From tameyer at ihug.co.nz Mon Dec 1 01:04:50 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 1 01:04:58 2003 Subject: [Spambayes] RE: Installing Spambayes In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304315B45@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1BD@its-xchg4.massey.ac.nz> [Please always direct spambayes questions to the spambayes mailing list , not individual developers.] > I wrote earlier about not being able to connect to localhost > using spambayes to connect to my pop3 server. Now I have another > problem. I uninstalled then attempted to reinstalled Spambayes. > I cannot. The error message is: > running install_lib > creating /usr/lib/python2.3/site-packages/spambayes > error: could not create '/usr/lib/python2.3/site-packages/spambayes': > Permission denied This seems a fairly straight-forward message. You don't have permission to create the directory named in the message. To install spambayes, you do need that permission; either install it somewhere else, or get permission. > I can install Spambayes and start the server as root but when > I use a browser > to go to http://localhost:8080/ I get the error message: > Cannot connect to localhost Are you deliberately using port 8080, rather than the default of 8880? If not, you may simply be using the wrong address. What does the console window where spambayes is running have in it? =Tony Meyer From gromit at ntlworld.com Mon Dec 1 03:12:16 2003 From: gromit at ntlworld.com (Gromit) Date: Mon Dec 1 03:12:18 2003 Subject: [Spambayes] Problem with spambayes Message-ID: I have windows xp and outlook 2000. Out look loads and spambayes looks like it's loaded but when i click on the spambayes button i just get a small grey square appear, and tghe delete as spam button doesn't work i think the program has stopped responing/loading properly. Thank you. From Mark.Howells at softoption.com Mon Dec 1 05:29:06 2003 From: Mark.Howells at softoption.com (Mark Howells) Date: Mon Dec 1 05:30:09 2003 Subject: [Spambayes] Clasification difficulties using pop3proxy / sb_server Message-ID: <5846CF419D2EF5439036CC3126A3A995017B55@SOSERVER1.softoption.local> > -----Original Message----- > From: Tony Meyer [mailto:tameyer@ihug.co.nz] > > > I have been getting a lot of 'unsures' (30% or so). When I > > use the web interface to review the messages and look at the > > clues for each unsure, I see that the spam rating is often > > 0.5 or even higher - sometimes well within the Spam zone (I > > currently use 20% and 80% as my thresholds). So , why is the > > classifier rating them as unsure? > > To paraphrase Tim, because they don't resemble the messages > you have trained > as ham more strongly than the messages you have trained as > spam (or vice > versa). Ooops - I've just looked at my question again and realise that the question looks ridiculous ;) What I _meant_ to write, was that I get some message that are classified as 'unsure' when the declared spam probability is 0.8 or higher. i.e. the spam rating is higher than the spam threshold. > Clues! Clues! Give us the clues! I'm not at home just now but will grab one when I get chance. > > Maybe related to the above, I have noticed that the 'unsure' > > subject line is marked as such in the cache. > Yes, that's correct. [Note that I incorrectly guessed otherwise when > someone else asked pretty much this exact question last week. And I apologise for asking a repeat question :( I've been following the list for a while but somehow didn't read that thread - sorry. I've lodged a bug report. Cheers Mark -- Outgoing mail is certified Virus Free. Checked by AVG Anti-Virus (http://www.grisoft.com). Version: 7.0.203 / Virus Database: 261.3.2 - Release Date: 11/27/2003 From papaDoc at videotron.ca Mon Dec 1 08:54:47 2003 From: papaDoc at videotron.ca (papaDoc) Date: Mon Dec 1 08:54:55 2003 Subject: [Spambayes] Clasification difficulties using pop3proxy / sb_server In-Reply-To: <5846CF419D2EF5439036CC3126A3A995017B55@SOSERVER1.softoption.local> References: <5846CF419D2EF5439036CC3126A3A995017B55@SOSERVER1.softoption.local> Message-ID: <3FCB4827.80803@videotron.ca> Hi Mark, >Ooops - I've just looked at my question again and realise that the question looks ridiculous ;) What I _meant_ to write, was that I get some message that are classified as 'unsure' when the declared spam probability is 0.8 or higher. i.e. the spam rating is higher than the spam threshold. > This can happend if in the Web UI, you trained on some messages. The remaining message stay in the same section but there score can change. Remi From michel.dhollosy at siemens.com Mon Dec 1 07:59:02 2003 From: michel.dhollosy at siemens.com (d'Hollosy Michel) Date: Mon Dec 1 09:11:01 2003 Subject: [Spambayes] Is it a bug? Message-ID: Hello! Since our exchange mail server does not filter out spam, I installed your SpamBayes about 2 weeks ago and I'm very happy about it. It helps me so efficiently in filtering the useful from the garbage... However I noticed now that my AutoArchive doesn't work properly anymore. Today, I tried to force OL2000 to archive the inbox manually by selecting file->archive...->... (see picture); it did run the process but all the mails are still in the inbox, i.e. they were not moved as expected. I checked whether the Do not archive flag was set in those messages, it's not the case. The phenomenon does not appear in other parts of the mail box, e.g. the sent items are handled ok. Is this an issue that relates to your plug-in? Best regards Michel d'Hollosy Header of spambayes1.log: Loaded bayes database from 'C:\WINNT\Profiles\DhollosM\Application Data\SpamBayes\default_bayes_database.db' Loaded message database from 'C:\WINNT\Profiles\DhollosM\Application Data\SpamBayes\default_message_database.db' Bayes database initialized with 51 spam and 146 good messages SpamBayes Outlook Addin, Binary version 0.81 (September 9, 2003) starting (with engine SpamBayes Beta2, version 0.2 (July 2003)) on Windows 4.0.1381 (Service Pack 6) using Python 2.3+ (#46, Aug 6 2003, 16:39:24) [MSC v.1200 32 bit (Intel)] SpamBayes: Watching for new messages in folder Inbox SpamBayes: Watching for new messages in folder Spam -------------- next part -------------- A non-text attachment was scrubbed... Name: OL-SpamBayesBug.jpg Type: image/jpeg Size: 31851 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031201/92ab38d2/OL-SpamBayesBug-0001.jpg From Mark.Howells at softoption.com Mon Dec 1 10:45:53 2003 From: Mark.Howells at softoption.com (Mark Howells) Date: Mon Dec 1 10:47:26 2003 Subject: [Spambayes] Clasification difficulties using pop3proxy / sb_server Message-ID: <5846CF419D2EF5439036CC3126A3A995017B5D@SOSERVER1.softoption.local> > -----Original Message----- > From: Mark Howells > Sent: 01 December 2003 10:29 > To: spambayes@python.org > Subject: RE: [Spambayes] Clasification difficulties using pop3proxy / > sb_server > > Ooops - I've just looked at my question again and realise > that the question looks ridiculous ;) What I _meant_ to > write, was that I get some message that are classified as > 'unsure' when the declared spam probability is 0.8 or higher. > i.e. the spam rating is higher than the spam threshold. > > > Clues! Clues! Give us the clues! Here's one that (with no training since it was classified) has a spam probability of 0.84, and is classified as unsure. The spam cutoff is 0.8 I'm guessing now, but it's possible that the classification of the 'unsure' token (see earlier part of thread) may have taken it over the brink of 0.8 eg. It may have had a rating of 0.79, been classified (correctly) as unsure. had the (spammy) unsure token added and is now showing as 0.84. In which case it's the clues production thats incorrect. Any ideas? Mark -- Outgoing mail is certified Virus Free. Checked by AVG Anti-Virus (http://www.grisoft.com). Version: 7.0.203 / Virus Database: 261.3.2 - Release Date: 11/27/2003 From tony-bayes at lownds.com Mon Dec 1 10:48:07 2003 From: tony-bayes at lownds.com (Tony Lownds) Date: Mon Dec 1 10:48:26 2003 Subject: [Spambayes] sb_imapfilter.py AssertionError: hamcount <= nham In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130212B1AC@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130212B1AC@its-xchg4.massey.ac.nz> Message-ID: At 12:19 PM +1300 12/1/03, Tony Meyer wrote: > > When I classify through sb_imapfilter.py, I am getting an >> AssertionError. Any ideas? I am using spambayes from CVS; courier >> IMAP; python 2.2.2; and a fresh database. See below for commands. >[...] >> assert hamcount <= nham >> AssertionError > >How fresh? Very... I remove the database files right before training. >>[tony ~]$ rm hammie.db spambayes.messageinfo.db >>[tony ~]$ /usr/bin/sb_imapfilter.py -t > This error says that you have a token in your database that has >appeared in more ham than you have trained it on - which isn't possible. Ah... while training it said 14 ham trained, while classifying it only said 10 ham. >> Training ham folder INBOX.Ham >>************** 14 trained. >>... >>hammie.db is an existing database, with 44 spam and 10 ham I didn't notice that before. >If this happens regularly, it would be great to know the sequence of events >that can reproduce it (in a sf bug tracker >), as we still don't really know what >causes this error. Sure, bug #852137, although without access to my IMAP server I don't see how it will be reproducable. Has anyone used Courier IMAP? Maybe the way it returns message identifiers is causing problems. -Tony From Mark.Howells at softoption.com Mon Dec 1 10:50:06 2003 From: Mark.Howells at softoption.com (Mark Howells) Date: Mon Dec 1 10:52:27 2003 Subject: [Spambayes] Clasification difficulties using pop3proxy / sb_server Message-ID: <5846CF419D2EF5439036CC3126A3A995017B5E@SOSERVER1.softoption.local> > -----Original Message----- > From: Mark Howells > Sent: 01 December 2003 10:29 > To: spambayes@python.org > Subject: RE: [Spambayes] Clasification difficulties using pop3proxy / > sb_server > > Ooops - I've just looked at my question again and realise > that the question looks ridiculous ;) What I _meant_ to > write, was that I get some message that are classified as > 'unsure' when the declared spam probability is 0.8 or higher. > i.e. the spam rating is higher than the spam threshold. > > > Clues! Clues! Give us the clues! Here's one that (with no training since it was classified) has a spam probability of 0.84, and is classified as unsure. The spam cutoff is 0.8 I'm guessing now, but it's possible that the classification of the 'unsure' token (see earlier part of thread) may have taken it over the brink of 0.8 eg. It may have had a rating of 0.79, been classified (correctly) as unsure. had the (spammy) unsure token added and is now showing as 0.84. In which case it's the clues production thats incorrect. Any ideas? Mark -- Outgoing mail is certified Virus Free. Checked by AVG Anti-Virus (http://www.grisoft.com). Version: 7.0.203 / Virus Database: 261.3.2 - Release Date: 11/27/2003 From tony-bayes at lownds.com Mon Dec 1 10:52:40 2003 From: tony-bayes at lownds.com (Tony Lownds) Date: Mon Dec 1 10:53:57 2003 Subject: [Spambayes] sb_imapfilter.py AssertionError: hamcount <= nham In-Reply-To: <16330.32689.397613.63493@montanaro.dyndns.org> References: <16330.32689.397613.63493@montanaro.dyndns.org> Message-ID: At 5:39 PM -0600 11/30/03, Skip Montanaro wrote: > Tony> When I classify through sb_imapfilter.py, I am getting an > Tony> AssertionError. Any ideas? > >That indicates a corrupt database. Do you ever kill sb_imapfilter.py >without giving it a chance to clean up properly or run two SpamBayes >applications which might both want to write to the same database (say, >sb_server.py or sb_filter.py as well as sb_imapfilter.py)? Nope, I am running everything in sequence. The sequence of commands I showed zapped the database right before training. I'm guessing that sp_imapfilter.py and Courier IMAP don't work nicely together (yet). -Tony From papaDoc at videotron.ca Mon Dec 1 10:57:23 2003 From: papaDoc at videotron.ca (papaDoc) Date: Mon Dec 1 10:57:28 2003 Subject: [Spambayes] sb_imapfilter.py AssertionError: hamcount <= nham In-Reply-To: References: <1ED4ECF91CDED24C8D012BCF2B034F130212B1AC@its-xchg4.massey.ac.nz> Message-ID: <3FCB64E3.9070504@videotron.ca> Hi Tony, >> This error says that you have a token in your database that has >> appeared in more ham than you have trained it on - which isn't possible. > > > Ah... while training it said 14 ham trained, while classifying it only > said 10 ham. Do you force the training (i.e. use the -f switch) ? sb_mboxtrain -f If you don't force the training old mail will still be marked as trained but they won't be included in the database (This is a feature so that we don't train on the same email many times) If you force the training even old mail marked as trained will be used from the training. When you train from scratch you should alway force the training. Remi From mossesq at earthlink.net Mon Dec 1 11:08:06 2003 From: mossesq at earthlink.net (Jeff) Date: Mon Dec 1 11:08:04 2003 Subject: [Spambayes] Outlook Problem Message-ID: <048f01c3b825$4e729a20$6401a8c0@jmosslpc> I do not know what happened, but Spambayes was working great, then stopped! I tried uninstall and redownloading and reinstall, but it made no difference. Even after uninstall and before the reinstall, the "Delete as Spam" Icon and the "Spambays" drop down box were there, but clicking on "Delete as Spam" did nothing, and there was no drop down on the spambayse key. I used to get a drop down box re: the manager. What happened? Jeff Moss -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 145 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031201/82b52b62/attachment.gif From skip at pobox.com Mon Dec 1 11:30:18 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Dec 1 11:30:29 2003 Subject: [Spambayes] Clasification difficulties using pop3proxy / sb_server In-Reply-To: <5846CF419D2EF5439036CC3126A3A995017B5E@SOSERVER1.softoption.local> References: <5846CF419D2EF5439036CC3126A3A995017B5E@SOSERVER1.softoption.local> Message-ID: <16331.27802.872745.363846@montanaro.dyndns.org> >> > Clues! Clues! Give us the clues! Mark> Here's one that (with no training since it was classified) has a Mark> spam probability of 0.84, and is classified as unsure. Nothing was attached as far as I could tell. Note that the *S* prob of 0.84 isn't enough to drop it into the spam bucket. Was there a *H* prob which wasn't 0.0? Just because it has a lot of spammy clues doesn't mean it doesn't also have a lot of hammy clues. BTW, what were the clues? ;-) Skip From skip at pobox.com Mon Dec 1 11:32:01 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Dec 1 11:33:26 2003 Subject: [Spambayes] sb_imapfilter.py AssertionError: hamcount <= nham In-Reply-To: References: <16330.32689.397613.63493@montanaro.dyndns.org> Message-ID: <16331.27905.11850.356984@montanaro.dyndns.org> Tony> I'm guessing that sp_imapfilter.py and Courier IMAP don't work Tony> nicely together (yet). I'm guessing that Remi's admonition to use -f is the cause. Skip From Tblatny at gasketeng.com Mon Dec 1 12:04:10 2003 From: Tblatny at gasketeng.com (Tamara Blatny) Date: Mon Dec 1 11:53:48 2003 Subject: [Spambayes] Outlook 97 and SpamBayes Message-ID: <51BD4A983557D411A5EA0050DA7B9C9D462114@EXCHANGE> We are using WinNT 4.0, Office 97, and I just installed the latest version of Spam Bayes. I don't have any log files. My problem is that I'm not seeing SpamBayes in my Outlook. Thank you for any assistance. Thank you, Tamara From TiagoTiago at Globo.com Mon Dec 1 12:50:27 2003 From: TiagoTiago at Globo.com (Tiago Estill de Noronha) Date: Mon Dec 1 12:48:48 2003 Subject: RES: [Spambayes] n-way in outlook? In-Reply-To: <16330.31241.322910.881219@montanaro.dyndns.org> Message-ID: <001c01c3b833$9d95f800$0860b7c8@virtua.com.br> Thanx for the answer ********************* Tiago Estill de Noronha TiagoTiago@Globo.com -=> -----Mensagem original----- -=> De: spambayes-bounces@python.org -=> [mailto:spambayes-bounces@python.org] Em nome de Skip Montanaro -=> Enviada em: domingo, 30 de novembro de 2003 20:15 -=> Para: Tiago Estill de Noronha -=> Cc: spambayes@python.org -=> Assunto: Re: [Spambayes] n-way in outlook? -=> -=> -=> -=> Tiago> have some1 made a n-way classification code for -=> using with the -=> Tiago> outlook plugin? -=> -=> Not that I'm aware of. -=> -=> Tiago> or does the nway code on the contrib folder -=> works on outlook? -=> -=> Nope. nway.py is just a simple demo I wrote. Since I'm -=> not an Outlook user, I made no attempt to make it work with -=> Outlook. After playing with it a bit, I'm not convinced -=> it's good enough for anything but experimental use. -=> -=> Skip -=> -=> _______________________________________________ -=> Spambayes@python.org -=> -=> http://mail.python.org/mailman/listinfo/spambaye-=> s -=> Check the -=> -=> FAQ before asking: -=> http://spambayes.sf.net/faq.html -=> -=> --- -=> Incoming mail is certified Virus Free. -=> Checked by AVG anti-virus system (http://www.grisoft.com). -=> Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 -=> -=> --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 From tony-bayes at lownds.com Mon Dec 1 13:35:28 2003 From: tony-bayes at lownds.com (Tony Lownds) Date: Mon Dec 1 13:35:29 2003 Subject: [Spambayes] sb_imapfilter.py AssertionError: hamcount <= nham In-Reply-To: <3FCB64E3.9070504@videotron.ca> References: <1ED4ECF91CDED24C8D012BCF2B034F130212B1AC@its-xchg4.massey.ac.nz> <3FCB64E3.9070504@videotron.ca> Message-ID: At 10:57 AM -0500 12/1/03, papaDoc wrote: >Hi Tony, > >>> This error says that you have a token in your database that has >>>appeared in more ham than you have trained it on - which isn't possible. >> >> >>Ah... while training it said 14 ham trained, while classifying it >>only said 10 ham. > >Do you force the training (i.e. use the -f switch) ? >sb_mboxtrain -f I'm using sb_imapfilter.py, no sb_mboxtrain. sb_imapfilter.py does not have an -f switch. I removed spambayes.messageinfo.db before running the training database. That should do the same thing as -f, right? Here are the commands I use to reproduce this: [tony ~]$ rm hammie.db spambayes.messageinfo.db [tony ~]$ /usr/bin/sb_imapfilter.py -t [tony ~]$ /usr/bin/sb_imapfilter.py -c I found something else that is interesting. When I train again, it trains 4 more messages, which should have been trained already. Here I train twice in a row and it trains 8 more messages each time: [tony ~]$ /usr/bin/sb_imapfilter.py -t SpamBayes IMAP Filter Beta1, version 0.1 (September 2003), using SpamBayes IMAP Filter Web Interface Alpha2, version 0.02 and engine SpamBayes Beta2, version 0.2 (July 2003). Loading state from hammie.db database hammie.db is an existing database, with 45 spam and 12 ham Loading database hammie.db... Done. Training Training ham folder INBOX.Ham .......***...*.. 4 trained. Training spam folder INBOX.Spam ..............***...................*........ 4 trained. Persisting hammie.db state in database Training took 0.65074801445 seconds, 8 messages were trained [tony ~]$ /usr/bin/sb_imapfilter.py -t SpamBayes IMAP Filter Beta1, version 0.1 (September 2003), using SpamBayes IMAP Filter Web Interface Alpha2, version 0.02 and engine SpamBayes Beta2, version 0.2 (July 2003). Loading state from hammie.db database hammie.db is an existing database, with 45 spam and 12 ham Loading database hammie.db... Done. Training Training ham folder INBOX.Ham .......***...*.. 4 trained. Training spam folder INBOX.Spam ..............***...................*........ 4 trained. Persisting hammie.db state in database Training took 0.65074801445 seconds, 8 messages were trained I'll try to figure out why sp_imapfilter.py is retraining those messages. -Tony From tim at fourstonesExpressions.com Mon Dec 1 13:56:55 2003 From: tim at fourstonesExpressions.com (Tim Stone) Date: Mon Dec 1 13:57:04 2003 Subject: [Spambayes] sb_imapfilter.py AssertionError: hamcount <= nham In-Reply-To: References: <1ED4ECF91CDED24C8D012BCF2B034F130212B1AC@its-xchg4.massey.ac.nz> <3FCB64E3.9070504@videotron.ca> Message-ID: On Mon, 1 Dec 2003 10:35:28 -0800, Tony Lownds wrote: > > > At 10:57 AM -0500 12/1/03, papaDoc wrote: >> Hi Tony, >> >>>> This error says that you have a token in your database that has >>>> appeared in more ham than you have trained it on - which isn't >>>> possible. >>> >>> >>> Ah... while training it said 14 ham trained, while classifying it only >>> said 10 ham. >> >> Do you force the training (i.e. use the -f switch) ? >> sb_mboxtrain -f > > I'm using sb_imapfilter.py, no sb_mboxtrain. sb_imapfilter.py does not > have an -f switch. I removed spambayes.messageinfo.db before running the > training database. That should do the same thing as -f, right? Not necessarily. If you remove the messageinfo db, then sb has forgotten what messages it has already trained on, and will quite possibly train a message that should otherwise be ignored... If you do this, then you must start with a completely new training database as well... > -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun From track at neomatrix.com.au Mon Dec 1 16:50:38 2003 From: track at neomatrix.com.au (Neomatrix) Date: Mon Dec 1 16:50:49 2003 Subject: [Spambayes] Feature request (for payment) Message-ID: <001401c3b855$2876a160$0200a8c0@Inspiron5150> Hello. Great product. Did I mention you have a great product? One itsy bitsy feature request, for which I will be happy to make a donation or payment of sorts. and it is perhaps one that others may appreciate. When mail is directed to the Junk folder, it is still marked UNREAD. This means it catches the eye, and must be 'cleared', which removes some of the effectiveness in the product. Can I ask that you (or smarter people than I who can actually code) add a check box option or similar, which allows for AUTOMATED clearing of junk mail, setting the read flag to READ. Even if you direct it straight to deleted folder, it still has a bold UNREAD indicator, so I keep getting excited thinking I have mail and somebody loves me, but instead it is just somebody suggesting I have a small penis and should consider their products (spam) etc. Thank you, Chris So give me a suggested donation and account details or I will go to paypal link on your site From kennypitt at hotmail.com Mon Dec 1 17:02:35 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Mon Dec 1 17:03:25 2003 Subject: [Spambayes] Feature request (for payment) In-Reply-To: <001401c3b855$2876a160$0200a8c0@Inspiron5150> Message-ID: Neomatrix wrote: > Great product. > > Did I mention you have a great product? Glad you like it. > When mail is directed to the Junk folder, it is still marked UNREAD. > This means it catches the eye, and must be 'cleared', which removes > some of the effectiveness in the product. > > Can I ask that you (or smarter people than I who can actually code) > add a check box option or similar, which allows for AUTOMATED > clearing of junk mail, setting the read flag to READ. I assume that you are running the Outlook plugin, but you don't mention what version. The latest version (0.81) has a checkbox labeled "Mark spam as read" on the Filtering tab of SpamBayes Manager. This option should do what you are asking. Be warned, however, that Outlook's new mail envelope icon will still be shown even if all you have is spam. -- Kenny Pitt From dgivens at channellcorp.com Mon Dec 1 17:07:15 2003 From: dgivens at channellcorp.com (Givens, Dallas) Date: Mon Dec 1 17:07:05 2003 Subject: [Spambayes] Where do the rejects go? Message-ID: <2C68E1F1B3E3D1118A1D0008C7244C92032635A1@CHANNELLCOMM> I am running a Windows 2000 system which has Office XP. I have installed the outlook plugin and have forwarded junk email from our exchange server into my email account. The email went into the Junk Suspects folder and I deleted them as spam. I then deleted them from the junk folder and the deleted items folder. After that, I forwarded the exact same email from the exchange server again but this time it does not show up anywhere. Therefore, what I want to know is where are they going now? Are they being permanently deleted? If so, is there a way to have them deleted only to the deleted items folder? Your help would be greatly appreciated. Thank you, Dallas Givens Desktop Administrator CHANNELL Commercial 26040 Ynez Road Temecula, CA. 92589 Office: (909) 719-2600 X2733 Cell: (760) 415-7510 From tameyer at ihug.co.nz Mon Dec 1 18:30:16 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 1 18:30:29 2003 Subject: [Spambayes] Outlook 97 and SpamBayes In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304477C58@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1C1@its-xchg4.massey.ac.nz> > We are using WinNT 4.0, Office 97, and I just installed the > latest version of Spam Bayes. I don't have any log files. > My problem is that I'm not seeing SpamBayes in my Outlook. > Thank you for any assistance. What mail program are you using? If it's Outlook 97, and you're trying to use the Outlook plug-in, you're out of luck (it works with Outlook 2000 and above). You can use the pop3 proxy or imap filter (assuming you get mail via pop3 or imap) instead, although you don't get the same integrated experience as with the plug-in. =Tony Meyer From rking at preflightventures.com Mon Dec 1 19:15:48 2003 From: rking at preflightventures.com (Rick King) Date: Mon Dec 1 19:15:26 2003 Subject: [Spambayes] SpamBayes Load Problem - Pls Help Message-ID: it says it gets installed, all the files show up on the hard drive, but then: a) there are no buttons or change in the menu ((Windows 98) indicating Spam detection b) the box below in item #3 is not checked even though it shows up every time. BUT every time I check it and try to close or hit OK or reboot, it shows up again as that same box not being checked?! PROBLEM: Addin doesn't load When you start Outlook, there is no Anti-Spam item in the toolbar. To resolve this: If you are running a binary version, then perform the following steps: 1. Start Outlook, and select Tools->Options to display the main Options dialog. Select the tab labelled Other, then click on the Advanced button. 2. Click on the COM Add-Ins button. If the SpamBayes addin is not listed, then SpamBayes should be reinstalled (Note that running regsvr32.exe spambayes_addin.dll from the SpamBayes directory may also solve this problem) ** 3. If the SpamBayes addin is listed but not checked, then simply check it and close the dialog. 4. If the SpamBayes addin is listed and checked, but still not working and still not creating log files, then I am stumped! Plesae send any help! It seems like it is very near to being set up right. Thnaks, - Rcik __________________________________ Rick King PreFlight Ventures (919) 806-1166 rking@preflightventures.com www.preflightventures.com Based in Research Triangle Park, NC, PreFlight Ventures helps entrepreneurs and new ventures grow their business through corporate partnering, technology licensing, and acquisitions. In addition, we provide coaching tools for entrepreneurs such as audio PreFlight PowerTips (www.preflightpowertips.com) and lead national workshops on "The Art of Telling Your Story", commercialization, and "Doing Smart Deals." <>< From tameyer at ihug.co.nz Mon Dec 1 19:25:47 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 1 19:25:54 2003 Subject: [Spambayes] SpamBayes Load Problem - Pls Help In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304477D42@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1C3@its-xchg4.massey.ac.nz> [...] > PROBLEM: Addin doesn't load > When you start Outlook, there is no Anti-Spam item in the toolbar. To > resolve this: [...] What version are you trying to install? This sounds like an old one. Please ensure that you are using the latest version (008.1). Please also note that the Outlook plug-in requires Outlook 2000 or above. =Tony Meyer From Tom.Larkin at pacificedge.com Mon Dec 1 19:38:45 2003 From: Tom.Larkin at pacificedge.com (Tom Larkin) Date: Mon Dec 1 19:38:52 2003 Subject: [Spambayes] Broken link Message-ID: <4543928AC7BCE2498D2D2DBA72D89285015AD710@smith.pacificedge.com> Broken image link at: http://spambayes.sourceforge.net/windows.html -- T o m L a r k i n From tameyer at ihug.co.nz Mon Dec 1 19:43:34 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 1 19:43:43 2003 Subject: [Spambayes] Broken link In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304477D4E@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1C4@its-xchg4.massey.ac.nz> > Broken image link at: > > http://spambayes.sourceforge.net/windows.html Everything looks fine here. I presume you mean on the sidebar on the left where the sourceforge logo should appear? (I can't see any other images on that page). It loads fine here - a temporary glitch, perhaps? =Tony Meyer From cindy at pizer.com Mon Dec 1 21:22:13 2003 From: cindy at pizer.com (Cindy Peyser) Date: Mon Dec 1 21:19:25 2003 Subject: [Spambayes] SpamBayes Q: Backup of PST file Message-ID: Dear SpamBayes, I really really like SpamBayes, it is so far the best spam filter I have tried (out of 3). One problem: I have noticed since the time I set it up, my backup system has not been able to backup my Outlook PST file (It gives a message that the file is in use even when Outlook is closed). And I have been unable to export, copy, or move my PST file (I just got a new computer, so am trying to get set up again). Is there something about SpamBayes that keeps using the PST file even after Outlook itself is closed? Also, how do I un-install? Regards, Cindy 206-634-2808 x1 cindy@pizer.com From sysadmin at scr.siemens.com Mon Dec 1 21:19:53 2003 From: sysadmin at scr.siemens.com (sysadmin@scr.siemens.com) Date: Mon Dec 1 21:20:15 2003 Subject: [Spambayes] Blocked Mail Notification Message-ID: <200312020220.hB22K4b11209@scr.siemens.com> ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Mon, 01 Dec 2003 21:19:50 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB22Jnb11202 for ; Mon, 1 Dec 2003 21:19:49 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte5.siemens.com (lte5.siemens.com [217.194.35.73]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB22Jqvd011910 for ; Mon, 1 Dec 2003 21:19:53 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte5.siemens.com (8.11.6/8.11.2) with ESMTP id hB22Ja829465 for ; Tue, 2 Dec 2003 03:19:36 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1AR08L-00026S-Eg; Mon, 01 Dec 2003 21:19:25 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 3 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Mon, 01 Dec 2003 21:19:25 -0500 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by scr.siemens.com id hB22Jnb11202 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. RES: [Spambayes] n-way in outlook? (Tiago Estill de Noronha) 2. Re: sb_imapfilter.py AssertionError: hamcount <=3D nham (Tony Lownds) 3. Re: sb_imapfilter.py AssertionError: hamcount <=3D nham (Tim Stone) 4. Feature request (for payment) (Neomatrix) 5. RE: Feature request (for payment) (Kenny Pitt) 6. Where do the rejects go? (Givens, Dallas) 7. RE: Outlook 97 and SpamBayes (Tony Meyer) 8. SpamBayes Load Problem - Pls Help (Rick King) 9. RE: SpamBayes Load Problem - Pls Help (Tony Meyer) 10. Broken link (Tom Larkin) 11. RE: Broken link (Tony Meyer) 12. SpamBayes Q: Backup of PST file (Cindy Peyser) ---------------------------------------------------------------------- Message: 1 Date: Mon, 1 Dec 2003 15:50:27 -0200 From: "Tiago Estill de Noronha" Subject: RES: [Spambayes] n-way in outlook? To: Message-ID: <001c01c3b833$9d95f800$0860b7c8@virtua.com.br> Content-Type: text/plain; charset=3D"Windows-1252" Thanx for the answer =20 =20 ********************* Tiago Estill de Noronha TiagoTiago@Globo.com -=3D> -----Mensagem original----- -=3D> De: spambayes-bounces@python.org=20 -=3D> [mailto:spambayes-bounces@python.org] Em nome de Skip Montanaro -=3D> Enviada em: domingo, 30 de novembro de 2003 20:15 -=3D> Para: Tiago Estill de Noronha -=3D> Cc: spambayes@python.org -=3D> Assunto: Re: [Spambayes] n-way in outlook? -=3D>=20 -=3D>=20 -=3D>=20 -=3D> Tiago> have some1 made a n-way classification code for=20 -=3D> using with the -=3D> Tiago> outlook plugin? -=3D>=20 -=3D> Not that I'm aware of. -=3D>=20 -=3D> Tiago> or does the nway code on the contrib folder=20 -=3D> works on outlook?=20 -=3D>=20 -=3D> Nope. nway.py is just a simple demo I wrote. Since I'm=20 -=3D> not an Outlook user, I made no attempt to make it work with=20 -=3D> Outlook. After playing with it a bit, I'm not convinced=20 -=3D> it's good enough for anything but experimental use. -=3D>=20 -=3D> Skip -=3D>=20 -=3D> _______________________________________________ -=3D> Spambayes@python.org=20 -=3D> -=3D> http://mail.python.org/mailman/listinfo/spambaye-=3D> s -=3D> Check the=20 -=3D>=20 -=3D> FAQ before asking:=20 -=3D> http://spambayes.sf.net/faq.html -=3D>=20 -=3D> --- -=3D> Incoming mail is certified Virus Free. -=3D> Checked by AVG anti-virus system (http://www.grisoft.com). -=3D> Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 -=3D> =20 -=3D>=20 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 =20 ------------------------------ Message: 2 Date: Mon, 1 Dec 2003 10:35:28 -0800 From: Tony Lownds Subject: Re: [Spambayes] sb_imapfilter.py AssertionError: hamcount <=3D nham To: papaDoc , spambayes@python.org Message-ID: Content-Type: text/plain; charset=3D"us-ascii" ; format=3D"flowed" At 10:57 AM -0500 12/1/03, papaDoc wrote: >Hi Tony, > >>> This error says that you have a token in your database that has >>>appeared in more ham than you have trained it on - which isn't possibl= e. >> >> >>Ah... while training it said 14 ham trained, while classifying it=20 >>only said 10 ham. > >Do you force the training (i.e. use the -f switch) ? >sb_mboxtrain -f I'm using sb_imapfilter.py, no sb_mboxtrain. sb_imapfilter.py does=20 not have an -f switch. I removed spambayes.messageinfo.db before=20 running the training database. That should do the same thing as -f,=20 right? Here are the commands I use to reproduce this: [tony ~]$ rm hammie.db spambayes.messageinfo.db [tony ~]$ /usr/bin/sb_imapfilter.py -t [tony ~]$ /usr/bin/sb_imapfilter.py -c I found something else that is interesting. When I train again, it=20 trains 4 more messages, which should have been trained already. Here=20 I train twice in a row and it trains 8 more messages each time: [tony ~]$ /usr/bin/sb_imapfilter.py -t SpamBayes IMAP Filter Beta1, version 0.1 (September 2003), using SpamBayes IMAP Filter Web Interface Alpha2, version 0.02 and engine SpamBayes Beta2, version 0.2 (July 2003). Loading state from hammie.db database hammie.db is an existing database, with 45 spam and 12 ham Loading database hammie.db... Done. Training Training ham folder INBOX.Ham .......***...*.. 4 trained. Training spam folder INBOX.Spam ..............***...................*........ 4 trained. Persisting hammie.db state in database Training took 0.65074801445 seconds, 8 messages were trained [tony ~]$ /usr/bin/sb_imapfilter.py -t SpamBayes IMAP Filter Beta1, version 0.1 (September 2003), using SpamBayes IMAP Filter Web Interface Alpha2, version 0.02 and engine SpamBayes Beta2, version 0.2 (July 2003). Loading state from hammie.db database hammie.db is an existing database, with 45 spam and 12 ham Loading database hammie.db... Done. Training Training ham folder INBOX.Ham .......***...*.. 4 trained. Training spam folder INBOX.Spam ..............***...................*........ 4 trained. Persisting hammie.db state in database Training took 0.65074801445 seconds, 8 messages were trained I'll try to figure out why sp_imapfilter.py is retraining those messages. -Tony ------------------------------ Message: 3 Date: Mon, 01 Dec 2003 12:56:55 -0600 From: Tim Stone Subject: Re: [Spambayes] sb_imapfilter.py AssertionError: hamcount <=3D nham To: Tony Lownds , papaDoc , spambayes@python.org Message-ID: Content-Type: text/plain; format=3Dflowed; charset=3Diso-8859-15 On Mon, 1 Dec 2003 10:35:28 -0800, Tony Lownds =20 wrote: > > > At 10:57 AM -0500 12/1/03, papaDoc wrote: >> Hi Tony, >> >>>> This error says that you have a token in your database that has >>>> appeared in more ham than you have trained it on - which isn't=20 >>>> possible. >>> >>> >>> Ah... while training it said 14 ham trained, while classifying it onl= y=20 >>> said 10 ham. >> >> Do you force the training (i.e. use the -f switch) ? >> sb_mboxtrain -f > > I'm using sb_imapfilter.py, no sb_mboxtrain. sb_imapfilter.py does not=20 > have an -f switch. I removed spambayes.messageinfo.db before running th= e=20 > training database. That should do the same thing as -f, right? Not necessarily. If you remove the messageinfo db, then sb has forgotten= =20 what messages it has already trained on, and will quite possibly train a=20 message that should otherwise be ignored... If you do this, then you mus= t=20 start with a completely new training database as well... > -- Vous exprimer; Expr=E9sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun ------------------------------ Message: 4 Date: Tue, 2 Dec 2003 08:50:38 +1100 From: "Neomatrix" Subject: [Spambayes] Feature request (for payment) To: Message-ID: <001401c3b855$2876a160$0200a8c0@Inspiron5150> Content-Type: text/plain; charset=3D"us-ascii" Hello. Great product. Did I mention you have a great product? =20 One itsy bitsy feature request, for which I will be happy to make a donat= ion or payment of sorts. and it is perhaps one that others may appreciate. =20 When mail is directed to the Junk folder, it is still marked UNREAD. This means it catches the eye, and must be 'cleared', which removes some of th= e effectiveness in the product. =20 Can I ask that you (or smarter people than I who can actually code) add a check box option or similar, which allows for AUTOMATED clearing of junk mail, setting the read flag to READ. Even if you direct it straight to deleted folder, it still has a bold UNREAD indicator, so I keep getting excited thinking I have mail and somebody loves me, but instead it is jus= t somebody suggesting I have a small penis and should consider their produc= ts (spam) etc. =20 Thank you, =20 Chris =20 So give me a suggested donation and account details or I will go to paypa= l link on your site ------------------------------ Message: 5 Date: Mon, 1 Dec 2003 17:02:35 -0500 From: "Kenny Pitt" Subject: RE: [Spambayes] Feature request (for payment) To: "'Neomatrix'" , Message-ID: Content-Type: text/plain; charset=3D"us-ascii" Neomatrix wrote: > Great product. >=20 > Did I mention you have a great product? Glad you like it. > When mail is directed to the Junk folder, it is still marked UNREAD. > This means it catches the eye, and must be 'cleared', which removes > some of the effectiveness in the product. >=20 > Can I ask that you (or smarter people than I who can actually code) > add a check box option or similar, which allows for AUTOMATED > clearing of junk mail, setting the read flag to READ. I assume that you are running the Outlook plugin, but you don't mention what version. The latest version (0.81) has a checkbox labeled "Mark spam as read" on the Filtering tab of SpamBayes Manager. This option should do what you are asking. Be warned, however, that Outlook's new mail envelope icon will still be shown even if all you have is spam. --=20 Kenny Pitt ------------------------------ Message: 6 Date: Mon, 1 Dec 2003 14:07:15 -0800=20 From: "Givens, Dallas" Subject: [Spambayes] Where do the rejects go? To: "'spambayes@python.org'" Message-ID: <2C68E1F1B3E3D1118A1D0008C7244C92032635A1@CHANNELLCOMM> Content-Type: text/plain I am running a Windows 2000 system which has Office XP. I have installed the outlook plugin and have forwarded junk email from our exchange server into my email account. The email went into the Junk Suspects folder and = I deleted them as spam. I then deleted them from the junk folder and the deleted items folder. After that, I forwarded the exact same email from = the exchange server again but this time it does not show up anywhere. Therefore, what I want to know is where are they going now? Are they bei= ng permanently deleted? If so, is there a way to have them deleted only to = the deleted items folder? Your help would be greatly appreciated. =20 Thank you, =20 Dallas Givens Desktop Administrator CHANNELL Commercial 26040 Ynez Road Temecula, CA. 92589 Office: (909) 719-2600 X2733 Cell: (760) 415-7510 =20 ------------------------------ Message: 7 Date: Tue, 2 Dec 2003 12:30:16 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Outlook 97 and SpamBayes To: "'Tamara Blatny'" , Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1C1@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset=3D"us-ascii" > We are using WinNT 4.0, Office 97, and I just installed the=20 > latest version of Spam Bayes. I don't have any log files. =20 > My problem is that I'm not seeing SpamBayes in my Outlook.=20 > Thank you for any assistance. What mail program are you using? If it's Outlook 97, and you're trying t= o use the Outlook plug-in, you're out of luck (it works with Outlook 2000 a= nd above). You can use the pop3 proxy or imap filter (assuming you get mail via pop3 or imap) instead, although you don't get the same integrated experience as with the plug-in. =3DTony Meyer ------------------------------ Message: 8 Date: Mon, 1 Dec 2003 19:15:48 -0500 From: "Rick King" Subject: [Spambayes] SpamBayes Load Problem - Pls Help To: Message-ID: Content-Type: text/plain; charset=3D"iso-8859-1" it says it gets installed, all the files show up on the hard drive, but then: a) there are no buttons or change in the menu ((Windows 98) indicating S= pam detection b) the box below in item #3 is not checked even though it shows up every time. BUT every time I check it and try to close or hit OK or reboot, it shows up again as that same box not being checked?! PROBLEM: Addin doesn't load When you start Outlook, there is no Anti-Spam item in the toolbar. To resolve this: If you are running a binary version, then perform the following steps: 1. Start Outlook, and select Tools->Options to display the main Options dialog. Select the tab labelled Other, then click on the Advanced button. 2. Click on the COM Add-Ins button. If the SpamBayes addin is not listed, then SpamBayes should be reinstalle= d (Note that running regsvr32.exe spambayes_addin.dll from the SpamBayes directory may also solve this problem) ** 3. If the SpamBayes addin is listed but not checked, then simply chec= k it and close the dialog. 4. If the SpamBayes addin is listed and checked, but still not working a= nd still not creating log files, then I am stumped! Plesae send any help! It seems like it is very near to being set up righ= t. Thnaks, - Rcik __________________________________ Rick King PreFlight Ventures (919) 806-1166 rking@preflightventures.com www.preflightventures.com Based in Research Triangle Park, NC, PreFlight Ventures helps entreprene= urs and new ventures grow their business through corporate partnering, technology licensing, and acquisitions. In addition, we provide coaching tools for entrepreneurs such as audio PreFlight PowerTips (www.preflightpowertips.com) and lead national workshops on "The Art of Telling Your Story", commercialization, and "Doing Smart Deals." <>< ------------------------------ Message: 9 Date: Tue, 2 Dec 2003 13:25:47 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] SpamBayes Load Problem - Pls Help To: "'Rick King'" , Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1C3@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset=3D"us-ascii" [...] > PROBLEM: Addin doesn't load > When you start Outlook, there is no Anti-Spam item in the toolbar. To > resolve this: [...] What version are you trying to install? This sounds like an old one. Please ensure that you are using the latest version (008.1). Please also note that the Outlook plug-in requires Outlook 2000 or above. =3DTony Meyer ------------------------------ Message: 10 Date: Mon, 1 Dec 2003 16:38:45 -0800 From: "Tom Larkin" Subject: [Spambayes] Broken link To: Message-ID: <4543928AC7BCE2498D2D2DBA72D89285015AD710@smith.pacificedge.com> Content-Type: text/plain; charset=3D"us-ascii" Broken image link at: http://spambayes.sourceforge.net/windows.html =20 -- T o m L a r k i n =20 ------------------------------ Message: 11 Date: Tue, 2 Dec 2003 13:43:34 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Broken link To: "'Tom Larkin'" , Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1C4@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset=3D"us-ascii" > Broken image link at: >=20 > http://spambayes.sourceforge.net/windows.html Everything looks fine here. I presume you mean on the sidebar on the lef= t where the sourceforge logo should appear? (I can't see any other images o= n that page). It loads fine here - a temporary glitch, perhaps? =3DTony Meyer ------------------------------ Message: 12 Date: Mon, 1 Dec 2003 18:22:13 -0800 From: "Cindy Peyser" Subject: [Spambayes] SpamBayes Q: Backup of PST file To: Message-ID: Content-Type: text/plain; charset=3D"iso-8859-1" Dear SpamBayes, I really really like SpamBayes, it is so far the best spam filter I have tried (out of 3). One problem: I have noticed since the time I set it up, my backup system has not been able to backup my Outlook PST file (It gives a message that = the file is in use even when Outlook is closed). And I have been unable to export, copy, or move my PST file (I just got a new computer, so am tryin= g to get set up again). Is there something about SpamBayes that keeps usin= g the PST file even after Outlook itself is closed? Also, how do I un-install? Regards, Cindy 206-634-2808 x1 cindy@pizer.com ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 3 **************************************** From atom at suspicious.org Mon Dec 1 21:27:50 2003 From: atom at suspicious.org (Atom 'Smasher') Date: Mon Dec 1 21:28:38 2003 Subject: [Spambayes] spam tokens IibKrw0yteNAtHyZDDw (fwd) Message-ID: two things i've noticed about spam, i'm not sure if either of them are taken into account with SB, but maybe someone can look into this further... or maybe someone already has and they can tell me why these don't work... 1) so many spams have a *lot* of spaces (and tabs?) in the subject line. (like above {taken from real spam}). i know... multiple spaces aren't tokens, they *separate* tokens... but when there are 20+ in a row, in the subject line, that usually means spam. 2) so many spams are filled with nonsense and random strings rldvlzgj coldokiue i q wfup cadrhs r cqufqc e p fnlcgv fipv which probably don't appear in legit email. can these be used to detect spam? are they used? my understanding of bayesian filtering, is that if it never before encountered the word "rldvlzgj", then it scores 0.5 (or something fairly neutral). well, after i've trained it on a few hundred or a few thousand emails, i think it should have a good handle on my vocabulary and maybe be less forgiving with words i haven't seen before. i fully understand that the nature of bayesian filtering is often counter-intuitive when it comes to what to look at and what to ignore, so i'm fully prepared for someone to tell me exactly why these things don't work the way my brain thinks they should. ...atom _______________________________________________ PGP key - http://smasher.suspicious.org/pgp.txt 3EBE 2810 30AE 601D 54B2 4A90 9C28 0BBF 3D7D 41E3 ------------------------------------------------- "IDEA's key length is 128 bits - over twice as long as DES. Assuming that a brute force attack is the most efficient, it would require 2^128 (10^38) encryptions to recover the key. Design a chip that can test a billion keys per second an throw a billion of the them at the problem, and it will still take 10^13 years - that's longer than the age of the universe. An array of 10^24 such chips can find the key in a day, but there aren't enough silicon atoms in the universe to build such a machine. Now we're getting somewhere - although I'd keep my eye on the dark matter debate." -- Bruce Schneier, Applied Cryptography From tim.one at comcast.net Mon Dec 1 22:36:37 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Dec 1 22:36:42 2003 Subject: [Spambayes] SpamBayes Q: Backup of PST file In-Reply-To: Message-ID: [Cindy Peyser] > Dear SpamBayes, > > I really really like SpamBayes, it is so far the best spam filter I > have tried (out of 3). Glad you like it! > One problem: I have noticed since the time I set it up, my backup > system has not been able to backup my Outlook PST file (It gives a > message that the file is in use even when Outlook is closed). And I > have been unable to export, copy, or move my PST file (I just got a > new computer, so am trying to get set up again). Is there something > about SpamBayes that keeps using the PST file even after Outlook > itself is closed? No, nothing that we know of, but closing Outlook doesn't always stop Outlook from running, with or without SpamBayes installed. I've always had this problem (with three different Outlook 2000 installations), and don't think it's any more-- or less --common since installing Outlook. You didn't say which version of Windows or Outlook you're using. The best thing to do before running backups is to use the Windows Task Manager to kill off all non-essential programs first. Exactly how you do that depends on which flavor of Windows you're using. You'll sometimes find that an Outlook process is still running despite that you closed Outlook. That's life. The same procedure applies if you get any sort of "persmission denied -- file in use" error when trying to copy, move, or rename a file. > Also, how do I un-install? Anyone? I run from CVS, so have no idea what the binary installer sets up here (assuming the poster used a binary installer, which seems too likely to question ). This one is nearly a FAQ lately! From delete at GoodmanAssociates.com Mon Dec 1 23:26:31 2003 From: delete at GoodmanAssociates.com (Seth Goodman) Date: Mon Dec 1 23:26:34 2003 Subject: [Spambayes] SpamBayes Q: Backup of PST file In-Reply-To: Message-ID: > No, nothing that we know of, but closing Outlook doesn't always > stop Outlook > from running, with or without SpamBayes installed. I've always had this > problem (with three different Outlook 2000 installations), and don't think > it's any more-- or less --common since installing Outlook. Just as you say, I noticed the same thing with Outlook 2000 long before I tried SpamBayes. You close Outlook and an Outlook.exe process is still running. It's neither better nor worse with the SpamBayes Add-In. Oddly, I have even noticed that after Outlook is running for a long time, a second instance of Outlook.exe appears, according to the task manager. The symptom that prompts me to notice this is that a message does not become "read" after I display for the appropriate number of seconds in the preview pane. Then I look in the task manager and see two Outlook.exe processes. What a bizarre piece of code this Outlook is, but also very useful. -- Seth Goodman Humans: change "delete" to "sethg" to email me Spambots: disregard the above From sysadmin at scr.siemens.com Mon Dec 1 23:27:09 2003 From: sysadmin at scr.siemens.com (sysadmin@scr.siemens.com) Date: Mon Dec 1 23:27:19 2003 Subject: [Spambayes] Blocked Mail Notification Message-ID: <200312020427.hB24RCb12625@scr.siemens.com> ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Mon, 01 Dec 2003 23:27:06 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB24R5b12617 for ; Mon, 1 Dec 2003 23:27:05 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte2.siemens.com ([212.114.202.115]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB24R8vd013744 for ; Mon, 1 Dec 2003 23:27:08 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte2.siemens.com (8.11.6/8.11.6) with ESMTP id hB24Qqk22783 for ; Tue, 2 Dec 2003 05:26:52 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1AR27T-0007zj-Jh; Mon, 01 Dec 2003 23:26:39 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 4 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Mon, 01 Dec 2003 23:26:39 -0500 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Blocked Mail Notification (sysadmin@scr.siemens.com) 2. spam tokens IibKrw0yteNAtHyZDDw (fwd) (Atom 'Smasher') 3. RE: SpamBayes Q: Backup of PST file (Tim Peters) 4. RE: SpamBayes Q: Backup of PST file (Seth Goodman) ---------------------------------------------------------------------- Message: 1 Date: Mon, 01 Dec 2003 21:19:53 -0500 From: sysadmin@scr.siemens.com Subject: [Spambayes] Blocked Mail Notification To: spambayes@python.org Message-ID: <200312020220.hB22K4b11209@scr.siemens.com> Content-Type: text/plain; charset="iso-8859-1" ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Mon, 01 Dec 2003 21:19:50 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB22Jnb11202 for ; Mon, 1 Dec 2003 21:19:49 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte5.siemens.com (lte5.siemens.com [217.194.35.73]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB22Jqvd011910 for ; Mon, 1 Dec 2003 21:19:53 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte5.siemens.com (8.11.6/8.11.2) with ESMTP id hB22Ja829465 for ; Tue, 2 Dec 2003 03:19:36 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1AR08L-00026S-Eg; Mon, 01 Dec 2003 21:19:25 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 3 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Mon, 01 Dec 2003 21:19:25 -0500 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by scr.siemens.com id hB22Jnb11202 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. RES: [Spambayes] n-way in outlook? (Tiago Estill de Noronha) 2. Re: sb_imapfilter.py AssertionError: hamcount <=3D nham (Tony Lownds) 3. Re: sb_imapfilter.py AssertionError: hamcount <=3D nham (Tim Stone) 4. Feature request (for payment) (Neomatrix) 5. RE: Feature request (for payment) (Kenny Pitt) 6. Where do the rejects go? (Givens, Dallas) 7. RE: Outlook 97 and SpamBayes (Tony Meyer) 8. SpamBayes Load Problem - Pls Help (Rick King) 9. RE: SpamBayes Load Problem - Pls Help (Tony Meyer) 10. Broken link (Tom Larkin) 11. RE: Broken link (Tony Meyer) 12. SpamBayes Q: Backup of PST file (Cindy Peyser) ---------------------------------------------------------------------- Message: 1 Date: Mon, 1 Dec 2003 15:50:27 -0200 From: "Tiago Estill de Noronha" Subject: RES: [Spambayes] n-way in outlook? To: Message-ID: <001c01c3b833$9d95f800$0860b7c8@virtua.com.br> Content-Type: text/plain; charset=3D"Windows-1252" Thanx for the answer =20 =20 ********************* Tiago Estill de Noronha TiagoTiago@Globo.com -=3D> -----Mensagem original----- -=3D> De: spambayes-bounces@python.org=20 -=3D> [mailto:spambayes-bounces@python.org] Em nome de Skip Montanaro -=3D> Enviada em: domingo, 30 de novembro de 2003 20:15 -=3D> Para: Tiago Estill de Noronha -=3D> Cc: spambayes@python.org -=3D> Assunto: Re: [Spambayes] n-way in outlook? -=3D>=20 -=3D>=20 -=3D>=20 -=3D> Tiago> have some1 made a n-way classification code for=20 -=3D> using with the -=3D> Tiago> outlook plugin? -=3D>=20 -=3D> Not that I'm aware of. -=3D>=20 -=3D> Tiago> or does the nway code on the contrib folder=20 -=3D> works on outlook?=20 -=3D>=20 -=3D> Nope. nway.py is just a simple demo I wrote. Since I'm=20 -=3D> not an Outlook user, I made no attempt to make it work with=20 -=3D> Outlook. After playing with it a bit, I'm not convinced=20 -=3D> it's good enough for anything but experimental use. -=3D>=20 -=3D> Skip -=3D>=20 -=3D> _______________________________________________ -=3D> Spambayes@python.org=20 -=3D> -=3D> http://mail.python.org/mailman/listinfo/spambaye-=3D> s -=3D> Check the=20 -=3D>=20 -=3D> FAQ before asking:=20 -=3D> http://spambayes.sf.net/faq.html -=3D>=20 -=3D> --- -=3D> Incoming mail is certified Virus Free. -=3D> Checked by AVG anti-virus system (http://www.grisoft.com). -=3D> Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 -=3D> =20 -=3D>=20 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 =20 ------------------------------ Message: 2 Date: Mon, 1 Dec 2003 10:35:28 -0800 From: Tony Lownds Subject: Re: [Spambayes] sb_imapfilter.py AssertionError: hamcount <=3D nham To: papaDoc , spambayes@python.org Message-ID: Content-Type: text/plain; charset=3D"us-ascii" ; format=3D"flowed" At 10:57 AM -0500 12/1/03, papaDoc wrote: >Hi Tony, > >>> This error says that you have a token in your database that has >>>appeared in more ham than you have trained it on - which isn't possibl= e. >> >> >>Ah... while training it said 14 ham trained, while classifying it=20 >>only said 10 ham. > >Do you force the training (i.e. use the -f switch) ? >sb_mboxtrain -f I'm using sb_imapfilter.py, no sb_mboxtrain. sb_imapfilter.py does=20 not have an -f switch. I removed spambayes.messageinfo.db before=20 running the training database. That should do the same thing as -f,=20 right? Here are the commands I use to reproduce this: [tony ~]$ rm hammie.db spambayes.messageinfo.db [tony ~]$ /usr/bin/sb_imapfilter.py -t [tony ~]$ /usr/bin/sb_imapfilter.py -c I found something else that is interesting. When I train again, it=20 trains 4 more messages, which should have been trained already. Here=20 I train twice in a row and it trains 8 more messages each time: [tony ~]$ /usr/bin/sb_imapfilter.py -t SpamBayes IMAP Filter Beta1, version 0.1 (September 2003), using SpamBayes IMAP Filter Web Interface Alpha2, version 0.02 and engine SpamBayes Beta2, version 0.2 (July 2003). Loading state from hammie.db database hammie.db is an existing database, with 45 spam and 12 ham Loading database hammie.db... Done. Training Training ham folder INBOX.Ham .......***...*.. 4 trained. Training spam folder INBOX.Spam ..............***...................*........ 4 trained. Persisting hammie.db state in database Training took 0.65074801445 seconds, 8 messages were trained [tony ~]$ /usr/bin/sb_imapfilter.py -t SpamBayes IMAP Filter Beta1, version 0.1 (September 2003), using SpamBayes IMAP Filter Web Interface Alpha2, version 0.02 and engine SpamBayes Beta2, version 0.2 (July 2003). Loading state from hammie.db database hammie.db is an existing database, with 45 spam and 12 ham Loading database hammie.db... Done. Training Training ham folder INBOX.Ham .......***...*.. 4 trained. Training spam folder INBOX.Spam ..............***...................*........ 4 trained. Persisting hammie.db state in database Training took 0.65074801445 seconds, 8 messages were trained I'll try to figure out why sp_imapfilter.py is retraining those messages. -Tony ------------------------------ Message: 3 Date: Mon, 01 Dec 2003 12:56:55 -0600 From: Tim Stone Subject: Re: [Spambayes] sb_imapfilter.py AssertionError: hamcount <=3D nham To: Tony Lownds , papaDoc , spambayes@python.org Message-ID: Content-Type: text/plain; format=3Dflowed; charset=3Diso-8859-15 On Mon, 1 Dec 2003 10:35:28 -0800, Tony Lownds =20 wrote: > > > At 10:57 AM -0500 12/1/03, papaDoc wrote: >> Hi Tony, >> >>>> This error says that you have a token in your database that has >>>> appeared in more ham than you have trained it on - which isn't=20 >>>> possible. >>> >>> >>> Ah... while training it said 14 ham trained, while classifying it onl= y=20 >>> said 10 ham. >> >> Do you force the training (i.e. use the -f switch) ? >> sb_mboxtrain -f > > I'm using sb_imapfilter.py, no sb_mboxtrain. sb_imapfilter.py does not=20 > have an -f switch. I removed spambayes.messageinfo.db before running th= e=20 > training database. That should do the same thing as -f, right? Not necessarily. If you remove the messageinfo db, then sb has forgotten= =20 what messages it has already trained on, and will quite possibly train a=20 message that should otherwise be ignored... If you do this, then you mus= t=20 start with a completely new training database as well... > -- Vous exprimer; Expr=E9sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun ------------------------------ Message: 4 Date: Tue, 2 Dec 2003 08:50:38 +1100 From: "Neomatrix" Subject: [Spambayes] Feature request (for payment) To: Message-ID: <001401c3b855$2876a160$0200a8c0@Inspiron5150> Content-Type: text/plain; charset=3D"us-ascii" Hello. Great product. Did I mention you have a great product? =20 One itsy bitsy feature request, for which I will be happy to make a donat= ion or payment of sorts. and it is perhaps one that others may appreciate. =20 When mail is directed to the Junk folder, it is still marked UNREAD. This means it catches the eye, and must be 'cleared', which removes some of th= e effectiveness in the product. =20 Can I ask that you (or smarter people than I who can actually code) add a check box option or similar, which allows for AUTOMATED clearing of junk mail, setting the read flag to READ. Even if you direct it straight to deleted folder, it still has a bold UNREAD indicator, so I keep getting excited thinking I have mail and somebody loves me, but instead it is jus= t somebody suggesting I have a small penis and should consider their produc= ts (spam) etc. =20 Thank you, =20 Chris =20 So give me a suggested donation and account details or I will go to paypa= l link on your site ------------------------------ Message: 5 Date: Mon, 1 Dec 2003 17:02:35 -0500 From: "Kenny Pitt" Subject: RE: [Spambayes] Feature request (for payment) To: "'Neomatrix'" , Message-ID: Content-Type: text/plain; charset=3D"us-ascii" Neomatrix wrote: > Great product. >=20 > Did I mention you have a great product? Glad you like it. > When mail is directed to the Junk folder, it is still marked UNREAD. > This means it catches the eye, and must be 'cleared', which removes > some of the effectiveness in the product. >=20 > Can I ask that you (or smarter people than I who can actually code) > add a check box option or similar, which allows for AUTOMATED > clearing of junk mail, setting the read flag to READ. I assume that you are running the Outlook plugin, but you don't mention what version. The latest version (0.81) has a checkbox labeled "Mark spam as read" on the Filtering tab of SpamBayes Manager. This option should do what you are asking. Be warned, however, that Outlook's new mail envelope icon will still be shown even if all you have is spam. --=20 Kenny Pitt ------------------------------ Message: 6 Date: Mon, 1 Dec 2003 14:07:15 -0800=20 From: "Givens, Dallas" Subject: [Spambayes] Where do the rejects go? To: "'spambayes@python.org'" Message-ID: <2C68E1F1B3E3D1118A1D0008C7244C92032635A1@CHANNELLCOMM> Content-Type: text/plain I am running a Windows 2000 system which has Office XP. I have installed the outlook plugin and have forwarded junk email from our exchange server into my email account. The email went into the Junk Suspects folder and = I deleted them as spam. I then deleted them from the junk folder and the deleted items folder. After that, I forwarded the exact same email from = the exchange server again but this time it does not show up anywhere. Therefore, what I want to know is where are they going now? Are they bei= ng permanently deleted? If so, is there a way to have them deleted only to = the deleted items folder? Your help would be greatly appreciated. =20 Thank you, =20 Dallas Givens Desktop Administrator CHANNELL Commercial 26040 Ynez Road Temecula, CA. 92589 Office: (909) 719-2600 X2733 Cell: (760) 415-7510 =20 ------------------------------ Message: 7 Date: Tue, 2 Dec 2003 12:30:16 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Outlook 97 and SpamBayes To: "'Tamara Blatny'" , Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1C1@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset=3D"us-ascii" > We are using WinNT 4.0, Office 97, and I just installed the=20 > latest version of Spam Bayes. I don't have any log files. =20 > My problem is that I'm not seeing SpamBayes in my Outlook.=20 > Thank you for any assistance. What mail program are you using? If it's Outlook 97, and you're trying t= o use the Outlook plug-in, you're out of luck (it works with Outlook 2000 a= nd above). You can use the pop3 proxy or imap filter (assuming you get mail via pop3 or imap) instead, although you don't get the same integrated experience as with the plug-in. =3DTony Meyer ------------------------------ Message: 8 Date: Mon, 1 Dec 2003 19:15:48 -0500 From: "Rick King" Subject: [Spambayes] SpamBayes Load Problem - Pls Help To: Message-ID: Content-Type: text/plain; charset=3D"iso-8859-1" it says it gets installed, all the files show up on the hard drive, but then: a) there are no buttons or change in the menu ((Windows 98) indicating S= pam detection b) the box below in item #3 is not checked even though it shows up every time. BUT every time I check it and try to close or hit OK or reboot, it shows up again as that same box not being checked?! PROBLEM: Addin doesn't load When you start Outlook, there is no Anti-Spam item in the toolbar. To resolve this: If you are running a binary version, then perform the following steps: 1. Start Outlook, and select Tools->Options to display the main Options dialog. Select the tab labelled Other, then click on the Advanced button. 2. Click on the COM Add-Ins button. If the SpamBayes addin is not listed, then SpamBayes should be reinstalle= d (Note that running regsvr32.exe spambayes_addin.dll from the SpamBayes directory may also solve this problem) ** 3. If the SpamBayes addin is listed but not checked, then simply chec= k it and close the dialog. 4. If the SpamBayes addin is listed and checked, but still not working a= nd still not creating log files, then I am stumped! Plesae send any help! It seems like it is very near to being set up righ= t. Thnaks, - Rcik __________________________________ Rick King PreFlight Ventures (919) 806-1166 rking@preflightventures.com www.preflightventures.com Based in Research Triangle Park, NC, PreFlight Ventures helps entreprene= urs and new ventures grow their business through corporate partnering, technology licensing, and acquisitions. In addition, we provide coaching tools for entrepreneurs such as audio PreFlight PowerTips (www.preflightpowertips.com) and lead national workshops on "The Art of Telling Your Story", commercialization, and "Doing Smart Deals." <>< ------------------------------ Message: 9 Date: Tue, 2 Dec 2003 13:25:47 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] SpamBayes Load Problem - Pls Help To: "'Rick King'" , Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1C3@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset=3D"us-ascii" [...] > PROBLEM: Addin doesn't load > When you start Outlook, there is no Anti-Spam item in the toolbar. To > resolve this: [...] What version are you trying to install? This sounds like an old one. Please ensure that you are using the latest version (008.1). Please also note that the Outlook plug-in requires Outlook 2000 or above. =3DTony Meyer ------------------------------ Message: 10 Date: Mon, 1 Dec 2003 16:38:45 -0800 From: "Tom Larkin" Subject: [Spambayes] Broken link To: Message-ID: <4543928AC7BCE2498D2D2DBA72D89285015AD710@smith.pacificedge.com> Content-Type: text/plain; charset=3D"us-ascii" Broken image link at: http://spambayes.sourceforge.net/windows.html =20 -- T o m L a r k i n =20 ------------------------------ Message: 11 Date: Tue, 2 Dec 2003 13:43:34 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Broken link To: "'Tom Larkin'" , Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1C4@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset=3D"us-ascii" > Broken image link at: >=20 > http://spambayes.sourceforge.net/windows.html Everything looks fine here. I presume you mean on the sidebar on the lef= t where the sourceforge logo should appear? (I can't see any other images o= n that page). It loads fine here - a temporary glitch, perhaps? =3DTony Meyer ------------------------------ Message: 12 Date: Mon, 1 Dec 2003 18:22:13 -0800 From: "Cindy Peyser" Subject: [Spambayes] SpamBayes Q: Backup of PST file To: Message-ID: Content-Type: text/plain; charset=3D"iso-8859-1" Dear SpamBayes, I really really like SpamBayes, it is so far the best spam filter I have tried (out of 3). One problem: I have noticed since the time I set it up, my backup system has not been able to backup my Outlook PST file (It gives a message that = the file is in use even when Outlook is closed). And I have been unable to export, copy, or move my PST file (I just got a new computer, so am tryin= g to get set up again). Is there something about SpamBayes that keeps usin= g the PST file even after Outlook itself is closed? Also, how do I un-install? Regards, Cindy 206-634-2808 x1 cindy@pizer.com ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 3 **************************************** ------------------------------ Message: 2 Date: Mon, 1 Dec 2003 18:27:50 -0800 (PST) From: Atom 'Smasher' Subject: [Spambayes] spam tokens IibKrw0yteNAtHyZDDw (fwd) To: spambayes@python.org Message-ID: Content-Type: TEXT/PLAIN; charset=US-ASCII two things i've noticed about spam, i'm not sure if either of them are taken into account with SB, but maybe someone can look into this further... or maybe someone already has and they can tell me why these don't work... 1) so many spams have a *lot* of spaces (and tabs?) in the subject line. (like above {taken from real spam}). i know... multiple spaces aren't tokens, they *separate* tokens... but when there are 20+ in a row, in the subject line, that usually means spam. 2) so many spams are filled with nonsense and random strings rldvlzgj coldokiue i q wfup cadrhs r cqufqc e p fnlcgv fipv which probably don't appear in legit email. can these be used to detect spam? are they used? my understanding of bayesian filtering, is that if it never before encountered the word "rldvlzgj", then it scores 0.5 (or something fairly neutral). well, after i've trained it on a few hundred or a few thousand emails, i think it should have a good handle on my vocabulary and maybe be less forgiving with words i haven't seen before. i fully understand that the nature of bayesian filtering is often counter-intuitive when it comes to what to look at and what to ignore, so i'm fully prepared for someone to tell me exactly why these things don't work the way my brain thinks they should. ...atom _______________________________________________ PGP key - http://smasher.suspicious.org/pgp.txt 3EBE 2810 30AE 601D 54B2 4A90 9C28 0BBF 3D7D 41E3 ------------------------------------------------- "IDEA's key length is 128 bits - over twice as long as DES. Assuming that a brute force attack is the most efficient, it would require 2^128 (10^38) encryptions to recover the key. Design a chip that can test a billion keys per second an throw a billion of the them at the problem, and it will still take 10^13 years - that's longer than the age of the universe. An array of 10^24 such chips can find the key in a day, but there aren't enough silicon atoms in the universe to build such a machine. Now we're getting somewhere - although I'd keep my eye on the dark matter debate." -- Bruce Schneier, Applied Cryptography ------------------------------ Message: 3 Date: Mon, 1 Dec 2003 22:36:37 -0500 From: "Tim Peters" Subject: RE: [Spambayes] SpamBayes Q: Backup of PST file To: "Cindy Peyser" Cc: spambayes@python.org Message-ID: Content-Type: text/plain; charset="iso-8859-1" [Cindy Peyser] > Dear SpamBayes, > > I really really like SpamBayes, it is so far the best spam filter I > have tried (out of 3). Glad you like it! > One problem: I have noticed since the time I set it up, my backup > system has not been able to backup my Outlook PST file (It gives a > message that the file is in use even when Outlook is closed). And I > have been unable to export, copy, or move my PST file (I just got a > new computer, so am trying to get set up again). Is there something > about SpamBayes that keeps using the PST file even after Outlook > itself is closed? No, nothing that we know of, but closing Outlook doesn't always stop Outlook from running, with or without SpamBayes installed. I've always had this problem (with three different Outlook 2000 installations), and don't think it's any more-- or less --common since installing Outlook. You didn't say which version of Windows or Outlook you're using. The best thing to do before running backups is to use the Windows Task Manager to kill off all non-essential programs first. Exactly how you do that depends on which flavor of Windows you're using. You'll sometimes find that an Outlook process is still running despite that you closed Outlook. That's life. The same procedure applies if you get any sort of "persmission denied -- file in use" error when trying to copy, move, or rename a file. > Also, how do I un-install? Anyone? I run from CVS, so have no idea what the binary installer sets up here (assuming the poster used a binary installer, which seems too likely to question ). This one is nearly a FAQ lately! ------------------------------ Message: 4 Date: Mon, 1 Dec 2003 22:26:31 -0600 From: "Seth Goodman" Subject: RE: [Spambayes] SpamBayes Q: Backup of PST file To: Message-ID: Content-Type: text/plain; charset="iso-8859-1" > No, nothing that we know of, but closing Outlook doesn't always > stop Outlook > from running, with or without SpamBayes installed. I've always had this > problem (with three different Outlook 2000 installations), and don't think > it's any more-- or less --common since installing Outlook. Just as you say, I noticed the same thing with Outlook 2000 long before I tried SpamBayes. You close Outlook and an Outlook.exe process is still running. It's neither better nor worse with the SpamBayes Add-In. Oddly, I have even noticed that after Outlook is running for a long time, a second instance of Outlook.exe appears, according to the task manager. The symptom that prompts me to notice this is that a message does not become "read" after I display for the appropriate number of seconds in the preview pane. Then I look in the task manager and see two Outlook.exe processes. What a bizarre piece of code this Outlook is, but also very useful. -- Seth Goodman Humans: change "delete" to "sethg" to email me Spambots: disregard the above ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 4 **************************************** From sysadmin at scr.siemens.com Mon Dec 1 23:27:44 2003 From: sysadmin at scr.siemens.com (sysadmin@scr.siemens.com) Date: Mon Dec 1 23:27:58 2003 Subject: [Spambayes] Blocked Mail Notification Message-ID: <200312020427.hB24Rrb12643@scr.siemens.com> ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Mon, 01 Dec 2003 23:27:40 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB24Rdb12633 for ; Mon, 1 Dec 2003 23:27:39 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte2.siemens.com ([212.114.202.115]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB24Rgvd013762 for ; Mon, 1 Dec 2003 23:27:42 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte2.siemens.com (8.11.6/8.11.6) with ESMTP id hB24RQk23003 for ; Tue, 2 Dec 2003 05:27:26 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1AR28D-0000UE-FG; Mon, 01 Dec 2003 23:27:25 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 5 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Mon, 01 Dec 2003 23:27:25 -0500 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Blocked Mail Notification (sysadmin@scr.siemens.com) ---------------------------------------------------------------------- Message: 1 Date: Mon, 01 Dec 2003 23:27:09 -0500 From: sysadmin@scr.siemens.com Subject: [Spambayes] Blocked Mail Notification To: spambayes@python.org Message-ID: <200312020427.hB24RCb12625@scr.siemens.com> Content-Type: text/plain; charset="iso-8859-1" ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Mon, 01 Dec 2003 23:27:06 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB24R5b12617 for ; Mon, 1 Dec 2003 23:27:05 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte2.siemens.com ([212.114.202.115]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB24R8vd013744 for ; Mon, 1 Dec 2003 23:27:08 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte2.siemens.com (8.11.6/8.11.6) with ESMTP id hB24Qqk22783 for ; Tue, 2 Dec 2003 05:26:52 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1AR27T-0007zj-Jh; Mon, 01 Dec 2003 23:26:39 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 4 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Mon, 01 Dec 2003 23:26:39 -0500 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Blocked Mail Notification (sysadmin@scr.siemens.com) 2. spam tokens IibKrw0yteNAtHyZDDw (fwd) (Atom 'Smasher') 3. RE: SpamBayes Q: Backup of PST file (Tim Peters) 4. RE: SpamBayes Q: Backup of PST file (Seth Goodman) ---------------------------------------------------------------------- Message: 1 Date: Mon, 01 Dec 2003 21:19:53 -0500 From: sysadmin@scr.siemens.com Subject: [Spambayes] Blocked Mail Notification To: spambayes@python.org Message-ID: <200312020220.hB22K4b11209@scr.siemens.com> Content-Type: text/plain; charset="iso-8859-1" ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Mon, 01 Dec 2003 21:19:50 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB22Jnb11202 for ; Mon, 1 Dec 2003 21:19:49 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte5.siemens.com (lte5.siemens.com [217.194.35.73]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB22Jqvd011910 for ; Mon, 1 Dec 2003 21:19:53 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte5.siemens.com (8.11.6/8.11.2) with ESMTP id hB22Ja829465 for ; Tue, 2 Dec 2003 03:19:36 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1AR08L-00026S-Eg; Mon, 01 Dec 2003 21:19:25 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 3 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Mon, 01 Dec 2003 21:19:25 -0500 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by scr.siemens.com id hB22Jnb11202 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. RES: [Spambayes] n-way in outlook? (Tiago Estill de Noronha) 2. Re: sb_imapfilter.py AssertionError: hamcount <=3D nham (Tony Lownds) 3. Re: sb_imapfilter.py AssertionError: hamcount <=3D nham (Tim Stone) 4. Feature request (for payment) (Neomatrix) 5. RE: Feature request (for payment) (Kenny Pitt) 6. Where do the rejects go? (Givens, Dallas) 7. RE: Outlook 97 and SpamBayes (Tony Meyer) 8. SpamBayes Load Problem - Pls Help (Rick King) 9. RE: SpamBayes Load Problem - Pls Help (Tony Meyer) 10. Broken link (Tom Larkin) 11. RE: Broken link (Tony Meyer) 12. SpamBayes Q: Backup of PST file (Cindy Peyser) ---------------------------------------------------------------------- Message: 1 Date: Mon, 1 Dec 2003 15:50:27 -0200 From: "Tiago Estill de Noronha" Subject: RES: [Spambayes] n-way in outlook? To: Message-ID: <001c01c3b833$9d95f800$0860b7c8@virtua.com.br> Content-Type: text/plain; charset=3D"Windows-1252" Thanx for the answer =20 =20 ********************* Tiago Estill de Noronha TiagoTiago@Globo.com -=3D> -----Mensagem original----- -=3D> De: spambayes-bounces@python.org=20 -=3D> [mailto:spambayes-bounces@python.org] Em nome de Skip Montanaro -=3D> Enviada em: domingo, 30 de novembro de 2003 20:15 -=3D> Para: Tiago Estill de Noronha -=3D> Cc: spambayes@python.org -=3D> Assunto: Re: [Spambayes] n-way in outlook? -=3D>=20 -=3D>=20 -=3D>=20 -=3D> Tiago> have some1 made a n-way classification code for=20 -=3D> using with the -=3D> Tiago> outlook plugin? -=3D>=20 -=3D> Not that I'm aware of. -=3D>=20 -=3D> Tiago> or does the nway code on the contrib folder=20 -=3D> works on outlook?=20 -=3D>=20 -=3D> Nope. nway.py is just a simple demo I wrote. Since I'm=20 -=3D> not an Outlook user, I made no attempt to make it work with=20 -=3D> Outlook. After playing with it a bit, I'm not convinced=20 -=3D> it's good enough for anything but experimental use. -=3D>=20 -=3D> Skip -=3D>=20 -=3D> _______________________________________________ -=3D> Spambayes@python.org=20 -=3D> -=3D> http://mail.python.org/mailman/listinfo/spambaye-=3D> s -=3D> Check the=20 -=3D>=20 -=3D> FAQ before asking:=20 -=3D> http://spambayes.sf.net/faq.html -=3D>=20 -=3D> --- -=3D> Incoming mail is certified Virus Free. -=3D> Checked by AVG anti-virus system (http://www.grisoft.com). -=3D> Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 -=3D> =20 -=3D>=20 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 =20 ------------------------------ Message: 2 Date: Mon, 1 Dec 2003 10:35:28 -0800 From: Tony Lownds Subject: Re: [Spambayes] sb_imapfilter.py AssertionError: hamcount <=3D nham To: papaDoc , spambayes@python.org Message-ID: Content-Type: text/plain; charset=3D"us-ascii" ; format=3D"flowed" At 10:57 AM -0500 12/1/03, papaDoc wrote: >Hi Tony, > >>> This error says that you have a token in your database that has >>>appeared in more ham than you have trained it on - which isn't possibl= e. >> >> >>Ah... while training it said 14 ham trained, while classifying it=20 >>only said 10 ham. > >Do you force the training (i.e. use the -f switch) ? >sb_mboxtrain -f I'm using sb_imapfilter.py, no sb_mboxtrain. sb_imapfilter.py does=20 not have an -f switch. I removed spambayes.messageinfo.db before=20 running the training database. That should do the same thing as -f,=20 right? Here are the commands I use to reproduce this: [tony ~]$ rm hammie.db spambayes.messageinfo.db [tony ~]$ /usr/bin/sb_imapfilter.py -t [tony ~]$ /usr/bin/sb_imapfilter.py -c I found something else that is interesting. When I train again, it=20 trains 4 more messages, which should have been trained already. Here=20 I train twice in a row and it trains 8 more messages each time: [tony ~]$ /usr/bin/sb_imapfilter.py -t SpamBayes IMAP Filter Beta1, version 0.1 (September 2003), using SpamBayes IMAP Filter Web Interface Alpha2, version 0.02 and engine SpamBayes Beta2, version 0.2 (July 2003). Loading state from hammie.db database hammie.db is an existing database, with 45 spam and 12 ham Loading database hammie.db... Done. Training Training ham folder INBOX.Ham .......***...*.. 4 trained. Training spam folder INBOX.Spam ..............***...................*........ 4 trained. Persisting hammie.db state in database Training took 0.65074801445 seconds, 8 messages were trained [tony ~]$ /usr/bin/sb_imapfilter.py -t SpamBayes IMAP Filter Beta1, version 0.1 (September 2003), using SpamBayes IMAP Filter Web Interface Alpha2, version 0.02 and engine SpamBayes Beta2, version 0.2 (July 2003). Loading state from hammie.db database hammie.db is an existing database, with 45 spam and 12 ham Loading database hammie.db... Done. Training Training ham folder INBOX.Ham .......***...*.. 4 trained. Training spam folder INBOX.Spam ..............***...................*........ 4 trained. Persisting hammie.db state in database Training took 0.65074801445 seconds, 8 messages were trained I'll try to figure out why sp_imapfilter.py is retraining those messages. -Tony ------------------------------ Message: 3 Date: Mon, 01 Dec 2003 12:56:55 -0600 From: Tim Stone Subject: Re: [Spambayes] sb_imapfilter.py AssertionError: hamcount <=3D nham To: Tony Lownds , papaDoc , spambayes@python.org Message-ID: Content-Type: text/plain; format=3Dflowed; charset=3Diso-8859-15 On Mon, 1 Dec 2003 10:35:28 -0800, Tony Lownds =20 wrote: > > > At 10:57 AM -0500 12/1/03, papaDoc wrote: >> Hi Tony, >> >>>> This error says that you have a token in your database that has >>>> appeared in more ham than you have trained it on - which isn't=20 >>>> possible. >>> >>> >>> Ah... while training it said 14 ham trained, while classifying it onl= y=20 >>> said 10 ham. >> >> Do you force the training (i.e. use the -f switch) ? >> sb_mboxtrain -f > > I'm using sb_imapfilter.py, no sb_mboxtrain. sb_imapfilter.py does not=20 > have an -f switch. I removed spambayes.messageinfo.db before running th= e=20 > training database. That should do the same thing as -f, right? Not necessarily. If you remove the messageinfo db, then sb has forgotten= =20 what messages it has already trained on, and will quite possibly train a=20 message that should otherwise be ignored... If you do this, then you mus= t=20 start with a completely new training database as well... > -- Vous exprimer; Expr=E9sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun ------------------------------ Message: 4 Date: Tue, 2 Dec 2003 08:50:38 +1100 From: "Neomatrix" Subject: [Spambayes] Feature request (for payment) To: Message-ID: <001401c3b855$2876a160$0200a8c0@Inspiron5150> Content-Type: text/plain; charset=3D"us-ascii" Hello. Great product. Did I mention you have a great product? =20 One itsy bitsy feature request, for which I will be happy to make a donat= ion or payment of sorts. and it is perhaps one that others may appreciate. =20 When mail is directed to the Junk folder, it is still marked UNREAD. This means it catches the eye, and must be 'cleared', which removes some of th= e effectiveness in the product. =20 Can I ask that you (or smarter people than I who can actually code) add a check box option or similar, which allows for AUTOMATED clearing of junk mail, setting the read flag to READ. Even if you direct it straight to deleted folder, it still has a bold UNREAD indicator, so I keep getting excited thinking I have mail and somebody loves me, but instead it is jus= t somebody suggesting I have a small penis and should consider their produc= ts (spam) etc. =20 Thank you, =20 Chris =20 So give me a suggested donation and account details or I will go to paypa= l link on your site ------------------------------ Message: 5 Date: Mon, 1 Dec 2003 17:02:35 -0500 From: "Kenny Pitt" Subject: RE: [Spambayes] Feature request (for payment) To: "'Neomatrix'" , Message-ID: Content-Type: text/plain; charset=3D"us-ascii" Neomatrix wrote: > Great product. >=20 > Did I mention you have a great product? Glad you like it. > When mail is directed to the Junk folder, it is still marked UNREAD. > This means it catches the eye, and must be 'cleared', which removes > some of the effectiveness in the product. >=20 > Can I ask that you (or smarter people than I who can actually code) > add a check box option or similar, which allows for AUTOMATED > clearing of junk mail, setting the read flag to READ. I assume that you are running the Outlook plugin, but you don't mention what version. The latest version (0.81) has a checkbox labeled "Mark spam as read" on the Filtering tab of SpamBayes Manager. This option should do what you are asking. Be warned, however, that Outlook's new mail envelope icon will still be shown even if all you have is spam. --=20 Kenny Pitt ------------------------------ Message: 6 Date: Mon, 1 Dec 2003 14:07:15 -0800=20 From: "Givens, Dallas" Subject: [Spambayes] Where do the rejects go? To: "'spambayes@python.org'" Message-ID: <2C68E1F1B3E3D1118A1D0008C7244C92032635A1@CHANNELLCOMM> Content-Type: text/plain I am running a Windows 2000 system which has Office XP. I have installed the outlook plugin and have forwarded junk email from our exchange server into my email account. The email went into the Junk Suspects folder and = I deleted them as spam. I then deleted them from the junk folder and the deleted items folder. After that, I forwarded the exact same email from = the exchange server again but this time it does not show up anywhere. Therefore, what I want to know is where are they going now? Are they bei= ng permanently deleted? If so, is there a way to have them deleted only to = the deleted items folder? Your help would be greatly appreciated. =20 Thank you, =20 Dallas Givens Desktop Administrator CHANNELL Commercial 26040 Ynez Road Temecula, CA. 92589 Office: (909) 719-2600 X2733 Cell: (760) 415-7510 =20 ------------------------------ Message: 7 Date: Tue, 2 Dec 2003 12:30:16 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Outlook 97 and SpamBayes To: "'Tamara Blatny'" , Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1C1@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset=3D"us-ascii" > We are using WinNT 4.0, Office 97, and I just installed the=20 > latest version of Spam Bayes. I don't have any log files. =20 > My problem is that I'm not seeing SpamBayes in my Outlook.=20 > Thank you for any assistance. What mail program are you using? If it's Outlook 97, and you're trying t= o use the Outlook plug-in, you're out of luck (it works with Outlook 2000 a= nd above). You can use the pop3 proxy or imap filter (assuming you get mail via pop3 or imap) instead, although you don't get the same integrated experience as with the plug-in. =3DTony Meyer ------------------------------ Message: 8 Date: Mon, 1 Dec 2003 19:15:48 -0500 From: "Rick King" Subject: [Spambayes] SpamBayes Load Problem - Pls Help To: Message-ID: Content-Type: text/plain; charset=3D"iso-8859-1" it says it gets installed, all the files show up on the hard drive, but then: a) there are no buttons or change in the menu ((Windows 98) indicating S= pam detection b) the box below in item #3 is not checked even though it shows up every time. BUT every time I check it and try to close or hit OK or reboot, it shows up again as that same box not being checked?! PROBLEM: Addin doesn't load When you start Outlook, there is no Anti-Spam item in the toolbar. To resolve this: If you are running a binary version, then perform the following steps: 1. Start Outlook, and select Tools->Options to display the main Options dialog. Select the tab labelled Other, then click on the Advanced button. 2. Click on the COM Add-Ins button. If the SpamBayes addin is not listed, then SpamBayes should be reinstalle= d (Note that running regsvr32.exe spambayes_addin.dll from the SpamBayes directory may also solve this problem) ** 3. If the SpamBayes addin is listed but not checked, then simply chec= k it and close the dialog. 4. If the SpamBayes addin is listed and checked, but still not working a= nd still not creating log files, then I am stumped! Plesae send any help! It seems like it is very near to being set up righ= t. Thnaks, - Rcik __________________________________ Rick King PreFlight Ventures (919) 806-1166 rking@preflightventures.com www.preflightventures.com Based in Research Triangle Park, NC, PreFlight Ventures helps entreprene= urs and new ventures grow their business through corporate partnering, technology licensing, and acquisitions. In addition, we provide coaching tools for entrepreneurs such as audio PreFlight PowerTips (www.preflightpowertips.com) and lead national workshops on "The Art of Telling Your Story", commercialization, and "Doing Smart Deals." <>< ------------------------------ Message: 9 Date: Tue, 2 Dec 2003 13:25:47 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] SpamBayes Load Problem - Pls Help To: "'Rick King'" , Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1C3@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset=3D"us-ascii" [...] > PROBLEM: Addin doesn't load > When you start Outlook, there is no Anti-Spam item in the toolbar. To > resolve this: [...] What version are you trying to install? This sounds like an old one. Please ensure that you are using the latest version (008.1). Please also note that the Outlook plug-in requires Outlook 2000 or above. =3DTony Meyer ------------------------------ Message: 10 Date: Mon, 1 Dec 2003 16:38:45 -0800 From: "Tom Larkin" Subject: [Spambayes] Broken link To: Message-ID: <4543928AC7BCE2498D2D2DBA72D89285015AD710@smith.pacificedge.com> Content-Type: text/plain; charset=3D"us-ascii" Broken image link at: http://spambayes.sourceforge.net/windows.html =20 -- T o m L a r k i n =20 ------------------------------ Message: 11 Date: Tue, 2 Dec 2003 13:43:34 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Broken link To: "'Tom Larkin'" , Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1C4@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset=3D"us-ascii" > Broken image link at: >=20 > http://spambayes.sourceforge.net/windows.html Everything looks fine here. I presume you mean on the sidebar on the lef= t where the sourceforge logo should appear? (I can't see any other images o= n that page). It loads fine here - a temporary glitch, perhaps? =3DTony Meyer ------------------------------ Message: 12 Date: Mon, 1 Dec 2003 18:22:13 -0800 From: "Cindy Peyser" Subject: [Spambayes] SpamBayes Q: Backup of PST file To: Message-ID: Content-Type: text/plain; charset=3D"iso-8859-1" Dear SpamBayes, I really really like SpamBayes, it is so far the best spam filter I have tried (out of 3). One problem: I have noticed since the time I set it up, my backup system has not been able to backup my Outlook PST file (It gives a message that = the file is in use even when Outlook is closed). And I have been unable to export, copy, or move my PST file (I just got a new computer, so am tryin= g to get set up again). Is there something about SpamBayes that keeps usin= g the PST file even after Outlook itself is closed? Also, how do I un-install? Regards, Cindy 206-634-2808 x1 cindy@pizer.com ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 3 **************************************** ------------------------------ Message: 2 Date: Mon, 1 Dec 2003 18:27:50 -0800 (PST) From: Atom 'Smasher' Subject: [Spambayes] spam tokens IibKrw0yteNAtHyZDDw (fwd) To: spambayes@python.org Message-ID: Content-Type: TEXT/PLAIN; charset=US-ASCII two things i've noticed about spam, i'm not sure if either of them are taken into account with SB, but maybe someone can look into this further... or maybe someone already has and they can tell me why these don't work... 1) so many spams have a *lot* of spaces (and tabs?) in the subject line. (like above {taken from real spam}). i know... multiple spaces aren't tokens, they *separate* tokens... but when there are 20+ in a row, in the subject line, that usually means spam. 2) so many spams are filled with nonsense and random strings rldvlzgj coldokiue i q wfup cadrhs r cqufqc e p fnlcgv fipv which probably don't appear in legit email. can these be used to detect spam? are they used? my understanding of bayesian filtering, is that if it never before encountered the word "rldvlzgj", then it scores 0.5 (or something fairly neutral). well, after i've trained it on a few hundred or a few thousand emails, i think it should have a good handle on my vocabulary and maybe be less forgiving with words i haven't seen before. i fully understand that the nature of bayesian filtering is often counter-intuitive when it comes to what to look at and what to ignore, so i'm fully prepared for someone to tell me exactly why these things don't work the way my brain thinks they should. ...atom _______________________________________________ PGP key - http://smasher.suspicious.org/pgp.txt 3EBE 2810 30AE 601D 54B2 4A90 9C28 0BBF 3D7D 41E3 ------------------------------------------------- "IDEA's key length is 128 bits - over twice as long as DES. Assuming that a brute force attack is the most efficient, it would require 2^128 (10^38) encryptions to recover the key. Design a chip that can test a billion keys per second an throw a billion of the them at the problem, and it will still take 10^13 years - that's longer than the age of the universe. An array of 10^24 such chips can find the key in a day, but there aren't enough silicon atoms in the universe to build such a machine. Now we're getting somewhere - although I'd keep my eye on the dark matter debate." -- Bruce Schneier, Applied Cryptography ------------------------------ Message: 3 Date: Mon, 1 Dec 2003 22:36:37 -0500 From: "Tim Peters" Subject: RE: [Spambayes] SpamBayes Q: Backup of PST file To: "Cindy Peyser" Cc: spambayes@python.org Message-ID: Content-Type: text/plain; charset="iso-8859-1" [Cindy Peyser] > Dear SpamBayes, > > I really really like SpamBayes, it is so far the best spam filter I > have tried (out of 3). Glad you like it! > One problem: I have noticed since the time I set it up, my backup > system has not been able to backup my Outlook PST file (It gives a > message that the file is in use even when Outlook is closed). And I > have been unable to export, copy, or move my PST file (I just got a > new computer, so am trying to get set up again). Is there something > about SpamBayes that keeps using the PST file even after Outlook > itself is closed? No, nothing that we know of, but closing Outlook doesn't always stop Outlook from running, with or without SpamBayes installed. I've always had this problem (with three different Outlook 2000 installations), and don't think it's any more-- or less --common since installing Outlook. You didn't say which version of Windows or Outlook you're using. The best thing to do before running backups is to use the Windows Task Manager to kill off all non-essential programs first. Exactly how you do that depends on which flavor of Windows you're using. You'll sometimes find that an Outlook process is still running despite that you closed Outlook. That's life. The same procedure applies if you get any sort of "persmission denied -- file in use" error when trying to copy, move, or rename a file. > Also, how do I un-install? Anyone? I run from CVS, so have no idea what the binary installer sets up here (assuming the poster used a binary installer, which seems too likely to question ). This one is nearly a FAQ lately! ------------------------------ Message: 4 Date: Mon, 1 Dec 2003 22:26:31 -0600 From: "Seth Goodman" Subject: RE: [Spambayes] SpamBayes Q: Backup of PST file To: Message-ID: Content-Type: text/plain; charset="iso-8859-1" > No, nothing that we know of, but closing Outlook doesn't always > stop Outlook > from running, with or without SpamBayes installed. I've always had this > problem (with three different Outlook 2000 installations), and don't think > it's any more-- or less --common since installing Outlook. Just as you say, I noticed the same thing with Outlook 2000 long before I tried SpamBayes. You close Outlook and an Outlook.exe process is still running. It's neither better nor worse with the SpamBayes Add-In. Oddly, I have even noticed that after Outlook is running for a long time, a second instance of Outlook.exe appears, according to the task manager. The symptom that prompts me to notice this is that a message does not become "read" after I display for the appropriate number of seconds in the preview pane. Then I look in the task manager and see two Outlook.exe processes. What a bizarre piece of code this Outlook is, but also very useful. -- Seth Goodman Humans: change "delete" to "sethg" to email me Spambots: disregard the above ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 4 **************************************** ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 5 **************************************** From sysadmin at scr.siemens.com Mon Dec 1 23:28:24 2003 From: sysadmin at scr.siemens.com (sysadmin@scr.siemens.com) Date: Mon Dec 1 23:28:39 2003 Subject: [Spambayes] Blocked Mail Notification Message-ID: <200312020428.hB24SXb12659@scr.siemens.com> ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Mon, 01 Dec 2003 23:28:20 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB24SJb12648 for ; Mon, 1 Dec 2003 23:28:19 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte1.siemens.com ([212.114.202.114]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB24SKvd013772 for ; Mon, 1 Dec 2003 23:28:22 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte1.siemens.com (8.11.6/8.11.6) with ESMTP id hB24S4U16715 for ; Tue, 2 Dec 2003 05:28:04 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1AR28o-0001AL-K3; Mon, 01 Dec 2003 23:28:02 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 6 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Mon, 01 Dec 2003 23:28:02 -0500 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Blocked Mail Notification (sysadmin@scr.siemens.com) ---------------------------------------------------------------------- Message: 1 Date: Mon, 01 Dec 2003 23:27:44 -0500 From: sysadmin@scr.siemens.com Subject: [Spambayes] Blocked Mail Notification To: spambayes@python.org Message-ID: <200312020427.hB24Rrb12643@scr.siemens.com> Content-Type: text/plain; charset="iso-8859-1" ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Mon, 01 Dec 2003 23:27:40 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB24Rdb12633 for ; Mon, 1 Dec 2003 23:27:39 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte2.siemens.com ([212.114.202.115]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB24Rgvd013762 for ; Mon, 1 Dec 2003 23:27:42 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte2.siemens.com (8.11.6/8.11.6) with ESMTP id hB24RQk23003 for ; Tue, 2 Dec 2003 05:27:26 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1AR28D-0000UE-FG; Mon, 01 Dec 2003 23:27:25 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 5 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Mon, 01 Dec 2003 23:27:25 -0500 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Blocked Mail Notification (sysadmin@scr.siemens.com) ---------------------------------------------------------------------- Message: 1 Date: Mon, 01 Dec 2003 23:27:09 -0500 From: sysadmin@scr.siemens.com Subject: [Spambayes] Blocked Mail Notification To: spambayes@python.org Message-ID: <200312020427.hB24RCb12625@scr.siemens.com> Content-Type: text/plain; charset="iso-8859-1" ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Mon, 01 Dec 2003 23:27:06 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB24R5b12617 for ; Mon, 1 Dec 2003 23:27:05 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte2.siemens.com ([212.114.202.115]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB24R8vd013744 for ; Mon, 1 Dec 2003 23:27:08 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte2.siemens.com (8.11.6/8.11.6) with ESMTP id hB24Qqk22783 for ; Tue, 2 Dec 2003 05:26:52 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1AR27T-0007zj-Jh; Mon, 01 Dec 2003 23:26:39 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 4 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Mon, 01 Dec 2003 23:26:39 -0500 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Blocked Mail Notification (sysadmin@scr.siemens.com) 2. spam tokens IibKrw0yteNAtHyZDDw (fwd) (Atom 'Smasher') 3. RE: SpamBayes Q: Backup of PST file (Tim Peters) 4. RE: SpamBayes Q: Backup of PST file (Seth Goodman) ---------------------------------------------------------------------- Message: 1 Date: Mon, 01 Dec 2003 21:19:53 -0500 From: sysadmin@scr.siemens.com Subject: [Spambayes] Blocked Mail Notification To: spambayes@python.org Message-ID: <200312020220.hB22K4b11209@scr.siemens.com> Content-Type: text/plain; charset="iso-8859-1" ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Mon, 01 Dec 2003 21:19:50 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB22Jnb11202 for ; Mon, 1 Dec 2003 21:19:49 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte5.siemens.com (lte5.siemens.com [217.194.35.73]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB22Jqvd011910 for ; Mon, 1 Dec 2003 21:19:53 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte5.siemens.com (8.11.6/8.11.2) with ESMTP id hB22Ja829465 for ; Tue, 2 Dec 2003 03:19:36 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1AR08L-00026S-Eg; Mon, 01 Dec 2003 21:19:25 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 3 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Mon, 01 Dec 2003 21:19:25 -0500 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by scr.siemens.com id hB22Jnb11202 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. RES: [Spambayes] n-way in outlook? (Tiago Estill de Noronha) 2. Re: sb_imapfilter.py AssertionError: hamcount <=3D nham (Tony Lownds) 3. Re: sb_imapfilter.py AssertionError: hamcount <=3D nham (Tim Stone) 4. Feature request (for payment) (Neomatrix) 5. RE: Feature request (for payment) (Kenny Pitt) 6. Where do the rejects go? (Givens, Dallas) 7. RE: Outlook 97 and SpamBayes (Tony Meyer) 8. SpamBayes Load Problem - Pls Help (Rick King) 9. RE: SpamBayes Load Problem - Pls Help (Tony Meyer) 10. Broken link (Tom Larkin) 11. RE: Broken link (Tony Meyer) 12. SpamBayes Q: Backup of PST file (Cindy Peyser) ---------------------------------------------------------------------- Message: 1 Date: Mon, 1 Dec 2003 15:50:27 -0200 From: "Tiago Estill de Noronha" Subject: RES: [Spambayes] n-way in outlook? To: Message-ID: <001c01c3b833$9d95f800$0860b7c8@virtua.com.br> Content-Type: text/plain; charset=3D"Windows-1252" Thanx for the answer =20 =20 ********************* Tiago Estill de Noronha TiagoTiago@Globo.com -=3D> -----Mensagem original----- -=3D> De: spambayes-bounces@python.org=20 -=3D> [mailto:spambayes-bounces@python.org] Em nome de Skip Montanaro -=3D> Enviada em: domingo, 30 de novembro de 2003 20:15 -=3D> Para: Tiago Estill de Noronha -=3D> Cc: spambayes@python.org -=3D> Assunto: Re: [Spambayes] n-way in outlook? -=3D>=20 -=3D>=20 -=3D>=20 -=3D> Tiago> have some1 made a n-way classification code for=20 -=3D> using with the -=3D> Tiago> outlook plugin? -=3D>=20 -=3D> Not that I'm aware of. -=3D>=20 -=3D> Tiago> or does the nway code on the contrib folder=20 -=3D> works on outlook?=20 -=3D>=20 -=3D> Nope. nway.py is just a simple demo I wrote. Since I'm=20 -=3D> not an Outlook user, I made no attempt to make it work with=20 -=3D> Outlook. After playing with it a bit, I'm not convinced=20 -=3D> it's good enough for anything but experimental use. -=3D>=20 -=3D> Skip -=3D>=20 -=3D> _______________________________________________ -=3D> Spambayes@python.org=20 -=3D> -=3D> http://mail.python.org/mailman/listinfo/spambaye-=3D> s -=3D> Check the=20 -=3D>=20 -=3D> FAQ before asking:=20 -=3D> http://spambayes.sf.net/faq.html -=3D>=20 -=3D> --- -=3D> Incoming mail is certified Virus Free. -=3D> Checked by AVG anti-virus system (http://www.grisoft.com). -=3D> Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 -=3D> =20 -=3D>=20 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 =20 ------------------------------ Message: 2 Date: Mon, 1 Dec 2003 10:35:28 -0800 From: Tony Lownds Subject: Re: [Spambayes] sb_imapfilter.py AssertionError: hamcount <=3D nham To: papaDoc , spambayes@python.org Message-ID: Content-Type: text/plain; charset=3D"us-ascii" ; format=3D"flowed" At 10:57 AM -0500 12/1/03, papaDoc wrote: >Hi Tony, > >>> This error says that you have a token in your database that has >>>appeared in more ham than you have trained it on - which isn't possibl= e. >> >> >>Ah... while training it said 14 ham trained, while classifying it=20 >>only said 10 ham. > >Do you force the training (i.e. use the -f switch) ? >sb_mboxtrain -f I'm using sb_imapfilter.py, no sb_mboxtrain. sb_imapfilter.py does=20 not have an -f switch. I removed spambayes.messageinfo.db before=20 running the training database. That should do the same thing as -f,=20 right? Here are the commands I use to reproduce this: [tony ~]$ rm hammie.db spambayes.messageinfo.db [tony ~]$ /usr/bin/sb_imapfilter.py -t [tony ~]$ /usr/bin/sb_imapfilter.py -c I found something else that is interesting. When I train again, it=20 trains 4 more messages, which should have been trained already. Here=20 I train twice in a row and it trains 8 more messages each time: [tony ~]$ /usr/bin/sb_imapfilter.py -t SpamBayes IMAP Filter Beta1, version 0.1 (September 2003), using SpamBayes IMAP Filter Web Interface Alpha2, version 0.02 and engine SpamBayes Beta2, version 0.2 (July 2003). Loading state from hammie.db database hammie.db is an existing database, with 45 spam and 12 ham Loading database hammie.db... Done. Training Training ham folder INBOX.Ham .......***...*.. 4 trained. Training spam folder INBOX.Spam ..............***...................*........ 4 trained. Persisting hammie.db state in database Training took 0.65074801445 seconds, 8 messages were trained [tony ~]$ /usr/bin/sb_imapfilter.py -t SpamBayes IMAP Filter Beta1, version 0.1 (September 2003), using SpamBayes IMAP Filter Web Interface Alpha2, version 0.02 and engine SpamBayes Beta2, version 0.2 (July 2003). Loading state from hammie.db database hammie.db is an existing database, with 45 spam and 12 ham Loading database hammie.db... Done. Training Training ham folder INBOX.Ham .......***...*.. 4 trained. Training spam folder INBOX.Spam ..............***...................*........ 4 trained. Persisting hammie.db state in database Training took 0.65074801445 seconds, 8 messages were trained I'll try to figure out why sp_imapfilter.py is retraining those messages. -Tony ------------------------------ Message: 3 Date: Mon, 01 Dec 2003 12:56:55 -0600 From: Tim Stone Subject: Re: [Spambayes] sb_imapfilter.py AssertionError: hamcount <=3D nham To: Tony Lownds , papaDoc , spambayes@python.org Message-ID: Content-Type: text/plain; format=3Dflowed; charset=3Diso-8859-15 On Mon, 1 Dec 2003 10:35:28 -0800, Tony Lownds =20 wrote: > > > At 10:57 AM -0500 12/1/03, papaDoc wrote: >> Hi Tony, >> >>>> This error says that you have a token in your database that has >>>> appeared in more ham than you have trained it on - which isn't=20 >>>> possible. >>> >>> >>> Ah... while training it said 14 ham trained, while classifying it onl= y=20 >>> said 10 ham. >> >> Do you force the training (i.e. use the -f switch) ? >> sb_mboxtrain -f > > I'm using sb_imapfilter.py, no sb_mboxtrain. sb_imapfilter.py does not=20 > have an -f switch. I removed spambayes.messageinfo.db before running th= e=20 > training database. That should do the same thing as -f, right? Not necessarily. If you remove the messageinfo db, then sb has forgotten= =20 what messages it has already trained on, and will quite possibly train a=20 message that should otherwise be ignored... If you do this, then you mus= t=20 start with a completely new training database as well... > -- Vous exprimer; Expr=E9sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun ------------------------------ Message: 4 Date: Tue, 2 Dec 2003 08:50:38 +1100 From: "Neomatrix" Subject: [Spambayes] Feature request (for payment) To: Message-ID: <001401c3b855$2876a160$0200a8c0@Inspiron5150> Content-Type: text/plain; charset=3D"us-ascii" Hello. Great product. Did I mention you have a great product? =20 One itsy bitsy feature request, for which I will be happy to make a donat= ion or payment of sorts. and it is perhaps one that others may appreciate. =20 When mail is directed to the Junk folder, it is still marked UNREAD. This means it catches the eye, and must be 'cleared', which removes some of th= e effectiveness in the product. =20 Can I ask that you (or smarter people than I who can actually code) add a check box option or similar, which allows for AUTOMATED clearing of junk mail, setting the read flag to READ. Even if you direct it straight to deleted folder, it still has a bold UNREAD indicator, so I keep getting excited thinking I have mail and somebody loves me, but instead it is jus= t somebody suggesting I have a small penis and should consider their produc= ts (spam) etc. =20 Thank you, =20 Chris =20 So give me a suggested donation and account details or I will go to paypa= l link on your site ------------------------------ Message: 5 Date: Mon, 1 Dec 2003 17:02:35 -0500 From: "Kenny Pitt" Subject: RE: [Spambayes] Feature request (for payment) To: "'Neomatrix'" , Message-ID: Content-Type: text/plain; charset=3D"us-ascii" Neomatrix wrote: > Great product. >=20 > Did I mention you have a great product? Glad you like it. > When mail is directed to the Junk folder, it is still marked UNREAD. > This means it catches the eye, and must be 'cleared', which removes > some of the effectiveness in the product. >=20 > Can I ask that you (or smarter people than I who can actually code) > add a check box option or similar, which allows for AUTOMATED > clearing of junk mail, setting the read flag to READ. I assume that you are running the Outlook plugin, but you don't mention what version. The latest version (0.81) has a checkbox labeled "Mark spam as read" on the Filtering tab of SpamBayes Manager. This option should do what you are asking. Be warned, however, that Outlook's new mail envelope icon will still be shown even if all you have is spam. --=20 Kenny Pitt ------------------------------ Message: 6 Date: Mon, 1 Dec 2003 14:07:15 -0800=20 From: "Givens, Dallas" Subject: [Spambayes] Where do the rejects go? To: "'spambayes@python.org'" Message-ID: <2C68E1F1B3E3D1118A1D0008C7244C92032635A1@CHANNELLCOMM> Content-Type: text/plain I am running a Windows 2000 system which has Office XP. I have installed the outlook plugin and have forwarded junk email from our exchange server into my email account. The email went into the Junk Suspects folder and = I deleted them as spam. I then deleted them from the junk folder and the deleted items folder. After that, I forwarded the exact same email from = the exchange server again but this time it does not show up anywhere. Therefore, what I want to know is where are they going now? Are they bei= ng permanently deleted? If so, is there a way to have them deleted only to = the deleted items folder? Your help would be greatly appreciated. =20 Thank you, =20 Dallas Givens Desktop Administrator CHANNELL Commercial 26040 Ynez Road Temecula, CA. 92589 Office: (909) 719-2600 X2733 Cell: (760) 415-7510 =20 ------------------------------ Message: 7 Date: Tue, 2 Dec 2003 12:30:16 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Outlook 97 and SpamBayes To: "'Tamara Blatny'" , Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1C1@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset=3D"us-ascii" > We are using WinNT 4.0, Office 97, and I just installed the=20 > latest version of Spam Bayes. I don't have any log files. =20 > My problem is that I'm not seeing SpamBayes in my Outlook.=20 > Thank you for any assistance. What mail program are you using? If it's Outlook 97, and you're trying t= o use the Outlook plug-in, you're out of luck (it works with Outlook 2000 a= nd above). You can use the pop3 proxy or imap filter (assuming you get mail via pop3 or imap) instead, although you don't get the same integrated experience as with the plug-in. =3DTony Meyer ------------------------------ Message: 8 Date: Mon, 1 Dec 2003 19:15:48 -0500 From: "Rick King" Subject: [Spambayes] SpamBayes Load Problem - Pls Help To: Message-ID: Content-Type: text/plain; charset=3D"iso-8859-1" it says it gets installed, all the files show up on the hard drive, but then: a) there are no buttons or change in the menu ((Windows 98) indicating S= pam detection b) the box below in item #3 is not checked even though it shows up every time. BUT every time I check it and try to close or hit OK or reboot, it shows up again as that same box not being checked?! PROBLEM: Addin doesn't load When you start Outlook, there is no Anti-Spam item in the toolbar. To resolve this: If you are running a binary version, then perform the following steps: 1. Start Outlook, and select Tools->Options to display the main Options dialog. Select the tab labelled Other, then click on the Advanced button. 2. Click on the COM Add-Ins button. If the SpamBayes addin is not listed, then SpamBayes should be reinstalle= d (Note that running regsvr32.exe spambayes_addin.dll from the SpamBayes directory may also solve this problem) ** 3. If the SpamBayes addin is listed but not checked, then simply chec= k it and close the dialog. 4. If the SpamBayes addin is listed and checked, but still not working a= nd still not creating log files, then I am stumped! Plesae send any help! It seems like it is very near to being set up righ= t. Thnaks, - Rcik __________________________________ Rick King PreFlight Ventures (919) 806-1166 rking@preflightventures.com www.preflightventures.com Based in Research Triangle Park, NC, PreFlight Ventures helps entreprene= urs and new ventures grow their business through corporate partnering, technology licensing, and acquisitions. In addition, we provide coaching tools for entrepreneurs such as audio PreFlight PowerTips (www.preflightpowertips.com) and lead national workshops on "The Art of Telling Your Story", commercialization, and "Doing Smart Deals." <>< ------------------------------ Message: 9 Date: Tue, 2 Dec 2003 13:25:47 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] SpamBayes Load Problem - Pls Help To: "'Rick King'" , Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1C3@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset=3D"us-ascii" [...] > PROBLEM: Addin doesn't load > When you start Outlook, there is no Anti-Spam item in the toolbar. To > resolve this: [...] What version are you trying to install? This sounds like an old one. Please ensure that you are using the latest version (008.1). Please also note that the Outlook plug-in requires Outlook 2000 or above. =3DTony Meyer ------------------------------ Message: 10 Date: Mon, 1 Dec 2003 16:38:45 -0800 From: "Tom Larkin" Subject: [Spambayes] Broken link To: Message-ID: <4543928AC7BCE2498D2D2DBA72D89285015AD710@smith.pacificedge.com> Content-Type: text/plain; charset=3D"us-ascii" Broken image link at: http://spambayes.sourceforge.net/windows.html =20 -- T o m L a r k i n =20 ------------------------------ Message: 11 Date: Tue, 2 Dec 2003 13:43:34 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Broken link To: "'Tom Larkin'" , Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1C4@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset=3D"us-ascii" > Broken image link at: >=20 > http://spambayes.sourceforge.net/windows.html Everything looks fine here. I presume you mean on the sidebar on the lef= t where the sourceforge logo should appear? (I can't see any other images o= n that page). It loads fine here - a temporary glitch, perhaps? =3DTony Meyer ------------------------------ Message: 12 Date: Mon, 1 Dec 2003 18:22:13 -0800 From: "Cindy Peyser" Subject: [Spambayes] SpamBayes Q: Backup of PST file To: Message-ID: Content-Type: text/plain; charset=3D"iso-8859-1" Dear SpamBayes, I really really like SpamBayes, it is so far the best spam filter I have tried (out of 3). One problem: I have noticed since the time I set it up, my backup system has not been able to backup my Outlook PST file (It gives a message that = the file is in use even when Outlook is closed). And I have been unable to export, copy, or move my PST file (I just got a new computer, so am tryin= g to get set up again). Is there something about SpamBayes that keeps usin= g the PST file even after Outlook itself is closed? Also, how do I un-install? Regards, Cindy 206-634-2808 x1 cindy@pizer.com ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 3 **************************************** ------------------------------ Message: 2 Date: Mon, 1 Dec 2003 18:27:50 -0800 (PST) From: Atom 'Smasher' Subject: [Spambayes] spam tokens IibKrw0yteNAtHyZDDw (fwd) To: spambayes@python.org Message-ID: Content-Type: TEXT/PLAIN; charset=US-ASCII two things i've noticed about spam, i'm not sure if either of them are taken into account with SB, but maybe someone can look into this further... or maybe someone already has and they can tell me why these don't work... 1) so many spams have a *lot* of spaces (and tabs?) in the subject line. (like above {taken from real spam}). i know... multiple spaces aren't tokens, they *separate* tokens... but when there are 20+ in a row, in the subject line, that usually means spam. 2) so many spams are filled with nonsense and random strings rldvlzgj coldokiue i q wfup cadrhs r cqufqc e p fnlcgv fipv which probably don't appear in legit email. can these be used to detect spam? are they used? my understanding of bayesian filtering, is that if it never before encountered the word "rldvlzgj", then it scores 0.5 (or something fairly neutral). well, after i've trained it on a few hundred or a few thousand emails, i think it should have a good handle on my vocabulary and maybe be less forgiving with words i haven't seen before. i fully understand that the nature of bayesian filtering is often counter-intuitive when it comes to what to look at and what to ignore, so i'm fully prepared for someone to tell me exactly why these things don't work the way my brain thinks they should. ...atom _______________________________________________ PGP key - http://smasher.suspicious.org/pgp.txt 3EBE 2810 30AE 601D 54B2 4A90 9C28 0BBF 3D7D 41E3 ------------------------------------------------- "IDEA's key length is 128 bits - over twice as long as DES. Assuming that a brute force attack is the most efficient, it would require 2^128 (10^38) encryptions to recover the key. Design a chip that can test a billion keys per second an throw a billion of the them at the problem, and it will still take 10^13 years - that's longer than the age of the universe. An array of 10^24 such chips can find the key in a day, but there aren't enough silicon atoms in the universe to build such a machine. Now we're getting somewhere - although I'd keep my eye on the dark matter debate." -- Bruce Schneier, Applied Cryptography ------------------------------ Message: 3 Date: Mon, 1 Dec 2003 22:36:37 -0500 From: "Tim Peters" Subject: RE: [Spambayes] SpamBayes Q: Backup of PST file To: "Cindy Peyser" Cc: spambayes@python.org Message-ID: Content-Type: text/plain; charset="iso-8859-1" [Cindy Peyser] > Dear SpamBayes, > > I really really like SpamBayes, it is so far the best spam filter I > have tried (out of 3). Glad you like it! > One problem: I have noticed since the time I set it up, my backup > system has not been able to backup my Outlook PST file (It gives a > message that the file is in use even when Outlook is closed). And I > have been unable to export, copy, or move my PST file (I just got a > new computer, so am trying to get set up again). Is there something > about SpamBayes that keeps using the PST file even after Outlook > itself is closed? No, nothing that we know of, but closing Outlook doesn't always stop Outlook from running, with or without SpamBayes installed. I've always had this problem (with three different Outlook 2000 installations), and don't think it's any more-- or less --common since installing Outlook. You didn't say which version of Windows or Outlook you're using. The best thing to do before running backups is to use the Windows Task Manager to kill off all non-essential programs first. Exactly how you do that depends on which flavor of Windows you're using. You'll sometimes find that an Outlook process is still running despite that you closed Outlook. That's life. The same procedure applies if you get any sort of "persmission denied -- file in use" error when trying to copy, move, or rename a file. > Also, how do I un-install? Anyone? I run from CVS, so have no idea what the binary installer sets up here (assuming the poster used a binary installer, which seems too likely to question ). This one is nearly a FAQ lately! ------------------------------ Message: 4 Date: Mon, 1 Dec 2003 22:26:31 -0600 From: "Seth Goodman" Subject: RE: [Spambayes] SpamBayes Q: Backup of PST file To: Message-ID: Content-Type: text/plain; charset="iso-8859-1" > No, nothing that we know of, but closing Outlook doesn't always > stop Outlook > from running, with or without SpamBayes installed. I've always had this > problem (with three different Outlook 2000 installations), and don't think > it's any more-- or less --common since installing Outlook. Just as you say, I noticed the same thing with Outlook 2000 long before I tried SpamBayes. You close Outlook and an Outlook.exe process is still running. It's neither better nor worse with the SpamBayes Add-In. Oddly, I have even noticed that after Outlook is running for a long time, a second instance of Outlook.exe appears, according to the task manager. The symptom that prompts me to notice this is that a message does not become "read" after I display for the appropriate number of seconds in the preview pane. Then I look in the task manager and see two Outlook.exe processes. What a bizarre piece of code this Outlook is, but also very useful. -- Seth Goodman Humans: change "delete" to "sethg" to email me Spambots: disregard the above ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 4 **************************************** ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 5 **************************************** ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 6 **************************************** From sysadmin at scr.siemens.com Mon Dec 1 23:29:04 2003 From: sysadmin at scr.siemens.com (sysadmin@scr.siemens.com) Date: Mon Dec 1 23:29:19 2003 Subject: [Spambayes] Blocked Mail Notification Message-ID: <200312020429.hB24TDb12675@scr.siemens.com> ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Mon, 01 Dec 2003 23:28:59 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB24Swb12671 for ; Mon, 1 Dec 2003 23:28:59 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte2.siemens.com ([212.114.202.115]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB24T1vd013787 for ; Mon, 1 Dec 2003 23:29:02 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte2.siemens.com (8.11.6/8.11.6) with ESMTP id hB24Sjk23382 for ; Tue, 2 Dec 2003 05:28:45 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1AR29U-00021Y-VV; Mon, 01 Dec 2003 23:28:44 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 7 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Mon, 01 Dec 2003 23:28:44 -0500 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Blocked Mail Notification (sysadmin@scr.siemens.com) ---------------------------------------------------------------------- Message: 1 Date: Mon, 01 Dec 2003 23:28:24 -0500 From: sysadmin@scr.siemens.com Subject: [Spambayes] Blocked Mail Notification To: spambayes@python.org Message-ID: <200312020428.hB24SXb12659@scr.siemens.com> Content-Type: text/plain; charset="iso-8859-1" ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Mon, 01 Dec 2003 23:28:20 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB24SJb12648 for ; Mon, 1 Dec 2003 23:28:19 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte1.siemens.com ([212.114.202.114]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB24SKvd013772 for ; Mon, 1 Dec 2003 23:28:22 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte1.siemens.com (8.11.6/8.11.6) with ESMTP id hB24S4U16715 for ; Tue, 2 Dec 2003 05:28:04 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1AR28o-0001AL-K3; Mon, 01 Dec 2003 23:28:02 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 6 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Mon, 01 Dec 2003 23:28:02 -0500 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Blocked Mail Notification (sysadmin@scr.siemens.com) ---------------------------------------------------------------------- Message: 1 Date: Mon, 01 Dec 2003 23:27:44 -0500 From: sysadmin@scr.siemens.com Subject: [Spambayes] Blocked Mail Notification To: spambayes@python.org Message-ID: <200312020427.hB24Rrb12643@scr.siemens.com> Content-Type: text/plain; charset="iso-8859-1" ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Mon, 01 Dec 2003 23:27:40 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB24Rdb12633 for ; Mon, 1 Dec 2003 23:27:39 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte2.siemens.com ([212.114.202.115]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB24Rgvd013762 for ; Mon, 1 Dec 2003 23:27:42 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte2.siemens.com (8.11.6/8.11.6) with ESMTP id hB24RQk23003 for ; Tue, 2 Dec 2003 05:27:26 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1AR28D-0000UE-FG; Mon, 01 Dec 2003 23:27:25 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 5 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Mon, 01 Dec 2003 23:27:25 -0500 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Blocked Mail Notification (sysadmin@scr.siemens.com) ---------------------------------------------------------------------- Message: 1 Date: Mon, 01 Dec 2003 23:27:09 -0500 From: sysadmin@scr.siemens.com Subject: [Spambayes] Blocked Mail Notification To: spambayes@python.org Message-ID: <200312020427.hB24RCb12625@scr.siemens.com> Content-Type: text/plain; charset="iso-8859-1" ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Mon, 01 Dec 2003 23:27:06 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB24R5b12617 for ; Mon, 1 Dec 2003 23:27:05 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte2.siemens.com ([212.114.202.115]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB24R8vd013744 for ; Mon, 1 Dec 2003 23:27:08 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte2.siemens.com (8.11.6/8.11.6) with ESMTP id hB24Qqk22783 for ; Tue, 2 Dec 2003 05:26:52 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1AR27T-0007zj-Jh; Mon, 01 Dec 2003 23:26:39 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 4 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Mon, 01 Dec 2003 23:26:39 -0500 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Blocked Mail Notification (sysadmin@scr.siemens.com) 2. spam tokens IibKrw0yteNAtHyZDDw (fwd) (Atom 'Smasher') 3. RE: SpamBayes Q: Backup of PST file (Tim Peters) 4. RE: SpamBayes Q: Backup of PST file (Seth Goodman) ---------------------------------------------------------------------- Message: 1 Date: Mon, 01 Dec 2003 21:19:53 -0500 From: sysadmin@scr.siemens.com Subject: [Spambayes] Blocked Mail Notification To: spambayes@python.org Message-ID: <200312020220.hB22K4b11209@scr.siemens.com> Content-Type: text/plain; charset="iso-8859-1" ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Mon, 01 Dec 2003 21:19:50 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB22Jnb11202 for ; Mon, 1 Dec 2003 21:19:49 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte5.siemens.com (lte5.siemens.com [217.194.35.73]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB22Jqvd011910 for ; Mon, 1 Dec 2003 21:19:53 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte5.siemens.com (8.11.6/8.11.2) with ESMTP id hB22Ja829465 for ; Tue, 2 Dec 2003 03:19:36 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1AR08L-00026S-Eg; Mon, 01 Dec 2003 21:19:25 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 3 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Mon, 01 Dec 2003 21:19:25 -0500 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by scr.siemens.com id hB22Jnb11202 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. RES: [Spambayes] n-way in outlook? (Tiago Estill de Noronha) 2. Re: sb_imapfilter.py AssertionError: hamcount <=3D nham (Tony Lownds) 3. Re: sb_imapfilter.py AssertionError: hamcount <=3D nham (Tim Stone) 4. Feature request (for payment) (Neomatrix) 5. RE: Feature request (for payment) (Kenny Pitt) 6. Where do the rejects go? (Givens, Dallas) 7. RE: Outlook 97 and SpamBayes (Tony Meyer) 8. SpamBayes Load Problem - Pls Help (Rick King) 9. RE: SpamBayes Load Problem - Pls Help (Tony Meyer) 10. Broken link (Tom Larkin) 11. RE: Broken link (Tony Meyer) 12. SpamBayes Q: Backup of PST file (Cindy Peyser) ---------------------------------------------------------------------- Message: 1 Date: Mon, 1 Dec 2003 15:50:27 -0200 From: "Tiago Estill de Noronha" Subject: RES: [Spambayes] n-way in outlook? To: Message-ID: <001c01c3b833$9d95f800$0860b7c8@virtua.com.br> Content-Type: text/plain; charset=3D"Windows-1252" Thanx for the answer =20 =20 ********************* Tiago Estill de Noronha TiagoTiago@Globo.com -=3D> -----Mensagem original----- -=3D> De: spambayes-bounces@python.org=20 -=3D> [mailto:spambayes-bounces@python.org] Em nome de Skip Montanaro -=3D> Enviada em: domingo, 30 de novembro de 2003 20:15 -=3D> Para: Tiago Estill de Noronha -=3D> Cc: spambayes@python.org -=3D> Assunto: Re: [Spambayes] n-way in outlook? -=3D>=20 -=3D>=20 -=3D>=20 -=3D> Tiago> have some1 made a n-way classification code for=20 -=3D> using with the -=3D> Tiago> outlook plugin? -=3D>=20 -=3D> Not that I'm aware of. -=3D>=20 -=3D> Tiago> or does the nway code on the contrib folder=20 -=3D> works on outlook?=20 -=3D>=20 -=3D> Nope. nway.py is just a simple demo I wrote. Since I'm=20 -=3D> not an Outlook user, I made no attempt to make it work with=20 -=3D> Outlook. After playing with it a bit, I'm not convinced=20 -=3D> it's good enough for anything but experimental use. -=3D>=20 -=3D> Skip -=3D>=20 -=3D> _______________________________________________ -=3D> Spambayes@python.org=20 -=3D> -=3D> http://mail.python.org/mailman/listinfo/spambaye-=3D> s -=3D> Check the=20 -=3D>=20 -=3D> FAQ before asking:=20 -=3D> http://spambayes.sf.net/faq.html -=3D>=20 -=3D> --- -=3D> Incoming mail is certified Virus Free. -=3D> Checked by AVG anti-virus system (http://www.grisoft.com). -=3D> Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 -=3D> =20 -=3D>=20 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 =20 ------------------------------ Message: 2 Date: Mon, 1 Dec 2003 10:35:28 -0800 From: Tony Lownds Subject: Re: [Spambayes] sb_imapfilter.py AssertionError: hamcount <=3D nham To: papaDoc , spambayes@python.org Message-ID: Content-Type: text/plain; charset=3D"us-ascii" ; format=3D"flowed" At 10:57 AM -0500 12/1/03, papaDoc wrote: >Hi Tony, > >>> This error says that you have a token in your database that has >>>appeared in more ham than you have trained it on - which isn't possibl= e. >> >> >>Ah... while training it said 14 ham trained, while classifying it=20 >>only said 10 ham. > >Do you force the training (i.e. use the -f switch) ? >sb_mboxtrain -f I'm using sb_imapfilter.py, no sb_mboxtrain. sb_imapfilter.py does=20 not have an -f switch. I removed spambayes.messageinfo.db before=20 running the training database. That should do the same thing as -f,=20 right? Here are the commands I use to reproduce this: [tony ~]$ rm hammie.db spambayes.messageinfo.db [tony ~]$ /usr/bin/sb_imapfilter.py -t [tony ~]$ /usr/bin/sb_imapfilter.py -c I found something else that is interesting. When I train again, it=20 trains 4 more messages, which should have been trained already. Here=20 I train twice in a row and it trains 8 more messages each time: [tony ~]$ /usr/bin/sb_imapfilter.py -t SpamBayes IMAP Filter Beta1, version 0.1 (September 2003), using SpamBayes IMAP Filter Web Interface Alpha2, version 0.02 and engine SpamBayes Beta2, version 0.2 (July 2003). Loading state from hammie.db database hammie.db is an existing database, with 45 spam and 12 ham Loading database hammie.db... Done. Training Training ham folder INBOX.Ham .......***...*.. 4 trained. Training spam folder INBOX.Spam ..............***...................*........ 4 trained. Persisting hammie.db state in database Training took 0.65074801445 seconds, 8 messages were trained [tony ~]$ /usr/bin/sb_imapfilter.py -t SpamBayes IMAP Filter Beta1, version 0.1 (September 2003), using SpamBayes IMAP Filter Web Interface Alpha2, version 0.02 and engine SpamBayes Beta2, version 0.2 (July 2003). Loading state from hammie.db database hammie.db is an existing database, with 45 spam and 12 ham Loading database hammie.db... Done. Training Training ham folder INBOX.Ham .......***...*.. 4 trained. Training spam folder INBOX.Spam ..............***...................*........ 4 trained. Persisting hammie.db state in database Training took 0.65074801445 seconds, 8 messages were trained I'll try to figure out why sp_imapfilter.py is retraining those messages. -Tony ------------------------------ Message: 3 Date: Mon, 01 Dec 2003 12:56:55 -0600 From: Tim Stone Subject: Re: [Spambayes] sb_imapfilter.py AssertionError: hamcount <=3D nham To: Tony Lownds , papaDoc , spambayes@python.org Message-ID: Content-Type: text/plain; format=3Dflowed; charset=3Diso-8859-15 On Mon, 1 Dec 2003 10:35:28 -0800, Tony Lownds =20 wrote: > > > At 10:57 AM -0500 12/1/03, papaDoc wrote: >> Hi Tony, >> >>>> This error says that you have a token in your database that has >>>> appeared in more ham than you have trained it on - which isn't=20 >>>> possible. >>> >>> >>> Ah... while training it said 14 ham trained, while classifying it onl= y=20 >>> said 10 ham. >> >> Do you force the training (i.e. use the -f switch) ? >> sb_mboxtrain -f > > I'm using sb_imapfilter.py, no sb_mboxtrain. sb_imapfilter.py does not=20 > have an -f switch. I removed spambayes.messageinfo.db before running th= e=20 > training database. That should do the same thing as -f, right? Not necessarily. If you remove the messageinfo db, then sb has forgotten= =20 what messages it has already trained on, and will quite possibly train a=20 message that should otherwise be ignored... If you do this, then you mus= t=20 start with a completely new training database as well... > -- Vous exprimer; Expr=E9sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun ------------------------------ Message: 4 Date: Tue, 2 Dec 2003 08:50:38 +1100 From: "Neomatrix" Subject: [Spambayes] Feature request (for payment) To: Message-ID: <001401c3b855$2876a160$0200a8c0@Inspiron5150> Content-Type: text/plain; charset=3D"us-ascii" Hello. Great product. Did I mention you have a great product? =20 One itsy bitsy feature request, for which I will be happy to make a donat= ion or payment of sorts. and it is perhaps one that others may appreciate. =20 When mail is directed to the Junk folder, it is still marked UNREAD. This means it catches the eye, and must be 'cleared', which removes some of th= e effectiveness in the product. =20 Can I ask that you (or smarter people than I who can actually code) add a check box option or similar, which allows for AUTOMATED clearing of junk mail, setting the read flag to READ. Even if you direct it straight to deleted folder, it still has a bold UNREAD indicator, so I keep getting excited thinking I have mail and somebody loves me, but instead it is jus= t somebody suggesting I have a small penis and should consider their produc= ts (spam) etc. =20 Thank you, =20 Chris =20 So give me a suggested donation and account details or I will go to paypa= l link on your site ------------------------------ Message: 5 Date: Mon, 1 Dec 2003 17:02:35 -0500 From: "Kenny Pitt" Subject: RE: [Spambayes] Feature request (for payment) To: "'Neomatrix'" , Message-ID: Content-Type: text/plain; charset=3D"us-ascii" Neomatrix wrote: > Great product. >=20 > Did I mention you have a great product? Glad you like it. > When mail is directed to the Junk folder, it is still marked UNREAD. > This means it catches the eye, and must be 'cleared', which removes > some of the effectiveness in the product. >=20 > Can I ask that you (or smarter people than I who can actually code) > add a check box option or similar, which allows for AUTOMATED > clearing of junk mail, setting the read flag to READ. I assume that you are running the Outlook plugin, but you don't mention what version. The latest version (0.81) has a checkbox labeled "Mark spam as read" on the Filtering tab of SpamBayes Manager. This option should do what you are asking. Be warned, however, that Outlook's new mail envelope icon will still be shown even if all you have is spam. --=20 Kenny Pitt ------------------------------ Message: 6 Date: Mon, 1 Dec 2003 14:07:15 -0800=20 From: "Givens, Dallas" Subject: [Spambayes] Where do the rejects go? To: "'spambayes@python.org'" Message-ID: <2C68E1F1B3E3D1118A1D0008C7244C92032635A1@CHANNELLCOMM> Content-Type: text/plain I am running a Windows 2000 system which has Office XP. I have installed the outlook plugin and have forwarded junk email from our exchange server into my email account. The email went into the Junk Suspects folder and = I deleted them as spam. I then deleted them from the junk folder and the deleted items folder. After that, I forwarded the exact same email from = the exchange server again but this time it does not show up anywhere. Therefore, what I want to know is where are they going now? Are they bei= ng permanently deleted? If so, is there a way to have them deleted only to = the deleted items folder? Your help would be greatly appreciated. =20 Thank you, =20 Dallas Givens Desktop Administrator CHANNELL Commercial 26040 Ynez Road Temecula, CA. 92589 Office: (909) 719-2600 X2733 Cell: (760) 415-7510 =20 ------------------------------ Message: 7 Date: Tue, 2 Dec 2003 12:30:16 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Outlook 97 and SpamBayes To: "'Tamara Blatny'" , Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1C1@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset=3D"us-ascii" > We are using WinNT 4.0, Office 97, and I just installed the=20 > latest version of Spam Bayes. I don't have any log files. =20 > My problem is that I'm not seeing SpamBayes in my Outlook.=20 > Thank you for any assistance. What mail program are you using? If it's Outlook 97, and you're trying t= o use the Outlook plug-in, you're out of luck (it works with Outlook 2000 a= nd above). You can use the pop3 proxy or imap filter (assuming you get mail via pop3 or imap) instead, although you don't get the same integrated experience as with the plug-in. =3DTony Meyer ------------------------------ Message: 8 Date: Mon, 1 Dec 2003 19:15:48 -0500 From: "Rick King" Subject: [Spambayes] SpamBayes Load Problem - Pls Help To: Message-ID: Content-Type: text/plain; charset=3D"iso-8859-1" it says it gets installed, all the files show up on the hard drive, but then: a) there are no buttons or change in the menu ((Windows 98) indicating S= pam detection b) the box below in item #3 is not checked even though it shows up every time. BUT every time I check it and try to close or hit OK or reboot, it shows up again as that same box not being checked?! PROBLEM: Addin doesn't load When you start Outlook, there is no Anti-Spam item in the toolbar. To resolve this: If you are running a binary version, then perform the following steps: 1. Start Outlook, and select Tools->Options to display the main Options dialog. Select the tab labelled Other, then click on the Advanced button. 2. Click on the COM Add-Ins button. If the SpamBayes addin is not listed, then SpamBayes should be reinstalle= d (Note that running regsvr32.exe spambayes_addin.dll from the SpamBayes directory may also solve this problem) ** 3. If the SpamBayes addin is listed but not checked, then simply chec= k it and close the dialog. 4. If the SpamBayes addin is listed and checked, but still not working a= nd still not creating log files, then I am stumped! Plesae send any help! It seems like it is very near to being set up righ= t. Thnaks, - Rcik __________________________________ Rick King PreFlight Ventures (919) 806-1166 rking@preflightventures.com www.preflightventures.com Based in Research Triangle Park, NC, PreFlight Ventures helps entreprene= urs and new ventures grow their business through corporate partnering, technology licensing, and acquisitions. In addition, we provide coaching tools for entrepreneurs such as audio PreFlight PowerTips (www.preflightpowertips.com) and lead national workshops on "The Art of Telling Your Story", commercialization, and "Doing Smart Deals." <>< ------------------------------ Message: 9 Date: Tue, 2 Dec 2003 13:25:47 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] SpamBayes Load Problem - Pls Help To: "'Rick King'" , Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1C3@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset=3D"us-ascii" [...] > PROBLEM: Addin doesn't load > When you start Outlook, there is no Anti-Spam item in the toolbar. To > resolve this: [...] What version are you trying to install? This sounds like an old one. Please ensure that you are using the latest version (008.1). Please also note that the Outlook plug-in requires Outlook 2000 or above. =3DTony Meyer ------------------------------ Message: 10 Date: Mon, 1 Dec 2003 16:38:45 -0800 From: "Tom Larkin" Subject: [Spambayes] Broken link To: Message-ID: <4543928AC7BCE2498D2D2DBA72D89285015AD710@smith.pacificedge.com> Content-Type: text/plain; charset=3D"us-ascii" Broken image link at: http://spambayes.sourceforge.net/windows.html =20 -- T o m L a r k i n =20 ------------------------------ Message: 11 Date: Tue, 2 Dec 2003 13:43:34 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Broken link To: "'Tom Larkin'" , Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1C4@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset=3D"us-ascii" > Broken image link at: >=20 > http://spambayes.sourceforge.net/windows.html Everything looks fine here. I presume you mean on the sidebar on the lef= t where the sourceforge logo should appear? (I can't see any other images o= n that page). It loads fine here - a temporary glitch, perhaps? =3DTony Meyer ------------------------------ Message: 12 Date: Mon, 1 Dec 2003 18:22:13 -0800 From: "Cindy Peyser" Subject: [Spambayes] SpamBayes Q: Backup of PST file To: Message-ID: Content-Type: text/plain; charset=3D"iso-8859-1" Dear SpamBayes, I really really like SpamBayes, it is so far the best spam filter I have tried (out of 3). One problem: I have noticed since the time I set it up, my backup system has not been able to backup my Outlook PST file (It gives a message that = the file is in use even when Outlook is closed). And I have been unable to export, copy, or move my PST file (I just got a new computer, so am tryin= g to get set up again). Is there something about SpamBayes that keeps usin= g the PST file even after Outlook itself is closed? Also, how do I un-install? Regards, Cindy 206-634-2808 x1 cindy@pizer.com ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 3 **************************************** ------------------------------ Message: 2 Date: Mon, 1 Dec 2003 18:27:50 -0800 (PST) From: Atom 'Smasher' Subject: [Spambayes] spam tokens IibKrw0yteNAtHyZDDw (fwd) To: spambayes@python.org Message-ID: Content-Type: TEXT/PLAIN; charset=US-ASCII two things i've noticed about spam, i'm not sure if either of them are taken into account with SB, but maybe someone can look into this further... or maybe someone already has and they can tell me why these don't work... 1) so many spams have a *lot* of spaces (and tabs?) in the subject line. (like above {taken from real spam}). i know... multiple spaces aren't tokens, they *separate* tokens... but when there are 20+ in a row, in the subject line, that usually means spam. 2) so many spams are filled with nonsense and random strings rldvlzgj coldokiue i q wfup cadrhs r cqufqc e p fnlcgv fipv which probably don't appear in legit email. can these be used to detect spam? are they used? my understanding of bayesian filtering, is that if it never before encountered the word "rldvlzgj", then it scores 0.5 (or something fairly neutral). well, after i've trained it on a few hundred or a few thousand emails, i think it should have a good handle on my vocabulary and maybe be less forgiving with words i haven't seen before. i fully understand that the nature of bayesian filtering is often counter-intuitive when it comes to what to look at and what to ignore, so i'm fully prepared for someone to tell me exactly why these things don't work the way my brain thinks they should. ...atom _______________________________________________ PGP key - http://smasher.suspicious.org/pgp.txt 3EBE 2810 30AE 601D 54B2 4A90 9C28 0BBF 3D7D 41E3 ------------------------------------------------- "IDEA's key length is 128 bits - over twice as long as DES. Assuming that a brute force attack is the most efficient, it would require 2^128 (10^38) encryptions to recover the key. Design a chip that can test a billion keys per second an throw a billion of the them at the problem, and it will still take 10^13 years - that's longer than the age of the universe. An array of 10^24 such chips can find the key in a day, but there aren't enough silicon atoms in the universe to build such a machine. Now we're getting somewhere - although I'd keep my eye on the dark matter debate." -- Bruce Schneier, Applied Cryptography ------------------------------ Message: 3 Date: Mon, 1 Dec 2003 22:36:37 -0500 From: "Tim Peters" Subject: RE: [Spambayes] SpamBayes Q: Backup of PST file To: "Cindy Peyser" Cc: spambayes@python.org Message-ID: Content-Type: text/plain; charset="iso-8859-1" [Cindy Peyser] > Dear SpamBayes, > > I really really like SpamBayes, it is so far the best spam filter I > have tried (out of 3). Glad you like it! > One problem: I have noticed since the time I set it up, my backup > system has not been able to backup my Outlook PST file (It gives a > message that the file is in use even when Outlook is closed). And I > have been unable to export, copy, or move my PST file (I just got a > new computer, so am trying to get set up again). Is there something > about SpamBayes that keeps using the PST file even after Outlook > itself is closed? No, nothing that we know of, but closing Outlook doesn't always stop Outlook from running, with or without SpamBayes installed. I've always had this problem (with three different Outlook 2000 installations), and don't think it's any more-- or less --common since installing Outlook. You didn't say which version of Windows or Outlook you're using. The best thing to do before running backups is to use the Windows Task Manager to kill off all non-essential programs first. Exactly how you do that depends on which flavor of Windows you're using. You'll sometimes find that an Outlook process is still running despite that you closed Outlook. That's life. The same procedure applies if you get any sort of "persmission denied -- file in use" error when trying to copy, move, or rename a file. > Also, how do I un-install? Anyone? I run from CVS, so have no idea what the binary installer sets up here (assuming the poster used a binary installer, which seems too likely to question ). This one is nearly a FAQ lately! ------------------------------ Message: 4 Date: Mon, 1 Dec 2003 22:26:31 -0600 From: "Seth Goodman" Subject: RE: [Spambayes] SpamBayes Q: Backup of PST file To: Message-ID: Content-Type: text/plain; charset="iso-8859-1" > No, nothing that we know of, but closing Outlook doesn't always > stop Outlook > from running, with or without SpamBayes installed. I've always had this > problem (with three different Outlook 2000 installations), and don't think > it's any more-- or less --common since installing Outlook. Just as you say, I noticed the same thing with Outlook 2000 long before I tried SpamBayes. You close Outlook and an Outlook.exe process is still running. It's neither better nor worse with the SpamBayes Add-In. Oddly, I have even noticed that after Outlook is running for a long time, a second instance of Outlook.exe appears, according to the task manager. The symptom that prompts me to notice this is that a message does not become "read" after I display for the appropriate number of seconds in the preview pane. Then I look in the task manager and see two Outlook.exe processes. What a bizarre piece of code this Outlook is, but also very useful. -- Seth Goodman Humans: change "delete" to "sethg" to email me Spambots: disregard the above ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 4 **************************************** ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 5 **************************************** ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 6 **************************************** ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 7 **************************************** From deltan at lithium.com Tue Dec 2 04:49:03 2003 From: deltan at lithium.com (Mark Sears) Date: Tue Dec 2 04:48:46 2003 Subject: [Spambayes] Restore from Spam Message-ID: <20031202094841.CE63939218@ion.lithium.com> Windows Version: XP Outlook Version: 2003 Spam Bayes Version: .81 (Sept.9 2003) Hi, I'm hoping someone can help me. I've been back & forth through the FAQ as well as the Bug Database. I can't seem to find any reference to my issue. For work I use imap with Outlook, I also have some pop accounts (my home ISP & such). The rules are set on the imap server via Procmail and I have spam bayes monitoring multiple folders with incoming mail. When I get a false positive or a message which is questionable and I want to "Recover From Spam". I click the recover button and the message is sent to the inbox of my personal folders. It's not recovered to the imap folder in which the message originated like I'd expect it to be. Is this a known issue? Is there a work around? Or maybe I've misconfigured something? Thanks, Mark deltan@lithium.com From Mike at Quickiemail.co.uk Tue Dec 2 05:27:36 2003 From: Mike at Quickiemail.co.uk (Quickiemail) Date: Tue Dec 2 05:28:05 2003 Subject: [Spambayes] spam folder/deleted folder Message-ID: <000c01c3b8be$e7969b90$a52efea9@Laptop> Hi, Why can't I send my spam directly to my deleted folder? Mike Smith From Mark.Howells at softoption.com Tue Dec 2 07:28:45 2003 From: Mark.Howells at softoption.com (Mark Howells) Date: Tue Dec 2 07:30:08 2003 Subject: [Spambayes] spam folder/deleted folder Message-ID: <5846CF419D2EF5439036CC3126A3A995017B62@SOSERVER1.softoption.local> > -----Original Message----- > From: Quickiemail [mailto:Mike@Quickiemail.co.uk] > Why can't I send my spam directly to my deleted folder? >From the FAQ 3.12 ;) ---- Why can't I set spam to be moved to the Deleted Items folder? ------------------------------------------------------------- The problem with this is that you can also set SpamBayes to train all messages moved to the designated spam folder. If you set the deleted items folder as the spam folder (early versions of the plug-in allowed this), then *all* messages that you delete would be trained as spam. To get this restriction removed, you'll have to convince the developers that there is a way to do this without confusing people - for example, if we let you choose the deleted items folder as the spam folder, only if the 'incremental training' option was off, people would get confused about why it sometimes works and sometimes doesn't. Note that Outlook 2003 has a "Junk Mail" folder that has many of the deleted items folder's properties, and you *can* get SpamBayes to move spam to this folder. You may also find some good advice in the answer to . Mark -- Outgoing mail is certified Virus Free. Checked by AVG Anti-Virus (http://www.grisoft.com). Version: 7.0.203 / Virus Database: 261.3.2 - Release Date: 11/27/2003 From mtiller at ford.com Tue Dec 2 09:01:43 2003 From: mtiller at ford.com (Tiller, Michael (M.M.)) Date: Tue Dec 2 09:02:03 2003 Subject: [Spambayes] IMAP Problem...and solution Message-ID: I'm trying to get SpamBayes 1.0a7 to work with the Mercury IMAP server (a Win32 based IMAP server). I have had problems for a while but I never had time until now to actually sit down and workout what the problem was. Here is my summary. Hopefully these changes can be folded back into the source... First, the problem I was having was that I would try and run the IMAP filter with the following command: python sb_imapfilter.py -c -t -v -l 5 and then I would get this error: Traceback (most recent call last): File "sb_imapfilter.py", line 838, in ? run() File "sb_imapfilter.py", line 824, in run imap_filter.Train() File "sb_imapfilter.py", line 646, in Train num_ham_trained = folder.Train(self.classifier, False) File "sb_imapfilter.py", line 573, in Train msg.get_substance() File "sb_imapfilter.py", line 372, in get_substance data = _extract_fetch_data(response[1][0]) File "sb_imapfilter.py", line 158, in _extract_fetch_data mo = FETCH_RESPONSE_RE.match(response) TypeError: expected string or buffer So I looked in _extract_fetch_data and there is a call that look like: mo = FETCH_RESPONSE_RE.match(response) There is a check to make sure that "mo" isn't equal to "None", but there is no check to make sure that "response" isn't None to begin with. So I looked where _extract_fetch_data is called. That call looks like this: data = _extract_fetch_data(response[1][0]) After examining this, I found that the value of response *in this context* (note two different variables named response), was a tuple containing ("OK", None). That explains how a None get passed to _extract_fetch_data. So, now how to fix it? Well, looking at get_substance (the routine that calls _extract_fetch_data) shows that perhaps the exact set of commands is not well defined. :-) Specifically, the routine appears to try and use "RFC822.PEEK" as the rfc822_command. Then it traps errors on that and tries "RFC822" instead. Very interesting. All this while the comments in the routine say that it should be using "BODY.PEEK" ?!?!? I changed the code to use "BODY.PEEK" and...not surprisingly I guess...it worked!!! I'm not sure why it is setup the way it is now, but I see two ways to remedy this. First, add a command line switch to choose what command to use ("RFC822.PEEK", "RFC822" or "BODY.PEEK") as the initial attempt. The alternative is to add an additional check after attempting "RFC822.PEEK" that checks for a response that looks like ("OK", None) and then key off of that to switch to "BODY.PEEK". Since I know what IMAP server I'm working with, I've just changed the default in init(). Comments? -- Mike From mtiller at ford.com Tue Dec 2 09:03:50 2003 From: mtiller at ford.com (Tiller, Michael (M.M.)) Date: Tue Dec 2 09:04:00 2003 Subject: [Spambayes] IMAP or Netscape Quirk Message-ID: I'm running the sb_imapfilter script in SpamBayes in conjunction with the Mercury IMAP server. One strange thing I noticed though while using Netscape's mail client to access my IMAP server after SpamBayes runs is that Netscape tells me I have X new messages. When I look in my Inbox they are all duplicated?!?! Yikes!!! If I exit Netscape and restart, they are all back to normal (i.e. one each). I'm assuming this is an IMAP support quirk, but I'll have to look into it more. -- Mike From tdm at kerridge.com Tue Dec 2 09:49:15 2003 From: tdm at kerridge.com (Timothy May) Date: Tue Dec 2 09:37:07 2003 Subject: [Spambayes] Compatibility Message-ID: <011727852083D3119F780000E21F014E3A21F2@diablo.kerridge.com> Good Afternoon, I was wondering if Spambayes works with Terminal Server. I have read through most of the website and I can't find anything about Terminal Server. Thank you for your time. Timothy ------------------------------------------------------------------- Timothy May Systems Integration Department Kerridge Computer Company Tel: 01635 214670 email: tdm@kerridge.com For support e-mail: sidsupp@kerridge.com ------------------------------------------------------------------- From skip at pobox.com Tue Dec 2 09:42:44 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Dec 2 09:43:29 2003 Subject: [Spambayes] Compatibility In-Reply-To: <011727852083D3119F780000E21F014E3A21F2@diablo.kerridge.com> References: <011727852083D3119F780000E21F014E3A21F2@diablo.kerridge.com> Message-ID: <16332.42212.24068.990073@montanaro.dyndns.org> Timothy> I was wondering if Spambayes works with Terminal Server. I have Timothy> read through most of the website and I can't find anything Timothy> about Terminal Server. I don't see why it wouldn't work. Have you tried it? If Outlook works through Terminal Server, then Outlook with the SpamBayes plugin ought to work as well. (By "Terminal Server" I assume you mean Microsoft's proprietary VNC-like thing for logging into and displaying a remote machine's desktop in a window on your local machine.) Skip From dbulgrien at vcsd.com Tue Dec 2 09:50:18 2003 From: dbulgrien at vcsd.com (Dennis W. Bulgrien) Date: Tue Dec 2 10:00:23 2003 Subject: [Spambayes] Re: SpamBayes vs. Outlook Mailbox Rules References: <023d01c38ec7$69c6d3e0$f502a8c0@eden> Message-ID: You don't need a "?" to get the tool tips. Put focus on the control you want help with (tab to, click) then press F1. The tool tip pops up. "Tim Peters" wrote in message news:LNBBLJKPBEHFEDALKOLCCEDOGHAB.tim.one@comcast.net... [Mark Hammond] > It is subtle, but the "tool tips" on all these dialogs work - click > on the "?" at the top of the dialog, then on any item in the dialog > to see what the option does. Ain't no "?" here (OL2K, CVS spambayes, Win98SE). ... From jmillican at connorms.com Tue Dec 2 10:16:37 2003 From: jmillican at connorms.com (Jim Millican) Date: Tue Dec 2 10:30:02 2003 Subject: [Spambayes] Configuration problem Message-ID: <3380498A8023AD41A6C8DD13D41BB6190433CD@connor-mail.connor.com> I received "Invalid Configuration" message when I pressed "Delete as spam" "You must configure spam folder". I have read all the trouble shooting guides that I can find, deleted and reloaded the program several times and still get the same results. Any Ideas? Thanks Jim From kennypitt at hotmail.com Tue Dec 2 10:42:08 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Tue Dec 2 10:42:37 2003 Subject: [Spambayes] Configuration problem In-Reply-To: <3380498A8023AD41A6C8DD13D41BB6190433CD@connor-mail.connor.com> Message-ID: Jim Millican wrote: > I received "Invalid Configuration" message when I pressed "Delete > as spam" "You must configure spam folder". I have read all the > trouble shooting guides that I can find, deleted and reloaded the > program several times and still get the same results. Any Ideas? Try this: open SpamBayes Manager from the SpamBayes button on the toolbar, then go to the Filtering tab and use the Browse button under Certain Spam to select a spam folder. That should do the trick. If your spam folder isn't set then some of your other settings may also be incorrect (filtered folders and possible spam folder in particular), so check the other settings on the Filtering tab while your there. -- Kenny Pitt From david at digitallyhip.com Tue Dec 2 10:51:40 2003 From: david at digitallyhip.com (David McCullum) Date: Tue Dec 2 10:51:42 2003 Subject: [Spambayes] Reinstall issue Message-ID: Pardon me if this is a FAQ, but I haven't found any solution for the following: I am very, very happy with the Spambayes plugin for Outlook, but I have a problem with reinstalling it with Outlook 2003. Basically, I can't do it! I have removed the plugin via add/remove programs, and thencleaned the registry, and then reinstalled. No errors show up, but no icons or toolbar either! I have rebooted between unistalling and reinstalling, I have checked for rouge copies of outlook or word running in the background, but no dice! Any suggestions? From dave at boost-consulting.com Tue Dec 2 12:22:05 2003 From: dave at boost-consulting.com (David Abrahams) Date: Tue Dec 2 12:22:30 2003 Subject: [Spambayes] Serious Problem Message-ID: Something curious started happening to all the email I receive after I upgraded from the "old" (pre-reorganization) Spambayes to the new one (e.g. that includes "sb_filter.py"). When mail is run through sb_filter.py, any line which begins with a period ("."), and all lines thereafter, are stripped from the email. For example, the following paragraph begins with "./configure". If you don't see it, you're seeing the bug. ./configure is the command most people use to run a configure script. -- Dave Abrahams Boost Consulting www.boost-consulting.com From papaDoc at videotron.ca Tue Dec 2 12:27:56 2003 From: papaDoc at videotron.ca (papaDoc) Date: Tue Dec 2 12:28:01 2003 Subject: [Spambayes] Serious Problem In-Reply-To: References: Message-ID: <3FCCCB9C.4070507@videotron.ca> Hi David, >Something curious started happening to all the email I receive after I >upgraded from the "old" (pre-reorganization) Spambayes to the new one >(e.g. that includes "sb_filter.py"). When mail is run through >sb_filter.py, any line which begins with a period ("."), and all lines >thereafter, are stripped from the email. For example, the following >paragraph begins with "./configure". If you don't see it, you're >seeing the bug. > >./configure is the command most people use to run a configure script. > I don't know if this can be related to the fact that on Unix (I don't know if this is in one of the RFCXXX) to indicate the end of an email you start a line with a dot "." So want I think is appening is the email package in sb_filter see your mail and forward everything up to a line starting with a dot "." Remi From jacob-spambayes-list at statisticalanomaly.com Tue Dec 2 12:30:02 2003 From: jacob-spambayes-list at statisticalanomaly.com (Jacob Farmer) Date: Tue Dec 2 12:29:15 2003 Subject: [Spambayes] IMAP or Netscape Quirk In-Reply-To: References: Message-ID: <3FCCCC1A.9050202@statisticalanomaly.com> Mike, I occasionally see this myself, and the nearest thing I can figure, is that I happened to look at the Inbox while Spambayes was training or classifying. It appears to download each message, delete it, then put it back where it belongs. Perhaps your IMAP server just takes a few seconds to register the delete flag or else you mail client doen't indicate when it is set. JAcob Tiller, Michael (M.M.) wrote: > have X new messages. When I look in my Inbox they are all > duplicated?!?! Yikes!!! If I exit Netscape and > restart, they are all back to normal (i.e. one each). > I'm assuming this is an IMAP support quirk, but I'll > have to look into it more. > > -- > Mike > > _______________________________________________ > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes > Check the FAQ before asking: http://spambayes.sf.net/faq.html > From AAnders at Thomcomp.com Tue Dec 2 12:55:19 2003 From: AAnders at Thomcomp.com (Anders, Anthony) Date: Tue Dec 2 12:58:22 2003 Subject: [Spambayes] Spambayes Installation Message-ID: <3859F1A6D93FD411AC0600D0B774B2CB0562EBC0@tci1.thomcomp.com> I am trying to install Spambayes, but am getting an error when the installation is registering files ... <<...OLE_Obj...>> Of course, if I Ignore, then it does not install (because when I open Outlook, it will not start the configuraion). I am using Windows XP, with Office 2000. I would appreciate any information. Anthony Anders Email: aanders@thomcomp.com Thomcomp, Inc. Phone: 610 834 1120 Ext 369 Fax: 610 834 0742 From kennypitt at hotmail.com Tue Dec 2 13:13:27 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Tue Dec 2 13:14:01 2003 Subject: [Spambayes] Spambayes Installation In-Reply-To: <3859F1A6D93FD411AC0600D0B774B2CB0562EBC0@tci1.thomcomp.com> Message-ID: Anders, Anthony wrote: > I am trying to install Spambayes, but am getting an error when the > installation is registering files ... > <<...OLE_Obj...>> Is that the complete error message? If not, could you send the complete message or maybe a screenshot of the error message box? What version of the SpamBayes plugin are you trying to install? -- Kenny Pitt From nobody at spamcop.net Tue Dec 2 14:23:46 2003 From: nobody at spamcop.net (Seth Goodman) Date: Tue Dec 2 14:47:36 2003 Subject: [Spambayes] training problem? Message-ID: Attached are two similar spams that I trained on. Since they are regular-looking newsletters that I can't succeed in opting out of, I'm not surprised that they look hammy to the classifier. However, many of the tokens that the tokenizer found were *not* listed in the spam score section for either message, despite the fact that these tokens appears in both trained spam. Conspicuously absent are the tokens 'subject:ADV', 'email addr:wsntv7511.com' and 'email name:info'. All three of these tokens appeared in both spam, and I imagine that 'subject:ADV' and 'email name:info' have appeared in other spam, as well. The second token is specific to this sender, but still should be listed as a spam clue. After training on the second spam, I had SpamBayes re-run the filter process to make sure that the classifier output was fresh, but nothing changed. I even tried to re-train on the earlier of the two messages, and the log file did show that the message was already trained as spam. Any ideas? Shouldn't all message tokens listed be included in the spam database if trained as such? Maybe someone could educate me as to how this actually works. I am willing to look for these tokens in the databases if someone can explain how to view or translate the .db format. I have appended the spam clue outputs from both messages (sorry, these are long). -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: please disregard the above first message rec'd 11-22-03 ---------------------------- Spam Score: 0% (3.11862e-013) word spamprob #ham #spam '*H*' 1 - - '*S*' 0 - - 'feedback' 0.0463033 66 3 'jobs' 0.0522704 40 2 'editor' 0.0535317 39 2 'at:' 0.0629429 107 7 "i'd" 0.0637449 47 3 'interesting' 0.0663369 31 2 '2003,' 0.0678725 85 6 '2003' 0.0737041 416 33 'hope' 0.078316 108 9 'services,' 0.0795076 37 3 'educational' 0.0946334 21 2 'did' 0.0955481 134 14 'spent' 0.0988499 20 2 'perform' 0.0991147 29 3 'url:newsletter' 0.0991147 29 3 'editorial' 0.10346 19 2 "you'd" 0.10537 44 5 'lead' 0.108054 51 6 'released' 0.114102 17 2 'know.' 0.114647 40 5 'copyright' 0.115228 101 13 'books' 0.117175 24 3 'inc.' 0.119021 149 20 'non-profit' 0.120288 16 2 'yet,' 0.120288 16 2 'since' 0.121167 146 20 'consultant' 0.121606 23 3 'developing' 0.121606 23 3 'rights' 0.122985 108 15 'group' 0.12519 71 10 'care' 0.125417 50 7 'case' 0.125417 50 7 'news.' 0.126387 22 3 'noise' 0.126387 22 3 'innovative' 0.127184 15 2 'table' 0.127184 15 2 'between' 0.12873 89 13 'hear' 0.128757 62 9 'still' 0.131436 133 20 'thought' 0.132466 60 9 'safety' 0.133985 27 4 'development.' 0.134919 14 2 'using' 0.135915 223 35 'rather' 0.136966 64 10 'this,' 0.13967 38 6 'members' 0.140566 56 9 'trying' 0.142782 49 8 'caused' 0.143283 19 3 'newsletter.' 0.143283 19 3 'edition,' 0.143655 13 2 'original' 0.145367 89 15 'bit' 0.146253 36 6 'was' 0.151597 319 57 'album' 0.153601 12 2 'contents' 0.153601 12 2 "we're" 0.155415 66 12 'servers' 0.157302 17 3 'including' 0.158365 144 27 'yes,' 0.158547 38 7 'host' 0.160599 27 5 'had' 0.160809 157 30 'it.' 0.16166 125 24 'release' 0.162604 42 8 'say' 0.16309 93 18 'during' 0.163769 62 12 'part' 0.167466 100 20 'physical' 0.16932 40 8 'broadband' 0.174363 15 3 'industry' 0.176954 75 16 'site.' 0.177519 47 10 'issues,' 0.178289 10 2 'subject: (' 0.178289 10 2 'western' 0.178289 10 2 'users' 0.178836 42 9 'contact' 0.179544 183 40 'speak' 0.179719 19 4 'effort' 0.182383 41 9 'reach' 0.182383 41 9 'education' 0.182675 32 7 'both' 0.187571 113 26 'customers.' 0.187729 18 4 'news' 0.187983 117 27 'issue' 0.188469 78 18 'corporate' 0.188816 35 8 'directly' 0.189935 56 13 'link' 0.192469 122 29 'fell' 0.193868 9 2 'group.' 0.193868 9 2 'jobs,' 0.193868 9 2 'chapter' 0.195575 13 3 'introduction' 0.195575 13 3 'trend' 0.195575 13 3 'speed' 0.197719 29 7 'december' 0.198233 41 10 'each' 0.199012 109 27 'newsletter' 0.201923 44 11 'first' 0.202177 158 40 'leading' 0.20246 36 9 'him' 0.204306 55 14 'high-speed' 0.2048 20 5 'overview' 0.2048 20 5 'worldwide.' 0.208241 12 3 'when' 0.208954 261 69 'always' 0.2093 91 24 'world' 0.209694 72 19 'social' 0.210856 23 6 'card' 0.211239 49 13 'focusing' 0.212432 8 2 'highway' 0.212432 8 2 'programs,' 0.212432 8 2 'reviews' 0.212432 8 2 '...' 0.213536 52 14 'where' 0.213651 118 32 'month,' 0.215464 26 7 'sending' 0.216261 62 17 'his' 0.218999 93 26 'book' 0.221906 46 13 'through' 0.222143 203 58 'talk' 0.223966 42 12 'technology' 0.224165 73 21 'unsubscribe' 0.225658 110 32 'some' 0.225992 260 76 'included' 0.226441 38 11 'few' 0.226592 89 26 'people' 0.228217 132 39 "what's" 0.22974 54 16 'those' 0.229959 104 31 'five' 0.231118 37 11 "haven't" 0.231118 37 11 'beyond' 0.234789 20 6 'positive' 0.234789 20 6 'homes' 0.234927 7 2 'introduce' 0.234927 7 2 'url:content' 0.234927 7 2 'voluntary' 0.234927 7 2 'header:Reply-To:1' 0.771068 178 602 'debt,' 0.774695 1 4 'finest' 0.774695 1 4 'relations.' 0.774695 1 4 'treasured' 0.786554 2 8 'works!' 0.786554 2 8 '$10,000' 0.805182 2 9 'books,' 0.805182 2 9 'awaits' 0.809601 1 5 'stimulates' 0.809601 1 5 'unique,' 0.809601 1 5 'health,' 0.822113 5 24 'admiration' 0.835142 1 6 'url:php' 0.857859 34 207 '100%' 0.876571 13 94 Message Stream: Return-Path: Received: from inbound-mx5.atl.registeredsite.com ([64.224.219.93]) by imta05a2.registeredsite.com with ESMTP id <20031123052424.QQPD21920.imta05a2.registeredsite.com@inbound-mx5.atl.regist eredsite.com> for ; Sun, 23 Nov 2003 00:24:24 -0500 Received: from aomto4919.com ([221.140.69.26]) by inbound-mx5.atl.registeredsite.com (8.12.9/8.12.9) with SMTP id hAN5NMMM026249 for ; Sun, 23 Nov 2003 00:23:24 -0500 Message-Id: <200311230523.hAN5NMMM026249@inbound-mx5.atl.registeredsite.com> From: "Anne Normandy" Reply-To: info@wsntv7511.com To: sethg@goodmanassociates.com Date: Sun, 23 Nov 2003 14:25:22 +0900 Subject: From the Trenches of Fiber Home Society (ADV) MIME-Version: 1.0 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: 7bit X-SPAM-Tagged: The server [221.140.69.26] that sent this message is listed on bl.spamcop.net WSNTV Newsletter WSNTV 75 studio
image of buildings

 

Newsletter
Issue 1/Nov. 2003

Table of Content

 

Editorial: A word from our chairman

It is with great admiration for the work of our WSN TV professionals, that I am so enthusiastic in sending this, our first edition, of the official WSN TV 75 Newsletter.
This month, I want to take the opportunity to introduce you to just a few of these wonderfully creative people.

The professionals presenting their work through WSN TV.
WSN TV have brought their gifts and talents to you in the the form of products, services, books, newsletters, film and video. Many more are in the works!

We are building a content site that will host some of the finest work of gifted film makers, video creators, writers, documentary makers, artist, musicians, inventors, manufacturers and marketers.

One of the things we did was attending the FTTH Convention in New Orleans.

This is where the gurus of the fiber to the home industry gather and learn about important issues, case studies, law and technology in the industry.

Why do we care about high-speed fiber to our homes?  Because it means you, the consumer, will be able to obtain movies, videos and a whole array of entertaining and
educational programs with just a click.

Soon, you won't have to get in your car to go get a video or wait for the mail to arrive. You'll simply click your mouse, find what you want and pay online to get it when you want it.

The Commissioner of the FCC was the lead speaker at the FTTH Convention.  He talked about fiber deployment in the United States.  Asian countries like Korea are leading the way in demonstrating that it stimulates economic development.

Ah yes, this is an exciting time to be an internet entrepreneur!
Have content of your own? Contact us if you'd like to hear more about WSN TV info@wsntv7511.com and how we can help you earn more income from your creative work.

Meantime, please check out our professionals'sites!

ANNE NORMANDY
Chairman, WSN TV 75

 

Five Cent wish: Indie-rock, anyone?

Indie-rock, anyone? Since the release of their 2002 debut EP, "Actions Speak Louder Than Apologies," Five Cent Wish has been amassing a dedicated legion of fans from all over the world with their energetic live performances and emotionally fueled brand of indie-rock. With a highly anticipated, debut full-length album to be released in December of 2003 and each member of the four-piece group still in high school, a bright future awaits the members of this innovative group. To learn more about the band and be part of their experience, click the link below.
http://wsntv7511. com/content/fivecentwish/Home.html

 

From the Trenches of Fiber Home Society
An Editorial from the Foundation

The evolution of society to one based on transference of information through fiber rather than by truck or plane has created both interesting and challenging times for traditional business and people.

Some interesting social trends have emerged, both positive and negative from the introduction of high speed fiber connected directly to the home or business.  
Korea is one of the countries that has responded by deploying fiber massively.

On a positive note, more IT friendly people make extra money by using fiber homes for hosting web sites, marketing, and creating content.  They save more money by having cheaper, faster and better education, entertainment, health, banking, socializing and playing from their super fast fiber homes.

The negative side of this trend is that those who haven't adapted by learning IT have lost salaried jobs and retirement funds by investing in obsolete physical business.
Offices, shopping malls and factories are declining by fiber home servers based new business.
In some countries this loss of jobs, business and retirement funds, and hope has even resulted in an increase in the suicide rate by people who are affected.  After trying to survive by using credit card debt, 20% of credit card users and loans are now delinquent.


During the past 5 years of 100% broadband penetration, 15% of college graduates couldn't get jobs by corporate massive downsizing, even Korean Ivy league graduates couldn't get jobs after years spent pursuing undergraduate, Masters and Ph.D. degrees.

Solution: Education of the whole population about IT and broadband impact on their life. Developing a voluntary social safety net to rescue information super highway roadkill victims who are against a major global trend.
WSN is leading the effort to build a social safety net globally by growing 10 million non-profit foundation trusts by 2025, targeting to pay an average $10,000 annual dividend for 7 billion people in the world.
More info: http://wsntv7511.com/index.php?content=wsntvbp.htm

 

Writer Extraordinaire

When it comes to writing, Jack Burney is just about the most captivating writer I know. His talent was evident the first time he ever contacted us with his story about how he fell in love in writing.  And best of all?he can perform that magic for clients who want
to add to their marketing and public relations.

Jack is a Writer, Journalist, Marketing and Public Relations Consultant for Hire.  He is adept at conceiving unique, results-oriented communication programs and projects for his clients to reach their customers.

He creates and delivers marketing, sales promotion, advertising and public relations programs, newsletters, news release campaigns, direct mail campaigns. His clients have included such Notables as Dupont, and the EPA. Here you'll see an example of his work as Editor in Chief of The Stock Trader News. Subscriptions available :-)
More info: http://www.wsntv7511.com/content/stocktradernews

 

London Series: Digital artwork

Both an artist and a multimedia expert, Rudolf Boogerman has taken art to the public through this series of digital images. Focusing on London as the artist pallet, he spent a great deal of time there viewing the historic buildings dwarved by the skyscrapers and the interesting collage it presented.
Seeing it as "architectural jewels tucked between steel and glass" with all the hues and patterns caused by overhanging wires, street noise and dust allowed him to create what you see here in his art site.

A truly talented artist, you must see his work at:
http://www.wsntv7511.com/conte nt/raboo

 

130 Great Books in one Great Book

If you have ever wanted to know a bit about the great books of western thought authored over the centuries, anything from Homer to Nietzche you can't go wrong buying "The Great Books". Say you want to get a quick overview of Euclid, because your boss is a math fan, or you want to impress a date, or you're giving a talk to the rotary club, or you name it ... it's as easy as clicking here and paying 49 cents.
Better yet, buy the whole book of condensed reviews for a mere pittance. The first ten chapters in the series are now available.
The whole book will be online later this fall. To see what's in store for you, we're making one chapter available FREE for the asking. Wish I'd have known about this book when I had to do my book reviews in school :-)

More info: http://www.wsntv7511.com/content/greatbooks

 

The Dazzling Brilliance of Quartz Art Sculpture

As one of the original sculptors to use Quartz as a medium for sculpting, artist Sergey Shirokov has gone beyond traditional art to create one of a kind, treasured masterpieces.

This art is rare and beautiful. After training in Russia with some of the leading masters of blown glass art and crystal sculpting, Sergey went on to create his own special style of quartz art sculpture that commands the attention of the most devoted art enthusiasts worldwide.

He creates unique sculptures for the Art Collector including customized pieces based on the specifications and requests of his clients.

See his work at http://wsntv7511.com/cont ent/quartzcrafters

 

Unsubscribe

We hate to see you go, but we always honour your wish to be removed from our subscriber list at any time. Please allow ten days before the removal instructions take effect.
Click here to unsubscribe

[ Table of Contents | About WSNTV | Feedback | Unsubscrib e ]

Copyright WSNTV 75 Inc. 2003, All rights reserved Message Tokens: 684 unique tokens '$10,000' '"the' '...' '1/nov.' '100%' '130' '15%' '20%' '2002' '2003' '2003,' '2025,' ':-)' 'able' 'about' 'adapted' 'add' 'adept' 'admiration' 'advertising' 'affected.' 'after' 'against' 'album' 'all' 'all?he' 'allow' 'allowed' 'always' 'amassing' 'and' 'anne' 'annual' 'anticipated,' 'any' 'anyone?' 'anything' 'are' 'array' 'arrive.' 'art' 'artist' 'artist,' 'artwork' 'asian' 'asking.' 'at:' 'attending' 'attention' 'authored' 'available' 'available.' 'average' 'awaits' 'band' 'banking,' 'based' 'beautiful.' 'because' 'been' 'before' 'below.' 'best' 'better' 'between' 'beyond' 'billion' 'bit' 'blown' 'boogerman' 'book' 'books' 'books".' 'books,' 'boss' 'both' 'brand' 'bright' 'brilliance' 'broadband' 'brought' 'build' 'building' 'buildings' 'burney' 'business' 'business.' 'but' 'buy' 'buying' 'campaigns,' 'campaigns.' 'can' "can't" 'captivating' 'car' 'card' 'care' 'case' 'caused' 'cc:none' 'cent' 'cents.' 'centuries,' 'chairman' 'chairman,' 'challenging' 'chapter' 'chapters' 'cheaper,' 'check' 'chief' 'click' 'click.' 'clicking' 'clients' 'clients.' 'club,' 'collage' 'collector' 'college' 'comes' 'commands' 'commissioner' 'conceiving' 'condensed' 'connected' 'consultant' 'consumer,' 'contact' 'contacted' 'content' 'content-type:text/plain' 'content.' 'contents' 'convention' 'convention.' 'copyright' 'corporate' "couldn't" 'countries' 'create' 'created' 'creates' 'creating' 'creative' 'creators,' 'credit' 'crystal' 'customers.' 'customized' 'date,' 'days' 'dazzling' 'deal' 'debt,' 'debut' 'december' 'declining' 'dedicated' 'degrees.' 'delinquent.' 'delivers' 'deploying' 'deployment' 'developing' 'development.' 'devoted' 'did' 'digital' 'direct' 'directly' 'dividend' 'documentary' 'downsizing,' 'dupont,' 'during' 'dust' 'dwarved' 'each' 'earn' 'easy' 'economic' 'edition,' 'editor' 'editorial' 'editorial:' 'education' 'education,' 'educational' 'effect.' 'effort' 'email addr:wsntv7511.com' 'email name:info' 'emerged,' 'emotionally' 'energetic' 'entertaining' 'enthusiastic' 'enthusiasts' 'ep,' 'epa.' 'euclid,' 'even' 'ever' 'evident' 'evolution' 'example' 'exciting' 'experience,' 'expert,' 'extra' 'factories' 'fall.' 'fan,' 'fans' 'fast' 'faster' 'fcc' 'feedback' 'fell' 'few' 'fiber' 'film' 'find' 'finest' 'first' 'five' 'focusing' 'for' 'form' 'foundation' 'four-piece' 'free' 'friendly' 'from' 'from:addr:info' 'from:addr:wsntv75111studio.com' 'from:name:anne normandy' 'ftth' 'fueled' 'full-length' 'funds' 'funds,' 'future' 'gather' 'get' 'gifted' 'gifts' 'giving' 'glass' 'glass"' 'global' 'globally' 'go,' 'gone' 'graduates' 'great' 'group' 'group.' 'growing' 'gurus' 'had' 'has' 'hate' 'have' "haven't" 'having' 'header:Date:1' 'header:From:1' 'header:MIME-Version:1' 'header:Message-Id:1' 'header:Received:2' 'header:Reply-To:1' 'header:Return-Path:1' 'header:Subject:1' 'header:To:1' 'health,' 'hear' 'help' 'here' 'high' 'high-speed' 'highly' 'highway' 'him' 'hire.' 'his' 'historic' 'home' 'homer' 'homes' 'homes.' 'homes?' 'honour' 'hope' 'host' 'hosting' 'how' 'hues' "i'd" 'images.' 'impact' 'important' 'impress' 'inc.' 'included' 'including' 'income' 'increase' 'indie-rock,' 'indie-rock.' 'industry' 'industry.' 'info:' 'information' 'innovative' 'instructions' 'interesting' 'internet' 'introduce' 'introduction' 'inventors,' 'investing' 'issue' 'issues,' "it's" 'it.' 'ivy' 'jack' 'jewels' 'jobs' 'jobs,' 'journalist,' 'just' 'kind,' 'know' 'know.' 'known' 'korea' 'korean' 'later' 'law' 'lead' 'leading' 'league' 'learn' 'learning' 'legion' 'life.' 'like' 'link' 'list' 'live' 'loans' 'london' 'loss' 'lost' 'louder' 'love' 'magic' 'mail' 'major' 'make' 'makers,' 'making' 'malls' 'many' 'marketers.' 'marketing' 'marketing,' 'massive' 'massively.' 'masters' 'math' 'means' 'meantime,' 'medium' 'member' 'members' 'mere' 'message-id:@inbound-mx5.atl.registeredsite.com' 'million' 'money' 'month,' 'more' 'most' 'mouse,' 'movies,' 'multimedia' 'musicians,' 'must' 'name' 'negative' 'net' 'new' 'news' 'news.' 'newsletter' 'newsletter.' 'newsletters,' 'nietzche' 'noise' 'non-profit' 'normandy' 'notables' 'note,' 'now' 'obsolete' 'obtain' 'offices,' 'official' 'one' 'online' 'opportunity' 'original' 'orleans.' 'our' 'out' 'over' 'overhanging' 'overview' 'own' 'own?' 'pallet,' 'part' 'past' 'patterns' 'pay' 'paying' 'penetration,' 'people' 'people.' 'perform' 'performances' 'ph.d.' 'physical' 'pieces' 'pittance.' 'plane' 'playing' 'please' 'population' 'positive' 'presented.' 'presenting' 'products,' 'programs' 'programs,' 'projects' 'promotion,' 'proto:http' 'public' 'pursuing' 'quartz' 'quick' 'rare' 'rate' 'rather' 'reach' 'relations' 'relations.' 'release' 'released' 'removal' 'removed' 'reply-to:addr:info' 'reply-to:addr:wsntv7511.com' 'reply-to:no real name:2**0' 'requests' 'rescue' 'reserved' 'responded' 'resulted' 'retirement' 'reviews' 'rights' 'roadkill' 'rotary' 'rudolf' 'russia' 'safety' 'salaried' 'sales' 'save' 'say' 'school' 'school,' 'sculpting,' 'sculptors' 'sculpture' 'sculptures' 'see' 'seeing' 'sender:none' 'sending' 'sergey' 'series' 'series:' 'servers' 'services,' 'shirokov' 'shopping' 'side' 'simply' 'since' 'site' 'site.' 'sites,' 'skip:& 10' 'skip:a 10' 'skip:c 10' 'skip:d 10' 'skip:e 10' 'skip:m 10' 'skip:p 10' 'skip:p 20' 'skip:r 10' 'skip:s 10' 'skip:u 10' 'skyscrapers' 'social' 'socializing' 'society' 'solution:' 'some' 'soon,' 'speak' 'speaker' 'special' 'speed' 'spent' 'states.' 'steel' 'still' 'stimulates' 'stock' 'store' 'story' 'street' 'studies,' 'studio' 'style' 'subject: ' 'subject: (' 'subject:)' 'subject:ADV' 'subject:Fiber' 'subject:From' 'subject:Home' 'subject:Society' 'subject:Trenches' 'subject:the' 'subscriber' 'such' 'suicide' 'super' 'survive' 'table' 'take' 'taken' 'talent' 'talented' 'talents' 'talk' 'talked' 'targeting' 'technology' 'ten' 'than' 'that' 'the' 'their' 'there' 'these' 'they' 'things' 'this' 'this,' 'those' 'thought' 'through' 'time' 'time.' 'times' 'to:2**0' 'to:addr:goodmanassociates.com' 'to:addr:sethg' 'to:no real name:2**0' 'trader' 'traditional' 'training' 'transference' 'treasured' 'trenches' 'trend' 'trend.' 'trends' 'truck' 'truly' 'trusts' 'trying' 'tucked' 'tv.' 'unique' 'unique,' 'united' 'unsubscribe' 'url:com' 'url:content' 'url:fivecentwish' 'url:greatbooks' 'url:home' 'url:htm' 'url:html' 'url:index' 'url:jpg' 'url:news_spot' 'url:newsletter' 'url:php' 'url:quartzcrafters' 'url:raboo' 'url:stocktradernews' 'url:studio' 'url:wsntv7511' 'url:wsntvbp' 'url:www' 'use' 'users' 'using' 'victims' 'video' 'video.' 'videos' 'viewing' 'voluntary' 'wait' 'want' 'wanted' 'was' 'way' "we're" 'web' 'went' 'western' 'what' "what's" 'when' 'where' 'who' 'whole' 'why' 'will' 'wires,' 'wish' 'wish:' 'with' "won't" 'wonderfully' 'word' 'work' 'work.' 'works!' 'world' 'world.' 'worldwide.' 'writer' 'writer,' 'writers,' 'writing,' 'writing.' 'wrong' 'wsn' 'wsntv' 'x-mailer:none' 'years' 'yes,' 'yet,' 'you' "you'd" "you'll" "you're" 'you,' 'your' second message rec'd 12-02-03 ---------------------------- Spam Score: 0% (8.88734e-013) word spamprob #ham #spam '*H*' 1 - - '*S*' 0 - - 'feedback' 0.0463033 66 3 'jobs' 0.0522704 40 2 'editor' 0.0535317 39 2 'at:' 0.0629429 107 7 "i'd" 0.0637449 47 3 'interesting' 0.0663369 31 2 '2003,' 0.0678725 85 6 '2003' 0.0737041 416 33 'hope' 0.078316 108 9 'services,' 0.0795076 37 3 'educational' 0.0946334 21 2 'did' 0.0955481 134 14 'spent' 0.0988499 20 2 'perform' 0.0991147 29 3 'url:newsletter' 0.0991147 29 3 'editorial' 0.10346 19 2 "you'd" 0.10537 44 5 'lead' 0.108054 51 6 'released' 0.114102 17 2 'know.' 0.114647 40 5 'copyright' 0.115228 101 13 'books' 0.117175 24 3 'inc.' 0.119021 149 20 'non-profit' 0.120288 16 2 'yet,' 0.120288 16 2 'since' 0.121167 146 20 'consultant' 0.121606 23 3 'developing' 0.121606 23 3 'rights' 0.122985 108 15 'group' 0.12519 71 10 'care' 0.125417 50 7 'case' 0.125417 50 7 'news.' 0.126387 22 3 'noise' 0.126387 22 3 'innovative' 0.127184 15 2 'table' 0.127184 15 2 'between' 0.12873 89 13 'hear' 0.128757 62 9 'still' 0.131436 133 20 'thought' 0.132466 60 9 'safety' 0.133985 27 4 'development.' 0.134919 14 2 'using' 0.135915 223 35 'rather' 0.136966 64 10 'this,' 0.13967 38 6 'members' 0.140566 56 9 'trying' 0.142782 49 8 'caused' 0.143283 19 3 'newsletter.' 0.143283 19 3 'edition,' 0.143655 13 2 'original' 0.145367 89 15 'bit' 0.146253 36 6 'was' 0.151597 319 57 'album' 0.153601 12 2 'contents' 0.153601 12 2 "we're" 0.155415 66 12 'servers' 0.157302 17 3 'including' 0.158365 144 27 'yes,' 0.158547 38 7 'host' 0.160599 27 5 'had' 0.160809 157 30 'it.' 0.16166 125 24 'release' 0.162604 42 8 'say' 0.16309 93 18 'during' 0.163769 62 12 'part' 0.167466 100 20 'physical' 0.16932 40 8 'broadband' 0.174363 15 3 'industry' 0.176954 75 16 'site.' 0.177519 47 10 'issues,' 0.178289 10 2 'subject: (' 0.178289 10 2 'western' 0.178289 10 2 'users' 0.178836 42 9 'contact' 0.179544 183 40 'speak' 0.179719 19 4 'effort' 0.182383 41 9 'reach' 0.182383 41 9 'education' 0.182675 32 7 'both' 0.187571 113 26 'customers.' 0.187729 18 4 'news' 0.187983 117 27 'issue' 0.188469 78 18 'corporate' 0.188816 35 8 'directly' 0.189935 56 13 'link' 0.192469 122 29 'fell' 0.193868 9 2 'group.' 0.193868 9 2 'jobs,' 0.193868 9 2 'chapter' 0.195575 13 3 'introduction' 0.195575 13 3 'trend' 0.195575 13 3 'speed' 0.197719 29 7 'december' 0.198233 41 10 'each' 0.199012 109 27 'newsletter' 0.201923 44 11 'first' 0.202177 158 40 'leading' 0.20246 36 9 'him' 0.204306 55 14 'high-speed' 0.2048 20 5 'overview' 0.2048 20 5 'worldwide.' 0.208241 12 3 'when' 0.208954 261 69 'always' 0.2093 91 24 'world' 0.209694 72 19 'social' 0.210856 23 6 'card' 0.211239 49 13 'focusing' 0.212432 8 2 'highway' 0.212432 8 2 'programs,' 0.212432 8 2 'reviews' 0.212432 8 2 '...' 0.213536 52 14 'where' 0.213651 118 32 'month,' 0.215464 26 7 'sending' 0.216261 62 17 'his' 0.218999 93 26 'book' 0.221906 46 13 'through' 0.222143 203 58 'talk' 0.223966 42 12 'technology' 0.224165 73 21 'unsubscribe' 0.225658 110 32 'some' 0.225992 260 76 'included' 0.226441 38 11 'few' 0.226592 89 26 'people' 0.228217 132 39 "what's" 0.22974 54 16 'those' 0.229959 104 31 'five' 0.231118 37 11 "haven't" 0.231118 37 11 'beyond' 0.234789 20 6 'positive' 0.234789 20 6 'url:content' 0.234927 7 2 'voluntary' 0.234927 7 2 'header:Reply-To:1' 0.771068 178 602 'debt,' 0.774695 1 4 'finest' 0.774695 1 4 'relations.' 0.774695 1 4 'treasured' 0.786554 2 8 'works!' 0.786554 2 8 '$10,000' 0.805182 2 9 'books,' 0.805182 2 9 'awaits' 0.809601 1 5 'stimulates' 0.809601 1 5 'unique,' 0.809601 1 5 'message-id:@inbound-mx9.atl.registeredsite.com' 0.82082 2 10 'health,' 0.822113 5 24 'x-mailer:microsoft outlook express 6.00.2600.0000' 0.833224 8 41 'admiration' 0.835142 1 6 'url:php' 0.857859 34 207 '100%' 0.876571 13 94 Message Stream: Return-Path: Received: from inbound-mx9.atl.registeredsite.com ([64.224.219.101]) by imta03a2.registeredsite.com with ESMTP id <20031202173332.KEAV18169.imta03a2.registeredsite.com@inbound-mx9.atl.regist eredsite.com> for ; Tue, 2 Dec 2003 12:33:32 -0500 Received: from abc74746.com ([218.52.79.104]) by inbound-mx9.atl.registeredsite.com (8.12.9/8.12.9) with SMTP id hB2HWqTB011397 for ; Tue, 2 Dec 2003 12:32:53 -0500 Message-Id: <200312021732.hB2HWqTB011397@inbound-mx9.atl.registeredsite.com> From: "Anne Normandy" Reply-To: info@wsntv7511.com To: sethg@goodmanassociates.com Date: Wed, 3 Dec 2003 02:43:30 +0900 Subject: From the Trenches of Fiber Home Society (ADV) X-Mailer: Microsoft Outlook Express 6.00.2600.0000 MIME-Version: 1.0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-SPAM-Tagged: The server [218.52.79.104] that sent this message is listed on bl.spamcop.net WSNTV Newsletter WSNTV 75 studio
image of buildings

 

Newsletter
Issue 1/Nov. 2003

Table of Content

 

Editorial: A word from our chairman

It is with great admiration for the work of our WSN TV professionals, that I am so enthusiastic in sending this, our first edition, of the official WSN TV 75 Newsletter.
This month, I want to take the opportunity to introduce you to just a few of these wonderfully creative people.

The professionals presenting their work through WSN TV.
WSN TV have brought their gifts and talents to you in the the form of products, services, books, newsletters, film and video. Many more are in the works!

We are building a content site that will host some of the finest work of gifted film makers, video creators, writers, documentary makers, artist, musicians, inventors, manufacturers and marketers.

One of the things we did was attending the FTTH Convention in New Orleans.

This is where the gurus of the fiber to the home industry gather and learn about important issues, case studies, law and technology in the industry.

Why do we care about high-speed fiber to our homes?  Because it means you, the consumer, will be able to obtain movies, videos and a whole array of entertaining and
educational programs with just a click.

Soon, you won't have to get in your car to go get a video or wait for the mail to arrive. You'll simply click your mouse, find what you want and pay online to get it when you want it.

The Commissioner of the FCC was the lead speaker at the FTTH Convention.  He talked about fiber deployment in the United States.  Asian countries like Korea are leading the way in demonstrating that it stimulates economic development.

Ah yes, this is an exciting time to be an internet entrepreneur!
Have content of your own? Contact us if you'd like to hear more about WSN TV info@wsntv7511.com and how we can help you earn more income from your creative work.

Meantime, please check out our professionals'sites!

ANNE NORMANDY
Chairman, WSN TV 75

 

Five Cent wish: Indie-rock, anyone?

Indie-rock, anyone? Since the release of their 2002 debut EP, "Actions Speak Louder Than Apologies," Five Cent Wish has been amassing a dedicated legion of fans from all over the world with their energetic live performances and emotionally fueled brand of indie-rock. With a highly anticipated, debut full-length album to be released in December of 2003 and each member of the four-piece group still in high school, a bright future awaits the members of this innovative group. To learn more about the band and be part of their experience, click the link below.
http://wsntv7511. com/content/fivecentwish/Home.html

 

From the Trenches of Fiber Home Society
An Editorial from the Foundation

The evolution of society to one based on transference of information through fiber rather than by truck or plane has created both interesting and challenging times for traditional business and people.

Some interesting social trends have emerged, both positive and negative from the introduction of high speed fiber connected directly to the home or business.  
Korea is one of the countries that has responded by deploying fiber massively.

On a positive note, more IT friendly people make extra money by using fiber homes for hosting web sites, marketing, and creating content.  They save more money by having cheaper, faster and better education, entertainment, health, banking, socializing and playing from their super fast fiber homes.

The negative side of this trend is that those who haven't adapted by learning IT have lost salaried jobs and retirement funds by investing in obsolete physical business.
Offices, shopping malls and factories are declining by fiber home servers based new business.
In some countries this loss of jobs, business and retirement funds, and hope has even resulted in an increase in the suicide rate by people who are affected.  After trying to survive by using credit card debt, 20% of credit card users and loans are now delinquent.


During the past 5 years of 100% broadband penetration, 15% of college graduates couldn't get jobs by corporate massive downsizing, even Korean Ivy league graduates couldn't get jobs after years spent pursuing undergraduate, Masters and Ph.D. degrees.

Solution: Education of the whole population about IT and broadband impact on their life. Developing a voluntary social safety net to rescue information super highway roadkill victims who are against a major global trend.
WSN is leading the effort to build a social safety net globally by growing 10 million non-profit foundation trusts by 2025, targeting to pay an average $10,000 annual dividend for 7 billion people in the world.
More info: http://wsntv7511.com/index.php?content=wsntvbp.htm

 

Writer Extraordinaire

When it comes to writing, Jack Burney is just about the most captivating writer I know. His talent was evident the first time he ever contacted us with his story about how he fell in love in writing.  And best of all?he can perform that magic for clients who want
to add to their marketing and public relations.

Jack is a Writer, Journalist, Marketing and Public Relations Consultant for Hire.  He is adept at conceiving unique, results-oriented communication programs and projects for his clients to reach their customers.

He creates and delivers marketing, sales promotion, advertising and public relations programs, newsletters, news release campaigns, direct mail campaigns. His clients have included such Notables as Dupont, and the EPA. Here you'll see an example of his work as Editor in Chief of The Stock Trader News. Subscriptions available :-)
More info: http://www.wsntv7511.com/content/stocktradernews

 

London Series: Digital artwork

Both an artist and a multimedia expert, Rudolf Boogerman has taken art to the public through this series of digital images. Focusing on London as the artist pallet, he spent a great deal of time there viewing the historic buildings dwarved by the skyscrapers and the interesting collage it presented.
Seeing it as "architectural jewels tucked between steel and glass" with all the hues and patterns caused by overhanging wires, street noise and dust allowed him to create what you see here in his art site.

A truly talented artist, you must see his work at:
http://www.wsntv7511.com/conte nt/raboo

 

130 Great Books in one Great Book

If you have ever wanted to know a bit about the great books of western thought authored over the centuries, anything from Homer to Nietzche you can't go wrong buying "The Great Books". Say you want to get a quick overview of Euclid, because your boss is a math fan, or you want to impress a date, or you're giving a talk to the rotary club, or you name it ... it's as easy as clicking here and paying 49 cents.
Better yet, buy the whole book of condensed reviews for a mere pittance. The first ten chapters in the series are now available.
The whole book will be online later this fall. To see what's in store for you, we're making one chapter available FREE for the asking. Wish I'd have known about this book when I had to do my book reviews in school :-)

More info: http://www.wsntv7511.com/content/greatbooks

 

The Dazzling Brilliance of Quartz Art Sculpture

As one of the original sculptors to use Quartz as a medium for sculpting, artist Sergey Shirokov has gone beyond traditional art to create one of a kind, treasured masterpieces.

This art is rare and beautiful. After training in Russia with some of the leading masters of blown glass art and crystal sculpting, Sergey went on to create his own special style of quartz art sculpture that commands the attention of the most devoted art enthusiasts worldwide.

He creates unique sculptures for the Art Collector including customized pieces based on the specifications and requests of his clients.

See his work at http://wsntv7511.com/cont ent/quartzcrafters

 

Unsubscribe

We hate to see you go, but we always honour your wish to be removed from our subscriber list at any time. Please allow ten days before the removal instructions take effect.
Click here to unsubscribe

[ Table of Contents | About WSNTV | Feedback | Unsubscrib e ]

Copyright WSNTV 75 Inc. 2003, All rights reserved Message Tokens: 684 unique tokens '$10,000' '"the' '...' '1/nov.' '100%' '130' '15%' '20%' '2002' '2003' '2003,' '2025,' ':-)' 'able' 'about' 'adapted' 'add' 'adept' 'admiration' 'advertising' 'affected.' 'after' 'against' 'album' 'all' 'all?he' 'allow' 'allowed' 'always' 'amassing' 'and' 'anne' 'annual' 'anticipated,' 'any' 'anyone?' 'anything' 'are' 'array' 'arrive.' 'art' 'artist' 'artist,' 'artwork' 'asian' 'asking.' 'at:' 'attending' 'attention' 'authored' 'available' 'available.' 'average' 'awaits' 'band' 'banking,' 'based' 'beautiful.' 'because' 'been' 'before' 'below.' 'best' 'better' 'between' 'beyond' 'billion' 'bit' 'blown' 'boogerman' 'book' 'books' 'books".' 'books,' 'boss' 'both' 'brand' 'bright' 'brilliance' 'broadband' 'brought' 'build' 'building' 'buildings' 'burney' 'business' 'business.' 'but' 'buy' 'buying' 'campaigns,' 'campaigns.' 'can' "can't" 'captivating' 'car' 'card' 'care' 'case' 'caused' 'cc:none' 'cent' 'cents.' 'centuries,' 'chairman' 'chairman,' 'challenging' 'chapter' 'chapters' 'cheaper,' 'check' 'chief' 'click' 'click.' 'clicking' 'clients' 'clients.' 'club,' 'collage' 'collector' 'college' 'comes' 'commands' 'commissioner' 'conceiving' 'condensed' 'connected' 'consultant' 'consumer,' 'contact' 'contacted' 'content' 'content-type:text/plain' 'content.' 'contents' 'convention' 'convention.' 'copyright' 'corporate' "couldn't" 'countries' 'create' 'created' 'creates' 'creating' 'creative' 'creators,' 'credit' 'crystal' 'customers.' 'customized' 'date,' 'days' 'dazzling' 'deal' 'debt,' 'debut' 'december' 'declining' 'dedicated' 'degrees.' 'delinquent.' 'delivers' 'deploying' 'deployment' 'developing' 'development.' 'devoted' 'did' 'digital' 'direct' 'directly' 'dividend' 'documentary' 'downsizing,' 'dupont,' 'during' 'dust' 'dwarved' 'each' 'earn' 'easy' 'economic' 'edition,' 'editor' 'editorial' 'editorial:' 'education' 'education,' 'educational' 'effect.' 'effort' 'email addr:wsntv7511.com' 'email name:info' 'emerged,' 'emotionally' 'energetic' 'entertaining' 'enthusiastic' 'enthusiasts' 'ep,' 'epa.' 'euclid,' 'even' 'ever' 'evident' 'evolution' 'example' 'exciting' 'experience,' 'expert,' 'extra' 'factories' 'fall.' 'fan,' 'fans' 'fast' 'faster' 'fcc' 'feedback' 'fell' 'few' 'fiber' 'film' 'find' 'finest' 'first' 'five' 'focusing' 'for' 'form' 'foundation' 'four-piece' 'free' 'friendly' 'from' 'from:addr:info' 'from:addr:wsntv75111studio.com' 'from:name:anne normandy' 'ftth' 'fueled' 'full-length' 'funds' 'funds,' 'future' 'gather' 'get' 'gifted' 'gifts' 'giving' 'glass' 'glass"' 'global' 'globally' 'go,' 'gone' 'graduates' 'great' 'group' 'group.' 'growing' 'gurus' 'had' 'has' 'hate' 'have' "haven't" 'having' 'header:Date:1' 'header:From:1' 'header:MIME-Version:1' 'header:Message-Id:1' 'header:Received:2' 'header:Reply-To:1' 'header:Return-Path:1' 'header:Subject:1' 'header:To:1' 'health,' 'hear' 'help' 'here' 'high' 'high-speed' 'highly' 'highway' 'him' 'hire.' 'his' 'historic' 'home' 'homer' 'homes' 'homes.' 'homes?' 'honour' 'hope' 'host' 'hosting' 'how' 'hues' "i'd" 'images.' 'impact' 'important' 'impress' 'inc.' 'included' 'including' 'income' 'increase' 'indie-rock,' 'indie-rock.' 'industry' 'industry.' 'info:' 'information' 'innovative' 'instructions' 'interesting' 'internet' 'introduce' 'introduction' 'inventors,' 'investing' 'issue' 'issues,' "it's" 'it.' 'ivy' 'jack' 'jewels' 'jobs' 'jobs,' 'journalist,' 'just' 'kind,' 'know' 'know.' 'known' 'korea' 'korean' 'later' 'law' 'lead' 'leading' 'league' 'learn' 'learning' 'legion' 'life.' 'like' 'link' 'list' 'live' 'loans' 'london' 'loss' 'lost' 'louder' 'love' 'magic' 'mail' 'major' 'make' 'makers,' 'making' 'malls' 'many' 'marketers.' 'marketing' 'marketing,' 'massive' 'massively.' 'masters' 'math' 'means' 'meantime,' 'medium' 'member' 'members' 'mere' 'message-id:@inbound-mx9.atl.registeredsite.com' 'million' 'money' 'month,' 'more' 'most' 'mouse,' 'movies,' 'multimedia' 'musicians,' 'must' 'name' 'negative' 'net' 'new' 'news' 'news.' 'newsletter' 'newsletter.' 'newsletters,' 'nietzche' 'noise' 'non-profit' 'normandy' 'notables' 'note,' 'now' 'obsolete' 'obtain' 'offices,' 'official' 'one' 'online' 'opportunity' 'original' 'orleans.' 'our' 'out' 'over' 'overhanging' 'overview' 'own' 'own?' 'pallet,' 'part' 'past' 'patterns' 'pay' 'paying' 'penetration,' 'people' 'people.' 'perform' 'performances' 'ph.d.' 'physical' 'pieces' 'pittance.' 'plane' 'playing' 'please' 'population' 'positive' 'presented.' 'presenting' 'products,' 'programs' 'programs,' 'projects' 'promotion,' 'proto:http' 'public' 'pursuing' 'quartz' 'quick' 'rare' 'rate' 'rather' 'reach' 'relations' 'relations.' 'release' 'released' 'removal' 'removed' 'reply-to:addr:info' 'reply-to:addr:wsntv7511.com' 'reply-to:no real name:2**0' 'requests' 'rescue' 'reserved' 'responded' 'resulted' 'retirement' 'reviews' 'rights' 'roadkill' 'rotary' 'rudolf' 'russia' 'safety' 'salaried' 'sales' 'save' 'say' 'school' 'school,' 'sculpting,' 'sculptors' 'sculpture' 'sculptures' 'see' 'seeing' 'sender:none' 'sending' 'sergey' 'series' 'series:' 'servers' 'services,' 'shirokov' 'shopping' 'side' 'simply' 'since' 'site' 'site.' 'sites,' 'skip:& 10' 'skip:a 10' 'skip:c 10' 'skip:d 10' 'skip:e 10' 'skip:m 10' 'skip:p 10' 'skip:p 20' 'skip:r 10' 'skip:s 10' 'skip:u 10' 'skyscrapers' 'social' 'socializing' 'society' 'solution:' 'some' 'soon,' 'speak' 'speaker' 'special' 'speed' 'spent' 'states.' 'steel' 'still' 'stimulates' 'stock' 'store' 'story' 'street' 'studies,' 'studio' 'style' 'subject: ' 'subject: (' 'subject:)' 'subject:ADV' 'subject:Fiber' 'subject:From' 'subject:Home' 'subject:Society' 'subject:Trenches' 'subject:the' 'subscriber' 'such' 'suicide' 'super' 'survive' 'table' 'take' 'taken' 'talent' 'talented' 'talents' 'talk' 'talked' 'targeting' 'technology' 'ten' 'than' 'that' 'the' 'their' 'there' 'these' 'they' 'things' 'this' 'this,' 'those' 'thought' 'through' 'time' 'time.' 'times' 'to:2**0' 'to:addr:goodmanassociates.com' 'to:addr:sethg' 'to:no real name:2**0' 'trader' 'traditional' 'training' 'transference' 'treasured' 'trenches' 'trend' 'trend.' 'trends' 'truck' 'truly' 'trusts' 'trying' 'tucked' 'tv.' 'unique' 'unique,' 'united' 'unsubscribe' 'url:com' 'url:content' 'url:fivecentwish' 'url:greatbooks' 'url:home' 'url:htm' 'url:html' 'url:index' 'url:jpg' 'url:news_spot' 'url:newsletter' 'url:php' 'url:quartzcrafters' 'url:raboo' 'url:stocktradernews' 'url:studio' 'url:wsntv7511' 'url:wsntvbp' 'url:www' 'use' 'users' 'using' 'victims' 'video' 'video.' 'videos' 'viewing' 'voluntary' 'wait' 'want' 'wanted' 'was' 'way' "we're" 'web' 'went' 'western' 'what' "what's" 'when' 'where' 'who' 'whole' 'why' 'will' 'wires,' 'wish' 'wish:' 'with' "won't" 'wonderfully' 'word' 'work' 'work.' 'works!' 'world' 'world.' 'worldwide.' 'writer' 'writer,' 'writers,' 'writing,' 'writing.' 'wrong' 'wsn' 'wsntv' 'x-mailer:microsoft outlook express 6.00.2600.0000' 'years' 'yes,' 'yet,' 'you' "you'd" "you'll" "you're" 'you,' 'your' From nobody at spamcop.net Tue Dec 2 15:01:27 2003 From: nobody at spamcop.net (Seth Goodman) Date: Tue Dec 2 15:01:30 2003 Subject: [Spambayes] training problem? In-Reply-To: Message-ID: > Attached are two similar spams that I trained on. Since they are For some reason, the attached spam samples did not make it with my previous post. I'll try again. -- Seth Goodman Humans: personal replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From kennypitt at hotmail.com Tue Dec 2 15:04:36 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Tue Dec 2 15:05:09 2003 Subject: [Spambayes] training problem? In-Reply-To: Message-ID: Seth Goodman wrote: > Attached are two similar spams that I trained on. Since they are > regular-looking newsletters that I can't succeed in opting out of, > I'm not surprised that they look hammy to the classifier. However, > many of the tokens that the tokenizer found were *not* listed in the > spam score section for either message, despite the fact that these > tokens appears in both trained spam. Conspicuously absent are the > tokens 'subject:ADV', 'email addr:wsntv7511.com' and 'email > name:info'. [snip] > > Message Tokens: > > 684 unique tokens SpamBayes will use at most 150 tokens to determine the spam probability, while the complete message has 684. SpamBayes chooses the 150 strongest tokens (i.e. those with probabilities farthest from a neutral 0.5), and the rest are not used so are only shown in the Message Tokens section. SpamBayes also ignores any tokens that don't have a probability <0.4 or >0.6. -- Kenny Pitt From tim.one at comcast.net Tue Dec 2 15:07:22 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 2 15:07:23 2003 Subject: [Spambayes] training problem? In-Reply-To: Message-ID: [Seth Goodman] >> Attached are two similar spams that I trained on. Since they are > For some reason, the attached spam samples did not make it with my > previous post. I'll try again. Please don't. The full text of your spam samples already got sent to the list, and sending multiple 150KB messages would be anti-social . From nobody at spamcop.net Tue Dec 2 15:18:47 2003 From: nobody at spamcop.net (Seth Goodman) Date: Tue Dec 2 15:18:49 2003 Subject: [Spambayes] training problem? In-Reply-To: Message-ID: > > Attached are two similar spams that I trained on. Since they are > > For some reason, the attached spam samples did not make it with > my previous > post. I'll try again. OK, mailing list: 2, poster: 0. Looks like attachments are stripped for our protection. If anyone needs the original two spam referenced in the original post, contact me off-line and I will forward them. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From tim.one at comcast.net Tue Dec 2 15:53:16 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 2 15:53:17 2003 Subject: [Spambayes] training problem? In-Reply-To: Message-ID: [Kenny Pitt] > SpamBayes will use at most 150 tokens to determine the spam > probability, while the complete message has 684. SpamBayes chooses > the 150 strongest tokens (i.e. those with probabilities farthest from > a neutral 0.5), and the rest are not used so are only shown in the > Message Tokens section. That's right. Note that this 150 is the default value of the Classifier's max_discriminators option. Setting it much higher than that can cause numerical problems in the inverse chi-squared probability computation, specifically at the # XXX If x2 is very large, exp(-m) will underflow to 0. comment in chi2Q(). Testing showed that the exact value of max_discriminators didn't matter much, provided it was at least 30 (or so). Then again, most emails don't have 150 tokens, let alone 150 strong ones. From dave at boost-consulting.com Tue Dec 2 16:30:33 2003 From: dave at boost-consulting.com (David Abrahams) Date: Tue Dec 2 16:31:02 2003 Subject: [Spambayes] suggestions for training and filtering? Message-ID: I've been pretty happy using SpamBayes. For a long time I had things set up so everything classified as spam would get thrown out, and so I'd classify everything that came up "unsure", with automatic training every night. I still had to manually throw out a few spams a day, but I was also missing some things like news stories my wife sent me from Yahoo, "forgot your password?" response emails, and the like. So I switched to discarding only "spam 1.00" messages for a while, and classifying everything else. Now I'm discarding all spam down to 0.96. I still get quite a few messages to classify, about 30 a day. My spam training folder has 24 Meg in it. My Ham folder has 68 Meg. I feel as though the quality of my filtering has sort of levelled off. >From other peoples' reports, I get the feeling I could be doing much better. Any suggestions? Thanks in advance, -- Dave Abrahams Boost Consulting www.boost-consulting.com From jacob-spambayes-list at statisticalanomaly.com Tue Dec 2 16:46:38 2003 From: jacob-spambayes-list at statisticalanomaly.com (Jacob Farmer) Date: Tue Dec 2 16:45:45 2003 Subject: [Spambayes] suggestions for training and filtering? In-Reply-To: References: Message-ID: <3FCD083E.7020206@statisticalanomaly.com> David, This has come up before, and I think the general solution is to get your Spam:Ham ratio to about 1:1. My guess is your's is way off. I get nearly perfect results with about 600 of each, and I haven't trained since. Jacob David Abrahams wrote: > My spam training folder has 24 Meg in it. My Ham folder has 68 Meg. > I feel as though the quality of my filtering has sort of levelled off. From nobody at spamcop.net Tue Dec 2 19:01:25 2003 From: nobody at spamcop.net (Seth Goodman) Date: Tue Dec 2 19:01:30 2003 Subject: [Spambayes] training problem? In-Reply-To: Message-ID: [Kenny Pitt] > > Message Tokens: > > > > 684 unique tokens > > SpamBayes will use at most 150 tokens to determine the spam probability, > while the complete message has 684. SpamBayes chooses the 150 strongest > tokens (i.e. those with probabilities farthest from a neutral 0.5), and > the rest are not used so are only shown in the Message Tokens section. > SpamBayes also ignores any tokens that don't have a probability <0.4 or > >0.6. [Tim Peters] > That's right. Note that this 150 is the default value of the Classifier's > max_discriminators option. Setting it much higher than that can cause > numerical problems in the inverse chi-squared probability computation, > specifically at the > > # XXX If x2 is very large, exp(-m) will underflow to 0. > > comment in chi2Q(). Testing showed that the exact value of > max_discriminators didn't matter much, provided it was at least > 30 (or so). > Then again, most emails don't have 150 tokens, let alone 150 strong ones. Thanks to both of you for clearing this up. The present problem I am fighting is false negatives. The two messages I posted about in this thread were just examples. Performance is obviously highly dependent on initial training set size and subsequent training strategy, but I have not done terribly well with false negatives (yet!). I now have two weeks worth of data using the following tactics: 1) Initial training set 650 spam, 654 ham on 11-16-03. 2) Initial filter thresholds 90/15. 3) Train on any spam that scores below 50, any ham that scores above 15. Filter all unread mail after each training event to simulate 4) On 11-22-03, changed filter thresholds to 90/5. Train on any ham that scores above 5. Trained 154 additional ham to rebalance databases. In reality, very, very few of the false negatives scored between 5 and 15, so the threshold change did not make a large difference. 5) On 11-29-03, trained 118 additional ham to rebalance databases. Here are my results: date spam fn fn% fp fp% comments -------- ---- -- ----- -- ---- -------- 11-17-03 137 18 13.1% 0 0.0% first full day after training 11-18-03 157 14 8.9% 0 0.0% 11-19-03 135 11 8.2% 0 0.0% 11-20-03 157 13 8.3% 0 0.0% 11-21-03 147 9 6.1% 0 0.0% 11-22-03 166 8 4.8% 0 0.0% trained 154 add'l ham, lowered ham threshold 11-23-03 164 11 6.7% 0 0.0% 11-24-03 146 3 2.1% 0 0.0% 11-25-03 154 5 3.3% 0 0.0% 11-26-03 133 3 2.3% 0 0.0% 11-27-03 134 0 0.0% 0 0.0% 11-28-03 135 8 5.9% 0 0.0% 11-29-03 152 7 4.6% 0 0.0% trained 118 add'l ham 11-30-03 138 6 4.4% 0 0.0% 12-01-03 157 9 5.7% 0 0.0% 12-02-03 106 8 7.6% 0 0.0% partial day, not yet complete SpamBayes currently has trained 926 ham and 929 spam. The very good news is no false positives, and that seems to be the forte of this program. It appears that the system reached an optimum around 11-27-03 and has gotten worse after that. Alternately, you could interpret this as stabilized by 11-21-03 with a few unusually good days following that. This false negative rate is similar to the results I had before, though I did not use a pre-defined training scheme as I do now. My questions are: 1) Is this typical or should I expect better? 2) What training tactics would you suggest that might work better? Under the assumption that the basic classifier has undergone lots of testing and is well-optimized, my guess is that most future performance improvements, aside from bug fixes and parsing changes, will result from training strategy. Hoping that this is not completely misguided, I put some ideas on training tactics on the wiki at http://entrian.com/sbwiki/TrainingIdeas. Comments, corrections and feedback would be most appreciated. I have no idea how many of these ideas have already been tried and the results known. As I don't care to waste other people's time with old or naive ideas, let me know if that wiki discussion out to lunch and I'll either fix it or rip it down. -- Seth Goodman Humans: personal replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From mbeloff at comcast.net Tue Dec 2 20:55:31 2003 From: mbeloff at comcast.net (Marv Beloff) Date: Tue Dec 2 20:55:46 2003 Subject: [Spambayes] Where are my e-mails? Message-ID: Hi, I installed Spambayes a few weeks ago and reported to my cybersenior group how pleased I was with it. I have two web sites as a sculptor www.woodbowties.com and www.sanddevil.com . I created two folders as directed JUNK MAIL and JUNK MAYBE. It was working wonderfully. I got a little careless and instead of deleting JUNK MAIL as was my daily activity I deleted the JUNK MAIL folder itself which caused me to have to go into Spambayes manager and reinstall with your wizard. I actually did this a couple of times and the last time was yesterday afternoon. I received no e-mails for the next 24 hours. I contacted friends and had them send me an e-mail. It did not make it through. In order to again receive e-mails I uninstalled the Spambayes. My question is Where are my lost e-mails? I have a few important ones and orders and cannot find them anywhere. Also, would like to suggest that you put a fix-in so that if one inadvertently deletes a JUNK MAIL folder I would have to click on a box that questions whether I really want to do this. Hope you can help me find my mail and reinstall what I believe is a super software system to quickly eliminate the nasty, disgusting, degrading and uninvited mail that fills my INBOX. Best wishes, Marv Beloff, mbeloff@comcast.net , 800-974-3557 From tim.one at comcast.net Tue Dec 2 21:18:24 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 2 21:18:27 2003 Subject: [Spambayes] training problem? In-Reply-To: Message-ID: [Seth Goodman] > The present problem I am fighting is false negatives. What do you mean by false negative? We use it here to mean spam scoring below your ham cutoff. > The two messages I posted about in this thread were just examples. One would have sufficed . > Performance is obviously highly dependent on initial training set > size and subsequent training strategy, but I have not done terribly > well with false negatives (yet!). I now have two weeks worth of data > using the following tactics: > > 1) Initial training set 650 spam, 654 ham on 11-16-03. > > 2) Initial filter thresholds 90/15. So by "false negative" here you mean spam scoring below 15? If so, I have no theory, as I see maybe one of those per month (with about 700 emails per day, including 200-250 daily spam). > 3) Train on any spam that scores below 50, any ham that scores above > 15. Filter all unread mail after each training event to simulate If your spam cutoff is 90, why do you only train on spam scoring below 50? Something doesn't sound right here. > 4) On 11-22-03, changed filter thresholds to 90/5. Train on any ham > that scores above 5. Trained 154 additional ham to rebalance > databases. In reality, very, very few of the false negatives scored > between 5 and 15, so the threshold change did not make a large > difference. Sorry, still don't know what you mean by false negative. If you meant the conventional "scored below 15" (your former ham cutoff), yet very, very few of them scored between 5 and 15, it must mean that almost all of your false negatives are scoring below 5. Is that what you mean? > 5) On 11-29-03, trained 118 additional ham to rebalance databases. > > Here are my results: > > date spam fn fn% fp fp% comments > -------- ---- -- ----- -- ---- -------- > 11-17-03 137 18 13.1% 0 0.0% first full day after training > 11-18-03 157 14 8.9% 0 0.0% > 11-19-03 135 11 8.2% 0 0.0% > 11-20-03 157 13 8.3% 0 0.0% > 11-21-03 147 9 6.1% 0 0.0% > 11-22-03 166 8 4.8% 0 0.0% trained 154 add'l ham, lowered > ham threshold > 11-23-03 164 11 6.7% 0 0.0% > 11-24-03 146 3 2.1% 0 0.0% > 11-25-03 154 5 3.3% 0 0.0% > 11-26-03 133 3 2.3% 0 0.0% > 11-27-03 134 0 0.0% 0 0.0% > 11-28-03 135 8 5.9% 0 0.0% > 11-29-03 152 7 4.6% 0 0.0% trained 118 add'l ham > 11-30-03 138 6 4.4% 0 0.0% > 12-01-03 157 9 5.7% 0 0.0% > 12-02-03 106 8 7.6% 0 0.0% partial day, not yet complete > > SpamBayes currently has trained 926 ham and 929 spam. The very good > news is no false positives, and that seems to be the forte of this > program. I expect it varies by person, but yes, I most often hear that people who have used many spam gimmicks are most surprised by this gimmick's low-to-zero FP rate. > It appears that the system reached an optimum around 11-27-03 and has > gotten worse after that. Alternately, you could interpret this as > stabilized by 11-21-03 with a few unusually good days following that. > This false negative rate is similar to the results I had before, though > I did not use a pre-defined training scheme as I do now. My questions > are: > > 1) Is this typical The only believable answer to that would have to come from broad testing. At best, we had a peak of about a dozen active testers here, but half the most active were focused on high-volume email filtering, not personal application. IOW, the broad testing needed to answer that question has never been done. > or should I expect better? Ditto. My own FP and FN rates are trivial (I'm genuinely surprised to see any spam in my Inbox, and shocked to see a ham in my Spam folder, using cutoffs of 20 and 80). My Unsure rate (scores between 20 and 80) is heading toward 5% -- but I don't care (I review all my spam anyway, and I'm on enough admin-type mailing lists that I get a ton of weird email -- I can't myself decide whether fully half the stuff in my Unsure folder is "really ham" or "really spam", and toss it untrained after mentally shrugging). > 2) What training tactics would you suggest that might work better? Until we know you meant by false negative, none. If you're calling spam that ends up Unsure "false negative", then reducing your spam cutoff should help. If you really are getting lots of spam scoring below 5, then that's something I've never heard of before (anyone?). > Under the assumption that the basic classifier has undergone lots of > testing and is well-optimized, my guess is that most future > performance improvements, aside from bug fixes and parsing changes, > will result from training strategy. Hoping that this is not > completely misguided, I put some ideas on training tactics on the > wiki at http://entrian.com/sbwiki/TrainingIdeas. Comments, > corrections and feedback would be most appreciated. I have no idea > how many of these ideas have already been tried and the results > known. As I don't care to waste other people's time with old or > naive ideas, let me know if that wiki discussion out to lunch and > I'll either fix it or rip it down. Thanks! That's an excellent use for the Wiki. People who disagree can add their disagreements directly to the Wiki page. Wikis are best when the authors cooperate to change the text as agreements appear, so it doesn't end up as a static ever-growing argument . From joef at taylorriver.com Tue Dec 2 21:33:17 2003 From: joef at taylorriver.com (Joe Farrell) Date: Tue Dec 2 21:33:31 2003 Subject: [Spambayes] Cannot Get SpamBayes to Install Message-ID: <000001c3b945$d139e460$6501a8c0@DELL> I have tried to install SB three times, and each time I got an error message at the end of the installation saying that it could not register the a .dll file called DLL/OCX. Has anyone had a similar experience? Do you know what the solution could be? From tim.one at comcast.net Tue Dec 2 21:36:09 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 2 21:36:13 2003 Subject: [Spambayes] Where are my e-mails? In-Reply-To: Message-ID: [Marv Beloff] > I installed Spambayes a few weeks ago and reported to my cybersenior > group how pleased I was with it. > ... > I created two folders as directed JUNK MAIL and JUNK MAYBE. > It was working wonderfully. I got a little careless and instead of > deleting JUNK MAIL as was my daily activity I deleted the JUNK MAIL > folder itself which caused me to have to go into Spambayes manager > and reinstall with your wizard. Is this the Outlook addin (there are many ways to use SpamBayes, and we're guessing about how you're using it)? If so, you should have been able to use the SpamBayes manager's Filtering tab to tell SpamBayes about your new JUNK MAIL folder (it may have had the same name to you, but it would have been an entirely different folder to Outlook). > I actually did this a couple of times and the last time was yesterday > afternoon. I received no e-mails for the next 24 hours. I contacted > friends and had them send me an e-mail. It's easier to send yourself email (just send an email to your own email address). > It did not make it through. So far, you haven't said anything to make me suspect that it's not just your ISP email account that isn't working. I'm not sure what "not make it through" means, exactly, either. It *sounds* like new email wasn't even downloaded from your ISP. > In order to again receive e-mails I uninstalled the Spambayes. Did that work? > My question is Where are my lost e-mails? Sorry, don't know. The Outlook addin never deletes email. You should try using Outlook's Advanced Find (Tools -> Advanced Find) function to search for one of them. The others will probably be found in the same place, assuming Outlook actually downloaded any of these emails to begin with. > I have a few important ones and orders and cannot find them > anywhere. Also, would like to suggest that you put a fix-in so that > if one inadvertently deletes a JUNK MAIL folder I would have to click > on a box that questions whether I really want to do this. Sorry, you'll have to ask Microsoft for that. Deleting, and moving, folders is *way* too easy to do my mistake in Outlook, but deleting and moving folders is strictly between you and Outlook -- Outlook doesn't ask SpamBayes whether it's OK. By the way, when you deleted your folder, didn't Outlook pop up a box asking you whether you really wanted to do that? If it didn't, your old JUNK MAIL folder is probably hiding now as a subfolder in your Deleted Items folder. From tameyer at ihug.co.nz Tue Dec 2 21:41:20 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Dec 2 21:41:33 2003 Subject: [Spambayes] Where are my e-mails? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13044781F1@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1CD@its-xchg4.massey.ac.nz> [Tim] > By the way, when you deleted your folder, didn't Outlook > pop up a box asking you whether you really wanted to do that? > If it didn't, your old JUNK MAIL folder is probably hiding > now as a subfolder in your Deleted Items folder. I don't know about all versions of Outlook, but I get this "are you sure" warning even if I'm just moving the folder into the Deleted Items folder. It would seem pointless to add a second check, even if it was a simple addition to the plug-in. BTW, has anyone else noticed that in the last week or two these 'missing mail' posts have become quite common? (Although I note that it's never ended up being spambayes at fault). Maybe we need a FAQ for this, too? =Tony Meyer From mdhpub at blueyonder.co.uk Tue Dec 2 21:46:17 2003 From: mdhpub at blueyonder.co.uk (Mathew Hendry) Date: Tue Dec 2 21:46:22 2003 Subject: [Spambayes] Table munging defeats SpamBayes Message-ID: Here's a new one on me. I spotted just now when looking through my spam corpus for "low scorers". The rendered version of the spam (In Outlook 2003 anyway) looks like this: ------------------------------begin All Rx Products Consultation at no cost No embarassing M.D. visits I want to know more ------------------------------end Most of that text is broken up into tiny pieces and inserted into tables, followed by a huge block filled mostly with randomized English text but also, at the end, containing the web site href and "I want to know more". SpamBayes gobbles up all the text inside the but doesn't spot the contents of the table because each apparent token is only 1-3 characters long. Fortunately the spammer scuppered s/h/itself a bit by including some spammy text in the block. That and my SpamAssassin/SpamCop mods pushed the score up 50%, so it didn't escape entirely. Return-Path: Delivered-To: MUNGED Received: (qmail 30533 invoked from network); 1 Dec 2003 19:56:10 -0000 Received: from unknown (HELO blade2.cesmail.net) (192.168.1.212) by blade4.cesmail.net with SMTP; 1 Dec 2003 19:56:10 -0000 Received: (qmail 27132 invoked from network); 1 Dec 2003 19:38:24 -0000 Received: from mailgate.cesmail.net (216.154.195.36) by blade2.cesmail.net with SMTP; 1 Dec 2003 19:38:22 -0000 Received: (qmail 12032 invoked from network); 1 Dec 2003 19:38:20 -0000 Received: from unknown (HELO mailgate.cesmail.net) (192.168.1.101) by mailgate.cesmail.net with SMTP; 1 Dec 2003 19:38:20 -0000 X-Message-Info: +va6sINXO+/559KzjEJWSK7H4PWBM17OSlLJppzGa8s= Received: from popgate.cesmail.net [192.168.1.201] by mailgate.cesmail.net with POP3 (fetchmail-6.2.1) for MUNGED (single-drop); Mon, 01 Dec 2003 14:38:20 -0500 (EST) Received: from mc3-f6.hotmail.com ([64.4.50.142]) by mc3-s4.hotmail.com with Microsoft SMTPSVC(5.0.2195.6713); Mon, 1 Dec 2003 10:09:31 -0800 Received: from mail.ganesparanx.com ([61.32.9.47]) by mc3-f6.hotmail.com with Microsoft SMTPSVC(5.0.2195.6713); Mon, 1 Dec 2003 10:09:14 -0800 Message-ID: Date: Mon, 01 Dec 2003 13:04:40 -0900 (EST) From: "sharaf his new book I mostly specialize in creating a little magic in the" To: MUNGED Subject: peqm Prescribed meds from hoomme ddm MIME-Version: 1.0 Content-Type: text/html X-OriginalArrivalTime: 01 Dec 2003 18:09:15.0515 (UTC) FILETIME=[3B0FECB0:01C3B836] X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on blade4 X-Spam-Level: ******** X-Spam-Status: hits=8.3 tests=FORGED_HOTMAIL_RCVD,HTML_90_100,HTML_MESSAGE, HTML_MIME_NO_HTML_TAG,LARGE_HEX,MIME_HTML_NO_CHARSET,MIME_HTML_ONLY, USERPASS version=2.60 X-SpamCop-Checked: 192.168.1.212 216.154.195.36 192.168.1.101 192.168.1.201 64.4.50.142 61.32.9.47 X-SpamCop-Disposition: Blocked bl.spamcop.net
Al
  
Co
No

  
ns
 e
Rx 
   
ult
mba
Pr
  
at
ra
odu
   
ion
ssi
ct
  
 a
ng

  

 M
 
 
n
.
  
  

D.
< /tr>
   
   
cos
 vi
 
 
t
s
 
 
 
i
   
   
   
ts 
From tameyer at ihug.co.nz Tue Dec 2 21:52:36 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Dec 2 21:52:42 2003 Subject: [Spambayes] Restore from Spam In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304477EF3@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1CF@its-xchg4.massey.ac.nz> > I click the recover button and the message is sent to > the inbox of my personal folders. It's not recovered to the > imap folder in which the message originated like I'd expect it to be. > > Is this a known issue? Is there a work around? Or maybe I've > misconfigured something? IIRC, the plug-in 'recovers' the mail to the Inbox when something goes wrong trying to recover it to the original folder, or when saving the information about which folder was the original. I suspect that one of these is the case here (I seem to recall IMAP being difficult in this respect). Would you be able to attach your latest log file (instructions in the troubleshooting guide)? This will have the information required to see what is going wrong. =Tony Meyer From tim.one at comcast.net Tue Dec 2 22:13:09 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 2 22:13:12 2003 Subject: [Spambayes] Where are my e-mails? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130212B1CD@its-xchg4.massey.ac.nz> Message-ID: [Tony Meyer] > I don't know about all versions of Outlook, but I get this "are you > sure" warning even if I'm just moving the folder into the Deleted > Items folder. I do not (Outlook 2000 SP3), although there are well-hidden options in Outlook that can affect when and whether you get warnings. > It would seem pointless to add a second check, even if > it was a simple addition to the plug-in. So maybe Marv disabled Outlook's warning and wants us to change his mind . > BTW, has anyone else noticed that in the last week or two these > 'missing mail' posts have become quite common? (Although I note that > it's never ended up being spambayes at fault). Yup, and it's pretty mysterious to see so many pop up all of a sudden. > Maybe we need a FAQ for this, too? I don't know how to answer that FAQ yet: where *do* people find their email in the end? Is there a pattern to it? I haven't seen many followup msgs from people with this complaint. From sysadmin at scr.siemens.com Tue Dec 2 22:13:38 2003 From: sysadmin at scr.siemens.com (sysadmin@scr.siemens.com) Date: Tue Dec 2 22:13:48 2003 Subject: [Spambayes] Blocked Mail Notification Message-ID: <200312030313.hB33Dh308556@scr.siemens.com> ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Tue, 02 Dec 2003 22:13:35 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB33DY308549 for ; Tue, 2 Dec 2003 22:13:34 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte2.siemens.com ([212.114.202.115]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB33Davd006205 for ; Tue, 2 Dec 2003 22:13:36 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte2.siemens.com (8.11.6/8.11.6) with ESMTP id hB33DLF06802 for ; Wed, 3 Dec 2003 04:13:21 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1ARNRz-0002XN-Ks; Tue, 02 Dec 2003 22:13:15 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 12 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Tue, 02 Dec 2003 22:13:15 -0500 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Cannot Get SpamBayes to Install (Joe Farrell) 2. RE: Where are my e-mails? (Tim Peters) 3. RE: Where are my e-mails? (Tony Meyer) 4. Table munging defeats SpamBayes (Mathew Hendry) 5. RE: Restore from Spam (Tony Meyer) 6. RE: Where are my e-mails? (Tim Peters) ---------------------------------------------------------------------- Message: 1 Date: Tue, 2 Dec 2003 21:33:17 -0500 From: "Joe Farrell" Subject: [Spambayes] Cannot Get SpamBayes to Install To: Message-ID: <000001c3b945$d139e460$6501a8c0@DELL> Content-Type: text/plain; charset="us-ascii" I have tried to install SB three times, and each time I got an error message at the end of the installation saying that it could not register the a .dll file called DLL/OCX. Has anyone had a similar experience? Do you know what the solution could be? ------------------------------ Message: 2 Date: Tue, 2 Dec 2003 21:36:09 -0500 From: "Tim Peters" Subject: RE: [Spambayes] Where are my e-mails? To: "Marv Beloff" Cc: spambayes@python.org Message-ID: Content-Type: text/plain; charset="iso-8859-1" [Marv Beloff] > I installed Spambayes a few weeks ago and reported to my cybersenior > group how pleased I was with it. > ... > I created two folders as directed JUNK MAIL and JUNK MAYBE. > It was working wonderfully. I got a little careless and instead of > deleting JUNK MAIL as was my daily activity I deleted the JUNK MAIL > folder itself which caused me to have to go into Spambayes manager > and reinstall with your wizard. Is this the Outlook addin (there are many ways to use SpamBayes, and we're guessing about how you're using it)? If so, you should have been able to use the SpamBayes manager's Filtering tab to tell SpamBayes about your new JUNK MAIL folder (it may have had the same name to you, but it would have been an entirely different folder to Outlook). > I actually did this a couple of times and the last time was yesterday > afternoon. I received no e-mails for the next 24 hours. I contacted > friends and had them send me an e-mail. It's easier to send yourself email (just send an email to your own email address). > It did not make it through. So far, you haven't said anything to make me suspect that it's not just your ISP email account that isn't working. I'm not sure what "not make it through" means, exactly, either. It *sounds* like new email wasn't even downloaded from your ISP. > In order to again receive e-mails I uninstalled the Spambayes. Did that work? > My question is Where are my lost e-mails? Sorry, don't know. The Outlook addin never deletes email. You should try using Outlook's Advanced Find (Tools -> Advanced Find) function to search for one of them. The others will probably be found in the same place, assuming Outlook actually downloaded any of these emails to begin with. > I have a few important ones and orders and cannot find them > anywhere. Also, would like to suggest that you put a fix-in so that > if one inadvertently deletes a JUNK MAIL folder I would have to click > on a box that questions whether I really want to do this. Sorry, you'll have to ask Microsoft for that. Deleting, and moving, folders is *way* too easy to do my mistake in Outlook, but deleting and moving folders is strictly between you and Outlook -- Outlook doesn't ask SpamBayes whether it's OK. By the way, when you deleted your folder, didn't Outlook pop up a box asking you whether you really wanted to do that? If it didn't, your old JUNK MAIL folder is probably hiding now as a subfolder in your Deleted Items folder. ------------------------------ Message: 3 Date: Wed, 3 Dec 2003 15:41:20 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Where are my e-mails? To: "'Tim Peters'" , "'Marv Beloff'" Cc: spambayes@python.org Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1CD@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset="us-ascii" [Tim] > By the way, when you deleted your folder, didn't Outlook > pop up a box asking you whether you really wanted to do that? > If it didn't, your old JUNK MAIL folder is probably hiding > now as a subfolder in your Deleted Items folder. I don't know about all versions of Outlook, but I get this "are you sure" warning even if I'm just moving the folder into the Deleted Items folder. It would seem pointless to add a second check, even if it was a simple addition to the plug-in. BTW, has anyone else noticed that in the last week or two these 'missing mail' posts have become quite common? (Although I note that it's never ended up being spambayes at fault). Maybe we need a FAQ for this, too? =Tony Meyer ------------------------------ Message: 4 Date: Wed, 3 Dec 2003 02:46:17 -0000 From: "Mathew Hendry" Subject: [Spambayes] Table munging defeats SpamBayes To: Message-ID: Content-Type: text/plain; charset="US-ASCII" Here's a new one on me. I spotted just now when looking through my spam corpus for "low scorers". The rendered version of the spam (In Outlook 2003 anyway) looks like this: ------------------------------begin All Rx Products Consultation at no cost No embarassing M.D. visits I want to know more ------------------------------end Most of that text is broken up into tiny pieces and inserted into tables, followed by a huge block filled mostly with randomized English text but also, at the end, containing the web site href and "I want to know more". SpamBayes gobbles up all the text inside the but doesn't spot the contents of the table because each apparent token is only 1-3 characters long. Fortunately the spammer scuppered s/h/itself a bit by including some spammy text in the block. That and my SpamAssassin/SpamCop mods pushed the score up 50%, so it didn't escape entirely. Return-Path: Delivered-To: MUNGED Received: (qmail 30533 invoked from network); 1 Dec 2003 19:56:10 -0000 Received: from unknown (HELO blade2.cesmail.net) (192.168.1.212) by blade4.cesmail.net with SMTP; 1 Dec 2003 19:56:10 -0000 Received: (qmail 27132 invoked from network); 1 Dec 2003 19:38:24 -0000 Received: from mailgate.cesmail.net (216.154.195.36) by blade2.cesmail.net with SMTP; 1 Dec 2003 19:38:22 -0000 Received: (qmail 12032 invoked from network); 1 Dec 2003 19:38:20 -0000 Received: from unknown (HELO mailgate.cesmail.net) (192.168.1.101) by mailgate.cesmail.net with SMTP; 1 Dec 2003 19:38:20 -0000 X-Message-Info: +va6sINXO+/559KzjEJWSK7H4PWBM17OSlLJppzGa8s= Received: from popgate.cesmail.net [192.168.1.201] by mailgate.cesmail.net with POP3 (fetchmail-6.2.1) for MUNGED (single-drop); Mon, 01 Dec 2003 14:38:20 -0500 (EST) Received: from mc3-f6.hotmail.com ([64.4.50.142]) by mc3-s4.hotmail.com with Microsoft SMTPSVC(5.0.2195.6713); Mon, 1 Dec 2003 10:09:31 -0800 Received: from mail.ganesparanx.com ([61.32.9.47]) by mc3-f6.hotmail.com with Microsoft SMTPSVC(5.0.2195.6713); Mon, 1 Dec 2003 10:09:14 -0800 Message-ID: Date: Mon, 01 Dec 2003 13:04:40 -0900 (EST) From: "sharaf his new book I mostly specialize in creating a little magic in the" To: MUNGED Subject: peqm Prescribed meds from hoomme ddm MIME-Version: 1.0 Content-Type: text/html X-OriginalArrivalTime: 01 Dec 2003 18:09:15.0515 (UTC) FILETIME=[3B0FECB0:01C3B836] X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on blade4 X-Spam-Level: ******** X-Spam-Status: hits=8.3 tests=FORGED_HOTMAIL_RCVD,HTML_90_100,HTML_MESSAGE, HTML_MIME_NO_HTML_TAG,LARGE_HEX,MIME_HTML_NO_CHARSET,MIME_HTML_ONLY, USERPASS version=2.60 X-SpamCop-Checked: 192.168.1.212 216.154.195.36 192.168.1.101 192.168.1.201 64.4.50.142 61.32.9.47 X-SpamCop-Disposition: Blocked bl.spamcop.net
Al
  
Co
No

  
ns
 e
Rx 
   
ult
mba
Pr
  
at
ra
odu
   
ion
ssi
ct
  
 a
ng

  

 M
 
 
n
.
  
  

D.
< /tr>
   
   
cos
 vi
 
 
t
s
 
 
 
i
   
   
   
ts 
------------------------------ Message: 5 Date: Wed, 3 Dec 2003 15:52:36 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Restore from Spam To: "'Mark Sears'" , Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1CF@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset="us-ascii" > I click the recover button and the message is sent to > the inbox of my personal folders. It's not recovered to the > imap folder in which the message originated like I'd expect it to be. > > Is this a known issue? Is there a work around? Or maybe I've > misconfigured something? IIRC, the plug-in 'recovers' the mail to the Inbox when something goes wrong trying to recover it to the original folder, or when saving the information about which folder was the original. I suspect that one of these is the case here (I seem to recall IMAP being difficult in this respect). Would you be able to attach your latest log file (instructions in the troubleshooting guide)? This will have the information required to see what is going wrong. =Tony Meyer ------------------------------ Message: 6 Date: Tue, 2 Dec 2003 22:13:09 -0500 From: "Tim Peters" Subject: RE: [Spambayes] Where are my e-mails? To: "Tony Meyer" , "'Marv Beloff'" Cc: spambayes@python.org Message-ID: Content-Type: text/plain; charset="iso-8859-1" [Tony Meyer] > I don't know about all versions of Outlook, but I get this "are you > sure" warning even if I'm just moving the folder into the Deleted > Items folder. I do not (Outlook 2000 SP3), although there are well-hidden options in Outlook that can affect when and whether you get warnings. > It would seem pointless to add a second check, even if > it was a simple addition to the plug-in. So maybe Marv disabled Outlook's warning and wants us to change his mind . > BTW, has anyone else noticed that in the last week or two these > 'missing mail' posts have become quite common? (Although I note that > it's never ended up being spambayes at fault). Yup, and it's pretty mysterious to see so many pop up all of a sudden. > Maybe we need a FAQ for this, too? I don't know how to answer that FAQ yet: where *do* people find their email in the end? Is there a pattern to it? I haven't seen many followup msgs from people with this complaint. ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 12 ***************************************** From tameyer at ihug.co.nz Tue Dec 2 22:25:39 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Dec 2 22:25:56 2003 Subject: [Spambayes] Where are my e-mails? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130447820C@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1D0@its-xchg4.massey.ac.nz> > > BTW, has anyone else noticed that in the last week or two these > > 'missing mail' posts have become quite common? (Although I > > note that it's never ended up being spambayes at fault). > > Yup, and it's pretty mysterious to see so many pop up all of a sudden. Especially since it's so long since the last release! > > Maybe we need a FAQ for this, too? > > I don't know how to answer that FAQ yet: where *do* people > find their email in the end? Is there a pattern to it? I > haven't seen many followup msgs from people with this complaint. I've seen a few (people replying off-list), although there's not a clear pattern. I've added a FAQ anyway, since I was doing a 'how to uninstall' one anyway; as we find out more information we can add to it. If anyone is reading this that has 'lost' mail and then found it again - it would be great to hear from you if there's more to finding it than is in the FAQ. =Tony Meyer From sysadmin at scr.siemens.com Tue Dec 2 22:26:20 2003 From: sysadmin at scr.siemens.com (sysadmin@scr.siemens.com) Date: Tue Dec 2 22:26:27 2003 Subject: [Spambayes] Blocked Mail Notification Message-ID: <200312030326.hB33QL308674@scr.siemens.com> ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Tue, 02 Dec 2003 22:26:16 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB33QF308669 for ; Tue, 2 Dec 2003 22:26:15 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte5.siemens.com (lte5.siemens.com [217.194.35.73]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB33QHvd006352 for ; Tue, 2 Dec 2003 22:26:18 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte5.siemens.com (8.11.6/8.11.2) with ESMTP id hB33Q2D28057 for ; Wed, 3 Dec 2003 04:26:02 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1ARNeG-0005w2-Ec; Tue, 02 Dec 2003 22:25:56 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 13 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Tue, 02 Dec 2003 22:25:56 -0500 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Blocked Mail Notification (sysadmin@scr.siemens.com) 2. RE: Where are my e-mails? (Tony Meyer) ---------------------------------------------------------------------- Message: 1 Date: Tue, 02 Dec 2003 22:13:38 -0500 From: sysadmin@scr.siemens.com Subject: [Spambayes] Blocked Mail Notification To: spambayes@python.org Message-ID: <200312030313.hB33Dh308556@scr.siemens.com> Content-Type: text/plain; charset="iso-8859-1" ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Tue, 02 Dec 2003 22:13:35 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB33DY308549 for ; Tue, 2 Dec 2003 22:13:34 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte2.siemens.com ([212.114.202.115]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB33Davd006205 for ; Tue, 2 Dec 2003 22:13:36 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte2.siemens.com (8.11.6/8.11.6) with ESMTP id hB33DLF06802 for ; Wed, 3 Dec 2003 04:13:21 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1ARNRz-0002XN-Ks; Tue, 02 Dec 2003 22:13:15 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 12 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Tue, 02 Dec 2003 22:13:15 -0500 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Cannot Get SpamBayes to Install (Joe Farrell) 2. RE: Where are my e-mails? (Tim Peters) 3. RE: Where are my e-mails? (Tony Meyer) 4. Table munging defeats SpamBayes (Mathew Hendry) 5. RE: Restore from Spam (Tony Meyer) 6. RE: Where are my e-mails? (Tim Peters) ---------------------------------------------------------------------- Message: 1 Date: Tue, 2 Dec 2003 21:33:17 -0500 From: "Joe Farrell" Subject: [Spambayes] Cannot Get SpamBayes to Install To: Message-ID: <000001c3b945$d139e460$6501a8c0@DELL> Content-Type: text/plain; charset="us-ascii" I have tried to install SB three times, and each time I got an error message at the end of the installation saying that it could not register the a .dll file called DLL/OCX. Has anyone had a similar experience? Do you know what the solution could be? ------------------------------ Message: 2 Date: Tue, 2 Dec 2003 21:36:09 -0500 From: "Tim Peters" Subject: RE: [Spambayes] Where are my e-mails? To: "Marv Beloff" Cc: spambayes@python.org Message-ID: Content-Type: text/plain; charset="iso-8859-1" [Marv Beloff] > I installed Spambayes a few weeks ago and reported to my cybersenior > group how pleased I was with it. > ... > I created two folders as directed JUNK MAIL and JUNK MAYBE. > It was working wonderfully. I got a little careless and instead of > deleting JUNK MAIL as was my daily activity I deleted the JUNK MAIL > folder itself which caused me to have to go into Spambayes manager > and reinstall with your wizard. Is this the Outlook addin (there are many ways to use SpamBayes, and we're guessing about how you're using it)? If so, you should have been able to use the SpamBayes manager's Filtering tab to tell SpamBayes about your new JUNK MAIL folder (it may have had the same name to you, but it would have been an entirely different folder to Outlook). > I actually did this a couple of times and the last time was yesterday > afternoon. I received no e-mails for the next 24 hours. I contacted > friends and had them send me an e-mail. It's easier to send yourself email (just send an email to your own email address). > It did not make it through. So far, you haven't said anything to make me suspect that it's not just your ISP email account that isn't working. I'm not sure what "not make it through" means, exactly, either. It *sounds* like new email wasn't even downloaded from your ISP. > In order to again receive e-mails I uninstalled the Spambayes. Did that work? > My question is Where are my lost e-mails? Sorry, don't know. The Outlook addin never deletes email. You should try using Outlook's Advanced Find (Tools -> Advanced Find) function to search for one of them. The others will probably be found in the same place, assuming Outlook actually downloaded any of these emails to begin with. > I have a few important ones and orders and cannot find them > anywhere. Also, would like to suggest that you put a fix-in so that > if one inadvertently deletes a JUNK MAIL folder I would have to click > on a box that questions whether I really want to do this. Sorry, you'll have to ask Microsoft for that. Deleting, and moving, folders is *way* too easy to do my mistake in Outlook, but deleting and moving folders is strictly between you and Outlook -- Outlook doesn't ask SpamBayes whether it's OK. By the way, when you deleted your folder, didn't Outlook pop up a box asking you whether you really wanted to do that? If it didn't, your old JUNK MAIL folder is probably hiding now as a subfolder in your Deleted Items folder. ------------------------------ Message: 3 Date: Wed, 3 Dec 2003 15:41:20 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Where are my e-mails? To: "'Tim Peters'" , "'Marv Beloff'" Cc: spambayes@python.org Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1CD@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset="us-ascii" [Tim] > By the way, when you deleted your folder, didn't Outlook > pop up a box asking you whether you really wanted to do that? > If it didn't, your old JUNK MAIL folder is probably hiding > now as a subfolder in your Deleted Items folder. I don't know about all versions of Outlook, but I get this "are you sure" warning even if I'm just moving the folder into the Deleted Items folder. It would seem pointless to add a second check, even if it was a simple addition to the plug-in. BTW, has anyone else noticed that in the last week or two these 'missing mail' posts have become quite common? (Although I note that it's never ended up being spambayes at fault). Maybe we need a FAQ for this, too? =Tony Meyer ------------------------------ Message: 4 Date: Wed, 3 Dec 2003 02:46:17 -0000 From: "Mathew Hendry" Subject: [Spambayes] Table munging defeats SpamBayes To: Message-ID: Content-Type: text/plain; charset="US-ASCII" Here's a new one on me. I spotted just now when looking through my spam corpus for "low scorers". The rendered version of the spam (In Outlook 2003 anyway) looks like this: ------------------------------begin All Rx Products Consultation at no cost No embarassing M.D. visits I want to know more ------------------------------end Most of that text is broken up into tiny pieces and inserted into tables, followed by a huge block filled mostly with randomized English text but also, at the end, containing the web site href and "I want to know more". SpamBayes gobbles up all the text inside the but doesn't spot the contents of the table because each apparent token is only 1-3 characters long. Fortunately the spammer scuppered s/h/itself a bit by including some spammy text in the block. That and my SpamAssassin/SpamCop mods pushed the score up 50%, so it didn't escape entirely. Return-Path: Delivered-To: MUNGED Received: (qmail 30533 invoked from network); 1 Dec 2003 19:56:10 -0000 Received: from unknown (HELO blade2.cesmail.net) (192.168.1.212) by blade4.cesmail.net with SMTP; 1 Dec 2003 19:56:10 -0000 Received: (qmail 27132 invoked from network); 1 Dec 2003 19:38:24 -0000 Received: from mailgate.cesmail.net (216.154.195.36) by blade2.cesmail.net with SMTP; 1 Dec 2003 19:38:22 -0000 Received: (qmail 12032 invoked from network); 1 Dec 2003 19:38:20 -0000 Received: from unknown (HELO mailgate.cesmail.net) (192.168.1.101) by mailgate.cesmail.net with SMTP; 1 Dec 2003 19:38:20 -0000 X-Message-Info: +va6sINXO+/559KzjEJWSK7H4PWBM17OSlLJppzGa8s= Received: from popgate.cesmail.net [192.168.1.201] by mailgate.cesmail.net with POP3 (fetchmail-6.2.1) for MUNGED (single-drop); Mon, 01 Dec 2003 14:38:20 -0500 (EST) Received: from mc3-f6.hotmail.com ([64.4.50.142]) by mc3-s4.hotmail.com with Microsoft SMTPSVC(5.0.2195.6713); Mon, 1 Dec 2003 10:09:31 -0800 Received: from mail.ganesparanx.com ([61.32.9.47]) by mc3-f6.hotmail.com with Microsoft SMTPSVC(5.0.2195.6713); Mon, 1 Dec 2003 10:09:14 -0800 Message-ID: Date: Mon, 01 Dec 2003 13:04:40 -0900 (EST) From: "sharaf his new book I mostly specialize in creating a little magic in the" To: MUNGED Subject: peqm Prescribed meds from hoomme ddm MIME-Version: 1.0 Content-Type: text/html X-OriginalArrivalTime: 01 Dec 2003 18:09:15.0515 (UTC) FILETIME=[3B0FECB0:01C3B836] X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on blade4 X-Spam-Level: ******** X-Spam-Status: hits=8.3 tests=FORGED_HOTMAIL_RCVD,HTML_90_100,HTML_MESSAGE, HTML_MIME_NO_HTML_TAG,LARGE_HEX,MIME_HTML_NO_CHARSET,MIME_HTML_ONLY, USERPASS version=2.60 X-SpamCop-Checked: 192.168.1.212 216.154.195.36 192.168.1.101 192.168.1.201 64.4.50.142 61.32.9.47 X-SpamCop-Disposition: Blocked bl.spamcop.net
Al
  
Co
No

  
ns
 e
Rx 
   
ult
mba
Pr
  
at
ra
odu
   
ion
ssi
ct
  
 a
ng

  

 M
 
 
n
.
  
  

D.
< /tr>
   
   
cos
 vi
 
 
t
s
 
 
 
i
   
   
   
ts 
------------------------------ Message: 5 Date: Wed, 3 Dec 2003 15:52:36 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Restore from Spam To: "'Mark Sears'" , Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1CF@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset="us-ascii" > I click the recover button and the message is sent to > the inbox of my personal folders. It's not recovered to the > imap folder in which the message originated like I'd expect it to be. > > Is this a known issue? Is there a work around? Or maybe I've > misconfigured something? IIRC, the plug-in 'recovers' the mail to the Inbox when something goes wrong trying to recover it to the original folder, or when saving the information about which folder was the original. I suspect that one of these is the case here (I seem to recall IMAP being difficult in this respect). Would you be able to attach your latest log file (instructions in the troubleshooting guide)? This will have the information required to see what is going wrong. =Tony Meyer ------------------------------ Message: 6 Date: Tue, 2 Dec 2003 22:13:09 -0500 From: "Tim Peters" Subject: RE: [Spambayes] Where are my e-mails? To: "Tony Meyer" , "'Marv Beloff'" Cc: spambayes@python.org Message-ID: Content-Type: text/plain; charset="iso-8859-1" [Tony Meyer] > I don't know about all versions of Outlook, but I get this "are you > sure" warning even if I'm just moving the folder into the Deleted > Items folder. I do not (Outlook 2000 SP3), although there are well-hidden options in Outlook that can affect when and whether you get warnings. > It would seem pointless to add a second check, even if > it was a simple addition to the plug-in. So maybe Marv disabled Outlook's warning and wants us to change his mind . > BTW, has anyone else noticed that in the last week or two these > 'missing mail' posts have become quite common? (Although I note that > it's never ended up being spambayes at fault). Yup, and it's pretty mysterious to see so many pop up all of a sudden. > Maybe we need a FAQ for this, too? I don't know how to answer that FAQ yet: where *do* people find their email in the end? Is there a pattern to it? I haven't seen many followup msgs from people with this complaint. ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 12 ***************************************** ------------------------------ Message: 2 Date: Wed, 3 Dec 2003 16:25:39 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Where are my e-mails? To: "'Tim Peters'" , "'Marv Beloff'" Cc: spambayes@python.org Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1D0@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset="us-ascii" > > BTW, has anyone else noticed that in the last week or two these > > 'missing mail' posts have become quite common? (Although I > > note that it's never ended up being spambayes at fault). > > Yup, and it's pretty mysterious to see so many pop up all of a sudden. Especially since it's so long since the last release! > > Maybe we need a FAQ for this, too? > > I don't know how to answer that FAQ yet: where *do* people > find their email in the end? Is there a pattern to it? I > haven't seen many followup msgs from people with this complaint. I've seen a few (people replying off-list), although there's not a clear pattern. I've added a FAQ anyway, since I was doing a 'how to uninstall' one anyway; as we find out more information we can add to it. If anyone is reading this that has 'lost' mail and then found it again - it would be great to hear from you if there's more to finding it than is in the FAQ. =Tony Meyer ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 13 ***************************************** From sysadmin at scr.siemens.com Tue Dec 2 22:26:49 2003 From: sysadmin at scr.siemens.com (sysadmin@scr.siemens.com) Date: Tue Dec 2 22:26:56 2003 Subject: [Spambayes] Blocked Mail Notification Message-ID: <200312030326.hB33Qp308682@scr.siemens.com> ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Tue, 02 Dec 2003 22:26:45 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB33Qi308678 for ; Tue, 2 Dec 2003 22:26:44 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte5.siemens.com (lte5.siemens.com [217.194.35.73]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB33Qkvd006360 for ; Tue, 2 Dec 2003 22:26:47 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte5.siemens.com (8.11.6/8.11.2) with ESMTP id hB33QVD28267 for ; Wed, 3 Dec 2003 04:26:32 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1ARNep-0006FL-1w; Tue, 02 Dec 2003 22:26:31 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 14 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Tue, 02 Dec 2003 22:26:31 -0500 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Blocked Mail Notification (sysadmin@scr.siemens.com) ---------------------------------------------------------------------- Message: 1 Date: Tue, 02 Dec 2003 22:26:20 -0500 From: sysadmin@scr.siemens.com Subject: [Spambayes] Blocked Mail Notification To: spambayes@python.org Message-ID: <200312030326.hB33QL308674@scr.siemens.com> Content-Type: text/plain; charset="iso-8859-1" ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Tue, 02 Dec 2003 22:26:16 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB33QF308669 for ; Tue, 2 Dec 2003 22:26:15 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte5.siemens.com (lte5.siemens.com [217.194.35.73]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB33QHvd006352 for ; Tue, 2 Dec 2003 22:26:18 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte5.siemens.com (8.11.6/8.11.2) with ESMTP id hB33Q2D28057 for ; Wed, 3 Dec 2003 04:26:02 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1ARNeG-0005w2-Ec; Tue, 02 Dec 2003 22:25:56 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 13 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Tue, 02 Dec 2003 22:25:56 -0500 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Blocked Mail Notification (sysadmin@scr.siemens.com) 2. RE: Where are my e-mails? (Tony Meyer) ---------------------------------------------------------------------- Message: 1 Date: Tue, 02 Dec 2003 22:13:38 -0500 From: sysadmin@scr.siemens.com Subject: [Spambayes] Blocked Mail Notification To: spambayes@python.org Message-ID: <200312030313.hB33Dh308556@scr.siemens.com> Content-Type: text/plain; charset="iso-8859-1" ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Tue, 02 Dec 2003 22:13:35 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB33DY308549 for ; Tue, 2 Dec 2003 22:13:34 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte2.siemens.com ([212.114.202.115]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB33Davd006205 for ; Tue, 2 Dec 2003 22:13:36 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte2.siemens.com (8.11.6/8.11.6) with ESMTP id hB33DLF06802 for ; Wed, 3 Dec 2003 04:13:21 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1ARNRz-0002XN-Ks; Tue, 02 Dec 2003 22:13:15 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 12 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Tue, 02 Dec 2003 22:13:15 -0500 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Cannot Get SpamBayes to Install (Joe Farrell) 2. RE: Where are my e-mails? (Tim Peters) 3. RE: Where are my e-mails? (Tony Meyer) 4. Table munging defeats SpamBayes (Mathew Hendry) 5. RE: Restore from Spam (Tony Meyer) 6. RE: Where are my e-mails? (Tim Peters) ---------------------------------------------------------------------- Message: 1 Date: Tue, 2 Dec 2003 21:33:17 -0500 From: "Joe Farrell" Subject: [Spambayes] Cannot Get SpamBayes to Install To: Message-ID: <000001c3b945$d139e460$6501a8c0@DELL> Content-Type: text/plain; charset="us-ascii" I have tried to install SB three times, and each time I got an error message at the end of the installation saying that it could not register the a .dll file called DLL/OCX. Has anyone had a similar experience? Do you know what the solution could be? ------------------------------ Message: 2 Date: Tue, 2 Dec 2003 21:36:09 -0500 From: "Tim Peters" Subject: RE: [Spambayes] Where are my e-mails? To: "Marv Beloff" Cc: spambayes@python.org Message-ID: Content-Type: text/plain; charset="iso-8859-1" [Marv Beloff] > I installed Spambayes a few weeks ago and reported to my cybersenior > group how pleased I was with it. > ... > I created two folders as directed JUNK MAIL and JUNK MAYBE. > It was working wonderfully. I got a little careless and instead of > deleting JUNK MAIL as was my daily activity I deleted the JUNK MAIL > folder itself which caused me to have to go into Spambayes manager > and reinstall with your wizard. Is this the Outlook addin (there are many ways to use SpamBayes, and we're guessing about how you're using it)? If so, you should have been able to use the SpamBayes manager's Filtering tab to tell SpamBayes about your new JUNK MAIL folder (it may have had the same name to you, but it would have been an entirely different folder to Outlook). > I actually did this a couple of times and the last time was yesterday > afternoon. I received no e-mails for the next 24 hours. I contacted > friends and had them send me an e-mail. It's easier to send yourself email (just send an email to your own email address). > It did not make it through. So far, you haven't said anything to make me suspect that it's not just your ISP email account that isn't working. I'm not sure what "not make it through" means, exactly, either. It *sounds* like new email wasn't even downloaded from your ISP. > In order to again receive e-mails I uninstalled the Spambayes. Did that work? > My question is Where are my lost e-mails? Sorry, don't know. The Outlook addin never deletes email. You should try using Outlook's Advanced Find (Tools -> Advanced Find) function to search for one of them. The others will probably be found in the same place, assuming Outlook actually downloaded any of these emails to begin with. > I have a few important ones and orders and cannot find them > anywhere. Also, would like to suggest that you put a fix-in so that > if one inadvertently deletes a JUNK MAIL folder I would have to click > on a box that questions whether I really want to do this. Sorry, you'll have to ask Microsoft for that. Deleting, and moving, folders is *way* too easy to do my mistake in Outlook, but deleting and moving folders is strictly between you and Outlook -- Outlook doesn't ask SpamBayes whether it's OK. By the way, when you deleted your folder, didn't Outlook pop up a box asking you whether you really wanted to do that? If it didn't, your old JUNK MAIL folder is probably hiding now as a subfolder in your Deleted Items folder. ------------------------------ Message: 3 Date: Wed, 3 Dec 2003 15:41:20 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Where are my e-mails? To: "'Tim Peters'" , "'Marv Beloff'" Cc: spambayes@python.org Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1CD@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset="us-ascii" [Tim] > By the way, when you deleted your folder, didn't Outlook > pop up a box asking you whether you really wanted to do that? > If it didn't, your old JUNK MAIL folder is probably hiding > now as a subfolder in your Deleted Items folder. I don't know about all versions of Outlook, but I get this "are you sure" warning even if I'm just moving the folder into the Deleted Items folder. It would seem pointless to add a second check, even if it was a simple addition to the plug-in. BTW, has anyone else noticed that in the last week or two these 'missing mail' posts have become quite common? (Although I note that it's never ended up being spambayes at fault). Maybe we need a FAQ for this, too? =Tony Meyer ------------------------------ Message: 4 Date: Wed, 3 Dec 2003 02:46:17 -0000 From: "Mathew Hendry" Subject: [Spambayes] Table munging defeats SpamBayes To: Message-ID: Content-Type: text/plain; charset="US-ASCII" Here's a new one on me. I spotted just now when looking through my spam corpus for "low scorers". The rendered version of the spam (In Outlook 2003 anyway) looks like this: ------------------------------begin All Rx Products Consultation at no cost No embarassing M.D. visits I want to know more ------------------------------end Most of that text is broken up into tiny pieces and inserted into tables, followed by a huge block filled mostly with randomized English text but also, at the end, containing the web site href and "I want to know more". SpamBayes gobbles up all the text inside the but doesn't spot the contents of the table because each apparent token is only 1-3 characters long. Fortunately the spammer scuppered s/h/itself a bit by including some spammy text in the block. That and my SpamAssassin/SpamCop mods pushed the score up 50%, so it didn't escape entirely. Return-Path: Delivered-To: MUNGED Received: (qmail 30533 invoked from network); 1 Dec 2003 19:56:10 -0000 Received: from unknown (HELO blade2.cesmail.net) (192.168.1.212) by blade4.cesmail.net with SMTP; 1 Dec 2003 19:56:10 -0000 Received: (qmail 27132 invoked from network); 1 Dec 2003 19:38:24 -0000 Received: from mailgate.cesmail.net (216.154.195.36) by blade2.cesmail.net with SMTP; 1 Dec 2003 19:38:22 -0000 Received: (qmail 12032 invoked from network); 1 Dec 2003 19:38:20 -0000 Received: from unknown (HELO mailgate.cesmail.net) (192.168.1.101) by mailgate.cesmail.net with SMTP; 1 Dec 2003 19:38:20 -0000 X-Message-Info: +va6sINXO+/559KzjEJWSK7H4PWBM17OSlLJppzGa8s= Received: from popgate.cesmail.net [192.168.1.201] by mailgate.cesmail.net with POP3 (fetchmail-6.2.1) for MUNGED (single-drop); Mon, 01 Dec 2003 14:38:20 -0500 (EST) Received: from mc3-f6.hotmail.com ([64.4.50.142]) by mc3-s4.hotmail.com with Microsoft SMTPSVC(5.0.2195.6713); Mon, 1 Dec 2003 10:09:31 -0800 Received: from mail.ganesparanx.com ([61.32.9.47]) by mc3-f6.hotmail.com with Microsoft SMTPSVC(5.0.2195.6713); Mon, 1 Dec 2003 10:09:14 -0800 Message-ID: Date: Mon, 01 Dec 2003 13:04:40 -0900 (EST) From: "sharaf his new book I mostly specialize in creating a little magic in the" To: MUNGED Subject: peqm Prescribed meds from hoomme ddm MIME-Version: 1.0 Content-Type: text/html X-OriginalArrivalTime: 01 Dec 2003 18:09:15.0515 (UTC) FILETIME=[3B0FECB0:01C3B836] X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on blade4 X-Spam-Level: ******** X-Spam-Status: hits=8.3 tests=FORGED_HOTMAIL_RCVD,HTML_90_100,HTML_MESSAGE, HTML_MIME_NO_HTML_TAG,LARGE_HEX,MIME_HTML_NO_CHARSET,MIME_HTML_ONLY, USERPASS version=2.60 X-SpamCop-Checked: 192.168.1.212 216.154.195.36 192.168.1.101 192.168.1.201 64.4.50.142 61.32.9.47 X-SpamCop-Disposition: Blocked bl.spamcop.net
Al
  
Co
No

  
ns
 e
Rx 
   
ult
mba
Pr
  
at
ra
odu
   
ion
ssi
ct
  
 a
ng

  

 M
 
 
n
.
  
  

D.
< /tr>
   
   
cos
 vi
 
 
t
s
 
 
 
i
   
   
   
ts 
------------------------------ Message: 5 Date: Wed, 3 Dec 2003 15:52:36 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Restore from Spam To: "'Mark Sears'" , Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1CF@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset="us-ascii" > I click the recover button and the message is sent to > the inbox of my personal folders. It's not recovered to the > imap folder in which the message originated like I'd expect it to be. > > Is this a known issue? Is there a work around? Or maybe I've > misconfigured something? IIRC, the plug-in 'recovers' the mail to the Inbox when something goes wrong trying to recover it to the original folder, or when saving the information about which folder was the original. I suspect that one of these is the case here (I seem to recall IMAP being difficult in this respect). Would you be able to attach your latest log file (instructions in the troubleshooting guide)? This will have the information required to see what is going wrong. =Tony Meyer ------------------------------ Message: 6 Date: Tue, 2 Dec 2003 22:13:09 -0500 From: "Tim Peters" Subject: RE: [Spambayes] Where are my e-mails? To: "Tony Meyer" , "'Marv Beloff'" Cc: spambayes@python.org Message-ID: Content-Type: text/plain; charset="iso-8859-1" [Tony Meyer] > I don't know about all versions of Outlook, but I get this "are you > sure" warning even if I'm just moving the folder into the Deleted > Items folder. I do not (Outlook 2000 SP3), although there are well-hidden options in Outlook that can affect when and whether you get warnings. > It would seem pointless to add a second check, even if > it was a simple addition to the plug-in. So maybe Marv disabled Outlook's warning and wants us to change his mind . > BTW, has anyone else noticed that in the last week or two these > 'missing mail' posts have become quite common? (Although I note that > it's never ended up being spambayes at fault). Yup, and it's pretty mysterious to see so many pop up all of a sudden. > Maybe we need a FAQ for this, too? I don't know how to answer that FAQ yet: where *do* people find their email in the end? Is there a pattern to it? I haven't seen many followup msgs from people with this complaint. ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 12 ***************************************** ------------------------------ Message: 2 Date: Wed, 3 Dec 2003 16:25:39 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Where are my e-mails? To: "'Tim Peters'" , "'Marv Beloff'" Cc: spambayes@python.org Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1D0@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset="us-ascii" > > BTW, has anyone else noticed that in the last week or two these > > 'missing mail' posts have become quite common? (Although I > > note that it's never ended up being spambayes at fault). > > Yup, and it's pretty mysterious to see so many pop up all of a sudden. Especially since it's so long since the last release! > > Maybe we need a FAQ for this, too? > > I don't know how to answer that FAQ yet: where *do* people > find their email in the end? Is there a pattern to it? I > haven't seen many followup msgs from people with this complaint. I've seen a few (people replying off-list), although there's not a clear pattern. I've added a FAQ anyway, since I was doing a 'how to uninstall' one anyway; as we find out more information we can add to it. If anyone is reading this that has 'lost' mail and then found it again - it would be great to hear from you if there's more to finding it than is in the FAQ. =Tony Meyer ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 13 ***************************************** ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 14 ***************************************** From tameyer at ihug.co.nz Tue Dec 2 22:26:54 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Dec 2 22:27:06 2003 Subject: [Spambayes] SpamBayes Q: Backup of PST file In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304477DDC@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1D1@its-xchg4.massey.ac.nz> [Tim] > Anyone? I run from CVS, so have no idea what the binary > installer sets up here (assuming the poster used a binary > installer, which seems too likely to question ). This > one is nearly a FAQ lately! Agreed, and done: (As always, someone correct me if I've made any mistakes in there! I run from CVS, too, although I've tested the binary on occasions.) =Tony Meyer From sysadmin at scr.siemens.com Tue Dec 2 22:27:22 2003 From: sysadmin at scr.siemens.com (sysadmin@scr.siemens.com) Date: Tue Dec 2 22:27:35 2003 Subject: [Spambayes] Blocked Mail Notification Message-ID: <200312030327.hB33RR308693@scr.siemens.com> ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Tue, 02 Dec 2003 22:27:18 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB33RH308689 for ; Tue, 2 Dec 2003 22:27:17 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte2.siemens.com ([212.114.202.115]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB33RIvd006369 for ; Tue, 2 Dec 2003 22:27:19 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte2.siemens.com (8.11.6/8.11.6) with ESMTP id hB33R3F11742 for ; Wed, 3 Dec 2003 04:27:03 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1ARNfJ-0006WF-TD; Tue, 02 Dec 2003 22:27:01 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 15 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Tue, 02 Dec 2003 22:27:01 -0500 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Blocked Mail Notification (sysadmin@scr.siemens.com) ---------------------------------------------------------------------- Message: 1 Date: Tue, 02 Dec 2003 22:26:49 -0500 From: sysadmin@scr.siemens.com Subject: [Spambayes] Blocked Mail Notification To: spambayes@python.org Message-ID: <200312030326.hB33Qp308682@scr.siemens.com> Content-Type: text/plain; charset="iso-8859-1" ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Tue, 02 Dec 2003 22:26:45 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB33Qi308678 for ; Tue, 2 Dec 2003 22:26:44 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte5.siemens.com (lte5.siemens.com [217.194.35.73]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB33Qkvd006360 for ; Tue, 2 Dec 2003 22:26:47 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte5.siemens.com (8.11.6/8.11.2) with ESMTP id hB33QVD28267 for ; Wed, 3 Dec 2003 04:26:32 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1ARNep-0006FL-1w; Tue, 02 Dec 2003 22:26:31 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 14 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Tue, 02 Dec 2003 22:26:31 -0500 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Blocked Mail Notification (sysadmin@scr.siemens.com) ---------------------------------------------------------------------- Message: 1 Date: Tue, 02 Dec 2003 22:26:20 -0500 From: sysadmin@scr.siemens.com Subject: [Spambayes] Blocked Mail Notification To: spambayes@python.org Message-ID: <200312030326.hB33QL308674@scr.siemens.com> Content-Type: text/plain; charset="iso-8859-1" ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Tue, 02 Dec 2003 22:26:16 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB33QF308669 for ; Tue, 2 Dec 2003 22:26:15 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte5.siemens.com (lte5.siemens.com [217.194.35.73]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB33QHvd006352 for ; Tue, 2 Dec 2003 22:26:18 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte5.siemens.com (8.11.6/8.11.2) with ESMTP id hB33Q2D28057 for ; Wed, 3 Dec 2003 04:26:02 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1ARNeG-0005w2-Ec; Tue, 02 Dec 2003 22:25:56 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 13 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Tue, 02 Dec 2003 22:25:56 -0500 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Blocked Mail Notification (sysadmin@scr.siemens.com) 2. RE: Where are my e-mails? (Tony Meyer) ---------------------------------------------------------------------- Message: 1 Date: Tue, 02 Dec 2003 22:13:38 -0500 From: sysadmin@scr.siemens.com Subject: [Spambayes] Blocked Mail Notification To: spambayes@python.org Message-ID: <200312030313.hB33Dh308556@scr.siemens.com> Content-Type: text/plain; charset="iso-8859-1" ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Tue, 02 Dec 2003 22:13:35 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB33DY308549 for ; Tue, 2 Dec 2003 22:13:34 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte2.siemens.com ([212.114.202.115]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB33Davd006205 for ; Tue, 2 Dec 2003 22:13:36 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte2.siemens.com (8.11.6/8.11.6) with ESMTP id hB33DLF06802 for ; Wed, 3 Dec 2003 04:13:21 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1ARNRz-0002XN-Ks; Tue, 02 Dec 2003 22:13:15 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 12 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Tue, 02 Dec 2003 22:13:15 -0500 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Cannot Get SpamBayes to Install (Joe Farrell) 2. RE: Where are my e-mails? (Tim Peters) 3. RE: Where are my e-mails? (Tony Meyer) 4. Table munging defeats SpamBayes (Mathew Hendry) 5. RE: Restore from Spam (Tony Meyer) 6. RE: Where are my e-mails? (Tim Peters) ---------------------------------------------------------------------- Message: 1 Date: Tue, 2 Dec 2003 21:33:17 -0500 From: "Joe Farrell" Subject: [Spambayes] Cannot Get SpamBayes to Install To: Message-ID: <000001c3b945$d139e460$6501a8c0@DELL> Content-Type: text/plain; charset="us-ascii" I have tried to install SB three times, and each time I got an error message at the end of the installation saying that it could not register the a .dll file called DLL/OCX. Has anyone had a similar experience? Do you know what the solution could be? ------------------------------ Message: 2 Date: Tue, 2 Dec 2003 21:36:09 -0500 From: "Tim Peters" Subject: RE: [Spambayes] Where are my e-mails? To: "Marv Beloff" Cc: spambayes@python.org Message-ID: Content-Type: text/plain; charset="iso-8859-1" [Marv Beloff] > I installed Spambayes a few weeks ago and reported to my cybersenior > group how pleased I was with it. > ... > I created two folders as directed JUNK MAIL and JUNK MAYBE. > It was working wonderfully. I got a little careless and instead of > deleting JUNK MAIL as was my daily activity I deleted the JUNK MAIL > folder itself which caused me to have to go into Spambayes manager > and reinstall with your wizard. Is this the Outlook addin (there are many ways to use SpamBayes, and we're guessing about how you're using it)? If so, you should have been able to use the SpamBayes manager's Filtering tab to tell SpamBayes about your new JUNK MAIL folder (it may have had the same name to you, but it would have been an entirely different folder to Outlook). > I actually did this a couple of times and the last time was yesterday > afternoon. I received no e-mails for the next 24 hours. I contacted > friends and had them send me an e-mail. It's easier to send yourself email (just send an email to your own email address). > It did not make it through. So far, you haven't said anything to make me suspect that it's not just your ISP email account that isn't working. I'm not sure what "not make it through" means, exactly, either. It *sounds* like new email wasn't even downloaded from your ISP. > In order to again receive e-mails I uninstalled the Spambayes. Did that work? > My question is Where are my lost e-mails? Sorry, don't know. The Outlook addin never deletes email. You should try using Outlook's Advanced Find (Tools -> Advanced Find) function to search for one of them. The others will probably be found in the same place, assuming Outlook actually downloaded any of these emails to begin with. > I have a few important ones and orders and cannot find them > anywhere. Also, would like to suggest that you put a fix-in so that > if one inadvertently deletes a JUNK MAIL folder I would have to click > on a box that questions whether I really want to do this. Sorry, you'll have to ask Microsoft for that. Deleting, and moving, folders is *way* too easy to do my mistake in Outlook, but deleting and moving folders is strictly between you and Outlook -- Outlook doesn't ask SpamBayes whether it's OK. By the way, when you deleted your folder, didn't Outlook pop up a box asking you whether you really wanted to do that? If it didn't, your old JUNK MAIL folder is probably hiding now as a subfolder in your Deleted Items folder. ------------------------------ Message: 3 Date: Wed, 3 Dec 2003 15:41:20 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Where are my e-mails? To: "'Tim Peters'" , "'Marv Beloff'" Cc: spambayes@python.org Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1CD@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset="us-ascii" [Tim] > By the way, when you deleted your folder, didn't Outlook > pop up a box asking you whether you really wanted to do that? > If it didn't, your old JUNK MAIL folder is probably hiding > now as a subfolder in your Deleted Items folder. I don't know about all versions of Outlook, but I get this "are you sure" warning even if I'm just moving the folder into the Deleted Items folder. It would seem pointless to add a second check, even if it was a simple addition to the plug-in. BTW, has anyone else noticed that in the last week or two these 'missing mail' posts have become quite common? (Although I note that it's never ended up being spambayes at fault). Maybe we need a FAQ for this, too? =Tony Meyer ------------------------------ Message: 4 Date: Wed, 3 Dec 2003 02:46:17 -0000 From: "Mathew Hendry" Subject: [Spambayes] Table munging defeats SpamBayes To: Message-ID: Content-Type: text/plain; charset="US-ASCII" Here's a new one on me. I spotted just now when looking through my spam corpus for "low scorers". The rendered version of the spam (In Outlook 2003 anyway) looks like this: ------------------------------begin All Rx Products Consultation at no cost No embarassing M.D. visits I want to know more ------------------------------end Most of that text is broken up into tiny pieces and inserted into tables, followed by a huge block filled mostly with randomized English text but also, at the end, containing the web site href and "I want to know more". SpamBayes gobbles up all the text inside the but doesn't spot the contents of the table because each apparent token is only 1-3 characters long. Fortunately the spammer scuppered s/h/itself a bit by including some spammy text in the block. That and my SpamAssassin/SpamCop mods pushed the score up 50%, so it didn't escape entirely. Return-Path: Delivered-To: MUNGED Received: (qmail 30533 invoked from network); 1 Dec 2003 19:56:10 -0000 Received: from unknown (HELO blade2.cesmail.net) (192.168.1.212) by blade4.cesmail.net with SMTP; 1 Dec 2003 19:56:10 -0000 Received: (qmail 27132 invoked from network); 1 Dec 2003 19:38:24 -0000 Received: from mailgate.cesmail.net (216.154.195.36) by blade2.cesmail.net with SMTP; 1 Dec 2003 19:38:22 -0000 Received: (qmail 12032 invoked from network); 1 Dec 2003 19:38:20 -0000 Received: from unknown (HELO mailgate.cesmail.net) (192.168.1.101) by mailgate.cesmail.net with SMTP; 1 Dec 2003 19:38:20 -0000 X-Message-Info: +va6sINXO+/559KzjEJWSK7H4PWBM17OSlLJppzGa8s= Received: from popgate.cesmail.net [192.168.1.201] by mailgate.cesmail.net with POP3 (fetchmail-6.2.1) for MUNGED (single-drop); Mon, 01 Dec 2003 14:38:20 -0500 (EST) Received: from mc3-f6.hotmail.com ([64.4.50.142]) by mc3-s4.hotmail.com with Microsoft SMTPSVC(5.0.2195.6713); Mon, 1 Dec 2003 10:09:31 -0800 Received: from mail.ganesparanx.com ([61.32.9.47]) by mc3-f6.hotmail.com with Microsoft SMTPSVC(5.0.2195.6713); Mon, 1 Dec 2003 10:09:14 -0800 Message-ID: Date: Mon, 01 Dec 2003 13:04:40 -0900 (EST) From: "sharaf his new book I mostly specialize in creating a little magic in the" To: MUNGED Subject: peqm Prescribed meds from hoomme ddm MIME-Version: 1.0 Content-Type: text/html X-OriginalArrivalTime: 01 Dec 2003 18:09:15.0515 (UTC) FILETIME=[3B0FECB0:01C3B836] X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on blade4 X-Spam-Level: ******** X-Spam-Status: hits=8.3 tests=FORGED_HOTMAIL_RCVD,HTML_90_100,HTML_MESSAGE, HTML_MIME_NO_HTML_TAG,LARGE_HEX,MIME_HTML_NO_CHARSET,MIME_HTML_ONLY, USERPASS version=2.60 X-SpamCop-Checked: 192.168.1.212 216.154.195.36 192.168.1.101 192.168.1.201 64.4.50.142 61.32.9.47 X-SpamCop-Disposition: Blocked bl.spamcop.net
Al
  
Co
No

  
ns
 e
Rx 
   
ult
mba
Pr
  
at
ra
odu
   
ion
ssi
ct
  
 a
ng

  

 M
 
 
n
.
  
  

D.
< /tr>
   
   
cos
 vi
 
 
t
s
 
 
 
i
   
   
   
ts 
------------------------------ Message: 5 Date: Wed, 3 Dec 2003 15:52:36 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Restore from Spam To: "'Mark Sears'" , Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1CF@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset="us-ascii" > I click the recover button and the message is sent to > the inbox of my personal folders. It's not recovered to the > imap folder in which the message originated like I'd expect it to be. > > Is this a known issue? Is there a work around? Or maybe I've > misconfigured something? IIRC, the plug-in 'recovers' the mail to the Inbox when something goes wrong trying to recover it to the original folder, or when saving the information about which folder was the original. I suspect that one of these is the case here (I seem to recall IMAP being difficult in this respect). Would you be able to attach your latest log file (instructions in the troubleshooting guide)? This will have the information required to see what is going wrong. =Tony Meyer ------------------------------ Message: 6 Date: Tue, 2 Dec 2003 22:13:09 -0500 From: "Tim Peters" Subject: RE: [Spambayes] Where are my e-mails? To: "Tony Meyer" , "'Marv Beloff'" Cc: spambayes@python.org Message-ID: Content-Type: text/plain; charset="iso-8859-1" [Tony Meyer] > I don't know about all versions of Outlook, but I get this "are you > sure" warning even if I'm just moving the folder into the Deleted > Items folder. I do not (Outlook 2000 SP3), although there are well-hidden options in Outlook that can affect when and whether you get warnings. > It would seem pointless to add a second check, even if > it was a simple addition to the plug-in. So maybe Marv disabled Outlook's warning and wants us to change his mind . > BTW, has anyone else noticed that in the last week or two these > 'missing mail' posts have become quite common? (Although I note that > it's never ended up being spambayes at fault). Yup, and it's pretty mysterious to see so many pop up all of a sudden. > Maybe we need a FAQ for this, too? I don't know how to answer that FAQ yet: where *do* people find their email in the end? Is there a pattern to it? I haven't seen many followup msgs from people with this complaint. ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 12 ***************************************** ------------------------------ Message: 2 Date: Wed, 3 Dec 2003 16:25:39 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Where are my e-mails? To: "'Tim Peters'" , "'Marv Beloff'" Cc: spambayes@python.org Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1D0@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset="us-ascii" > > BTW, has anyone else noticed that in the last week or two these > > 'missing mail' posts have become quite common? (Although I > > note that it's never ended up being spambayes at fault). > > Yup, and it's pretty mysterious to see so many pop up all of a sudden. Especially since it's so long since the last release! > > Maybe we need a FAQ for this, too? > > I don't know how to answer that FAQ yet: where *do* people > find their email in the end? Is there a pattern to it? I > haven't seen many followup msgs from people with this complaint. I've seen a few (people replying off-list), although there's not a clear pattern. I've added a FAQ anyway, since I was doing a 'how to uninstall' one anyway; as we find out more information we can add to it. If anyone is reading this that has 'lost' mail and then found it again - it would be great to hear from you if there's more to finding it than is in the FAQ. =Tony Meyer ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 13 ***************************************** ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 14 ***************************************** ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 15 ***************************************** From tameyer at ihug.co.nz Tue Dec 2 22:33:32 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Dec 2 22:33:39 2003 Subject: [Spambayes] IMAP or Netscape Quirk In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130447801C@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1D2@its-xchg4.massey.ac.nz> [Jacob Farmer] > I occasionally see this myself, and the nearest thing I can > figure, is that I happened to look at the Inbox while > Spambayes was training or classifying. It appears to > download each message, delete it, then put it back where it > belongs. Perhaps your IMAP server just takes a few seconds > to register the delete flag or else you mail client doen't > indicate when it is set. I suspect that this is the case. The imapfilter does, indeed, create new copies of messages, and remove the old ones (well, it marks them for removal, anyway). The main problem is that IMAP doesn't have any facility to move a message from one folder to another (only to delete the message and add it to the new location). I'm afraid I can't really see any way around this; some of the duplication might be removed in the future (those that result from storing an id with the message), but those from moving messages to new folders seem unavoidable (if you look closely, even the mail clients do this). =Tony Meyer From tameyer at ihug.co.nz Tue Dec 2 22:36:58 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Dec 2 22:37:04 2003 Subject: FW: [Spambayes] Where are my e-mails? Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1D5@its-xchg4.massey.ac.nz> [Marv Beloff, offlist] > Thanks! I found 97 e-mails in a sub folder > under Deleted Items. I guess that was my problem > since I had two JUNK MAIL folders. One mystery solved, at least :) =Tony Meyer From david.matos at comcast.net Tue Dec 2 23:07:18 2003 From: david.matos at comcast.net (David Matos) Date: Tue Dec 2 23:07:25 2003 Subject: [Spambayes] Please Kill the "Blocked Mail Notification" In-Reply-To: Message-ID: <000001c3b952$f1ea8040$8d80b042@dexter> Can one of the moderators do something about the "Blocked Mail Notification" from sysadmin@scr.siemens.com ? -----Original Message----- From: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org] On Behalf Of spambayes-request@python.org Sent: Tuesday, December 02, 2003 10:28 PM To: spambayes@python.org Subject: Spambayes Digest, Vol 64, Issue 16 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. RE: SpamBayes Q: Backup of PST file (Tony Meyer) 2. Blocked Mail Notification (sysadmin@scr.siemens.com) ---------------------------------------------------------------------- Message: 1 Date: Wed, 3 Dec 2003 16:26:54 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] SpamBayes Q: Backup of PST file To: "'Tim Peters'" , "'Cindy Peyser'" Cc: spambayes@python.org Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1D1@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset="us-ascii" [Tim] > Anyone? I run from CVS, so have no idea what the binary > installer sets up here (assuming the poster used a binary > installer, which seems too likely to question ). This > one is nearly a FAQ lately! Agreed, and done: (As always, someone correct me if I've made any mistakes in there! I run from CVS, too, although I've tested the binary on occasions.) =Tony Meyer ------------------------------ Message: 2 Date: Tue, 02 Dec 2003 22:27:22 -0500 From: sysadmin@scr.siemens.com Subject: [Spambayes] Blocked Mail Notification To: spambayes@python.org Message-ID: <200312030327.hB33RR308693@scr.siemens.com> Content-Type: text/plain; charset="iso-8859-1" ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Tue, 02 Dec 2003 22:27:18 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB33RH308689 for ; Tue, 2 Dec 2003 22:27:17 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte2.siemens.com ([212.114.202.115]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB33RIvd006369 for ; Tue, 2 Dec 2003 22:27:19 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte2.siemens.com (8.11.6/8.11.6) with ESMTP id hB33R3F11742 for ; Wed, 3 Dec 2003 04:27:03 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1ARNfJ-0006WF-TD; Tue, 02 Dec 2003 22:27:01 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 15 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Tue, 02 Dec 2003 22:27:01 -0500 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Blocked Mail Notification (sysadmin@scr.siemens.com) ---------------------------------------------------------------------- Message: 1 Date: Tue, 02 Dec 2003 22:26:49 -0500 From: sysadmin@scr.siemens.com Subject: [Spambayes] Blocked Mail Notification To: spambayes@python.org Message-ID: <200312030326.hB33Qp308682@scr.siemens.com> Content-Type: text/plain; charset="iso-8859-1" ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Tue, 02 Dec 2003 22:26:45 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB33Qi308678 for ; Tue, 2 Dec 2003 22:26:44 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte5.siemens.com (lte5.siemens.com [217.194.35.73]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB33Qkvd006360 for ; Tue, 2 Dec 2003 22:26:47 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte5.siemens.com (8.11.6/8.11.2) with ESMTP id hB33QVD28267 for ; Wed, 3 Dec 2003 04:26:32 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1ARNep-0006FL-1w; Tue, 02 Dec 2003 22:26:31 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 14 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Tue, 02 Dec 2003 22:26:31 -0500 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Blocked Mail Notification (sysadmin@scr.siemens.com) ---------------------------------------------------------------------- Message: 1 Date: Tue, 02 Dec 2003 22:26:20 -0500 From: sysadmin@scr.siemens.com Subject: [Spambayes] Blocked Mail Notification To: spambayes@python.org Message-ID: <200312030326.hB33QL308674@scr.siemens.com> Content-Type: text/plain; charset="iso-8859-1" ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Tue, 02 Dec 2003 22:26:16 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB33QF308669 for ; Tue, 2 Dec 2003 22:26:15 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte5.siemens.com (lte5.siemens.com [217.194.35.73]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB33QHvd006352 for ; Tue, 2 Dec 2003 22:26:18 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte5.siemens.com (8.11.6/8.11.2) with ESMTP id hB33Q2D28057 for ; Wed, 3 Dec 2003 04:26:02 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1ARNeG-0005w2-Ec; Tue, 02 Dec 2003 22:25:56 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 13 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Tue, 02 Dec 2003 22:25:56 -0500 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Blocked Mail Notification (sysadmin@scr.siemens.com) 2. RE: Where are my e-mails? (Tony Meyer) ---------------------------------------------------------------------- Message: 1 Date: Tue, 02 Dec 2003 22:13:38 -0500 From: sysadmin@scr.siemens.com Subject: [Spambayes] Blocked Mail Notification To: spambayes@python.org Message-ID: <200312030313.hB33Dh308556@scr.siemens.com> Content-Type: text/plain; charset="iso-8859-1" ************* eManager Notification ************** Recipient, Content filter has detected a sensitive e-mail. Source mailbox: "spambayes-bounces@python.org" Destination mailbox(es): "spambayes@python.org" ******************* End of message ******************* -------------- next part -------------- Received: from 129.73.8.34 by postoffice.scr.siemens.com (InterScan E-Mail VirusWall NT); Tue, 02 Dec 2003 22:13:35 -0500 Received: from idmz1.scr.siemens.com ([129.73.8.9]) by scr.siemens.com (8.11.7/8.11.7) with ESMTP id hB33DY308549 for ; Tue, 2 Dec 2003 22:13:34 -0500 (EST) X-SCR-Return-Path: (as seen by idmz1.scr.siemens.com) Received: from lte2.siemens.com ([212.114.202.115]) by idmz1.scr.siemens.com (8.12.10/8.12.10) with ESMTP id hB33Davd006205 for ; Tue, 2 Dec 2003 22:13:36 -0500 (EST) Received: from mail.python.org (mail.python.org [12.155.117.29]) by lte2.siemens.com (8.11.6/8.11.6) with ESMTP id hB33DLF06802 for ; Wed, 3 Dec 2003 04:13:21 +0100 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.22) id 1ARNRz-0002XN-Ks; Tue, 02 Dec 2003 22:13:15 -0500 From: spambayes-request@python.org Subject: Spambayes Digest, Vol 64, Issue 12 To: spambayes@python.org Reply-To: spambayes@python.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-BeenThere: spambayes@python.org X-Mailman-Version: 2.1.4a0 Precedence: list List-Id: Discussion list for Pythonic Bayesian classifier List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: spambayes-bounces@python.org Errors-To: spambayes-bounces@python.org Message-Id: Date: Tue, 02 Dec 2003 22:13:15 -0500 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Cannot Get SpamBayes to Install (Joe Farrell) 2. RE: Where are my e-mails? (Tim Peters) 3. RE: Where are my e-mails? (Tony Meyer) 4. Table munging defeats SpamBayes (Mathew Hendry) 5. RE: Restore from Spam (Tony Meyer) 6. RE: Where are my e-mails? (Tim Peters) ---------------------------------------------------------------------- Message: 1 Date: Tue, 2 Dec 2003 21:33:17 -0500 From: "Joe Farrell" Subject: [Spambayes] Cannot Get SpamBayes to Install To: Message-ID: <000001c3b945$d139e460$6501a8c0@DELL> Content-Type: text/plain; charset="us-ascii" I have tried to install SB three times, and each time I got an error message at the end of the installation saying that it could not register the a .dll file called DLL/OCX. Has anyone had a similar experience? Do you know what the solution could be? ------------------------------ Message: 2 Date: Tue, 2 Dec 2003 21:36:09 -0500 From: "Tim Peters" Subject: RE: [Spambayes] Where are my e-mails? To: "Marv Beloff" Cc: spambayes@python.org Message-ID: Content-Type: text/plain; charset="iso-8859-1" [Marv Beloff] > I installed Spambayes a few weeks ago and reported to my cybersenior > group how pleased I was with it. ... > I created two folders as directed JUNK MAIL and JUNK MAYBE. > It was working wonderfully. I got a little careless and instead of > deleting JUNK MAIL as was my daily activity I deleted the JUNK MAIL > folder itself which caused me to have to go into Spambayes manager > and reinstall with your wizard. Is this the Outlook addin (there are many ways to use SpamBayes, and we're guessing about how you're using it)? If so, you should have been able to use the SpamBayes manager's Filtering tab to tell SpamBayes about your new JUNK MAIL folder (it may have had the same name to you, but it would have been an entirely different folder to Outlook). > I actually did this a couple of times and the last time was yesterday > afternoon. I received no e-mails for the next 24 hours. I contacted > friends and had them send me an e-mail. It's easier to send yourself email (just send an email to your own email address). > It did not make it through. So far, you haven't said anything to make me suspect that it's not just your ISP email account that isn't working. I'm not sure what "not make it through" means, exactly, either. It *sounds* like new email wasn't even downloaded from your ISP. > In order to again receive e-mails I uninstalled the Spambayes. Did that work? > My question is Where are my lost e-mails? Sorry, don't know. The Outlook addin never deletes email. You should try using Outlook's Advanced Find (Tools -> Advanced Find) function to search for one of them. The others will probably be found in the same place, assuming Outlook actually downloaded any of these emails to begin with. > I have a few important ones and orders and cannot find them anywhere. > Also, would like to suggest that you put a fix-in so that if one > inadvertently deletes a JUNK MAIL folder I would have to click on a > box that questions whether I really want to do this. Sorry, you'll have to ask Microsoft for that. Deleting, and moving, folders is *way* too easy to do my mistake in Outlook, but deleting and moving folders is strictly between you and Outlook -- Outlook doesn't ask SpamBayes whether it's OK. By the way, when you deleted your folder, didn't Outlook pop up a box asking you whether you really wanted to do that? If it didn't, your old JUNK MAIL folder is probably hiding now as a subfolder in your Deleted Items folder. ------------------------------ Message: 3 Date: Wed, 3 Dec 2003 15:41:20 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Where are my e-mails? To: "'Tim Peters'" , "'Marv Beloff'" Cc: spambayes@python.org Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1CD@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset="us-ascii" [Tim] > By the way, when you deleted your folder, didn't Outlook > pop up a box asking you whether you really wanted to do that? > If it didn't, your old JUNK MAIL folder is probably hiding > now as a subfolder in your Deleted Items folder. I don't know about all versions of Outlook, but I get this "are you sure" warning even if I'm just moving the folder into the Deleted Items folder. It would seem pointless to add a second check, even if it was a simple addition to the plug-in. BTW, has anyone else noticed that in the last week or two these 'missing mail' posts have become quite common? (Although I note that it's never ended up being spambayes at fault). Maybe we need a FAQ for this, too? =Tony Meyer ------------------------------ Message: 4 Date: Wed, 3 Dec 2003 02:46:17 -0000 From: "Mathew Hendry" Subject: [Spambayes] Table munging defeats SpamBayes To: Message-ID: Content-Type: text/plain; charset="US-ASCII" Here's a new one on me. I spotted just now when looking through my spam corpus for "low scorers". The rendered version of the spam (In Outlook 2003 anyway) looks like this: ------------------------------begin All Rx Products Consultation at no cost No embarassing M.D. visits I want to know more ------------------------------end Most of that text is broken up into tiny pieces and inserted into tables, followed by a huge block filled mostly with randomized English text but also, at the end, containing the web site href and "I want to know more". SpamBayes gobbles up all the text inside the but doesn't spot the contents of the table because each apparent token is only 1-3 characters long. Fortunately the spammer scuppered s/h/itself a bit by including some spammy text in the block. That and my SpamAssassin/SpamCop mods pushed the score up 50%, so it didn't escape entirely. Return-Path: Delivered-To: MUNGED Received: (qmail 30533 invoked from network); 1 Dec 2003 19:56:10 -0000 Received: from unknown (HELO blade2.cesmail.net) (192.168.1.212) by blade4.cesmail.net with SMTP; 1 Dec 2003 19:56:10 -0000 Received: (qmail 27132 invoked from network); 1 Dec 2003 19:38:24 -0000 Received: from mailgate.cesmail.net (216.154.195.36) by blade2.cesmail.net with SMTP; 1 Dec 2003 19:38:22 -0000 Received: (qmail 12032 invoked from network); 1 Dec 2003 19:38:20 -0000 Received: from unknown (HELO mailgate.cesmail.net) (192.168.1.101) by mailgate.cesmail.net with SMTP; 1 Dec 2003 19:38:20 -0000 X-Message-Info: +va6sINXO+/559KzjEJWSK7H4PWBM17OSlLJppzGa8s= Received: from popgate.cesmail.net [192.168.1.201] by mailgate.cesmail.net with POP3 (fetchmail-6.2.1) for MUNGED (single-drop); Mon, 01 Dec 2003 14:38:20 -0500 (EST) Received: from mc3-f6.hotmail.com ([64.4.50.142]) by mc3-s4.hotmail.com with Microsoft SMTPSVC(5.0.2195.6713); Mon, 1 Dec 2003 10:09:31 -0800 Received: from mail.ganesparanx.com ([61.32.9.47]) by mc3-f6.hotmail.com with Microsoft SMTPSVC(5.0.2195.6713); Mon, 1 Dec 2003 10:09:14 -0800 Message-ID: Date: Mon, 01 Dec 2003 13:04:40 -0900 (EST) From: "sharaf his new book I mostly specialize in creating a little magic in the" To: MUNGED Subject: peqm Prescribed meds from hoomme ddm MIME-Version: 1.0 Content-Type: text/html X-OriginalArrivalTime: 01 Dec 2003 18:09:15.0515 (UTC) FILETIME=[3B0FECB0:01C3B836] X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on blade4 X-Spam-Level: ******** X-Spam-Status: hits=8.3 tests=FORGED_HOTMAIL_RCVD,HTML_90_100,HTML_MESSAGE, HTML_MIME_NO_HTML_TAG,LARGE_HEX,MIME_HTML_NO_CHARSET,MIME_HTML_ONLY, USERPASS version=2.60 X-SpamCop-Checked: 192.168.1.212 216.154.195.36 192.168.1.101 192.168.1.201 64.4.50.142 61.32.9.47 X-SpamCop-Disposition: Blocked bl.spamcop.net
Al
  
Co
No

  
ns
 e
Rx 
   
ult
mba
Pr
  
at
ra
odu
   
ion
ssi
ct
  
 a
ng

  

 M
 
 
n
.
  
  

D.
< /tr>
   
   
cos
 vi
 
 
t
s
 
 
 
i
   
   
   
ts 
------------------------------ Message: 5 Date: Wed, 3 Dec 2003 15:52:36 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Restore from Spam To: "'Mark Sears'" , Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1CF@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset="us-ascii" > I click the recover button and the message is sent to > the inbox of my personal folders. It's not recovered to the > imap folder in which the message originated like I'd expect it to be. > > Is this a known issue? Is there a work around? Or maybe I've > misconfigured something? IIRC, the plug-in 'recovers' the mail to the Inbox when something goes wrong trying to recover it to the original folder, or when saving the information about which folder was the original. I suspect that one of these is the case here (I seem to recall IMAP being difficult in this respect). Would you be able to attach your latest log file (instructions in the troubleshooting guide)? This will have the information required to see what is going wrong. =Tony Meyer ------------------------------ Message: 6 Date: Tue, 2 Dec 2003 22:13:09 -0500 From: "Tim Peters" Subject: RE: [Spambayes] Where are my e-mails? To: "Tony Meyer" , "'Marv Beloff'" Cc: spambayes@python.org Message-ID: Content-Type: text/plain; charset="iso-8859-1" [Tony Meyer] > I don't know about all versions of Outlook, but I get this "are you > sure" warning even if I'm just moving the folder into the Deleted > Items folder. I do not (Outlook 2000 SP3), although there are well-hidden options in Outlook that can affect when and whether you get warnings. > It would seem pointless to add a second check, even if > it was a simple addition to the plug-in. So maybe Marv disabled Outlook's warning and wants us to change his mind . > BTW, has anyone else noticed that in the last week or two these > 'missing mail' posts have become quite common? (Although I note that > it's never ended up being spambayes at fault). Yup, and it's pretty mysterious to see so many pop up all of a sudden. > Maybe we need a FAQ for this, too? I don't know how to answer that FAQ yet: where *do* people find their email in the end? Is there a pattern to it? I haven't seen many followup msgs from people with this complaint. ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 12 ***************************************** ------------------------------ Message: 2 Date: Wed, 3 Dec 2003 16:25:39 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Where are my e-mails? To: "'Tim Peters'" , "'Marv Beloff'" Cc: spambayes@python.org Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1D0@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset="us-ascii" > > BTW, has anyone else noticed that in the last week or two these > > 'missing mail' posts have become quite common? (Although I > > note that it's never ended up being spambayes at fault). > > Yup, and it's pretty mysterious to see so many pop up all of a sudden. Especially since it's so long since the last release! > > Maybe we need a FAQ for this, too? > > I don't know how to answer that FAQ yet: where *do* people > find their email in the end? Is there a pattern to it? I > haven't seen many followup msgs from people with this complaint. I've seen a few (people replying off-list), although there's not a clear pattern. I've added a FAQ anyway, since I was doing a 'how to uninstall' one anyway; as we find out more information we can add to it. If anyone is reading this that has 'lost' mail and then found it again - it would be great to hear from you if there's more to finding it than is in the FAQ. =Tony Meyer ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 13 ***************************************** ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 14 ***************************************** ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 15 ***************************************** ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 16 ***************************************** From tim at fourstonesExpressions.com Tue Dec 2 23:13:37 2003 From: tim at fourstonesExpressions.com (Tim Stone) Date: Tue Dec 2 23:13:43 2003 Subject: [Spambayes] Please Kill the "Blocked Mail Notification" In-Reply-To: <000001c3b952$f1ea8040$8d80b042@dexter> References: <000001c3b952$f1ea8040$8d80b042@dexter> Message-ID: On Tue, 2 Dec 2003 23:07:18 -0500, David Matos wrote: > Can one of the moderators do something about the "Blocked Mail > Notification" > from sysadmin@scr.siemens.com ? Seriously, this is a good example of why a bouncing filter is a BAD idea... there's not much we can do about it. Seimens' filter is classifying some of the mail on this list as spam (imagine that) and doing the infamous and much debated (on this list) bounce operation. Someone on this list works at Seimens, and that person will either have to unsubscribe, or tell their email admin type people to fix their filter... -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun From tim.one at comcast.net Tue Dec 2 23:13:59 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 2 23:14:02 2003 Subject: [Spambayes] Please Kill the "Blocked Mail Notification" In-Reply-To: <000001c3b952$f1ea8040$8d80b042@dexter> Message-ID: [David Matos] > Can one of the moderators do something about the "Blocked Mail > Notification" from sysadmin@scr.siemens.com ? This isn't a moderated list. Proof: if it were, I would have blocked you from bothering everyone with 80KB of quoted digest email just to ask a one-sentence question . (BTW, I asked siemens to stop that silliness, but they didn't reply; then again, I haven't seen another one of those from them since I asked.) From tim at fourstonesExpressions.com Tue Dec 2 23:14:36 2003 From: tim at fourstonesExpressions.com (Tim Stone) Date: Tue Dec 2 23:14:43 2003 Subject: [Spambayes] Please Kill the "Blocked Mail Notification" In-Reply-To: <000001c3b952$f1ea8040$8d80b042@dexter> References: <000001c3b952$f1ea8040$8d80b042@dexter> Message-ID: On Tue, 2 Dec 2003 23:07:18 -0500, David Matos wrote: > Can one of the moderators do something about the "Blocked Mail > Notification" > from sysadmin@scr.siemens.com ? Train your spambayes... -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun From tim.one at comcast.net Tue Dec 2 23:26:18 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 2 23:26:21 2003 Subject: [Spambayes] Please Kill the "Blocked Mail Notification" In-Reply-To: Message-ID: > (BTW, I asked siemens to stop that silliness, but they didn't reply; > then again, I haven't seen another one of those from them since I > asked.) Oops! I have now. I set the list to bounce the siemens bounces back to siemens. I expect the entire Internet to collapse as a result, and then we won't be bothered by any spam any more. From sethg at GoodmanAssociates.com Tue Dec 2 23:45:24 2003 From: sethg at GoodmanAssociates.com (Seth Goodman) Date: Tue Dec 2 23:45:26 2003 Subject: [Spambayes] training problem? In-Reply-To: Message-ID: [Tim Peters] > What do you mean by false negative? We use it here to mean spam scoring > below your ham cutoff. That's exactly what I meant by it. I don't count an unsure as a false negative, and I don't mind seeing unsures. Most of the false negatives were spam that scored 0% or 1%. Incidentally, *all* of my ham scores either 0% or 1% with the great preponderance at 0%. That is why I later moved my ham threshold from 15 down to 5. [Tim Peters] > > 1) Initial training set 650 spam, 654 ham on 11-16-03. > > > > 2) Initial filter thresholds 90/15. > > So by "false negative" here you mean spam scoring below 15? If so, I have > no theory, as I see maybe one of those per month (with about 700 > emails per > day, including 200-250 daily spam). Exactly. At the outset of this experiment, a false negative was any spam scoring below 15. Most of them scored very close to 0, just like my ham. >From the numbers in my results table, you can see that I get between 5-10 of these per day with a spam load of around 140 per day. [Tim Peters] > > The two messages I posted about in this thread were just examples. > > One would have sufficed . Point well taken. Sorry for wasting everyone's bandwidth. [Tim Peters] > > 3) Train on any spam that scores below 50, any ham that scores above > > 15. Filter all unread mail after each training event to simulate > > If your spam cutoff is 90, why do you only train on spam scoring below 50? > Something doesn't sound right here. Yes, I agree it sounds fishy, but read on. My spam cutoff is 90, and I did decide to only train on spam that scored less than 50 for this run. In previous runs, I trained on all errors and all unsures using the default thresholds of 90/15. All the unsures were spam (every single one of them), and the unsures were numerous, so my spam database grew quickly. I trained extra ham periodically to balance it, but I still had a high false negative rate. This run was an experiment in training less (to see if that was the problem) by only training when the classifier was *really* wrong as opposed to training on all the unsures. I picked 50% as the cutoff for being really wrong since a completely "neutral" set of words would have an expected score of 50%. In any case, the experiment was a failure since my false negative rate is about the same as it was when I trained on all errors and all unsures. [Tim Peters] > Sorry, still don't know what you mean by false negative. If you meant the > conventional "scored below 15" (your former ham cutoff), yet > very, very few > of them scored between 5 and 15, it must mean that almost all of > your false > negatives are scoring below 5. Is that what you mean? Yes! It's sad but true. Most of these false negatives had the same scores as ham. [Tim Peters] > Ditto. My own FP and FN rates are trivial (I'm genuinely surprised to see > any spam in my Inbox, and shocked to see a ham in my Spam folder, using > cutoffs of 20 and 80). My Unsure rate (scores between 20 and 80) > is heading > toward 5% -- but I don't care (I review all my spam anyway, and I'm on > enough admin-type mailing lists that I get a ton of weird email -- I can't > myself decide whether fully half the stuff in my Unsure folder is "really > ham" or "really spam", and toss it untrained after mentally shrugging). I would be delighted if my system performed like that. Like you, I also don't care how many unsures I get. Since the system *says* it's unsure, I *will* look at those messages. I didn't track the number, but I think unsures amounted to 15-20% of my spam. The 5-10 spam in my Inbox with a score at or near zero, however, does bother me. A couple of them are "newsletters that won't quit" types, and I can understand the classifier having trouble with them. They don't have any sales jargon, they just don't bother with unsubscribe requests. If I wasn't into experimenting with SpamBayes like this, I would just kill them off by sender. However, some of the spam that scores near zero is the real stomach-emptying stuff that I would have guessed have enough spammy words to light up the magic light bulb very brightly. There's also the 419 stuff that SpamBayes does not seem to catch, for whatever reason. [Tim Peters] > Until we know you meant by false negative, none. If you're calling spam > that ends up Unsure "false negative", then reducing your spam > cutoff should > help. If you really are getting lots of spam scoring below 5, then that's > something I've never heard of before (anyone?). It looks like this is a case you've never seen before, which is not good news. I can send any files that you care to see and will do any experiments that you suggest. I have also retained the entire message stream for the duration of this experiment. My assumption is that my results have something to do with my training tactics: the initial training set size, the thresholds that trigger me to train, how far out of balance I let the databases get before I add more ham, etc. Do you think my initial training set (around 650 of each) was too large? The next time I start over, I plan to use thresholds of 80/5. Do you recommend any particular initial training set size? The other configuration stuff that may or may not matter is: - SpamBayes Outlook Plug-In 0.81, clean install - Outlook 2000 SP-3 - Windows 2000 Pro SP-4 - mail fetched from two POP3 servers every five minutes - Outlook rules move all legit mailing list stuff out of the Inbox - background mode set with start delay = 2.0 sec, delay between messages = 1.0 sec - only the Inbox is watched -- Seth Goodman Humans: personal replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From lylesj002 at hawaii.rr.com Wed Dec 3 01:34:30 2003 From: lylesj002 at hawaii.rr.com (Jerome Lyles) Date: Wed Dec 3 01:34:34 2003 Subject: [Spambayes] Re: Installing Spambayes In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130212B1BD@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130212B1BD@its-xchg4.massey.ac.nz> Message-ID: <200312022234.30051.lylesj002@hawaii.rr.com> On Sunday 30 November 2003 22:04, you wrote: > [Please always direct spambayes questions to the spambayes mailing list > , not individual developers.] I will. > > I wrote earlier about not being able to connect to localhost > > using spambayes to connect to my pop3 server. Now I have another > > problem. I uninstalled then attempted to reinstalled Spambayes. > > I cannot. The error message is: > > running install_lib > > creating /usr/lib/python2.3/site-packages/spambayes > > error: could not create '/usr/lib/python2.3/site-packages/spambayes': > > Permission denied > > This seems a fairly straight-forward message. You don't have permission to > create the directory named in the message. To install spambayes, you do > need that permission; either install it somewhere else, or get permission. It's spambayes setup script that installs spambayes in the directory above. If I wanted to use a different directory I would have to modify the script, is this what you advise me to do? > > I can install Spambayes and start the server as root but when > > I use a browser > > to go to http://localhost:8080/ I get the error message: > > Cannot connect to localhost > > Are you deliberately using port 8080, rather than the default of 8880? If > not, you may simply be using the wrong address. What does the console > window where spambayes is running have in it? > > =Tony Meyer I was using the wrong port number,thank you. When I try to save the configuration I get this error message: ------ File "/usr/lib/python2.3/site-packages/spambayes/Dibbler.py", line 267, in __init__ self.bind(port) File "/usr/lib/python2.3/asyncore.py", line 300, in bind return self.socket.bind(addr) File "", line 1, in bind error: (98, 'Address already in use') I'm using port 25 as my smtp port in spambayes configuration and port 110 as my pop3 port. I have four pop3 accounts, one of which I setup for spambayes to monitor. All four pop3 accounts are setup to use port 110. Which address then is already in use? From bspiker at stayfree.co.uk Wed Dec 3 02:02:32 2003 From: bspiker at stayfree.co.uk (Brian Spiker) Date: Wed Dec 3 02:02:41 2003 Subject: [Spambayes] Need to delete SPAM automagically... Message-ID: <017701c3b96b$6dd83130$0a00000a@HPa250n> I saw in the FAQ: 3.9 How can I configure SpamBayes to delete spam rather than moving it? Sorry, but you can't. However, Outlook has an excellent "auto-archive" facility which can be used to the same effect - simply configure auto-archive to periodically delete your Spam folder. It is recommended that you configure auto-archive to keep at least a few days of Spam around, should the SpamBayes database become corrupt and require you to perform a full re-train. This is fine except I don't want auto-archive turned on for all of my outlook folders which is what this does. I didn't see any means to limit it just to my SPAM folder. I'm being overrun by SPAM to the point where I'm tempted to have a new email account every two weeks. (My ISP my be on a SPAMmers hit list!). It is *VERY* tedious to always go and grab the 300+ emails and then delete them to then go to the deleted and remove them again! AAARRRGGGG!!! Can;'t we just execute a few of these SPAMmers just as an example on CNN? Cheers! Brian... From d-r-lewis at tiscali.co.uk Wed Dec 3 06:42:47 2003 From: d-r-lewis at tiscali.co.uk (David Lewis) Date: Wed Dec 3 06:42:45 2003 Subject: [Spambayes] Identified Spam Message-ID: <008b01c3b992$92e300a0$d8902e50@acer5gi5q0ubzj> Sorry if I am a bit stupid, but what actions actually happen when a Spam has been identified ? Does the sender / administrator of the Spam get a message telling them to stop, or telling them that the address was not valid etc ? I cannot find this on the information pages. Whilst I am getting Spam identified, into junk or suspect, the actual volume is not getting any less - in fact it is possibly increasing ! Any help / advice would be welcome. Thanks in advance. DAVID From Mark.Howells at softoption.com Wed Dec 3 06:50:42 2003 From: Mark.Howells at softoption.com (Mark Howells) Date: Wed Dec 3 06:52:04 2003 Subject: [Spambayes] Identified Spam Message-ID: <5846CF419D2EF5439036CC3126A3A995017B66@SOSERVER1.softoption.local> > -----Original Message----- > From: David Lewis [mailto:d-r-lewis@tiscali.co.uk] > > Sorry if I am a bit stupid, but what actions actually happen > when a Spam has been identified ? SpamBayes is a message classifier. Once it has classified the message as Ham/Suspect/Spam its job is done. > Does the sender / administrator of the Spam get a message telling them > to stop, or telling them that the address was not valid etc ? No. There's no practical way of achieving anything. Most spam has falsified reply/from etc headers and most replies would just bounce, causing even more traffic. > Whilst I am getting Spam identified, into junk or suspect, the actual > volume is not getting any less - in fact it is possibly increasing ! At least it's not cluttering your inbox any more.... Cheers Mark -- Outgoing mail is certified Virus Free. Checked by AVG Anti-Virus (http://www.grisoft.com). Version: 7.0.203 / Virus Database: 261.3.3 - Release Date: 12/2/2003 From rcoe at CambridgeMA.GOV Wed Dec 3 07:03:18 2003 From: rcoe at CambridgeMA.GOV (Coe, Bob) Date: Wed Dec 3 07:03:21 2003 Subject: [Spambayes] Please help run an Outlook plugin test Message-ID: I would greatly appreciate it if one or more readers of this list would test the following proposition for me: Start with a Windows XP SP1 computer without the Outlook plugin installed. (Uninstall it if necessary.) In HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager, set "SafeDllSearchMode"=dword:00000000 Install version 0.81 of the Outlook plugin, using the self-extracting load module as distributed. Reboot and start up Outlook. It should crash immediately. Uninstall the plugin. In HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager, set "SafeDllSearchMode"=dword:00000001 Install the plugin as before. Reboot and start up Outlook. It should run correctly. I don't know that the proposition is true, but a test I ran last night suggests that it may be. I've been trying to track down a persistent problem with the plugin ever since I converted two of my computers to XP last month. The above approach was suggested by someone whose Windows credentials are unusually good, so I'm hopeful that I may be onto something. Bob MIS Department, City of Cambridge 831 Massachusetts Ave, Cambridge MA 02139 ? 617-349-4217 ? fax 617-349-6165 From avi.ron at southstreetgroup.com Wed Dec 3 07:33:17 2003 From: avi.ron at southstreetgroup.com (Avi Ron) Date: Wed Dec 3 07:33:26 2003 Subject: [Spambayes] sw issues Message-ID: Hi, When I click on "Configuration Wizard..." button nothing happens. I have uniinstalled and reinstalled twice. I have about 1000+ email messages. Windows XP Home Office 2000, Outlook 2000 Spambeyes version 0.81 I don't know what to do. Help! Thanks, Avi Ron Regards Avi Ron South Street Group LLC 860-657-4754 X 305 WWW.SOUTHSTREETGROUP.COM -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3545 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031203/1c0f7f7d/winmail-0001.bin From rcoe at CambridgeMA.GOV Wed Dec 3 07:50:05 2003 From: rcoe at CambridgeMA.GOV (Coe, Bob) Date: Wed Dec 3 07:50:08 2003 Subject: [Spambayes] RE: Please help run an Outlook plugin test Message-ID: I should have specified that I'm referring to XP Professional. I don't know whether the test would make sense using the Home Edition. Bob > -----Original Message----- > From: Coe, Bob > Sent: Wednesday, December 03, 2003 7:03 AM > To: spambayes@Python.org > Subject: [Spambayes] Please help run an Outlook plugin test > > > I would greatly appreciate it if one or more readers of this > list would test the following proposition for me: > > Start with a Windows XP SP1 computer without the Outlook > plugin installed. (Uninstall it if necessary.) > In HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager, > set "SafeDllSearchMode"=dword:00000000 > Install version 0.81 of the Outlook plugin, using the > self-extracting load module as distributed. > Reboot and start up Outlook. It should crash immediately. > Uninstall the plugin. > In HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager, > set "SafeDllSearchMode"=dword:00000001 > Install the plugin as before. > Reboot and start up Outlook. It should run correctly. > > I don't know that the proposition is true, but a test I ran > last night suggests that it may be. I've been trying to track > down a persistent problem with the plugin ever since I > converted two of my computers to XP last month. The above > approach was suggested by someone whose Windows credentials > are unusually good, so I'm hopeful that I may be onto something. > > Bob > > MIS Department, City of Cambridge > 831 Massachusetts Ave, Cambridge MA 02139 ? 617-349-4217 ? fax 617-349-6165 From wim_leys at hotmail.com Wed Dec 3 08:38:54 2003 From: wim_leys at hotmail.com (Wim Leys) Date: Wed Dec 3 08:39:00 2003 Subject: [Spambayes] "No filterable mail items are selected" error with "undeliverable" e-mails Message-ID: Hi, Since yesterday, I receive these "undeliverable" messages. I think a spammer is using my e-mail address as a reply address. I wanted to train Spambayes to recognize these e-mails, but I get this "No filterable mail items are selected" message. And indeed, the Spam-column in Outlook is empty, thise messages seem to be different from "normal" e-mails. Is there any way I can pass this kind of e-mails to Spambayes, so they can be filtered out ? Kind regards Wim Van: System Administrator Verzonden: woensdag 3 december 2003 7:07 Aan: allss@kos.net; bunspec@kos.net; ckbrooks@kos.net; murraym@kos.net Onderwerp: Undeliverable: Straight talk about rc cars Uw bericht heeft enkele of alle geadresseerden niet bereikt. Onderwerp: Straight talk about rc cars Verzonden: 3/12/2003 8:05 De volgende geadresseerde(n) zijn niet bereikt: allss@kos.net op 3/12/2003 7:07 The e-mail system was unable to deliver the message, but did not report a specific reason. Check the address and try again. If it still fails, contact your system administrator. < barracuda.kos.net #5.0.0 X-Barracuda-Spam-Firewall; host 127.0.0.1[127.0.0.1] said: 550 5.7.1 Message content rejected, UBE, id=25761-13-8 (in reply to end of DATA command)> bunspec@kos.net op 3/12/2003 7:07 The e-mail system was unable to deliver the message, but did not report a specific reason. Check the address and try again. If it still fails, contact your system administrator. < barracuda.kos.net #5.0.0 X-Barracuda-Spam-Firewall; host 127.0.0.1[127.0.0.1] said: 550 5.7.1 Message content rejected, UBE, id=25761-13-8 (in reply to end of DATA command)> ckbrooks@kos.net op 3/12/2003 7:07 The e-mail system was unable to deliver the message, but did not report a specific reason. Check the address and try again. If it still fails, contact your system administrator. < barracuda.kos.net #5.0.0 X-Barracuda-Spam-Firewall; host 127.0.0.1[127.0.0.1] said: 550 5.7.1 Message content rejected, UBE, id=25761-13-8 (in reply to end of DATA command)> murraym@kos.net op 3/12/2003 7:07 The e-mail system was unable to deliver the message, but did not report a specific reason. Check the address and try again. If it still fails, contact your system administrator. < barracuda.kos.net #5.0.0 X-Barracuda-Spam-Firewall; host 127.0.0.1[127.0.0.1] said: 550 5.7.1 Message content rejected, UBE, id=25761-13-8 (in reply to end of DATA command)> _________________________________________________________________ Ken je het magazine Glamo al? http://www.msn.be/glamo From dbulgrien at vcsd.com Wed Dec 3 09:01:41 2003 From: dbulgrien at vcsd.com (Dennis W. Bulgrien) Date: Wed Dec 3 09:01:56 2003 Subject: [Spambayes] Re: Outlook plugin - training, database status References: <9891913C5BFE87429D71E37F08210CB91839FE@zeus.sfhq.friskit.com> Message-ID: Some confirmation that classifier does not automatically train on messages that it's sure about... SpamBayes Manager, Training database status, stated "Database only has... 7 good and 388 spam - you should consider performing additional training." Then I received 57 e-mail classified as: 2 ham, 54 uncertain, and 29 spam. SpamBayes Manager, Training database status still stated... "7 good and 388 spam" instead of what I had hoped would be... "9 good and 417 spam". After classifying the 54 uncertain as spam the numbers expectedly went to... "7 good and 442 spam". With the proposed idea the potentially advantageous numbers could have been... "9 good and 471 spam". "Piers Haken" wrote in message news:9891913C5BFE87429D71E37F08210CB91839FE@zeus.sfhq.friskit.com... I don't believe you need this. I think that the classifier automatically trains on messages as they arrive (or at least on messages that it's sure about). ... > -----Original Message----- > From: Moore, Paul [mailto:Paul.Moore@atosorigin.com] ... > One thing I don't see, however, is a means of confirming the > classifier's decisions as correct. ... From jacob-spambayes-list at statisticalanomaly.com Wed Dec 3 09:15:54 2003 From: jacob-spambayes-list at statisticalanomaly.com (Jacob Farmer) Date: Wed Dec 3 09:15:00 2003 Subject: [Spambayes] sb_imapfilter (sort of) hangs when launched with -b argument Message-ID: <3FCDF01A.3000507@statisticalanomaly.com> Hello everyone, I've been encountering a strange problem recently that probably warrants a bug report, but I'm at a loss of how to provide enough useful information to make one worthwhile. When I start sb_imapfilter with the -b argument to launch the web interface, the program starts as expected: suslik% ./sb_imapfilter.py -b SpamBayes IMAP Filter Beta1, version 0.1 (September 2003), using SpamBayes IMAP Filter Web Interface Alpha2, version 0.02 and engine SpamBayes Beta2, version 0.2 (July 2003). User interface url is http://localhost:8880/ It then launches lynx, which sits at "HTTP request sent; waiting for response." until it times out. If I Ctrl-C out of lynx, I'll get this message: Exiting via interrupt: 2 and then be dumped back to sb_imapfilter, which is still running. If I then Ctrl-Z the sb_imapfilter, bg it to start it again in the background, and then start lynx and point it to the same address, I get into the configuration interface without any trouble. I have no idea why this happens (or how I figured out the workaround, for that matter). Some system information: suslik% uname -X System = SunOS Node = suslik Release = 5.8 KernelID = Generic_108528-23 Machine = sun4u BusType = Serial = Users = OEM# = 0 Origin# = 1 NumCPU = 1 suslik% python -V Python 2.3.2 suslik% lynx -version Lynx Version 2.8.4rel.1 (17 Jul 2001) libwww-FM 2.14, SSL-MM 1.4.1, OpenSSL 0.9.5 Built on solaris2.7 Feb 26 2002 18:10:07 Any thoughts? Thanks! Jacob From dbulgrien at vcsd.com Wed Dec 3 09:27:11 2003 From: dbulgrien at vcsd.com (Dennis W. Bulgrien) Date: Wed Dec 3 09:27:24 2003 Subject: [Spambayes] Re: Outlook plugin - training, automatic References: <16E1010E4581B049ABC51D4975CEDB88619926@UKDCX001.uk.int.atosorigin.com> Message-ID: This is a wonderful idea! It would be terrific if a check box were added to the SpamBayes Manager, Advanced tab, which would cause automatic training on all ham left in InBox and all spam put in the spam folder. I emphasize something Mark Hammond said, "underestimating our own tool". If an e-mail is over a 90% spam limit, at 91%, the 9% that isn't classified as spam may contain clues as to how the SPAMming community is trying to outwit the population. For instance, the words "buy v iagra online" (the space is to help THIS email not to be classified as spam :-) is in the database. The SPAMmer sends "buy v.iagra online" (have you seen that technique?) which, lets assume, the classifier scores at 91% due to "buy" and "online". By automatically training on this correctly categorized email, the word "v.iagra" gets added to the database. Then if the SPAMmer tries to send "b.uy v.iagra o.nline" the classifier will recognize "v.iagra" from last time and have an advantage. Its morphs with changes in the black industry, sort of like artificial intelligence. "Delete As Spam" and "Recover from Spam" buttons correct any error; for those who don't care to check the spam folder, the feature can remain disabled. "Moore, Paul" wrote in message news:16E1010E4581B049ABC51D4975CEDB88619926@UKDCX001.uk.int.atosorigin.com... ... As I'm starting from a very small message base, I worry that correct classifications are still somewhat based on "luck", and training based on correct decisions would help to increase both my and the classifier's confidence level. ... From tstrobel at hcs.net Wed Dec 3 09:51:40 2003 From: tstrobel at hcs.net (Theresa Strobel) Date: Wed Dec 3 09:51:56 2003 Subject: [Spambayes] Junk e-mail suspect folder list Message-ID: <000201c3b9ac$f5a09080$6101a8c0@JSTOBEL2> I am currently using Spambayes and love it. But something strange happened. I had a junk mail suspect folder and it suddenly disappeared. How do I get it back? I like going through it and making sure that the ones that are spam are and the others are not. Can you tell me why this happened and how do I get it back? Thanks Theresa Strobel -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 145 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031203/5a56cd1c/attachment.gif From dbulgrien at vcsd.com Wed Dec 3 10:17:30 2003 From: dbulgrien at vcsd.com (Dennis W. Bulgrien) Date: Wed Dec 3 10:17:43 2003 Subject: [Spambayes] Re: Need to delete SPAM automagically... References: <017701c3b96b$6dd83130$0a00000a@HPa250n> Message-ID: I believe that if you turn Outlook auto-archive on (in Tools, Options, Other, AutoArchive...) you can then configure each and every folder to either archive or not. It appears that the folders that Outlook defaults to archive are Calendar, Deleted Items, Inbox, Sent Items, Tasks. Go to each of those, right click, properties, AutoArchive, uncheck "Clean out items older than" check box. On your SPAM folder, configure AutoArchive as desired. Won't it be the only folder that is archived (with the wonderful option to "Permanently delete old items"? "Brian Spiker" wrote in message news:017701c3b96b$6dd83130$0a00000a@HPa250n... ... Sorry, but you can't. However, Outlook has an excellent "auto-archive" facility which can be used to the same effect - simply configure auto-archive to periodically delete your Spam folder. ... This is fine except I don't want auto-archive turned on for all of my outlook folders which is what this does. I didn't see any means to limit it just to my SPAM folder. ... From nobody at spamcop.net Wed Dec 3 10:51:16 2003 From: nobody at spamcop.net (Seth Goodman) Date: Wed Dec 3 10:51:22 2003 Subject: [Spambayes] Re: Outlook plugin - training, database status In-Reply-To: Message-ID: [Dennis W. Bulgrien] > Some confirmation that classifier does not automatically train on > messages that > it's sure about... SpamBayes Manager, Training database status, stated Dennis, Your observations are correct. It appears that the configuration switch for "train on everything" is exposed for use in the POP3PROXY version (all other mailers besides Outlook), but not in the Outlook Plug-In. I would also like to experiment with it, but I use the plug-in like you. >From your numbers, you might consider training some additional ham into your databases. Other people have found poor results when the number of spam and ham messages trained is drastically different. -- Seth Goodman Humans: personal replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From david.matos at comcast.net Wed Dec 3 10:58:04 2003 From: david.matos at comcast.net (David Matos) Date: Wed Dec 3 10:58:05 2003 Subject: [Spambayes] Please Kill the "Blocked Mail Notification" In-Reply-To: Message-ID: <001701c3b9b6$3cb77bb0$8d80b042@dexter> Yikes! You got me on that 80 KB! I was half asleep and sick with a cold when I fired off that question last night. ;-p -----Original Message----- From: Tim Peters [mailto:tim.one@comcast.net] Sent: Tuesday, December 02, 2003 11:14 PM To: David Matos Cc: spambayes@python.org Subject: RE: [Spambayes] Please Kill the "Blocked Mail Notification" [David Matos] > Can one of the moderators do something about the "Blocked Mail > Notification" from sysadmin@scr.siemens.com ? This isn't a moderated list. Proof: if it were, I would have blocked you from bothering everyone with 80KB of quoted digest email just to ask a one-sentence question . (BTW, I asked siemens to stop that silliness, but they didn't reply; then again, I haven't seen another one of those from them since I asked.) From dbulgrien at vcsd.com Wed Dec 3 11:10:55 2003 From: dbulgrien at vcsd.com (Dennis W. Bulgrien) Date: Wed Dec 3 11:11:06 2003 Subject: [Spambayes] Re: Re: Outlook plugin - training, database status References: Message-ID: Thanks. I now understand a difference in POP3PROXY and Outlook. I too would REALLY like the configuration switch for additional training to be put into the Outlook plug-in. I'll wait patiently in great anticipation... "Seth Goodman" wrote in message news:MHEGIFHMACFNNIMMBACAEEMMGAAA.nobody@spamcop.net... ... Your observations are correct. It appears that the configuration switch for "train on everything" is exposed for use in the POP3PROXY version (all other mailers besides Outlook), but not in the Outlook Plug-In. ... From nobody at spamcop.net Wed Dec 3 11:15:38 2003 From: nobody at spamcop.net (Seth Goodman) Date: Wed Dec 3 11:15:43 2003 Subject: [Spambayes] Re: Outlook plugin - training, automatic In-Reply-To: Message-ID: [Dennis W. Bulgrien] > "Moore, Paul" wrote in message > news:16E1010E4581B049ABC51D4975CEDB88619926@UKDCX001.uk.int.atosorigin.com.. . What newsgroup was this in? I can't retrieve the message from the link (perhaps truncated?). I'm extremely intrigued by the possibilities of continuous training. Maybe it works better, maybe it doesn't. Does anyone have any experiences in this regard? In any case, with continuous training comes a continuously growing database. Mistakes in classification will stay in the database forever, as will forms of spam that are no longer common. I don't *know* that this is a serious problem, but intuition says it won't help anything. A smaller database should also learn faster than a larger one. I have put some ideas up on the SpamBayes Wiki at http://entrian.com/sbwiki/TrainingIdeas concerning automatic pruning of database entries for use with continuous training. I encourage you, anyone else who shares this interest and in particular any of the developers, to add comments to the Wiki, comment to the mailing list or comment to me off-line. I am willing to put work into this, write code and experiment, but I have no desire to waste time hashing out ideas that have already been explored before. Thanks in advance. -- Seth Goodman Humans: personal replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From Mark.Howells at softoption.com Wed Dec 3 11:26:42 2003 From: Mark.Howells at softoption.com (Mark Howells) Date: Wed Dec 3 11:27:48 2003 Subject: [Spambayes] Spambayes vs. Popfile and other Bayesian classifiers Message-ID: <5846CF419D2EF5439036CC3126A3A995017B68@SOSERVER1.softoption.local> Does anyone have any observations / opinions on the use of Spambayes compared with other classifiers. I use Spambayes at work and home and I'm very happy withthe results - however the quest for perfection is never quite finished ... ;) Popfile, in particular, looks good on paper as it does n-way classification. However I've not used it so can't coment of its effectiveness. Anyway, comments (subjective or otherwise) would be welcome. Cheers Mark -- Outgoing mail is certified Virus Free. Checked by AVG Anti-Virus (http://www.grisoft.com). Version: 7.0.203 / Virus Database: 261.3.3 - Release Date: 12/2/2003 From TiagoTiago at Globo.com Wed Dec 3 11:42:13 2003 From: TiagoTiago at Globo.com (Tiago Estill de Noronha) Date: Wed Dec 3 11:42:46 2003 Subject: ENC: [Spambayes] Re: Outlook plugin - training, automatic Message-ID: <000101c3b9bc$67c108c0$0860b7c8@virtua.com.br> ********************* Tiago Estill de Noronha TiagoTiago@Globo.com -=> -----Mensagem original----- -=> De: Tiago Estill de Noronha [mailto:TiagoTiago@Globo.com] -=> Enviada em: quarta-feira, 3 de dezembro de 2003 13:39 -=> Para: 'Dennis W. Bulgrien' -=> Assunto: RES: [Spambayes] Re: Outlook plugin - training, automatic -=> -=> -=> Automatic training? Yummy! -=> -=> -=> -=> ********************* -=> Tiago Estill de Noronha -=> TiagoTiago@Globo.com -=> -=> -=> -=> -----Mensagem original----- -=> -=> De: spambayes-bounces@python.org -=> -=> [mailto:spambayes-bounces@python.org] Em nome de Dennis -=> W. Bulgrien -=> Enviada em: quarta-feira, 3 de dezembro de -=> 2003 11:27 -=> Para: spambayes@python.org -=> Assunto: -=> [Spambayes] Re: Outlook plugin - training, automaticAuto -=> -=> -=> -=> -=> This is a wonderful idea! It would be terrific if a check -=> -=> box were added to the SpamBayes Manager, Advanced tab, -=> -=> which would cause automatic training on all ham left in -=> -=> InBox and all spam put in the spam folder. I emphasize -=> -=> something Mark Hammond said, "underestimating our own -=> -=> tool". If an e-mail is over a 90% spam limit, at 91%, the -=> -=> 9% that isn't classified as spam may contain clues as to -=> -=> how the SPAMming community is trying to outwit the -=> -=> population. For instance, the words "buy v iagra online" -=> -=> (the space is to help THIS email not to be classified as -=> -=> spam :-) is in the database. The SPAMmer sends "buy -=> -=> v.iagra online" (have you seen that technique?) which, lets -=> -=> assume, the classifier scores at 91% due to "buy" and -=> -=> "online". By automatically training on this correctly -=> -=> categorized email, the word "v.iagra" gets added to the -=> -=> database. Then if the SPAMmer tries to send "b.uy v.iagra -=> -=> o.nline" the classifier will recognize "v.iagra" from last -=> -=> time and have an advantage. Its morphs with changes in the -=> -=> black industry, sort of like artificial intelligence. -=> -=> "Delete As Spam" and "Recover from Spam" buttons correct -=> -=> any error; for those who don't care to check the spam -=> -=> folder, the feature can remain disabled. -=> -=> -=> -=> "Moore, Paul" wrote in message -=> -=> news:16E1010E4581B049ABC51D4975CEDB88619926@UKDCX001.uk.int. -=> atosorigin.com... -=> ... -=> As I'm starting from a very small message base, I worry -=> that correct classifications are still somewhat based on -=> "luck", and training based on correct decisions would help -=> to increase both my and the classifier's confidence level. ... -=> -=> -=> -=> -=> _______________________________________________ -=> Spambayes@python.org -=> -=> http://mail.python.org/mailman/listinfo/spambaye-=> s -=> Check the -=> -=> FAQ before asking: -=> http://spambayes.sf.net/faq.html -=> -=> --- -=> Incoming mail is certified Virus Free. -=> Checked by AVG anti-virus system (http://www.grisoft.com). -=> Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 -=> -=> -=> --- -=> Outgoing mail is certified Virus Free. -=> Checked by AVG anti-virus system (http://www.grisoft.com). -=> Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 -=> -=> --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 From lists at strisik.com Wed Dec 3 11:45:49 2003 From: lists at strisik.com (Peter Strisik) Date: Wed Dec 3 11:45:56 2003 Subject: [Spambayes] Spambayes vs. Popfile and other Bayesian classifiers In-Reply-To: <5846CF419D2EF5439036CC3126A3A995017B68@SOSERVER1.softoption.local> Message-ID: <0HPB009J6VWEYK@mmp-2.gci.net> Mark, I went back and forth among popfile itself, outclass, and spambayes. I use Outlook. I've settled on outclass (www.vargonsoft.com). It has been more reliable in classification for me than spambayes, I don't know why. I also like the "safe view" button that opens the message's source code in notepad so you can see what it is without opening it in Outlook. The only difficulty is that the author is currently preoccupied with fatherhood, so development has slowed significantly. Though, the current alpha release is working fine for me. .....Peter Mark Howells <> emailed on Wednesday, December 03, 2003 7:27 am: (at least in part)..... > Does anyone have any observations / opinions on the use of Spambayes > compared with other classifiers. I use Spambayes at work and home > and I'm very happy withthe results - however the quest for perfection > is never quite finished ... ;) > > Popfile, in particular, looks good on paper as it does n-way > classification. However I've not used it so can't coment of its > effectiveness. > > Anyway, comments (subjective or otherwise) would be welcome. > > Cheers > > Mark From dave at boost-consulting.com Wed Dec 3 12:24:44 2003 From: dave at boost-consulting.com (David Abrahams) Date: Wed Dec 3 12:24:52 2003 Subject: [Spambayes] Re: suggestions for training and filtering? References: <3FCD083E.7020206@statisticalanomaly.com> Message-ID: Jacob Farmer writes: > David, > > This has come up before, and I think the general solution is to get > your Spam:Ham ratio to about 1:1. My guess is your's is way off. > > I get nearly perfect results with about 600 of each, and I haven't > trained since. Which ones are you discarding? All that are classified as Spam? -- Dave Abrahams Boost Consulting www.boost-consulting.com From nobody at spamcop.net Wed Dec 3 12:30:27 2003 From: nobody at spamcop.net (Seth Goodman) Date: Wed Dec 3 12:30:30 2003 Subject: [Spambayes] Spambayes vs. Popfile and other Bayesian classifiers In-Reply-To: <0HPB009J6VWEYK@mmp-2.gci.net> Message-ID: [Peter Strisik] > I went back and forth among popfile itself, outclass, and > spambayes. I use > Outlook. I've settled on outclass (www.vargonsoft.com). It has been more > reliable in classification for me than spambayes, I don't know > why. I also > like the "safe view" button that opens the message's source code > in notepad > so you can see what it is without opening it in Outlook. Outclass does have some nice UI features, as does the parser in the underlying PopFile. Could you be more specific as to what you mean by "more reliable"? Though SpamBayes still produces a fair amount of false negatives for my mail stream, I have yet to experience a false positive. This is extremely important, as I *never* want to lose an important email in a large spam folder. Visually inspecting the large spam folder is not totally reliable. Could you state approximately what were your false negative rates, false positive rates and total spam load per day? Since Outclass (and PopFile) support whitelists and blacklists, how much use did you make of those features? -- Seth Goodman Humans: personal replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From nobody at spamcop.net Wed Dec 3 12:37:31 2003 From: nobody at spamcop.net (Seth Goodman) Date: Wed Dec 3 12:37:33 2003 Subject: [Spambayes] Re: suggestions for training and filtering? In-Reply-To: Message-ID: > Jacob Farmer writes: > > > David, > > > > This has come up before, and I think the general solution is to get > > your Spam:Ham ratio to about 1:1. My guess is your's is way off. > > > > I get nearly perfect results with about 600 of each, and I haven't > > trained since. > > Which ones are you discarding? All that are classified as Spam? > > -- > Dave Abrahams He says he isn't training at all anymore. My question for Jacob is what was the initial size of his training set and what were his criteria for training before he reached his present state? -- Seth Goodman Humans: personal replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From dave at boost-consulting.com Wed Dec 3 12:50:48 2003 From: dave at boost-consulting.com (David Abrahams) Date: Wed Dec 3 12:50:58 2003 Subject: [Spambayes] Re: suggestions for training and filtering? In-Reply-To: <3FCD083E.7020206@statisticalanomaly.com> (Jacob Farmer's message of "Tue, 02 Dec 2003 16:46:38 -0500") References: <3FCD083E.7020206@statisticalanomaly.com> Message-ID: Jacob Farmer writes: > David, > > This has come up before, and I think the general solution is to get > your Spam:Ham ratio to about 1:1. My guess is your's is way off. > > I get nearly perfect results with about 600 of each, and I haven't > trained since. Which ones are you discarding? All that are classified as Spam? -- Dave Abrahams Boost Consulting www.boost-consulting.com From JSahlman at dispec.com Wed Dec 3 12:53:46 2003 From: JSahlman at dispec.com (Sahlman, John) Date: Wed Dec 3 12:53:39 2003 Subject: [Spambayes] Server side setup? Message-ID: <05C61C52D7CAD211A7830008C7DF6F106F6362@DISABILITYINS01> I've been using spambayes recently with excellent results on my individual outlook account. I'd love to put it in place at the server level for our group of 38 folks. We use our webhost to collect mail and pull it down to our exchange server(5) using a product called popbeamer. I read the server side page on the site but wondered if theres anything more detailled on how I could incorporate this into our platform. Thanks for any input! John E. Sahlman, FLMI Disability Insurance Specialists, LLC (860)761-1864 (860)769-6986 Fax From lists at strisik.com Wed Dec 3 13:07:07 2003 From: lists at strisik.com (Peter Strisik) Date: Wed Dec 3 13:07:12 2003 Subject: [Spambayes] Spambayes vs. Popfile and other Bayesian classifiers In-Reply-To: Message-ID: <0HPB00IJLZNVU7@mmp-3.gci.net> Seth, I don't keep any stats on performance, though I do know that I average about 30 spam per day. By reliable, I mean consistent. Outclass keeps working with very occasional false negatives. Virtually no false positives. And it did this without having to do much training. With SpamBayes, I had to have messages available, worry about how many spam and ham I was using, etc. An I found that it drifted towards poorer performance missing spam messages requiring some retraining. Basically, I just experienced outclass working like I needed without effort or worry. I don't use any white/black list features in outclass, only the bayesian filtering. ......Peter -----Original Message----- From: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org] On Behalf Of Seth Goodman Outclass does have some nice UI features, as does the parser in the underlying PopFile. Could you be more specific as to what you mean by "more reliable"? Though SpamBayes still produces a fair amount of false negatives for my mail stream, I have yet to experience a false positive. This is extremely important, as I *never* want to lose an important email in a large spam folder. Visually inspecting the large spam folder is not totally reliable. Could you state approximately what were your false negative rates, false positive rates and total spam load per day? Since Outclass (and PopFile) support whitelists and blacklists, how much use did you make of those features? From dbulgrien at vcsd.com Wed Dec 3 13:37:22 2003 From: dbulgrien at vcsd.com (Dennis W. Bulgrien) Date: Wed Dec 3 13:37:34 2003 Subject: [Spambayes] Re: Re: Outlook plugin - training, automatic References: Message-ID: None other, sorry. The "newsgroup" reference is to none other than this e-mailing list which is being recorded on news://news.gmane.org/gmane.mail.spam.spambayes.general . There you can see the large e-mail discussion history. "Seth Goodman" wrote in message... > "Moore, Paul" wrote in message news:16E1010E4581B049ABC51D4975CEDB88619926@UKDCX001.uk.int.atosorigin.com.. From jacob-spambayes-list at statisticalanomaly.com Wed Dec 3 13:54:47 2003 From: jacob-spambayes-list at statisticalanomaly.com (Jacob Farmer) Date: Wed Dec 3 13:53:55 2003 Subject: [Spambayes] Re: suggestions for training and filtering? In-Reply-To: References: Message-ID: <3FCE3177.1070105@statisticalanomaly.com> Seth, I started out with about 300 of each. I would always train on ham and unsures, and I would delete the spam. However, as ham count in my database grew, I would classify some additional spam messages to keep the ratio even. When I did that, I tried to train on a block of about 100 messages (~3 days worth for me) at a time, so that I had a diverse enough sample to avoid skewing my results. Once I got to the point where most of my messages were being properly sorted, I just started deleting the spam. To be honest, I still train my unsures, but I get very, very few of them. In addition, if I notice the number of unsures (or even messages that should be spam being marked as ham), I'll start saving new spam and when I have enough to be at about a 1:1 ratio with my saved ham, I'll nuke the database and retrain it using the mail I've collected recently. This system has worked out really well for me so far. Jacob This has worked well for me so far. Seth Goodman wrote: > He says he isn't training at all anymore. My question for Jacob is what was > the initial size of his training set and what were his criteria for training > before he reached his present state? From rmalayter at bai.org Wed Dec 3 14:11:46 2003 From: rmalayter at bai.org (Ryan Malayter) Date: Wed Dec 3 14:11:49 2003 Subject: [Spambayes] training problem? Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01A74E94@cliff.bai.org> Seth Goodman wrote: > 2) What training tactics would you suggest that might work better? I've recently done a few things to balance my training ratio, and the initial results are encouraging. Have my "spam" folder, with 773 messages in it, all less than a month old. I then use Outlook to do a search of all my mail folders *except* my spam folder (this is easy in Outlook 2002 and up, because you can exclude individual folders from search), for all mail messages newer than a month old. I move the "cutoff date" on this search until the number of messages returned by the search is very close to the number in my spam folder. I then *copy* all the messages from this search into a temporary Outlook folder called "Ham for training". Then, I train on this folder and my spam folder, rebuilding the database from scratch. I set my thresholds to 20/80, and train appropriately on all spam or ham that falls in the middle spams. I'll add this to your Wiki... -ryan- From nobody at spamcop.net Wed Dec 3 14:32:33 2003 From: nobody at spamcop.net (Seth Goodman) Date: Wed Dec 3 14:32:40 2003 Subject: [Spambayes] training problem? In-Reply-To: <792DE28E91F6EA42B4663AE761C41C2A01A74E94@cliff.bai.org> Message-ID: [Ryan Malayter] > Have my "spam" folder, with 773 messages in it, all less than a month > old. I then use Outlook to do a search of all my mail folders *except* > my spam folder (this is easy in Outlook 2002 and up, because you can > exclude individual folders from search), for all mail messages newer > than a month old. I move the "cutoff date" on this search until the > number of messages returned by the search is very close to the number in > my spam folder. > > I then *copy* all the messages from this search into a temporary Outlook > folder called "Ham for training". Then, I train on this folder and my > spam folder, rebuilding the database from scratch. I set my thresholds > to 20/80, and train appropriately on all spam or ham that falls in the > middle spams. > > I'll add this to your Wiki... Thanks for the feedback and contribution to the Wiki. This is close to what I did on my previous run, but the results were not so good. That run was similar to what you did except that I used the default thresholds of 90/15 and the initial training set size was around 600 each spam and ham. Maybe using the lower spam threshold of 80 and training all the unsures is the important difference. I'll try that with my next run. OTOH, maybe my spam stream is just nasty. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From yatesg at bellsouth.net Wed Dec 3 14:48:57 2003 From: yatesg at bellsouth.net (Greg Yates) Date: Wed Dec 3 14:49:03 2003 Subject: [Spambayes] Outlook will only start in safe mode Message-ID: I've seen a few others with this problem, but nowhere did I find a solution. SpamBayes version 008.1 Outlook XP (Exchange server) I had successfully installed SpamBayes on two PCs at home, tried this at work and when I start Outlook (originally to configure after installing the adding) it "encountered a problem" and shut down. I can only start Outlook in safe mode now. I've tried uninstalling and re-installing SpamBayes several times but always I get the same thing. Outlook fails to start except in safe mode. I have rebooted :) but not re-installed Outlook or Windoze. Thanks Greg From nobody at spamcop.net Wed Dec 3 14:56:55 2003 From: nobody at spamcop.net (Seth Goodman) Date: Wed Dec 3 14:57:00 2003 Subject: [Spambayes] training problem? In-Reply-To: <792DE28E91F6EA42B4663AE761C41C2A01A74E94@cliff.bai.org> Message-ID: Ryan, Nice Wiki work. Another difference I see between your approach and my previous one is that you trained on 30 days worth of spam. I was afraid to do that since I get 140 spam/day, so 30 days worth is 4,200 messages. To get that much ham, I would need to go back almost 6 months. However, maybe that long of a history for spam is what it takes to get good detection. I'm amazed at your low unsure rate. When I originally trained on 650 spam and 650 ham, that amounted to about five days of spam and 26 days of ham. Now I'm wondering if the longer time frame for spam is the key. Does anyone have any thoughts on this? One small note: on the email list, you mentioned using thresholds of 80/20, but on the Wiki you said 90/10. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From dave at boost-consulting.com Wed Dec 3 15:33:49 2003 From: dave at boost-consulting.com (David Abrahams) Date: Wed Dec 3 15:33:58 2003 Subject: [Spambayes] Re: suggestions for training and filtering? References: Message-ID: "Seth Goodman" writes: >> Jacob Farmer writes: >> >> > David, >> > >> > This has come up before, and I think the general solution is to get >> > your Spam:Ham ratio to about 1:1. My guess is your's is way off. >> > >> > I get nearly perfect results with about 600 of each, and I haven't >> > trained since. Deleting all but 650 of each of my ham/spam, deleting my hammie.db and spambayes.messageinfo.db, and retraining caused me to almost immediately get a few spams classified as "ham 0.06". I haven't see spam classified as anything other than spam or unsure for quite some time. So I guess that didn't work out too well :( >> Which ones are you discarding? All that are classified as Spam? >> >> -- >> Dave Abrahams > > He says he isn't training at all anymore. You misunderstand my question. I mean: which messages get thrown out automatically? -- Dave Abrahams Boost Consulting www.boost-consulting.com From richie at entrian.com Wed Dec 3 16:30:44 2003 From: richie at entrian.com (Richie Hindle) Date: Wed Dec 3 16:30:55 2003 Subject: [Spambayes] sb_imapfilter (sort of) hangs when launched with -b argument In-Reply-To: <3FCDF01A.3000507@statisticalanomaly.com> References: <3FCDF01A.3000507@statisticalanomaly.com> Message-ID: Jacob, > When I start sb_imapfilter with the -b argument [...] launches lynx, > which sits at "HTTP request sent; waiting for response." until it times > out. This is down to Python's webbrowser module. To launch a URL from a Python program, you say "webbrowser.open(url)". With a graphical browser, the browser starts up and the Python program continues. But with a text-mode browser, the Python program blocks until the user exits the browser. The upshot is that you can't use -b with a text-mode browser. Sorry! -- Richie Hindle richie@entrian.com From skip at pobox.com Wed Dec 3 16:32:25 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Dec 3 16:32:31 2003 Subject: [Spambayes] training problem? In-Reply-To: References: <792DE28E91F6EA42B4663AE761C41C2A01A74E94@cliff.bai.org> Message-ID: <16334.22121.23862.352713@montanaro.dyndns.org> Seth> Nice Wiki work. Another difference I see between your approach Seth> and my previous one is that you trained on 30 days worth of spam. Seth> I was afraid to do that since I get 140 spam/day, so 30 days worth Seth> is 4,200 messages. To get that much ham, I would need to go back Seth> almost 6 months. However, maybe that long of a history for spam Seth> is what it takes to get good detection. I'm amazed at your low Seth> unsure rate. I really doubt that anyone needs to train on every single spam message which comes through in a 30-day period. Most spam probably comes from a small handful of cretins, and spam from the same cretin seems to arrive in bunches (gotta make full use of a new account before it gets shut off). Consequently, training on a single spammy unsure message is often sufficient to nudge several messages of of the unsure region and into spam territory. I've appended a small script I use to help decide which spams and hams that turn up "unsure" I should train on first/next. I run a mailbox through sb_filter.py like so: sb_filter.py ~/Mail/unsure | python ~/tmp/scan.py The scan.py script spits out the subject, message-id, date and classification headers sorted by score. By default, it only considers messages classified as "unsure". You can force it to consider any/all combinations though, e.g.: sb_filter.py ~/Mail/unsure | python ~/tmp/scan.py 'ham|spam|unsure' The idea is that you train on one or a few of your lowest scoring spams and/or highest scoring hams, save your unsure file, then run the above again. Any previously "unsure" spams which now show up at the spam end of things get ignored. Lather, rinse, repeat. When you're tired of the cleansing cycle (or your hair is squeaky clean), rename your unsure folder, e.g.: mv ~/Mail/unsure ~/Mail/unsure.save then train on it again, e.g.: formail -s procmail < ~/Mail/unsure.save The above commands are what I use in my Unix-y/procmail-ish/sb_filter-laden environment. You will obviously have to adjust them according to the needs of your environment, but the basic idea is the same everywhere. I think this process is even easier in the Outlook plugin. Sort your unsure folder by score, move a small number of the most out-of-whack messages where they belong, then reclassify your unsure folder. Skip #!/usr/bin/env python import sys, re, getopt msgid = date = cls = "" sub = "" scanfor = "unsure" opts, args = getopt.getopt(sys.argv[1:], "") if args: scanfor = '|'.join(args) info = [] for line in sys.stdin: if line.startswith("From "): msgid = date = cls = "" sub = "" elif line.lower().startswith("subject: "): sub = line.strip() elif line.lower().startswith("message-id: "): msgid = line.strip() elif line.lower().startswith("date: "): date = line.strip() elif line.lower().startswith("x-spambayes-classification: "): cls = line.strip() if re.search(scanfor, cls) is not None: prob = float(cls.split()[-1]) info.append((prob, (sub, date, msgid, cls))) date = msgid = cls = "" sub = "" info.sort() for (prob, (sub, date, msgid, cls)) in info: print print sub print date print msgid print cls From mwaxer at dcconvention.com Wed Dec 3 16:16:01 2003 From: mwaxer at dcconvention.com (Waxer, Michael) Date: Wed Dec 3 17:12:29 2003 Subject: [Spambayes] shared knowledge Message-ID: <4DCC2BD91F50AF4BB04DFF3F47228A7904270E@charlie.dcconvention.com> I was thinking about rolling out this product in a corporate setting. Until the software "learns" it really isn't very useful. So, I was wondering, is there a way to train it on one mailbox and then share that knowledge with other systems on the network? From spambayes at kungfoocoder.org Wed Dec 3 17:16:29 2003 From: spambayes at kungfoocoder.org (Paul Wagland) Date: Wed Dec 3 17:16:46 2003 Subject: [Spambayes] A proposal for mail filtering Message-ID: <57DC0EC6-25DE-11D8-94EB-000A95CD704C@kungfoocoder.org> Hi all, I currently use the IMAP filter program to do mail filtering, and have been running it in "learning" mode, that is, I specify the SPAM and HAM folders, and tell it to learn on them. My SPAM and HAM training folders used to correlate to my SPAM folder and my INBOX respectively. The problem with this is that, fortunately, I get much more ham than spam (please don't "fix" that ;-) ) and so my message counts were getting wildly out of synchronisation. So I have changed my HAM training folder to be my "Unsure" folder, doing a pseudo train on mistakes mode. The problem is, that this is still training on all of my spam, and so eventually my SPAM count will end up being too high as well. My suggestion is to implement some form of mistakes based training. My suggestion for this is as follows: (please feel free to jump in with improvements/criticisms/etc :-) ) In mistakes mode we still "train" on all messages, but we do not add the scores to either of ham or spam unless the message is being re-classified. When we detect that a message has been incorrectly classified then we increase the appropriate ham/spam score. To my way of thinking this means that we would then need to have five states associated with each message id. 1. Registered as HAM 2. Registered as SPAM 3. Registered as UNSURE 4. Trained as HAM 5. Trained as SPAM Then the state transitions would be as follow: [1,3] -> 5 : Add token scores to SPAM count [2,3] -> 4 : Add token scores to HAM count 4 -> 5 : Add token scores to SPAM count, subtract from HAM 5 -> 4 : Add token scores to HAM count, subtract from SPAM The last two transitions I would not expect to occur all that often, but people do make mistakes ;-) Since people really do appear to be of the opinion that it is better to have a balanced message count than an unbalanced one, maybe we could also automatically train on the last "x" HAM/SPAM (whichever needs to be "balanced") if the ratio of one to the other gets more than 1.5. So, what do you think? Good idea? Or am I just just smoking the good shit? :-) Cheers, Paul From atom at suspicious.org Wed Dec 3 17:20:09 2003 From: atom at suspicious.org (Atom 'Smasher') Date: Wed Dec 3 17:21:03 2003 Subject: [Spambayes] shared knowledge In-Reply-To: <4DCC2BD91F50AF4BB04DFF3F47228A7904270E@charlie.dcconvention.com> References: <4DCC2BD91F50AF4BB04DFF3F47228A7904270E@charlie.dcconvention.com> Message-ID: > I was thinking about rolling out this product in a corporate setting. Until > the software "learns" it really isn't very useful. So, I was wondering, is > there a way to train it on one mailbox and then share that knowledge with > other systems on the network? ================================== if your corp privacy policy allows, i'd say copy *ALL* incoming mail during a 24 hour period into a separate box... manually sort it, then use it to train. then copy that DB into each user's environment, where they can re-train it against mistakes. apparently this type of filtering is most effective if each user decides for them self what is and isn't spam, but you might get away with a bulk initial training. ...atom _______________________________________________ PGP key - http://smasher.suspicious.org/pgp.txt 3EBE 2810 30AE 601D 54B2 4A90 9C28 0BBF 3D7D 41E3 ------------------------------------------------- "The U.S. government has become the most prolific user of doublespeak, especially when it comes to legislation that affects workers. 'right to work' = right to scab; 'paycheck protection' = cut unions out of politics; 'free trade' = give jobs to other countries." -- R.S. "Bo" Marlow, president, UAW Local 882 From rmalayter at bai.org Wed Dec 3 17:34:20 2003 From: rmalayter at bai.org (Ryan Malayter) Date: Wed Dec 3 17:34:27 2003 Subject: [Spambayes] training problem? Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01A74E99@cliff.bai.org> > I'm amazed at your low unsure rate. I get about 1-2 a day out of hundreds of messages in my unsure folder. I think the 80/20 settings have a lot to do with that, the range for unsures to occupy is narrower. I've never seen a ham score over 80, and I've only seen a few spams that score below 20, so I feel confident with the settings. I think another key is the training of all unsures as ham or spam, regardless of their score. You mentioned only training unsures that were less than 50% for some reason, I don't know why you would do that. Unsure means it falls somewhere in the middle, and intuitively I think training on it (in either direction) will improve the probabilites that those tokens will push future messages towards either end, making the tokens "less unsure", which is what you want when you train. > When I originally trained on 650 spam and 650 ham, that > amounted to about > five days of spam and 26 days of ham. Now I'm wondering if > the longer time > frame for spam is the key. Does anyone have any thoughts on this? If you really get 5 times as much spam as you do ham, then I think you should take a month's worth of ham, and a month's worth of spam. Find some way to randomly sub-sample the month's worth of spam down to a number similar to the number of spam. (Sorting by the Spam score previously assigned the messages and choosting the lowest 1/5 might be an interesting way to do this, and would have you training on the "sneakiest" of your spam). > One small note: on the email list, you mentioned using thresholds > of 80/20, but on the Wiki you said 90/10. I actually use 80/20, but I didn't want to put that in the Wiki, for fear of someone getting a false positive and calling me a jerk. At the time I was Wikiing, I thought, the more conservative the better. But maybe I should amend it to say what I actually use, and just warn the user. From tim at fourstonesExpressions.com Wed Dec 3 17:34:51 2003 From: tim at fourstonesExpressions.com (Tim Stone) Date: Wed Dec 3 17:34:59 2003 Subject: [Spambayes] A proposal for mail filtering In-Reply-To: <57DC0EC6-25DE-11D8-94EB-000A95CD704C@kungfoocoder.org> References: <57DC0EC6-25DE-11D8-94EB-000A95CD704C@kungfoocoder.org> Message-ID: On Wed, 3 Dec 2003 23:16:29 +0100, Paul Wagland wrote: > My suggestion is to implement some form of mistakes based training. This strategy has been debated on this list ad-infinitum. The people who currently write code for spambayes by-and-large believe that it is less valid than a training regimen that includes positive reinforcement of correct decisions made by spambayes, as well as proper training of unsures and correction of mistakes. There are test data that seem to indicate that mistakes only based training is deficient in terms of overall false positive/false negative rates. The statisticians in our number can address this a bit better, I think... That said... if you wish to contribute some code, we probably wouldn't turn it down... -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun From tameyer at ihug.co.nz Wed Dec 3 17:44:48 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Dec 3 17:44:58 2003 Subject: [Spambayes] A proposal for mail filtering In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130458F57C@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212B1DD@its-xchg4.massey.ac.nz> > The > problem is, that this is still training on all of my spam, and so > eventually my SPAM count will end up being too high as well. One option, of course, is to simply not train on as much. A lot of people are reporting good results with less training data. > 1. Registered as HAM > 2. Registered as SPAM > 3. Registered as UNSURE > 4. Trained as HAM > 5. Trained as SPAM FWIW, this information is already stored. In sb_server and sb_imapfilter it's in the messageinfo db, with sb_filter it's a combination of the X-SpamBayes-Trained and X-SpamBayes-Classification headers, and IIRC the plug-in records this as well, in its equivalent of the messageinfo db. > In mistakes mode we still "train" on all messages, but we do not add > the scores to either of ham or spam unless the message is being > re-classified. When we detect that a message has been incorrectly > classified then we increase the appropriate ham/spam score. To my way > of thinking this means that we would then need to have five states > associated with each message id. This wouldn't be hard to test in sb_imapfilter. There's a function (called Train(), I think) that trains all messages in a folder. For each message, it checks if it has already been trained, and if so, untrains it first. It then trains the message with the new classification. You could simply make this last step conditional on the first. (Not that I've tried this, but it sounds good ). > maybe we could also automatically train on the last "x" > HAM/SPAM (whichever needs to be "balanced") if the ratio of > one to the other gets more than 1.5. This wouldn't be that much harder to add, either. Similar things have been proposed in the past, and, IIRC, the main concern was that this would make it much harder to understand what the filter is doing, since it would be deciding what to train 'on it's own'. Anyway, if you feel like coding, hopefully this gives you some starters :) =Tony Meyer From spambayes at kungfoocoder.org Wed Dec 3 17:57:42 2003 From: spambayes at kungfoocoder.org (Paul Wagland) Date: Wed Dec 3 17:57:54 2003 Subject: [Spambayes] A proposal for mail filtering In-Reply-To: References: <57DC0EC6-25DE-11D8-94EB-000A95CD704C@kungfoocoder.org> Message-ID: <1A265C47-25E4-11D8-94EB-000A95CD704C@kungfoocoder.org> On Dec 3, 2003, at 23:34, Tim Stone wrote: > On Wed, 3 Dec 2003 23:16:29 +0100, Paul Wagland > wrote: > >> My suggestion is to implement some form of mistakes based training. > > This strategy has been debated on this list ad-infinitum. The people > who currently write code for spambayes by-and-large believe that it is > less valid than a training regimen that includes positive > reinforcement of correct decisions made by spambayes, as well as > proper training of unsures and correction of mistakes. Hmm. OK, I can see why this might be "interesting". I think that a rigourous testing regime for this could be quote difficult to setup... :-) I can see why it can be useful to "re-enforce" good training, since then it can help to pick up a slowly evolving corpus of HAM or SPAM. The reason that I am suggesting this is that I would really like to be able to just "set and forget" this thing :-) And so I would like some form of automatic training that is more optimal than the current built-in default, since that for most people is going to suffer from some horrible kind of sideways skew towards high SPAM or HAM counts. Perhaps it might be interesting to have another set of regions for the HAM/SPAM probabilities that we train on. Then we could positively re-enforce the database with known very safe HAM/SPAM, or maybe try to positively re-enforce it with marginally good HAM/SPAM (figuring that this would lead to the best overall improvement) The other thing that I have been noticing is that most people seem to say that a low message count is good. If I positively train on all my HAM/SPAM then I very quickly get quite large message counts. Maybe then we need some way to "retire" old tokens and/or messages. Something that I know cannot currently be done since we don't store any dates with the token information. Anyway, as to me submitting code... I will look into it, but am currently busy trying to get a FLAC codec for quicktime to work ;-) Cheers, Paul From tim at fourstonesExpressions.com Wed Dec 3 18:40:22 2003 From: tim at fourstonesExpressions.com (Tim Stone) Date: Wed Dec 3 18:40:50 2003 Subject: [Spambayes] A proposal for mail filtering In-Reply-To: <1A265C47-25E4-11D8-94EB-000A95CD704C@kungfoocoder.org> References: <57DC0EC6-25DE-11D8-94EB-000A95CD704C@kungfoocoder.org> <1A265C47-25E4-11D8-94EB-000A95CD704C@kungfoocoder.org> Message-ID: On Wed, 3 Dec 2003 23:57:42 +0100, Paul Wagland wrote: >>> My suggestion is to implement some form of mistakes based training. >> >> This strategy has been debated on this list ad-infinitum. The jury never was unanimous on this issue > I know cannot currently be done since we don't store any dates with the > token information. Now here's a strategy that seemed more promising, but we just never got around to it, iirc because the database size would grow considerably with dates stored... not that anyone REALLY cares about db size... > > Anyway, as to me submitting code... I will look into it, but am > currently busy trying to get a FLAC codec for quicktime to work ;-) Hmmm.... might be some anti-spam considerations there... -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun From tim at fourstonesExpressions.com Wed Dec 3 18:42:28 2003 From: tim at fourstonesExpressions.com (Tim Stone) Date: Wed Dec 3 18:42:44 2003 Subject: [Spambayes] A proposal for mail filtering In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130212B1DD@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130212B1DD@its-xchg4.massey.ac.nz> Message-ID: On Thu, 4 Dec 2003 11:44:48 +1300, Tony Meyer wrote: >> The >> problem is, that this is still training on all of my spam, and so >> eventually my SPAM count will end up being too high as well. > > One option, of course, is to simply not train on as much. A lot of > people > are reporting good results with less training data. > I'm having reasonable results with <100 ham and spam trained. I think that people's ham tends to be very homogenous. Certainly mine is... -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun From rcoe at CambridgeMA.GOV Wed Dec 3 19:21:53 2003 From: rcoe at CambridgeMA.GOV (Coe, Bob) Date: Wed Dec 3 19:21:56 2003 Subject: [Spambayes] RE: training problem? Message-ID: For all that work, the results had better be more than merely "encouraging". Or have you found a way to script it, so that you don't really have to do anything? Bob > -----Original Message----- > From: Ryan Malayter [mailto:rmalayter@bai.org] > Sent: Wednesday, December 03, 2003 2:12 PM > To: spambayes@python.org > Subject: RE: [Spambayes] training problem? > > > Seth Goodman wrote: > > 2) What training tactics would you suggest that might work better? > > I've recently done a few things to balance my training ratio, and the > initial results are encouraging. > > Have my "spam" folder, with 773 messages in it, all less than a month > old. I then use Outlook to do a search of all my mail folders *except* > my spam folder (this is easy in Outlook 2002 and up, because you can > exclude individual folders from search), for all mail messages newer > than a month old. I move the "cutoff date" on this search until the > number of messages returned by the search is very close to the number > in my spam folder. > > I then *copy* all the messages from this search into a temporary Outlook > folder called "Ham for training". Then, I train on this folder and my > spam folder, rebuilding the database from scratch. I set my thresholds > to 20/80, and train appropriately on all spam or ham that falls in the > middle spams. > > I'll add this to your Wiki... > > -ryan- From nobody at spamcop.net Wed Dec 3 19:42:09 2003 From: nobody at spamcop.net (Seth Goodman) Date: Wed Dec 3 19:42:09 2003 Subject: [Spambayes] training problem? In-Reply-To: <792DE28E91F6EA42B4663AE761C41C2A01A74E99@cliff.bai.org> Message-ID: [Ryan Malayter] > I think another key is the training of all unsures as ham or spam, > regardless of their score. You mentioned only training unsures that were > less than 50% for some reason, I don't know why you would do that. Because I got around 25 unsures per day. I was attempting to limit the growth of the database by only training on the spam that scored lowest. [Ryan Malayter] > Unsure means it falls somewhere in the middle, and intuitively I think > training on it (in either direction) will improve the probabilites that > those tokens will push future messages towards either end, making the > tokens "less unsure", which is what you want when you train. This is why I proposed on the Wiki the continuous train on everything approach with an automatic mechanism to prune the database of the tokens associated with the oldest trained messages. Realizing that some spam is "trickier" and doesn't occur as often, I also suggested that misclassified messages have their tokens stay longer according to the amount of misclassification. [Ryan Malayter] > If you really get 5 times as much spam as you do ham, then I think you > should take a month's worth of ham, and a month's worth of spam. Find > some way to randomly sub-sample the month's worth of spam down to a > number similar to the number of spam. (Sorting by the Spam score > previously assigned the messages and choosting the lowest 1/5 might be > an interesting way to do this, and would have you training on the > "sneakiest" of your spam). This is sort of what I was doing when, after initial training on all spam within a time window, I incrementally trained only on spam with a score lower than 50. So far, it hasn't worked any better. Skip also suggested a similar approach his suggestion was to do it incrementally, which would result in the minimum number of spam to get the desired token set. I was hoping to come up with an algorithm for continuous, automatic training that had the best properties of both of these methods without requiring periodically starting over. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From nobody at spamcop.net Wed Dec 3 23:01:49 2003 From: nobody at spamcop.net (Seth Goodman) Date: Wed Dec 3 23:01:52 2003 Subject: [Spambayes] training problem? In-Reply-To: <16334.22121.23862.352713@montanaro.dyndns.org> Message-ID: [Skip Montanaro] > The idea is that you train on one or a few of your lowest scoring spams > and/or highest scoring hams, save your unsure file, then run the above > again. Any previously "unsure" spams which now show up at the spam end of > things get ignored. Lather, rinse, repeat. When you're tired of the > cleansing cycle (or your hair is squeaky clean), rename your > unsure folder, OK, I've just done your process manually through the Outlook plug-in. I started with an initial training set of about 150 each of spam and ham (one day of spam and a week of ham from about a month ago). I then repeated filtered a corpus of about 4,500 spam and 1,500 ham (the ham goes back much further in time), added the highest scoring ham and lowest scoring spam to the training set (~50 messages at a time), retrained, filtered, repeat until brain dead. I did this until all ham scored 0 and all but three spam scored at least 90. The final training set was 525 ham and 548 spam. Therefore, a training set about 15% of the corpus size gave an a posteriori classification accuracy of 100% with only 0.05% unsures. Of course, the a priori performance can't stay that good, but it is still impressive. I have set my thresholds at 90/5 and will continue to train on all errors and unsures. I'll keep statistics and see how it goes. This does show, as you suggested, that a smaller subset of spam (and ham) can supply the tokens to get very good classification, at least a posteriori. Lets see how this works as an a priori predictor. I bet it will work great. Thanks for the training algorithm. I think the key is to start with a small initial training set and then continually add the "outliers". Doing this then turns some correctly classified messages into outliers, and you then add them to the training set and recurse until you have a good a posteriori classifier. If I did this with less than 50 messages at a time, I probably would have ended up with a smaller training set, but this was time intensive enough. If this turns out to be a good a priori predictor of spam/ham, this training method could be automated based on your scripts. Thanks again. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From atom at suspicious.org Thu Dec 4 03:11:09 2003 From: atom at suspicious.org (Atom 'Smasher') Date: Thu Dec 4 03:12:03 2003 Subject: [Spambayes] OT - broken spam Message-ID: i've been getting spam with "Re: %RND_UC_CHAR[2-8]," in the subject line for a few weeks now... http://smasher.suspicious.org/tmp/spam.png of course, my collection only goes back to when i started using spambayes and saving my spam. totally off topic, but amusing since someone really screwed up their spamming program before letting that one loose. ...atom _______________________________________________ PGP key - http://smasher.suspicious.org/pgp.txt 3EBE 2810 30AE 601D 54B2 4A90 9C28 0BBF 3D7D 41E3 ------------------------------------------------- "The only inherent sin in society lies in hurting others unnecessarily. Hurting yourself in not sinful - just dumb." -- Robert Heinlein From ttelesky at aurasound.com Thu Dec 4 08:25:57 2003 From: ttelesky at aurasound.com (Ted Telesky) Date: Thu Dec 4 08:26:10 2003 Subject: [Spambayes] Deleted Junk Mail folder Message-ID: I did a dumb thing, I though I was deleting a file but actually deleted my "Junk Mail" folder and can't undo. I tried re-installing the program, no change. Can I just create another Junk Mail folder or should I uninstall and re-install? Running Windows XP pro, Outlook 2000 Thanks Ted From skip at pobox.com Thu Dec 4 09:46:08 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Dec 4 09:46:14 2003 Subject: [Spambayes] OT - broken spam In-Reply-To: References: Message-ID: <16335.18608.485741.915181@montanaro.dyndns.org> atom> i've been getting spam with "Re: %RND_UC_CHAR[2-8]," in the atom> subject line for a few weeks atom> now... http://smasher.suspicious.org/tmp/spam.png Yup. There are several similar gaffs floating around. A quick grep for RANDOM in some older spam shows these interesting tokens: $RANDOM $RANDOMI RANDOM_WORD %RANDOM= %RANDOM_T= %RANDOM_TE= %RANDOM_W= %RANDOM_WO= %RANDOM_WOR= %RANDOMC3% {FROM_NAME}{RANDOM_MIXED|3}@fullpharm.org [RANDOMIZE][RANDOMIZE][RANDOMIZE][RANDOMIZE] and many other variations. Try grepping a largish spam collection for '%R[A-Z]*='. You get a lot of similar stuff which sort of makes their bug obvious: % find Set4 -type f | xargs egrep '%R[A-Z]*=' Set4/3797:ORD -->tercourse probleen obtain and maintain an ereblems report that this drug incking pleasure an TXT) and then gracefully exit. It's what I use. :) - Bill Yerazunis From mdhpub at blueyonder.co.uk Sun Dec 7 10:31:37 2003 From: mdhpub at blueyonder.co.uk (Mathew Hendry) Date: Sun Dec 7 10:31:41 2003 Subject: [Spambayes] Table munging defeats SpamBayes In-Reply-To: <200312071431.hB7EVUJ16836@localhost.localdomain> Message-ID: Bill Yerazunis wrote: >It's not python-based (i.e. you have to shell out to it) but > > lynx -dump -stdin > >will render from standard input to standard output (essentially HTML --> TXT) >and then gracefully exit. > >It's what I use. :) It's a nice idea but lynx doesn't seem to render these tables properly. IE, Mozilla and Outlook render it "as intended". Here's what lynx spits out: C:\>lynx -stdin -dump < spam.html Al Co No l ns e Rx ult mba Pr at ra odu ion ssi ct a ng s t M n . o D. cos vi t s i ts C:\> I've attached the table I'm testing with as a FYI, although I'm not sure if the mailing list will accept it. -- Mat. From sdubman at edumatch.com Sun Dec 7 13:34:20 2003 From: sdubman at edumatch.com (Sheila Dubman) Date: Sun Dec 7 13:34:25 2003 Subject: [Spambayes] Log files Message-ID: <000601c3bcf0$bb6f1ca0$6401a8c0@ne2.client2.attbi.com> I use Outlook 2000 under Windows 98. Spambayes seems to be setting up log files in c:\windows\temp. They are titled spambayes1.log, spambayes2.log, etc. Can I delete these when I clean out my temp folder? Thank you for providing such a wonderful solution to the spam problem. From tim.one at comcast.net Sun Dec 7 14:38:32 2003 From: tim.one at comcast.net (Tim Peters) Date: Sun Dec 7 14:38:31 2003 Subject: [Spambayes] Log files In-Reply-To: <000601c3bcf0$bb6f1ca0$6401a8c0@ne2.client2.attbi.com> Message-ID: [Sheila Dubman] > I use Outlook 2000 under Windows 98. > > Spambayes seems to be setting up log files in c:\windows\temp. They > are titled spambayes1.log, spambayes2.log, etc. > > Can I delete these when I clean out my temp folder? Yup. You *should* be able to delete everything in a folder named "temp" without harm, and this is no exception. SpamBayes writes information into .log files to help us diagnose problems, but the .log files have no other use -- if you don't have a problem, they're not needed. Note that because of the way Windows is designed, Windows may not *let* you delete a SpamBayes .log file while SpamBayes is running. The only solution to that is to close Outlook before trying to delete. > Thank you for providing such a wonderful solution to the spam problem. You're welcome. From tim.one at comcast.net Sun Dec 7 15:56:34 2003 From: tim.one at comcast.net (Tim Peters) Date: Sun Dec 7 15:56:36 2003 Subject: [Spambayes] Table munging defeats SpamBayes In-Reply-To: Message-ID: [Bill Yerazunis] >> It's not python-based (i.e. you have to shell out to it) but >> >> lynx -dump -stdin >> >> will render from standard input to standard output (essentially HTML >> --> TXT) and then gracefully exit. >> >> It's what I use. :) [Mathew Hendry] > It's a nice idea but lynx doesn't seem to render these tables > properly. IE, Mozilla and Outlook render it "as intended". > > Here's what lynx spits out: > > C:\>lynx -stdin -dump < spam.html > > Al > > Co > No > l > > [additional gibberish deleted] Rendering spam HTML is extraordinaly difficult, especially because spam is a percentage game and, e.g., spammers don't really care whether it renders as intended under lynx, or probably even under Mozilla anymore. If they can exploit IE/Outlook bugs, they hit the bulk of their potential buyers. Not that they *know* they're exploiting bugs -- just like most HTML coders, "if it shows up right in IE, it must be OK" is all they consider. Since the source code for Microsoft's HTML renderers is secret, and HTML has grown so sprawlingly complex, it's extraordinarily difficult to mimic what IE does in all cases (and overlooking that "IE" is ambiguous -- there are many versions of IE with many distinct bugs). I haven't yet seen enough spam exploiting table tricks to worry about it. The "white on white" (foreground color close to background color) text-hiding trick is still a much more common gimmick. It's not often effective against this kind of classifier, though, since the spammer can't cheaply guess words that are hammy to you. Cases where they stumble into hammy words by accident still get discussed here as if they were miracles of directed marketing <0.9 wink>. > I've attached the table I'm testing with as a FYI, although I'm not > sure if the mailing list will accept it. It didn't accept it, but I'm not sure why. The list options were set to discard MIME attacments of type text/html, but that's all. I've disabled that, since I don't share the hatred of HTML most list admins seems to suffer. BTW, Mailman has an option to convert HTML to plain text (which was also enabled, and which is why you don't see any HTML msgs in the spambayes archive). I don't know how it does it, or how faithful a translation it produces, or how robust it is against intentional extreme obfuscation (only spammers do that), or ... but it is coded in Python . From nobody at spamcop.net Sun Dec 7 17:51:45 2003 From: nobody at spamcop.net (Seth Goodman) Date: Sun Dec 7 17:51:48 2003 Subject: [Spambayes] feature request Message-ID: Here's an interesting feature request that comes out experimenting with training schemes. When you are in the unsure folder and hit either of the buttons "Delete As Spam" or "Recover from Spam", it would be great if you would re-filter the unsure folder. In fact, I would argue that the unsure folder should be re-classified after any training event. If you want to avoid unnecessary "overtraining" (training on messages whose tokens are already represented in sufficient number in the correct database), one good practice it to manually re-filter the unsure box after each additional message that you train on. Frequently, training one unsure message as spam will push the scores of other messages in the unsure folder well into the spam range, making it unnecessary to train on them. Since we probably don't remember to do this all the time (I sure don't), we wind up training on messages that would now classify properly, thus increasing the size of the (usually spam) database unnecessarily. Since many knowledgeable people on this list say that smaller databases seem to be better, which is reasonable, this feature would be an aid to extending the useful life of any particular training set. If you really want to make it slick, after re-filtering the unsure folder, move any messages out of that folder that re-classify as definite spam or ham. Ideally, these would be a user selectable parameters, but I feel both features would be excellent default behavior. One more note on the unsure folder is that one of the buttons is labeled "Recover from Spam". Since none of the messages in the unsure folder have been trained as spam, the "Recover from Spam" button is a bit misleading. Though this is the same button that appears in spam folders, thus making the code simpler, in the unsure folder it should probably be called "Train as Good" or "Keep as Good". -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From bob at 1776.com Sun Dec 7 18:14:16 2003 From: bob at 1776.com (Robert K. Coe) Date: Sun Dec 7 19:31:11 2003 Subject: [Spambayes] RE: Routine training on correctly classified email? In-Reply-To: Message-ID: <000101c3bd17$d64f6210$6501a8c0@CambridgeMA.gov> The problem with mistake-based training is that almost all mistakes are false negatives. And most of the messages that go to the "Indefinite" folder turn out to be spam. The result is that over time, the database becomes increasingly spam-heavy. This in turn degrades the reliability of the algorithm, according to the accepted wisdom. Obviously this doesn't constitute "definitive proof" that automatic training would be better. But it does argue for giving it a try. Bob MIS Department, City of Cambridge 831 Massachusetts Ave, Cambridge MA 02139 ? 617-349-4217 ? fax 617-349-6165 > -----Original Message----- > From: Kenny Pitt [mailto:kennypitt@hotmail.com] > Sent: Friday, December 05, 2003 3:55 PM > To: 'Eamon Egan'; spambayes@python.org > Subject: RE: [Spambayes] Routine training on correctly > classified email? > > > The Unix SpamBayes filter has an all-or-nothing option to train on all > messages that are classified as certain ham or certain spam, but this > is not currently supported for Outlook. > ... > > So far, we have no definitive proof that automatic training is any > better or worse than mistake-based training. I'm sure it depends a > lot on your particular mix of ham and spam. There is still a lot of > work to be done in determining if there is a "best" method of training. From bob at 1776.com Sun Dec 7 17:57:12 2003 From: bob at 1776.com (Robert K. Coe) Date: Sun Dec 7 19:31:13 2003 Subject: [Spambayes] RE: Outlook shuts down In-Reply-To: <000801c3bc1f$34f538c0$6400a8c0@gilles> Message-ID: <000001c3bd15$744a86f0$6501a8c0@CambridgeMA.gov> I had that exact problem after I converted a couple of my machines to Windows XP. Outlook was totally unusable, so I had to uninstall Spambayes. Then at a conference the other day, I thought I got a line on a registry hack that might be the answer. But when I reinstalled Spambayes (after a month of not using it), I couldn't reproduce the problem! I continue to believe that the behavior you describe reflects a serious bug in Spambayes (or at least in the Outlook plugin). But the sporadic nature of the symptoms makes it hard to compete for the developers' attention. Bob MIS Department, City of Cambridge 831 Massachusetts Ave, Cambridge MA 02139 ? 617-349-4217 ? fax 617-349-6165 > -----Original Message----- > From: North of Eden [mailto:north-of-eden@rogers.com] > Sent: Saturday, December 06, 2003 12:34 PM > To: spambayes@python.org > Subject: [Spambayes] Outlook shuts down > > > I've installed SB for Outlook 8.0 without problems but when > I open Outlook SB tries to configure but then Outlook closes. > I've rebooted and I've reinstalled SB without success. Why > does Outlook keep shutting down and how can I keep it open > long enough to get my mail? From tim.one at comcast.net Sun Dec 7 20:05:16 2003 From: tim.one at comcast.net (Tim Peters) Date: Sun Dec 7 20:05:17 2003 Subject: [Spambayes] RE: Outlook shuts down In-Reply-To: <000001c3bd15$744a86f0$6501a8c0@CambridgeMA.gov> Message-ID: [Robert K. Coe] > I had that exact problem after I converted a couple of my machines > to Windows XP. Outlook was totally unusable, so I had to uninstall > Spambayes. Then at a conference the other day, I thought I got a > line on a registry hack that might be the answer. But when I > reinstalled Spambayes (after a month of not using it), I couldn't > reproduce the problem! > > I continue to believe that the behavior you describe reflects a > serious bug in Spambayes (or at least in the Outlook plugin). But > the sporadic nature of the symptoms makes it hard to compete for > the developers' attention. If the developers never see the problem (they don't), and people don't submit their logs (and they don't), what do you expect the developers to do? Suggest something specific. Or, if you fancy yourself a Windows expert, cool, you have all our source code, *you* dig into it. Being able to see the problem occur is usually 95% of the battle. Because Outlook is closed-source, the only thing the plugin *can* do is use Microsoft's documented Outlook APIs. If those cause Outlook to hang, freeze, crash, or corrupt its .pst files, the bug(s) is(are) Microsoft's. Now in the Python world we have a long tradition of tiptoeing around known bugs in OSes and C libraries from all kinds of vendors, so pinning the blame isn't really interesting here. But even if it is a batch of MS bugs, we're never going to know what they are or how to worm around them until someone volunteers the effort to figure that out. > MIS Department, City of Cambridge If you're in MIS, my bet is that you run Windows Update and/or Office Update whenever another MS "security hole" is announced. Rarely a month passes when one isn't announced, so "after a month of not using [SpamBayes]" I'm guessing you installed updates to MS system software in the meantime. Face it: either you changed something, or you believe software is magic . From tim.one at comcast.net Sun Dec 7 20:15:22 2003 From: tim.one at comcast.net (Tim Peters) Date: Sun Dec 7 20:15:22 2003 Subject: [Spambayes] RE: Routine training on correctly classified email? In-Reply-To: <000101c3bd17$d64f6210$6501a8c0@CambridgeMA.gov> Message-ID: [Robert K. Coe] > The problem with mistake-based training is that almost all mistakes > are false negatives. And most of the messages that go to the > "Indefinite" folder turn out to be spam. I'll accept that both are true for your email so far, but there's no basis for assuming it's true of everyone. As a counterexample, most of my unsures over the last month have been ham. Different people get different kinds of email. My ham unsures lately usually come from the private python-help mailing list, where a wide variety of people I've never heard of before send questions on everything from Python through pythons to Monty Python. Some are barely able to write English (some don't even try), and many use free email accounts with auto-inserted ads at the bottom. That *is* ham to me, and I'm sure this same kind of unfocused mish-mash floods the inbox of anyone at the receiving end of a public admin or help-desk address. SpamBayes probably isn't optimal for my email mix, but I'm not going to change it to favor mine at the expense of yours. By the same token, I'm not going to change it to favor yours at the expense of mine. We've so far stuck to very general algorithms that strive to favor nothing. Qualitative results on any specific email mix will and do vary, and generalizing from one's own particular mix never leads to a truth. > The result is that over time, the database becomes increasingly > spam-heavy. This hasn't been tested properly, but I agree the bulk of self-selected reports have so far seemed to have an email mix more like yours than like mine. > This in turn degrades the reliability of the algorithm, according to the > accepted wisdom. That has been tested properly, and imbalance does hurt results with this algorithm. > Obviously this doesn't constitute "definitive proof" that automatic > training would be better. But it does argue for giving it a try. Yup! From mhammond at skippinet.com.au Sun Dec 7 20:21:32 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun Dec 7 20:21:47 2003 Subject: [Spambayes] feature request In-Reply-To: Message-ID: <000801c3bd29$9e176cf0$2c00a8c0@eden> [Rayfes] > I have a small feature request though. I am using Outlook > and my computer and speakers are always on but I leave my > monitor off when I'm not using my computer. When I receive > email I hear Outlook's new mail sound but I have no way of > telling whether the message is spam or not without > going to my computer and turning on the monitor. Is there > any way to have SpamBayes play a sound every time is marks > a message a spam and maybe a different sound for possible spam? > That way if I hear a new message come in I could > just wait a few seconds to hear whether it was marked spam. > If I happen to get multiple messages at time I may miss realizing > that some Ham messages came in with some spam but that's ok with me. > > What do you think? I think others might find that feature useful. That is a pretty good idea. If someone can nail down the exact feature request, I think we could add it. The issues as I see them: * Play the sound for *every* mail that SpamBayes processes, or only at the end of a "batch"? * If a batch, what exactly is a batch, considering the "background" filtering option? * If a batch and there are multiple classifications in the batch, do we play all sounds? * If so, in what order? * Sounds can be played synchronously, or asynchronously. Which one do we choose? If async and a sound is playing when we want to play a new one, do we stop the old one? If sync and many mails are processed, is it possible we may queue 10 minutes worth of sound-effects? * and probably others. Ultimately, these answers need to be expressed as a set of options that will exist in the INI file - from the above list, I doubt that 3 simple "spam/ham/unsure_sound_filename" options will do. [Manuel] > Concerning this feature, it has been discussed here last week. I'll find > that post in a jiffy: > There it is: Tony Meyers answer to the same request on "[Spambayes] > Outlook sound": > > I've added this to [ 774978 ] Hide envelope icon when only spam > > received, since it's essentially the same thing. It *might* get added It sounds slightly different - in that request, it sounds like a replacement for Outlooks 'play sound' is requested - ie, "play a single sound exactly once, but only when the new message is not spam", much in the same way that it is asking for a replacement for Outlooks 'tray icon', again with the qualification 'but only when the new message is not spam'. In 774978, we are being asked to re-implement concepts internal to Outlook, but external to SpamBayes. In the above request, it seems the concepts remain internal to SpamBayes, and make no reference to builtin Outlook functionality (other than possibly 'you may like to disable it' :). If that remains true, I'd be happy to see a different feature request opened. Mark. From tim.one at comcast.net Sun Dec 7 20:39:01 2003 From: tim.one at comcast.net (Tim Peters) Date: Sun Dec 7 20:39:02 2003 Subject: [Spambayes] RE: Routine training on correctly classified email? In-Reply-To: Message-ID: [Tim] > ... > My ham unsures lately usually come from the private python-help > mailing list, where a wide variety of people I've never heard of > before send questions on everything from Python through pythons > to Monty Python. Some are barely able to write English (some > don't even try), and many use free email accounts with auto-inserted > ads at the bottom. That *is* ham to me, and I'm sure this same > kind of unfocused mish-mash floods the inbox of anyone at the > receiving end of a public admin or help-desk address. Heh. This one came in while I was typing the above: hello im kinda lost on ur totoryal were dew i put all this info i meen wer dose it take place word pad calculator net were im not shure vplees help! Maybe I should redefine my notion of ham . From 20041231 at ariel.cotse.net Sun Dec 7 20:44:58 2003 From: 20041231 at ariel.cotse.net (David Smith) Date: Sun Dec 7 20:45:09 2003 Subject: [Spambayes] (no subject) Message-ID: I have just downloaded and installed Spambayes 008.1. I am using it in Outlook 2003 (11.5608.5703), running in Windows XP Pro SP1. I want to train it from scratch; I have no spam to offer it, so I am relying on it moving all incoming mail to 'Unsure' and then I take it from there. The first two e-mails to arrive went to 'Unsure'. Now, they are all staying in 'Inbox'. Why? I have tried uninstalling and re-installing -- no good. Whatever I do, the same pattern happens. Thanks. From skip at pobox.com Sun Dec 7 20:47:33 2003 From: skip at pobox.com (Skip Montanaro) Date: Sun Dec 7 20:47:45 2003 Subject: [Spambayes] RE: Routine training on correctly classified email? In-Reply-To: References: Message-ID: <16339.55349.796223.605343@montanaro.dyndns.org> Tim> Heh. This one came in while I was typing the above: Tim> hello im kinda lost on ur totoryal were dew i put all this info Tim> i meen wer dose it take place word pad calculator net were im Tim> not shure vplees help! Maybe I shouldn't have approved it. ;-) Skip From tim.one at comcast.net Sun Dec 7 20:59:12 2003 From: tim.one at comcast.net (Tim Peters) Date: Sun Dec 7 20:59:13 2003 Subject: [Spambayes] (no subject) In-Reply-To: Message-ID: [David Smith] > I have just downloaded and installed Spambayes 008.1. I am using it > in Outlook 2003 (11.5608.5703), running in Windows XP Pro SP1. I > want to train it from scratch; I have no spam to offer it, so I am > relying on it moving all incoming mail to 'Unsure' and then I take it > from there. > > The first two e-mails to arrive went to 'Unsure'. Now, they are all > staying in 'Inbox'. Why? Why not? Out of the box, SpamBayes has no preconceived notions of what ham and spam mean. You've now taught it that two things are ham, but that nothing is spam. That's now its *entire* knowledge of your world. For example, the simple fact that the messages you trained it on were addressed to you have so far taught it that "message addressed to David" is a 100% reliable indicator of ham. Until you teach it that *something* is spam, it's going to continue believing that, so every message addressed it to is going to look hammy. OTOH, if you get so little spam that you can't find any to train on, maybe you should just leave SpamBayes uninstalled . From mhammond at skippinet.com.au Sun Dec 7 21:26:08 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun Dec 7 21:26:23 2003 Subject: [Spambayes] Hotmail Confusion In-Reply-To: <1070664831.3fd10c7f5574a@www.mailshell.com> Message-ID: <000001c3bd32$a3f2d200$2c00a8c0@eden> > thanks for trying. Meanwhile let me ask this-- let's say I > only had a hotmail account that I view the normal way, > through the web browser. Will one of Spambayes' configuration > work with a web-based email account? You would be out of luck. For SpamBayes to work, you must user either a mail server that works with your version of Outlook, or a mail-server that supports the standard 'pop3' protocol. What this means in reality is that you need to use Outlook 2002 to use SpamBayes with hotmail. Mark. From spambayes at xo.mailshell.com Sun Dec 7 21:30:27 2003 From: spambayes at xo.mailshell.com (spambayes@xo.mailshell.com) Date: Sun Dec 7 21:30:31 2003 Subject: [Spambayes] Hotmail Confusion In-Reply-To: <000001c3bd32$a3f2d200$2c00a8c0@eden> Message-ID: <1070850627.3fd3e243be7ef@www.mailshell.com> Oh I have Outlook 2002, you probably came in on the conversation late. The problem is I can send and receive my hotmail account just fine in Outlook 2002, but for some reason Spambayes will only process spam one manually selected email at a time. >From Mark Hammond on 7 Dec 2003: > > thanks for trying. Meanwhile let me ask this-- let's say I > > only had a hotmail account that I view the normal way, > > through the web browser. Will one of Spambayes' configuration > > work with a web-based email account? > > You would be out of luck. For SpamBayes to work, you must user either a > mail server that works with your version of Outlook, or a mail-server > that > supports the standard 'pop3' protocol. What this means in reality is > that > you need to use Outlook 2002 to use SpamBayes with hotmail. > > Mark. > > From mhammond at skippinet.com.au Sun Dec 7 22:17:53 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun Dec 7 22:18:07 2003 Subject: [Spambayes] Hotmail Confusion In-Reply-To: <1070850627.3fd3e243be7ef@www.mailshell.com> Message-ID: <000b01c3bd39$de567760$2c00a8c0@eden> > Oh I have Outlook 2002, you probably came in on the conversation late. I was aware of that - however, I was replying to your: > > thanks for trying. Meanwhile let me ask this-- let's say I > > only had a hotmail account that I view the normal way, > > through the web browser. Will one of Spambayes' configuration > > work with a web-based email account? Note I paid attention to the 'let's say' bit. > The problem is I can send and receive my hotmail account just > fine in Outlook 2002, but for some reason Spambayes will only > process spam one manually selected email at a time. Try enabling the 'background filtering' option. Please let me know even if this *does* work - I am starting to believe that the 'background filtering' option should be the default. Mark. From spambayes at xo.mailshell.com Sun Dec 7 23:07:07 2003 From: spambayes at xo.mailshell.com (spambayes@xo.mailshell.com) Date: Sun Dec 7 23:07:10 2003 Subject: [Spambayes] Hotmail Confusion In-Reply-To: <000b01c3bd39$de567760$2c00a8c0@eden> Message-ID: <1070856427.3fd3f8eb09c60@www.mailshell.com> It was already checked actually. >From Mark Hammond on 7 Dec 2003: > > Oh I have Outlook 2002, you probably came in on the conversation late. > > I was aware of that - however, I was replying to your: > > > > thanks for trying. Meanwhile let me ask this-- let's say I > > > only had a hotmail account that I view the normal way, > > > through the web browser. Will one of Spambayes' configuration > > > work with a web-based email account? > > Note I paid attention to the 'let's say' bit. > > > The problem is I can send and receive my hotmail account just > > fine in Outlook 2002, but for some reason Spambayes will only > > process spam one manually selected email at a time. > > Try enabling the 'background filtering' option. Please let me know even > if > this *does* work - I am starting to believe that the 'background > filtering' > option should be the default. > > Mark. > > From 103.Willman.Rd at mxsf15.cluster1.charter.net Sun Dec 7 23:22:01 2003 From: 103.Willman.Rd at mxsf15.cluster1.charter.net (Ferrell Hurst) Date: Sun Dec 7 23:26:11 2003 Subject: [Spambayes] Unexperienced and wants to reinstall SpamBayes and Python Message-ID: <6.0.0.22.0.20031207214544.01ab5240@pop.charter.net> Hi all, Friday was a bad day for me I had to zero out my hard drive and start over. I was a happy user of SpamBayes because it worked great, unfortunately, I do not work as well as it did. I do not work with programing and cannot, on my own, reinstall SpamBayes and Python back onto my system. I do not even know how I originally contacted the person who helped me and I have forgotten his name and have lost all of my records of him and the installation. I am very good at forgetting names and numbers. This young man is of Oriental descent, lives in California and races cars and I hope he remembers me because I live in a race town (Talladega). He helped his father install SpamBayes on a machine without "Outlook" pryer to my installation and new exactly how to do it and was willing to call me by phone to expedite the installation and even provided the programs from his own web site. I do not have Outlook, I have Outlook Express but, I don't even use it because, I like Eudora's filtering much, much, more. My "OS" is "Windows XP Home". If he should read this or, anyone knows him please have him email me. If not and anyone else feels they can get me going again please feel free to contact me at: fhurst@charter.net . Your Friend, Ferrell From ta-meyer at ihug.co.nz Mon Dec 8 03:30:13 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 8 03:30:20 2003 Subject: [Spambayes] RE: sb_mailsort.py In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130458FDBA@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A01@its-xchg4.massey.ac.nz> [Atom] > i'm trying to find the author of "sb_mailsort.py"? > > might that be you? It might, but it isn't. The original author was Neil Schemenauer: =Tony Meyer From mhammond at skippinet.com.au Mon Dec 8 04:02:42 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Dec 8 04:02:59 2003 Subject: [Spambayes] Outlook: Setting background filtering as the default. Message-ID: <000801c3bd6a$0a90f140$2c00a8c0@eden> Hi all, I'm thinking of changing the Outlook addin such that the 'Background Filtering' option is enabled by default. From our bug reports and the mails here, it seems that enabling this option solves problems for many users, and I don't recall a single person reporting significant problems with the feature. If I do enable this option by default, it will not affect existing users - even when they upgrade to the (yet to be released) new version with the new default value. For existing users, the existing options file will be used, which will already define a setting for this option. Therefore, I ask existing Outlook users to change their existing configuration. I would appreciate it if existing users of the Outlook addin could perform the following steps: * Open the "SpamBayes Manager" via the toolbar. * Select the "Advanced" tab. * Select the "Enable Background Filtering" option. * Set the "Processing Start Delay" to 2 seconds. * Set the "Delay Between Processing Items" to 1 second. * Click OK. This will then configure SpamBayes as new users would see it. Please then wait a few days and see how things go. If this works fine for you, and you believe these are reasonable defaults (even if they are not the values you would choose to use), then please send just me (not the list) a quick mail saying so. If you see any problems, or believe these default values are not appropriate, then please CC your reply to the list. Thanks, Mark. From rmalayter at bai.org Mon Dec 8 09:21:18 2003 From: rmalayter at bai.org (Ryan Malayter) Date: Mon Dec 8 09:21:20 2003 Subject: [Spambayes] Can we do this? Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01A74FA5@cliff.bai.org> Set your ham and spam thresholds both to 50. Theoretically, only a message that scored exactly 50.0000 would ever go into the junk suspects folder. > -----Original Message----- > From: spambayes-bounces@python.org > [mailto:spambayes-bounces@python.org] On Behalf Of Bob Newman > Sent: Saturday, December 06, 2003 10:42 AM > To: spambayes@python.org > Subject: [Spambayes] Can we do this? > > I have been using SpamBayes for sometime & it works great. I > can't remember the last time that a message routed to the > "Junk Suspects" folder was not spam. Can we tell the program > to stop using the "Junk Suspects" folder & send every time > that would have gone there to the "Junk E-mail" folder instead? > > Thanks in advance... Bob > _______________________________________________ > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes > Check the FAQ before asking: http://spambayes.sf.net/faq.html > From dbulgrien at vcsd.com Mon Dec 8 09:22:33 2003 From: dbulgrien at vcsd.com (Dennis W. Bulgrien) Date: Mon Dec 8 09:22:40 2003 Subject: [Spambayes] FAQ 4.5 How do I train...(Outlook plugin)? Incremental Training Message-ID: http://spambayes.sourceforge.net/faq.html 4.5 How do I train SpamBayes (Outlook plugin)? "...If you have set it to use incremental training then it will also train on messages which are moved into the spam folder and those folders that you are 'watching'." Clarification of the FAQ. I suppose it means when messages are MANUALLY moved. Based on other posts, the messages that are automatically moved by Spambayes are not trained on. Clarification of the Manager dialog. SpamBayes Manager, Training tab, Incremental Training frame, two check boxes say "Train... when it is moved... to the Inbox" and "Train... when it is moved to the spam folder". I suppose the first is analogous to "Train... when it is moved... to those folders that you are 'watching'" From rmalayter at bai.org Mon Dec 8 09:27:16 2003 From: rmalayter at bai.org (Ryan Malayter) Date: Mon Dec 8 09:27:19 2003 Subject: [Spambayes] feature request Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01A74FA6@cliff.bai.org> > From: Seth Goodman > Here's an interesting feature request that comes out > experimenting with training schemes. When you are in the > unsure folder and hit either of the buttons "Delete As Spam" > or "Recover from Spam", it would be great if you would > re-filter the unsure folder. In fact, I would argue that the > unsure folder should be re-classified after any training > event. If you want to avoid unnecessary "overtraining" > (training on messages whose tokens are already represented in > sufficient number in the correct database), one good practice > it to manually re-filter the unsure box after each additional > message that you train on. Frequently, training one unsure > message as spam will push the scores of other messages in the > unsure folder well into the spam range, making it unnecessary > to train on them. Since we probably don't remember to do > this all the time (I sure don't), we wind up training on > messages that would now classify properly, thus increasing > the size of the (usually spam) database unnecessarily. Since > many knowledgeable people on this list say that smaller > databases seem to be better, which is reasonable, this > feature would be an aid to extending the useful life of any > particular training set. If that's the case, the plugin should automatically re-filter all unread messages in the inbox as well as all messages in the unsure folder upon each training event. That would insure that any spam that was completely missed gets caught as well. Incidentally, when keeping "stats" for testing spam filters, I manually do this by hand. That way I don't skew the statistics if I don't read my mail for a week or so. From kennypitt at hotmail.com Mon Dec 8 09:27:25 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Mon Dec 8 09:27:59 2003 Subject: [Spambayes] feature request In-Reply-To: Message-ID: Seth Goodman wrote: > Here's an interesting feature request that comes out experimenting > with training schemes. When you are in the unsure folder and hit > either of the buttons "Delete As Spam" or "Recover from Spam", it > would be great if you would re-filter the unsure folder. I've thought about this myself, and hopefully I'll get a chance to include it as I'm making my auto-balancing updates. I also would like to add a menu item to rescore the currently selected folder. Would anyone else find this useful? I often rescore my Spam folder to see if additional training has caused any of the messages to fall back below the threshold (yes, this does happen fairly often, especially when your training set is still small). Going through the "Filter now" dialog box and updating my folder selections every time gets cumbersome. > One more note on the unsure folder is that one of the buttons is > labeled "Recover from Spam". Since none of the messages in the > unsure folder have been trained as spam, the "Recover from Spam" > button is a bit misleading. Though this is the same button that > appears in spam folders, thus making the code simpler, in the unsure > folder it should probably be called "Train as Good" or "Keep as Good". Locally, I've renamed mine simply "Spam" and "Not Spam". This also has the nice side-effect of making the toolbar shorter. -- Kenny Pitt From kennypitt at hotmail.com Mon Dec 8 09:47:35 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Mon Dec 8 09:48:09 2003 Subject: [Spambayes] feature request In-Reply-To: <000801c3bd29$9e176cf0$2c00a8c0@eden> Message-ID: Mark Hammond wrote: > [Rayfes] >> Is there >> any way to have SpamBayes play a sound every time is marks >> a message a spam and maybe a different sound for possible spam? > > That is a pretty good idea. If someone can nail down the exact > feature request, I think we could add it. The issues as I see them: > * Play the sound for *every* mail that SpamBayes processes, or only > at the end of a "batch"? > * If a batch, what exactly is a batch, considering the "background" > filtering option? My thoughts: I think once per batch would be best. Especially when I first start Outlook after the weekend and receive a bunch of e-mails at once, one per message would get very annoying. Per batch is also more like the Outlook behavior. As far as defining a batch, I would probably use a timer similar to, but not exactly like, the background filter. When the first message is processed, set the timer. Each time a new message is processed, reset the timer. If no new message is processed before the timer expires then play the appropriate sound. The trick is making sure that this timer doesn't expire before the background filter timer, but that shouldn't be too difficult. > * If a batch and there are multiple classifications in the batch, do > we play all sounds? > * If so, in what order? Seems like most people mainly want to know if there is anything interesting to look at before opening Outlook. To that end, I would play only the sound for the "most interesting" classification which I think most people would define as ham->unsure->spam. I would also look at which sounds are configured so that, for example, if no ham sound is configured then I would consider the unsure sound to be the most interesting. > * Sounds can be played synchronously, or asynchronously. Which one > do we choose? If we are playing only the one most interesting sound after each batch, then I would just play it synchronously. > Ultimately, these answers need to be expressed as a set of options > that will exist in the INI file - from the above list, I doubt that 3 > simple "spam/ham/unsure_sound_filename" options will do. Seems like we might be able to get by with these 3 plus a "batch accumulation delay" setting. Other possibilities to make it more configurable would be defining the "most interesting" order, and an option to play the sounds for all detected classifications instead of the most interesting. Is there an existing win32all function to play an arbitrary sound file? If there is, I'd be glad to start looking into implementing this. -- Kenny Pitt From dbulgrien at vcsd.com Mon Dec 8 09:51:03 2003 From: dbulgrien at vcsd.com (Dennis W. Bulgrien) Date: Mon Dec 8 09:51:12 2003 Subject: [Spambayes] Re: Routine training on correctly classified email? References: <000101c3bd17$d64f6210$6501a8c0@CambridgeMA.gov> Message-ID: That is the case for me. I started out with a clean slate. I get hundreds of spam per ham. My ham count is VERY small because most unsures go to spam. With very few ham the unsure folder seems to fill up with spam. I too would like all messages classified as certain ham to be trained on so the ham count goes up without intervention. "Robert K. Coe" wrote in message news:000101c3bd17$d64f6210$6501a8c0@CambridgeMA.gov... The problem with mistake-based training is that almost all mistakes are false negatives. And most of the messages that go to the "Indefinite" folder turn out to be spam. The result is that over time, the database becomes increasingly spam-heavy. ... From aclark at danvillesignal.com Mon Dec 8 09:55:47 2003 From: aclark at danvillesignal.com (Al Clark) Date: Mon Dec 8 09:55:41 2003 Subject: [Spambayes] SpamBayes with Eudora Message-ID: <5.2.0.9.0.20031208085204.01ac5140@localhost> Maybe, I am missing something but SpamBayes isn't filtering anything in Eudora 5.2 I can look at the scores and the messages are being classified correctly. I can look at the incoming training in localhost and these message are all correct but all messages go right into my In box in Eudora. I didn't find any info on this in the FAQ or the Eudora setup instructions. Thanks for helping Al Clark Danville Signal Processing, Inc. -------------------------------------------------------------------- Purveyors of Fine DSP Hardware and other Cool Stuff Available at http://www.danvillesignal.com From justin.davis at attws.com Mon Dec 8 09:57:41 2003 From: justin.davis at attws.com (Davis, Justin) Date: Mon Dec 8 09:57:48 2003 Subject: [Spambayes] SpamBayes Outlook 2000 SR-1 won't move my spam Message-ID: <0AE621AC518BE0488EBE3BB22853364F0109B493@tx-msg12-aln.wireless.attws.com> Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: spambayes2.log Type: application/octet-stream Size: 61 bytes Desc: spambayes2.log Url : http://mail.python.org/pipermail/spambayes/attachments/20031208/46d82811/spambayes2.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: spambayes1.log Type: application/octet-stream Size: 8339 bytes Desc: spambayes1.log Url : http://mail.python.org/pipermail/spambayes/attachments/20031208/46d82811/spambayes1.obj From dbulgrien at vcsd.com Mon Dec 8 09:59:01 2003 From: dbulgrien at vcsd.com (Dennis W. Bulgrien) Date: Mon Dec 8 09:59:09 2003 Subject: [Spambayes] Manager, Start Training Button Message-ID: The Spambayes Manager, Training tab, "Start Training" button does not have on-line help, and there's nothing in the FAQ about it. I have collected a bunch of ham and want to add them to the training database while keeping the current spam data. it that the way to do it? From karns.17 at osu.edu Mon Dec 8 02:22:25 2003 From: karns.17 at osu.edu (Jason Karns) Date: Mon Dec 8 10:00:01 2003 Subject: [Spambayes] Toolbar Button Images Message-ID: <000001c3bd5c$0a3ee200$47c96ba4@brutus> Skipped content of type multipart/related-------------- next part -------------- A non-text attachment was scrubbed... Name: spam.bmp Type: image/bmp Size: 106974 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031208/9a414785/spam-0001.bin From Mark.Howells at softoption.com Mon Dec 8 10:18:57 2003 From: Mark.Howells at softoption.com (Mark Howells) Date: Mon Dec 8 10:20:03 2003 Subject: [Spambayes] Why is this classified as Ham Message-ID: <5846CF419D2EF5439036CC3126A3A995017B79@SOSERVER1.softoption.local> Any idea why this is classified as Ham in the Web interface? It's obviously Spam. Cheers Mark ----------- Spam probability: 0.999309965162 Clues for: CH3CK 0UT MY H0T L~I~V~E~C~A~M tmmy HYPERLINK "http://howellsfamily.f9.co.uk:8880/status.gif" *H* 0.00138006962536 *S* 0.999999999949 i've 0.0587109384186 could 0.0783695417564 come 0.0792885882338 should 0.0835626574485 hope 0.0896793882134 can't 0.0901758244142 still 0.0977411944495 but 0.101442528546 i'm 0.103160470713 find 0.106071593545 address 0.11446324708 email 0.126670440102 need 0.13739139052 asked 0.140426781604 myself 0.140699300699 forward 0.144650473521 that 0.145581519932 been 0.148862940962 have 0.153124092909 soon 0.156743646261 got 0.161060900207 well 0.163932435784 looking 0.171137147574 it, 0.177700582011 much 0.177834344036 such 0.182099866435 his 0.184251293589 busy 0.187317680178 currently 0.193381299887 through 0.195558276346 having 0.19778199154 you 0.202511883715 can 0.211999010103 help 0.212361827705 really 0.21469769243 me, 0.221005451766 just 0.232632960164 with 0.242690276404 the 0.247447382308 cause 0.252040034264 front 0.25502750373 like 0.259105512997 bit 0.262509888234 people 0.267141100931 your 0.269414421461 guys 0.28041172801 stuff 0.280718805323 straight 0.296550535449 lot 0.322302003293 and 0.323547704938 time 0.323986678539 see 0.348430936048 show 0.355273649977 url:com 0.356540273378 name 0.360968143228 inside 0.377903943703 url:www 0.385957710686 little 0.395637155378 blue 0.39793927695 url:ultrapasswords 0.612524002827 url:webcams 0.612524002827 click 0.6128900103 to:no real name:2**0 0.650569116011 there! 0.653080750096 college. 0.704427602712 eye 0.704427602712 finger 0.704427602712 in! 0.704427602712 header:Reply-To:1 0.72390271017 content-type:multipart/alternative 0.734876132917 content-type:text/html 0.747415607763 header:Received:2 0.796462459916 hey 0.803208820999 sexual 0.839128920218 gzu 0.844827586207 subject:0UT 0.844827586207 subject:CH3CK 0.844827586207 subject:H0T 0.844827586207 see, 0.852590197161 chatting 0.882180199307 flirt 0.882180199307 watch 0.885059013265 chat 0.898617177983 url:deatoo 0.908913381574 to:addr:mark 0.923918591094 live 0.924325435254 100-120 0.949438202247 age: 0.949438202247 build: 0.949438202247 color: 0.949438202247 freshman 0.949438202247 glrl 0.949438202247 hates 0.949438202247 hehe. 0.949438202247 height: 0.949438202247 here!!! 0.949438202247 internet, 0.949438202247 lbs. 0.949438202247 length: 0.949438202247 medium 0.949438202247 michelle 0.949438202247 michelle's 0.949438202247 online. 0.949438202247 petite 0.949438202247 pledging 0.949438202247 preference: 0.949438202247 refuses 0.949438202247 sex: 0.949438202247 shaved 0.949438202247 sorority 0.949438202247 special: 0.949438202247 stuff! 0.949438202247 suck 0.949438202247 tall 0.949438202247 url:1450 0.949438202247 url:1453 0.949438202247 url:1457 0.949438202247 url:1458 0.949438202247 url:1510 0.949438202247 url:1511 0.949438202247 url:1521 0.949438202247 url:1533 0.949438202247 url:1540 0.949438202247 url:michelle_files 0.949438202247 w3b 0.949438202247 ya, 0.949438202247 url:jpg 0.958300371059 weight: 0.95871559633 to:addr:howellsfamily.fsnet.co.uk 0.961368306838 female 0.969798657718 hair 0.973372781065 cock 0.97619047619 from:addr:bluemail.ch 0.97619047619 myself. 0.97619047619 reply-to:addr:bluemail.ch 0.97619047619 blonde 0.978468899522 url:livecam 0.978468899522 h0tt3st 0.981927710843 boyfriend 0.983271375465 webcam 0.991493383743 -----Original Message----- From: Davis, Justin [mailto:justin.davis@attws.com] Sent: 08 December 2003 14:58 To: spambayes@python.org Subject: [Spambayes] SpamBayes Outlook 2000 SR-1 won't move my spam My installation of SpamBayes seems to be working up until the point that it should be moving mail from my inbox to one of the spam folders. I have read the help files, and checked my setup, but I can't find anything that seems to be off. When I check the spam clues for some of the spam in my inbox, I might find the rating at 100% but it is still in the inbox. In about a week of using SpamBayes I have not gotten any messages that were moved to my spam suspects folder, and I don't think any automatically were filtered to my spam folder either. I am running Windows 2000 Professional 5.0.2195 Service Pack 3 Build 2195 My SpamBayes version for outlook is 0.81 I have included the 2 logfiles I could find on my computer. Any help will be greatly appreciated. Thank you. Justin -- Incoming mail is certified Virus Free. Checked by AVG Anti-Virus (http://www.grisoft.com). Version: 7.0.206 / Virus Database: 261.4.0 - Release Date: 12/5/2003 -- Outgoing mail is certified Virus Free. Checked by AVG Anti-Virus (http://www.grisoft.com). Version: 7.0.206 / Virus Database: 261.4.0 - Release Date: 12/5/2003 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031208/26dfebd3/attachment-0001.html From skip at pobox.com Mon Dec 8 10:24:43 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Dec 8 10:24:55 2003 Subject: [Spambayes] SpamBayes with Eudora In-Reply-To: <5.2.0.9.0.20031208085204.01ac5140@localhost> References: <5.2.0.9.0.20031208085204.01ac5140@localhost> Message-ID: <16340.38843.466829.922375@montanaro.dyndns.org> Al> Maybe, I am missing something but SpamBayes isn't filtering anything Al> in Eudora 5.2 Al> I can look at the scores and the messages are being classified correctly. I Al> can look at the incoming training in localhost and these message are all Al> correct but all messages go right into my In box in Eudora. You need to define rules in Eudora which tell it where to save messages which score as spam or unsure. Skip From tpeters at mixcom.com Mon Dec 8 10:55:47 2003 From: tpeters at mixcom.com (Tom Peters) Date: Mon Dec 8 10:56:23 2003 Subject: [Spambayes] SpamBayes with Eudora In-Reply-To: <5.2.0.9.0.20031208085204.01ac5140@localhost> Message-ID: <5.1.0.14.2.20031208094837.01d64b30@localhost> I'm sorry if you already know this and I'm sending you "Duh!" info, but... SpamBayes doesn't FILTER anything unless you are using outlook and the outlook plugin. I use Eudora 5.1 and the Pop3proxy, which is now called sbserver, which just inserts a new header into your mail that reads one of three things: X-Spambayes-Classification: ham X-Spambayes-Classification: spam X-Spambayes-Classification: unsure YOU write the filters using Eudora's filter dialog to do whatever you want with these messages. That part is not intuitvely obvious; the filter has to be written to say Match Incomming and Manual (manual is optional) Header: X-Spambayes-Classification: Contains: spam [Ignore] Action: Transfer to: Trash Ignore Rest My filters do a little more too. Before the Action: Transfer To Trash, I actually change its label to a category I created "spam" and play a wave file. HTH -T At 08:55 AM 12/8/2003 -0600, Al Clark wrote: >Maybe, I am missing something but SpamBayes isn't filtering anything in >Eudora 5.2 > >I can look at the scores and the messages are being classified correctly. >I can look at the incoming training in localhost and these message are all >correct but all messages go right into my In box in Eudora. > >I didn't find any info on this in the FAQ or the Eudora setup instructions. > >Thanks for helping > >Al Clark >Danville Signal Processing, Inc. >-------------------------------------------------------------------- >Purveyors of Fine DSP Hardware and other Cool Stuff >Available at http://www.danvillesignal.com > > >_______________________________________________ >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes >Check the FAQ before asking: http://spambayes.sf.net/faq.html [Commentary] There is no idea so stupid that you can't find a professor who believes it. --H.L. Mencken --... ...-- -.. . -. ----. --.- --.- -... tpeters@nospam.mixcom.com (internet) remove "nospam." N9QQB (ham) "HEY YOU" (loud shouting) WEB ADDRESS http//www.mixweb.com/tpeters 43 7' 17.2" N, by 88? 6' 28.9" W, Elevation 815', Grid Square EN53wc WAN/LAN/Telcom Analyst, Tech Writer, MCP, Cisco Certified CCNA From info at openforex.com Mon Dec 8 20:11:09 2003 From: info at openforex.com (OpenForex) Date: Mon Dec 8 11:09:49 2003 Subject: [Spambayes]  FOREX News - OpenForex.Com  240983 Message-ID: ============================================================ Forex News Month Digest "COMMERCIAL FORECASTS OF THE FOREX MARKET" ============================================================ 12/8/2003 Analytical Center OpenForex.Com represent your attention Commercial Forecasts of the FOREX Market. - Forecasts are given in the form of specific trade recommendations - According to statistics the probability of our forecasts execution is more than 80 percent - Forecasts include technical, fundamental and other types of analysis - Variety of forms of payment - Prompt notice on new forecasts via E-mail, ICQ, SMS - Free demo account More... http://www.openforex.com --- November 2003 QUOTATIONS Total profit in the amount of pips: - eur/usd 230 - gbp/usd 275 - usd/jpy 171 - usd/chf 10 Taking income into account on the condition of open position, in the size of one lot - $1000 - for every currency pair, the total profit, regardless of the base 1000 dollars, will equal for every currency pair: - eur/usd $2300 - gbp/usd $2750 - usd/jpy $1710 - usd/chf $100 More in detail... http://www.openforex.com/?detail --- CONTACTS E-mail: mailto:info@openforex.com ICQ: 778487 --- FEATURES Unique trading strategies are applied to market analysis. For the first time ever, we decided to combine technical analysis? achievements, macroeconomic indications and psychology of exchange crowd. We use various authors? developments in complex approach to market analysis: technical analysis (J. DiNapoli, J. Shwager, Leboe and others); fundamental analysis (Likhovidov and others); market psychology analysis (A. Elder and others). Other than that, we use non-standard methods of analysis: B. William?s trade chaos, Japanese candles in Nisson?s interpretation, Elliot?s theory of waves, pitchfork method and so forth. Our group consists not only of specialists in technical and fundamental analysis, but also of experienced traders that know how to estimate correctly both, signals of technical indicators, and indications of macroeconomic parameters. It is worthwhile mentioning that our resource is not connected to dealing centers and we are not interested in involving our clients in stockjobbing. Our aim is to assist trader-beginner with making intelligent trade decisions, while increasing but not losing his capital. Forecasted below are the most popular currency pairs: Euro ? US Dollar US Dollar ? Swiss Frank US Dollar ? Japanese Yen US Dollar ? Canadian Dollar Pound sterling - US Dollar Subsequently, the spectrum of trade instruments will expand. Currency pairs and other instruments (stocks, options, index and precious metals futures) will also be added. Starting forecasts will be brought out on a higher level in terms of quality of the forecasts themselves and the design of material submission. Except for forecasts, market surveys will be organized in the following sessions ? European, American and Asian. Technical evaluation of such market instruments as gold and exchange indexes will also be given. --- QUOTATIONS ON-LINE FOREX market http://www.openforex.com/forex.php Futures http://www.openforex.com/fuch.php Indexes http://www.openforex.com/indexes.php USA stocks http://www.openforex.com/usacfd.php --- SEE ALSO FORUM http://www.openforex.com/forum/ INFORMERS http://www.openforex.com/informers.php 9F61E30-3FB9E231-740315AF-3479F161-48D3EC62 54BD18E8-A05125E-754AECC9-3970ACA6-1506DE24 52F3851C-356ED081-459F0D1E-3F09CD1D-D3EF049 From tim.one at comcast.net Mon Dec 8 11:12:12 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Dec 8 11:12:11 2003 Subject: [Spambayes] feature request In-Reply-To: Message-ID: [Kenny Pitt] > ... > Is there an existing win32all function to play an arbitrary sound > file? If there is, I'd be glad to start looking into implementing > this. It's available from base Python; see PlaySound() at http://www.python.org/doc/current/lib/module-winsound.html and you want (at least) the SND_FILENAME flag. From eliot at isogen.com Mon Dec 8 11:19:02 2003 From: eliot at isogen.com (W. Eliot Kimber) Date: Mon Dec 8 11:20:03 2003 Subject: [Spambayes] POP3 Server Performance Issue Win2K SP4 Message-ID: <3FD4A476.4020202@isogen.com> I'm running the POP3 server 1.0a5 (I thought I had upgraded to 1.0a7 but the console reports 1.0a5). This is on a 2Ghz laptop running Win2K SP4. The proxy is installed as a service. I'm using Mozilla 1.4 as my mail client. The issue is that whenever my mail client connects to the POP3 proxy, the Python process sucks up all the system resources to the point that the system essentially stops responding for about a minute. The task manager shows that it is the PythonService.e that is using all the cycles. This issue has really only been a problem in the last few weeks, so I'm pretty sure it's a problem with the data on my machine, not an inherent performance problem with SpamBayes. I suspect the issue is related to the size of my spam database--I've been running SB since September 2003 had have collected over 25,000 spams (my email address has been unchanged for about 7 years and is in lots of public web pages and mail archives so I'm just a spam magnet). Does anyone have any suggestions for how I might tune the server or databse to avoid this level of resource hogging? I haven't been able to develop enough of an understanding of how the database works to know what I can do or try to address this problem. Thanks, Eliot -- W. Eliot Kimber Innodata Isogen eliot@isogen.com www.isogen.com From kennypitt at hotmail.com Mon Dec 8 11:20:16 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Mon Dec 8 11:20:52 2003 Subject: [Spambayes] SpamBayes Outlook 2000 SR-1 won't move my spam In-Reply-To: <0AE621AC518BE0488EBE3BB22853364F0109B493@tx-msg12-aln.wireless.attws.com> Message-ID: I see several entries in your log that begin with: Recovering to folder 'Inbox' and ham training message The "Recover from Spam" button in the toolbar only shows up on the folders that you have configured for Possible Spam and Certain Spam, so it usually isn't possible to generate these entries unless SpamBayes thinks it is in one of these folders. You might check your Filtering configuration in SpamBayes Manager to be sure your "Certain Spam" and "Possible Spam" folders are set to what you think they are. On a side note, I notice that you initially trained on 1058 good messages, but only 37 spam messages. SpamBayes will be much more accurate at classifying your messages (assuming we can get it to move them at all <0.5 wink>) if the number of messages of each type is as close to equal as possible. If you have 1000 good messages trained then ideally you should also have 1000 spam messages. -- Kenny Pitt _____ From: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org] On Behalf Of Davis, Justin Sent: Monday, December 08, 2003 9:58 AM To: spambayes@python.org Subject: [Spambayes] SpamBayes Outlook 2000 SR-1 won't move my spam My installation of SpamBayes seems to be working up until the point that it should be moving mail from my inbox to one of the spam folders. I have read the help files, and checked my setup, but I can't find anything that seems to be off. When I check the spam clues for some of the spam in my inbox, I might find the rating at 100% but it is still in the inbox. In about a week of using SpamBayes I have not gotten any messages that were moved to my spam suspects folder, and I don't think any automatically were filtered to my spam folder either. I am running Windows 2000 Professional 5.0.2195 Service Pack 3 Build 2195 My SpamBayes version for outlook is 0.81 I have included the 2 logfiles I could find on my computer. Any help will be greatly appreciated. Thank you. Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031208/c6a70d99/attachment.html From kennypitt at hotmail.com Mon Dec 8 11:26:52 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Mon Dec 8 11:27:36 2003 Subject: [Spambayes] Toolbar Button Images In-Reply-To: <000001c3bd5c$0a3ee200$47c96ba4@brutus> Message-ID: Jason Karns wrote: > A suggestion I have is that the background to your toolbar button images > should be "invisible" rather than white. Unfortunately, Outlook uses a rather bizarre method for creating invisible backgrounds on custom buttons that it is not possible to accomplish with the current version of the Python language that is used to write the SpamBayes plugin. The best alternative I can think of write now is to design icons that fill the entire button area so that you don't see any "background". -- Kenny Pitt From guyh at forescout.com Mon Dec 8 11:36:26 2003 From: guyh at forescout.com (Guy Harpaz) Date: Mon Dec 8 11:36:34 2003 Subject: [Spambayes] Are you considering to ad the option of deleting the Spam mail from the servers Message-ID: <731D4474EFFCD644A6EB64B126EFDC7B5787AE@fs07.fsd.forescout.com> Hello, Let me start by saying that the general idea of your program sound promising. I do have a question / remark about the fact that I have to download all the Spam mail to my PC instead of detecting the mail while still on the servers. I read the answer in the FAQ's and I must say that I think that the option of being able to identify and delete a Spam without having to download them could give a great added value. If I understand how the program works, then getting the sender address, subject and the body of the message is enough (no attachments), it should be clear that this could make the tool even more powerful (for example removing emails containing viruses that exploit the mail client sent by "Spammer" senders). I am currently using a program called "MailWasher" and it I think that a combination of both technologies could really make the difference. I know that I for one would be very interested if it was possible to do use an standalone tool and remove the Spam mail before downloading it. Guy Harpaz ForeScout Technologies Ltd. 32 Habarzel St. ,Tel Aviv, 69710 Israel Tel : +972-3-6449987 ext. 119 Cell : +972-55-613365 From kennypitt at hotmail.com Mon Dec 8 11:38:51 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Mon Dec 8 11:39:31 2003 Subject: [Spambayes] Why is this classified as Ham In-Reply-To: <5846CF419D2EF5439036CC3126A3A995017B79@SOSERVER1.softoption.local> Message-ID: Mark Howells wrote: > Any idea why this is classified as Ham in the Web interface? > > It's obviously Spam. > > ----------- > Spam probability: 0.999309965162 > Clues for: CH3CK 0UT MY H0T L~I~V~E~C~A~M tmmy > *H* 0.00138006962536 > *S* 0.999999999949 And SpamBayes obviously thinks it's spam based on your current training. What isn't clear is what SpamBayes thought about the message at the time it was received. Clicking on the Clues button in the Web UI shows the current clues that would be used to classify the message, which are not necessarily the same as the clues that were used when it was received. Try looking at the full headers for the message (click the subject in the Web UI to see the complete raw message) and look for "X-Spambayes-Classification:" and "X-Spambayes-Spam-Probability:" headers. These will tell you what the original classification and scoring of the message was. -- Kenny Pitt From Melanie_Stern at dofasco.ca Mon Dec 8 11:49:48 2003 From: Melanie_Stern at dofasco.ca (Melanie_Stern@dofasco.ca) Date: Mon Dec 8 11:49:53 2003 Subject: [Spambayes] Where are my e-mails? continued Message-ID: We've been installing Spambayes on a limited basis and I've had nothing but happy people. In fact, one user has called me twice just to say thanks! Many of them liked it so much they've installed it at home too. It's a great program. Of the 20 people with SpamBayes, 2 of them have accidentally deleted their JUNK folders and had to get me to fix it. (recreate the folder and tell SpamBayes where it is) It would be nice if there was a second warning or if you just could not delete / move any folder that SpamBayes was using. Maybe it could be a separate bullet-proof-my-SpamBayes startup option or something. I have no idea how hard these are to do and I know that for most users they aren't necessary. But for the few that make a mistake, it would save them and me (and you guys - you've gotten a lot of email about this) some hassle. Apart from that, the interface is such a dream. I installed it on my mom's computer and she figured it out without any help from me! That is truly impressive. Melanie Stern Internet Group Dofasco, Inc. ____________________________________________________________ [Tim] > By the way, when you deleted your folder, didn't Outlook > pop up a box asking you whether you really wanted to do that? > If it didn't, your old JUNK MAIL folder is probably hiding > now as a subfolder in your Deleted Items folder. [TONY] I don't know about all versions of Outlook, but I get this "are you sure" warning even if I'm just moving the folder into the Deleted Items folder. It would seem pointless to add a second check, even if it was a simple addition to the plug-in. BTW, has anyone else noticed that in the last week or two these 'missing mail' posts have become quite common? (Although I note that it's never ended up being spambayes at fault). Maybe we need a FAQ for this, too? =Tony Meyer _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html From kennypitt at hotmail.com Mon Dec 8 12:07:43 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Mon Dec 8 12:08:19 2003 Subject: [Spambayes] POP3 Server Performance Issue Win2K SP4 In-Reply-To: <3FD4A476.4020202@isogen.com> Message-ID: W. Eliot Kimber wrote: > I'm running the POP3 server 1.0a5 (I thought I had upgraded to 1.0a7 > but the console reports 1.0a5)... > > The issue is that whenever my mail client connects to the POP3 proxy, > the Python process sucks up all the system resources to the point that > the system essentially stops responding for about a minute... > > I suspect the issue is related to the size of my spam database--I've > been running SB since September 2003 had have collected over 25,000 > spams (my email address has been unchanged for about 7 years and is in > lots of public web pages and mail archives so I'm just a spam magnet). > > Does anyone have any suggestions for how I might tune the server or > databse to avoid this level of resource hogging? If your training database is very large then it could have an impact on processor utilization. Just how big is your training database? It usually isn't necessary to train on every message you receive, and if you've received 25,000 spams in about 3 months then I suspect your training is heavily over-balanced toward spam anyway. I would suggest retraining on a much smaller set of messages (50-100 of each is probably more than sufficient). After that, be more selective about which messages you actually train on. If most of your messages classify correctly then you shouldn't need to train much at all. -- Kenny Pitt From david.matos at comcast.net Mon Dec 8 12:32:13 2003 From: david.matos at comcast.net (David Matos) Date: Mon Dec 8 12:32:12 2003 Subject: [Spambayes] "Un-classifying" a message In-Reply-To: Message-ID: <000f01c3bdb1$391f75d0$07a62241@dexter> Using the Outlook plug-in, is it possible to correct ("un-classify") a message that I accidentally classified incorrectly? In other words, if I?get an unsure, misclick the "spam" button, then go to the "spam" folder, highlight that message, then click the "ham" button, what--if anything--have I accomplished? From james at hostunite.com Mon Dec 8 10:41:52 2003 From: james at hostunite.com (James) Date: Mon Dec 8 12:42:38 2003 Subject: [Spambayes] REMOVE In-Reply-To: Message-ID: <001d01c3bda1$e3b52630$6604a8c0@HOM> REMOVE -----Original Message----- From: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org] On Behalf Of spambayes-request@python.org Sent: Monday, December 08, 2003 12:00 PM To: spambayes@python.org Subject: Spambayes Digest, Vol 64, Issue 37 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. RE: Where are my e-mails? continued (Melanie_Stern@dofasco.ca) ---------------------------------------------------------------------- Message: 1 Date: Mon, 8 Dec 2003 11:49:48 -0500 From: Subject: RE: [Spambayes] Where are my e-mails? continued To: Message-ID: Content-Type: text/plain; charset="US-ASCII" We've been installing Spambayes on a limited basis and I've had nothing but happy people. In fact, one user has called me twice just to say thanks! Many of them liked it so much they've installed it at home too. It's a great program. Of the 20 people with SpamBayes, 2 of them have accidentally deleted their JUNK folders and had to get me to fix it. (recreate the folder and tell SpamBayes where it is) It would be nice if there was a second warning or if you just could not delete / move any folder that SpamBayes was using. Maybe it could be a separate bullet-proof-my-SpamBayes startup option or something. I have no idea how hard these are to do and I know that for most users they aren't necessary. But for the few that make a mistake, it would save them and me (and you guys - you've gotten a lot of email about this) some hassle. Apart from that, the interface is such a dream. I installed it on my mom's computer and she figured it out without any help from me! That is truly impressive. Melanie Stern Internet Group Dofasco, Inc. ____________________________________________________________ [Tim] > By the way, when you deleted your folder, didn't Outlook > pop up a box asking you whether you really wanted to do that? > If it didn't, your old JUNK MAIL folder is probably hiding > now as a subfolder in your Deleted Items folder. [TONY] I don't know about all versions of Outlook, but I get this "are you sure" warning even if I'm just moving the folder into the Deleted Items folder. It would seem pointless to add a second check, even if it was a simple addition to the plug-in. BTW, has anyone else noticed that in the last week or two these 'missing mail' posts have become quite common? (Although I note that it's never ended up being spambayes at fault). Maybe we need a FAQ for this, too? =Tony Meyer _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 37 ***************************************** From dbulgrien at vcsd.com Mon Dec 8 13:03:41 2003 From: dbulgrien at vcsd.com (Dennis W. Bulgrien) Date: Mon Dec 8 13:03:49 2003 Subject: [Spambayes] Strip Subject of Non-alpha Message-ID: I suggest that a filter be added which strips the subject line of all non-alpha characters before scoring. It can be scored on the unstripped subject too, but on the stripped one too. That will detect messages where the spam words are broken up by dots, periods, dashes, etc. From kennypitt at hotmail.com Mon Dec 8 13:03:50 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Mon Dec 8 13:04:26 2003 Subject: [Spambayes] "Un-classifying" a message In-Reply-To: <000f01c3bdb1$391f75d0$07a62241@dexter> Message-ID: David Matos wrote: > Using the Outlook plug-in, is it possible to correct ("un-classify") a > message that I accidentally classified incorrectly? In other words, > if I?get an unsure, misclick the "spam" button, then go to the "spam" > folder, highlight that message, then click the "ham" button, what--if > anything--have I accomplished? It will untrain the message from spam and retrain it as ham. The end result is the same training info that you would have had if you had clicked "ham" in the unsure folder originally. -- Kenny Pitt From tim.one at comcast.net Mon Dec 8 13:09:05 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Dec 8 13:09:04 2003 Subject: [Spambayes] "Un-classifying" a message In-Reply-To: <000f01c3bdb1$391f75d0$07a62241@dexter> Message-ID: [David Matos] > Using the Outlook plug-in, is it possible to correct ("un-classify") a > message that I accidentally classified incorrectly? In other words, > if I?get an unsure, misclick the "spam" button, then go to the "spam" > folder, highlight that message, then click the "ham" button, what--if > anything--have I accomplished? You've accomplished two things then: (1) removed the training that said this msg was spam; and, (2) added training that this msg is ham. From rayfes at rayfes.com Mon Dec 8 13:11:07 2003 From: rayfes at rayfes.com (Rayfes Mondal) Date: Mon Dec 8 13:11:31 2003 Subject: [Spambayes] feature request In-Reply-To: Message-ID: I appreciate you guys working on this feature and keeping me in the loop! I'm sorry I don't have anything useful to offer, I'm a computer chip designer, software is too non-deterministic for me. :) Thanks, Rayfes From tim.one at comcast.net Mon Dec 8 13:15:59 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Dec 8 13:15:58 2003 Subject: [Spambayes] REMOVE In-Reply-To: <001d01c3bda1$e3b52630$6604a8c0@HOM> Message-ID: [James] > REMOVE Eh? If you're trying to unsubscribe, go to http://mail.python.org/mailman/listinfo/spambayes where you can unsubscribe yourself (we don't subscribe anyone, and we don't unsubscribe anyone -- readers do their own mailing list management here). From skip at pobox.com Mon Dec 8 13:47:22 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Dec 8 13:47:29 2003 Subject: [Spambayes] POP3 Server Performance Issue Win2K SP4 In-Reply-To: <3FD4A476.4020202@isogen.com> References: <3FD4A476.4020202@isogen.com> Message-ID: <16340.51002.498155.120077@montanaro.dyndns.org> Eliot> I suspect the issue is related to the size of my spam Eliot> database--I've been running SB since September 2003 had have Eliot> collected over 25,000 spams (my email address has been unchanged Eliot> for about 7 years and is in lots of public web pages and mail Eliot> archives so I'm just a spam magnet). Eliot> Does anyone have any suggestions for how I might tune the server Eliot> or databse to avoid this level of resource hogging? Delete your ham and spam databases and start over? You have 25,000 spams. How many hams do you have? If you don't have at least 8,000, my guess is that SB is performing sub-par anyway. Any idea how many classification mistakes you might have in your database? Any idea how to find them? <0.5 wink> Take a look at http://www.entrian.com/sbwiki/TrainingIdeas for some ideas about more selective training. Eliot> I haven't been able to develop enough of an understanding of how Eliot> the database works to know what I can do or try to address this Eliot> problem. In general, SpamBayes doesn't need to see every email sent to you to have a pretty good idea what you consider ham and spam. Skip From eliot at isogen.com Mon Dec 8 13:59:58 2003 From: eliot at isogen.com (W. Eliot Kimber) Date: Mon Dec 8 14:00:59 2003 Subject: [Spambayes] POP3 Server Performance Issue Win2K SP4 In-Reply-To: References: Message-ID: <3FD4CA2E.6080504@isogen.com> Kenny Pitt wrote: > > If your training database is very large then it could have an impact on > processor utilization. Just how big is your training database? My hammie.db file is about 10.5 Meg. > It usually isn't necessary to train on every message you receive, and if > you've received 25,000 spams in about 3 months then I suspect your > training is heavily over-balanced toward spam anyway. I would suggest > retraining on a much smaller set of messages (50-100 of each is probably > more than sufficient). After that, be more selective about which > messages you actually train on. If most of your messages classify > correctly then you shouldn't need to train much at all. I'm feeling a bit slow--but what is the process for doing this sort of re-training? Do I simply delete hammie.db and then retrain again using either new messages or old messages that I think are representative? I couldn't find any docs that spoke to this process directly. Thanks, Eliot -- W. Eliot Kimber Innodata Isogen eliot@isogen.com www.isogen.com From kennypitt at hotmail.com Mon Dec 8 14:32:38 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Mon Dec 8 14:33:11 2003 Subject: [Spambayes] POP3 Server Performance Issue Win2K SP4 In-Reply-To: <3FD4CA2E.6080504@isogen.com> Message-ID: W. Eliot Kimber wrote: > Kenny Pitt wrote: >> It usually isn't necessary to train on every message you receive, >> and if you've received 25,000 spams in about 3 months then I suspect >> your training is heavily over-balanced toward spam anyway. I would >> suggest retraining on a much smaller set of messages (50-100 of each >> is probably more than sufficient). After that, be more selective >> about which messages you actually train on. If most of your >> messages classify correctly then you shouldn't need to train much at >> all. > > I'm feeling a bit slow--but what is the process for doing this sort of > re-training? Do I simply delete hammie.db and then retrain again using > either new messages or old messages that I think are representative? > > I couldn't find any docs that spoke to this process directly. I use sb_server mostly when testing and not on a day-to-day basis so I may not be the best person to address that, but I'll give it a shot anyway. Yes, you can delete your database and start over. You should probably delete both your statistics_database and your message_info_database just to make sure they stay in sync. You can then retrain using a small, representative subset of messages. IIRC, you said you were using Mozilla Mail? If that is correct, then each of the folders in your Local Folders is stored in "mbox" format which is understood by the training option. You can create two folders such as "Ham Training" and "Spam Training" and copy the messages that you want to train on into those folders. You can then browse to the storage files for those folders (which are buried deep under your Mozilla profile directory) and feed each to the training option with the appropriate classification. After that, just watch your mail for unsures and mistakes. IMHO, there really isn't much reason to Review Messages as long as SpamBayes classifies everything correctly. Unreviewed messages in the cache will eventually expire, and you can use the Advanced Configuration page to set how long they are kept. When you do get some unsures or mistakes, do Review Messages to correct them. I would recommend also training on a few of the other messages there as needed, just enough to keep your training set balanced, and discard the rest. -- Kenny Pitt From dbulgrien at vcsd.com Mon Dec 8 14:41:38 2003 From: dbulgrien at vcsd.com (Dennis W. Bulgrien) Date: Mon Dec 8 14:41:47 2003 Subject: [Spambayes] Re: Manager, Start Training Button References: Message-ID: Yes, but first select a good and/or junk folder and uncheck "Rebuild entire database". "Dennis W. Bulgrien" wrote in message news:br23jo$c9v$1@sea.gmane.org... The Spambayes Manager, Training tab, "Start Training" button... add them to the training database while keeping the current spam data. it that the way to do it? From richie at entrian.com Mon Dec 8 14:56:44 2003 From: richie at entrian.com (Richie Hindle) Date: Mon Dec 8 14:56:50 2003 Subject: [Spambayes] install spambayes under $HOME In-Reply-To: <200312062020.hB6KKxO11250@pandora.outcomes.chop.edu> References: <200312062020.hB6KKxO11250@pandora.outcomes.chop.edu> Message-ID: Hi Yuelin, > I would like to install spambayes under my $HOME directory on a > Sun workstation running Solaris 9 and Python 2.2.3. I don't have > root access to the default directory, /usr/local/bin. I don't > know much about Python. I think you should be able to: o Unpack the archive into, say, ~/spambayes o Add ~/spambayes/scripts to your PATH o Add ~/spambayes to your PYTHONPATH (which is an environment variable just like PATH, that tells Python where to look for library modules). I've not tried this, but it ought to work. (On Windows I do something analogous to run CVS SpamBayes). -- Richie Hindle richie@entrian.com From skip at pobox.com Mon Dec 8 14:56:52 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Dec 8 14:57:07 2003 Subject: [Spambayes] Strip Subject of Non-alpha In-Reply-To: References: Message-ID: <16340.55172.171511.255475@montanaro.dyndns.org> Dennis> I suggest that a filter be added which strips the subject line Dennis> of all non-alpha characters before scoring. It can be scored on Dennis> the unstripped subject too, but on the stripped one too. That Dennis> will detect messages where the spam words are broken up by dots, Dennis> periods, dashes, etc. I have a local mod which adds an asciify_subject option to the tokenizer. It uses a codec I wrote called 'latscii' which assumes the subject is encoded as latin-1 (which seems to be the case for all the examples I've seen) and then performs a mapping from accented to unaccented letters, and maps symbols to ASCII characters somewhat arbitrarily (e.g., mapping the registered trademark character to an 'R' and a British pound sign to '#'). I suppose I could check it in, though it's not clear that for the fairly small number of these sort of messages I receive that it makes much difference (though perhaps my code modification has a bug someone else could spot). I never got overwhelming encouragement for my ideas about how to add experimental extensions to the CVS repository. Skip From dbulgrien at vcsd.com Mon Dec 8 15:04:48 2003 From: dbulgrien at vcsd.com (Dennis W. Bulgrien) Date: Mon Dec 8 15:04:58 2003 Subject: [Spambayes] Re: Manager, Start Training Button References: Message-ID: Oops, must have both good AND junk folder. Why? I'd like to just add training of good messages. Not critical that it requires both as I can create and empty folder and point the other to it. "Dennis W. Bulgrien" wrote... Yes, but first select a good and/or junk folder and uncheck "Rebuild entire database". "Dennis W. Bulgrien" wrote... The Spambayes Manager, Training tab, "Start Training" button... add them to the training database while keeping the current spam data... From richie at entrian.com Mon Dec 8 15:08:20 2003 From: richie at entrian.com (Richie Hindle) Date: Mon Dec 8 15:08:27 2003 Subject: [Spambayes] Are you considering to ad the option of deleting the Spam mail from the servers In-Reply-To: <731D4474EFFCD644A6EB64B126EFDC7B5787AE@fs07.fsd.forescout.com> References: <731D4474EFFCD644A6EB64B126EFDC7B5787AE@fs07.fsd.forescout.com> Message-ID: [Guy] > If I understand how the program works, then getting the sender > address, subject and the body of the message is enough (no attachments), That's almost true. Depending upon your options, you might need all the headers, but that's no problem. The difficulty, at least for POP3, is that there's no way to download the body and not the attachments - all you can do is download the first N lines of the message, which might be just part of the body, or might be the whole body and some of an attachment, or (worst but least common case) might be all attachment and no body. Also, what you've then downloaded is not a valid message, because it's cut off partway through - that could confuse the email parser that SpamBayes uses to separate the pieces of the message from each other. These problems can be overcome - in 99% of cases the headers and the first, say, 50 lines of the body would be enough for accurate clasification. The message could in most (all?) cases be fixed up sufficiently for the email parser to make some sense of it (email module experts: is it enough simply to add a newline?). So the answer is that we don't have any immediate plans for such a thing, but it wouldn't be very difficult to implement. It is on my long term to-do list, because I'd like it myself. 8-) But my to-do list is very long and my free time is very short. -- Richie Hindle richie@entrian.com From dbulgrien at vcsd.com Mon Dec 8 15:09:44 2003 From: dbulgrien at vcsd.com (Dennis W. Bulgrien) Date: Mon Dec 8 15:10:25 2003 Subject: [Spambayes] Re: Manager, Start Training Button References: Message-ID: Odd, "Deleted Items" doesn't show up in the Browse list. I wanted to use it. I permanently delete spam, so deleted items contains ham (or what used to be). "Dennis W. Bulgrien" wrote... Yes, but first select a good and/or junk folder and uncheck "Rebuild entire database". "Dennis W. Bulgrien" wrote... The Spambayes Manager, Training tab, "Start Training" button... add them to the training database while keeping the current spam data... From nobody at spamcop.net Mon Dec 8 15:46:53 2003 From: nobody at spamcop.net (Seth Goodman) Date: Mon Dec 8 15:47:09 2003 Subject: [Spambayes] feature request In-Reply-To: Message-ID: [Kenny Pitt] > I've thought about this myself, and hopefully I'll get a chance to > include it as I'm making my auto-balancing updates. I also would like > to add a menu item to rescore the currently selected folder. Would > anyone else find this useful? I often rescore my Spam folder to see if I sure would find it useful, as I notice exactly the same thing: on a small training set, each new trained message changes the scores of other messages in not obvious ways. Some spam drops below the threshold, and some ham (rarely) increases above the threshold, both suggesting that you should then retrain on those messages. The token databases are a statistical estimate of what the message stream is. When the database is small, one message can skew the estimate significantly. Though I would like to have the "filter current folder" button, I would rather have Ryan Malayter's suggestion (below), since it automatically filters all unread mail so you can look for changed classifications. [Ryan Malayter] > If that's the case, the plugin should automatically re-filter all unread > messages in the inbox as well as all messages in the unsure folder upon > each training event. That would insure that any spam that was completely > missed gets caught as well. On the matter of the "Recover from Spam" button in the unsure folder: [Seth Goodman] > > One more note on the unsure folder is that one of the buttons is > > labeled "Recover from Spam". Since none of the messages in the > > unsure folder have been trained as spam, the "Recover from Spam" > > button is a bit misleading. Though this is the same button that > > appears in spam folders, thus making the code simpler, in the unsure > > folder it should probably be called "Train as Good" or "Keep as Good". [Kenny Pitt] > > Locally, I've renamed mine simply "Spam" and "Not Spam". This also has > the nice side-effect of making the toolbar shorter. I don't see how naming the Unsure folder "Not Spam" solves the problem. Since none of these messages in your "Not Spam" folder were ever classified as spam, "Recover from Spam" still is confusing. I don't see how renaming the button when in the Unsure folder would make the toolbar any longer. There would be still be two buttons: "Delete as Spam" and "Keep as Good" (or whatever else you want to call it). -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From nobody at spamcop.net Mon Dec 8 15:53:48 2003 From: nobody at spamcop.net (Seth Goodman) Date: Mon Dec 8 15:53:52 2003 Subject: [Spambayes] Outlook: Setting background filtering as the default. In-Reply-To: <000801c3bd6a$0a90f140$2c00a8c0@eden> Message-ID: I have a question on background mode. Why is the second timer necessary? I use it at the default value of 1.0 sec, and what it does is to process one message per second after Outlook downloads a pile of messages. I assume that this is necessary or you wouldn't have gone to the trouble of adding it, but it does slow things down a lot when you download a bunch of messages at once. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From kennypitt at hotmail.com Mon Dec 8 15:57:02 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Mon Dec 8 15:57:38 2003 Subject: [Spambayes] feature request In-Reply-To: Message-ID: Seth Goodman wrote: > [Kenny Pitt] >> >> Locally, I've renamed mine simply "Spam" and "Not Spam". This also >> has the nice side-effect of making the toolbar shorter. > > I don't see how naming the Unsure folder "Not Spam" solves the > problem. Since none of these messages in your "Not Spam" folder were > ever classified as spam, "Recover from Spam" still is confusing. I > don't see how renaming the button when in the Unsure folder would > make the toolbar any longer. There would be still be two buttons: > "Delete as Spam" and "Keep as Good" (or whatever else you want to > call it). Sorry, I guess I wasn't clear on that statement. I meant I've changed the names of the *buttons* to "Spam" and "Not Spam" instead of "Delete as Spam" and "Recover from Spam". I run from source so I am free to change them to whatever I want, and I was merely presenting another possibility for the names. -- Kenny Pitt From nobody at spamcop.net Mon Dec 8 16:19:58 2003 From: nobody at spamcop.net (Seth Goodman) Date: Mon Dec 8 16:20:29 2003 Subject: [Spambayes] "Un-classifying" a message In-Reply-To: Message-ID: [Tim Peters] > You've accomplished two things then: (1) removed the training that said > this msg was spam; and, (2) added training that this msg is ham. OK, but how about untraining a message that doesn't belong in either database but was trained on by mistake? If you do this, currently you are stuck. This would suggest an "Untrain message" button. In this case (sounds like a lot of work, but possibly worth it), the toolbar would be context sensitive to the message that is selected, not the folder. This obviously gets tougher for multiple messages selected, but you would then display only buttons that are applicable to all selected messages. For the selected message(s), look to see if it is in either database. Here is a summary of what the two buttons would be depending on the message's training status: message is first button second button --------------- ----------------- ------------------------ trained as spam Recover from Spam Remove from Training Set trained as ham Delete as Spam Remove from Training Set not trained Delete as Spam Train as Good (presently "Recover from Spam") Or, if you change the button names to what Kenny uses in his setup, [Kenny Pitt] > Sorry, I guess I wasn't clear on that statement. I meant I've changed > the names of the *buttons* to "Spam" and "Not Spam" instead of "Delete > as Spam" and "Recover from Spam". I run from source so I am free to > change them to whatever I want, and I was merely presenting another > possibility for the names. then the summary would be: message is first button second button third button --------------- ------------ ------------- ------------ trained as spam Not Spam Can't Tell - trained as ham Spam Can't Tell - not trained Spam Not Spam Can't Tell The third button is displayed for untrained messages and is needed in the Unsure folder. This would move the message to the Inbox but *not* train it as ham. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From nobody at spamcop.net Mon Dec 8 16:27:02 2003 From: nobody at spamcop.net (Seth Goodman) Date: Mon Dec 8 16:27:07 2003 Subject: [Spambayes] "Un-classifying" a message In-Reply-To: Message-ID: Reposted with corrections: [Tim Peters] > You've accomplished two things then: (1) removed the training that said > this msg was spam; and, (2) added training that this msg is ham. OK, but how about untraining a message that doesn't belong in either database but was trained on by mistake? If you do this, currently you are stuck. This would suggest an "Untrain message" button. In this case (sounds like a lot of work, but possibly worth it), the toolbar would be context sensitive to the message that is selected, not the folder. This obviously gets tougher for multiple messages selected, but you would then display only buttons that are applicable to all selected messages. For the selected message(s), look to see if it is in either database. Here is a summary of what the two buttons would be depending on the message's training status: message is first button second button third button --------------- ----------------- ------------------------ ----------- - trained as spam Recover from Spam Remove from Training Set - trained as ham Delete as Spam Remove from Training Set - not trained Delete as Spam Train as Good (presently Don't Train "Recover from Spam") The third button is displayed for untrained messages and is needed in the Unsure folder. This would move the message to the Inbox but *not* train it as ham. Or, if you change the button names to what Kenny uses in his setup, [Kenny Pitt] > Sorry, I guess I wasn't clear on that statement. I meant I've changed > the names of the *buttons* to "Spam" and "Not Spam" instead of "Delete > as Spam" and "Recover from Spam". I run from source so I am free to > change them to whatever I want, and I was merely presenting another > possibility for the names. then the button names are shorter and less confusing. The summary would be: message is first button second button third button --------------- ------------ ------------- ------------ trained as spam Not Spam Can't Tell - trained as ham Spam Can't Tell - not trained Spam Not Spam Can't Tell The third button is displayed for untrained messages and is needed in the Unsure folder. This would move the message to the Inbox but *not* train it as ham. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From tim.one at comcast.net Mon Dec 8 16:34:45 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Dec 8 16:34:46 2003 Subject: [Spambayes] "Un-classifying" a message In-Reply-To: Message-ID: [Seth Goodman] > OK, but how about untraining a message that doesn't belong in either > database but was trained on by mistake? If you do this, currently > you are stuck. Speak for yourself . I keep all my training ham and spam in a distinct .pst file dedicated to holding training data, and I retrain from that. There's no mistake I can't recover from easily that way. I've also disabled all the "automatically train as this-or-that when it's moved from here-to-there" options. Too much magic makes operation incomprehensible in the end. So do too many UI buttons. > This would suggest an "Untrain message" button. In this case (sounds > like a lot of work, but possibly worth it), the toolbar would be context > sensitive to the message that is selected, not the folder. This > obviously gets tougher for multiple messages selected, but you would > then display only buttons that are applicable to all selected messages. > ... I expect this is too complex for most users to understand, and that buttons that stay in the same place but change meaning based on what you've selected would cause more training errors than they help solve. A simpler solution (one I wouldn't use, but ...) would probably be to add a single new checkbox option analogous to the existing "train on move" options: untrain a message if it's moved to the Unsure folder from a Ham or Spam folder. From eliot at isogen.com Mon Dec 8 17:34:28 2003 From: eliot at isogen.com (W. Eliot Kimber) Date: Mon Dec 8 17:35:37 2003 Subject: [Spambayes] POP3 Server Performance Issue Win2K SP4 In-Reply-To: References: Message-ID: <3FD4FC74.7040302@isogen.com> Kenny Pitt wrote: > W. Eliot Kimber wrote: > >>Kenny Pitt wrote: >> >>>It usually isn't necessary to train on every message you receive, >>>and if you've received 25,000 spams in about 3 months then I suspect >>>your training is heavily over-balanced toward spam anyway. I would >>>suggest retraining on a much smaller set of messages (50-100 of each >>>is probably more than sufficient). After that, be more selective >>>about which messages you actually train on. If most of your >>>messages classify correctly then you shouldn't need to train much at >>>all. >> >>I'm feeling a bit slow--but what is the process for doing this sort of >>re-training? Do I simply delete hammie.db and then retrain again using >>either new messages or old messages that I think are representative? >> >>I couldn't find any docs that spoke to this process directly. > Yes, you can delete your database and start over. You should probably > delete both your statistics_database and your message_info_database just > to make sure they stay in sync. You can then retrain using a small, > representative subset of messages. This is what I did and it appears to have fixed the issue. I think I realize what I did: I think I trained on my collected spam folder, which probably had about 10K spams in it at the time. I guess the answer is: don't do that. Thanks, Eliot -- W. Eliot Kimber Innodata Isogen eliot@isogen.com www.isogen.com From tameyer at ihug.co.nz Mon Dec 8 19:11:03 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 8 19:11:12 2003 Subject: [Spambayes] Are you considering to ad the option of deleting theSpam mail from the servers In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304590427@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130467772E@its-xchg4.massey.ac.nz> [Richie] > So the answer is that we don't have any immediate plans for > such a thing, but it wouldn't be very difficult to implement. > It is on my long term to-do list, because I'd like it > myself. 8-) But my to-do list is very long and my free time > is very short. I've thought about (and almost started playing with implementing) this, too (and, of course, ran across the same to-do list/free time problem). Were you thinking of having this as a standalone filter type thing (a la imapfilter), or still as a proxy? If the proxy, had you given any thought to what you'd feed back to the mail client? I was wondering about this and the best I came up with was substituting the message with a (easily recognisable and therefore filterable) message from spambayes. It seems that it would be easy to confuse the client, otherwise. I'm curious to know what your ideas are - this may move up my to-do list after the testing/etc for the '7.5' binary is done since my wife's connection is via dialup (in a rural area, as well, which doesn't help), has very homogeneous spam, and could speed things up greatly. =Tony Meyer From tameyer at ihug.co.nz Mon Dec 8 19:14:23 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 8 19:14:33 2003 Subject: [Spambayes] Can we do this? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304590320@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130467772F@its-xchg4.massey.ac.nz> [Ryan Malayter] > Set your ham and spam thresholds both to 50. Theoretically, > only a message that scored exactly 50.0000 would ever go into > the junk suspects folder. Not even then, IIRC. I believe it's spam if it's greater than *or equal* to the threshold, so if the thresholds are equal, then it is a true binary classification. (Something like 'if score <= threshold: spam; elif score <= unsure_thres: unsure', which has the two conditions identical in this case, so the second is never hit). =Tony Meyer From tameyer at ihug.co.nz Mon Dec 8 19:22:32 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 8 19:22:45 2003 Subject: [Spambayes] deleted file In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130458F6E8@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677731@its-xchg4.massey.ac.nz> [Ok, now I've seen the reply to the one I just replied to, but I can use the answer here ;)] > Somehow, the junk email file was deleted. I reinstalled the > program, and this didn't restore the file. So I manually > restored it. However, it still doesn't work, and I can't get > the program to reconfigure even tho I keep getting a message > that I need to reconfigure. What do I do? By "junk email file", I presume you mean the folder that Outlook was set to move junk email to? (If you mean the database where information about this mail is kept, then you can just retrain). You *should* be able to go to the SpamBayes dialog, then to the "Filtering" tab and reselect the folder (it'll have "" in it, most likely). I tried this here just now and it worked fine. At most, if that doesn't work for some reason (I have vague recollections of a bug about this, which may mean (since I'm running from the latest source) that I'm using a fixed version), then you can just delete your configuration file (named outlook.ini, or [profile name].ini) which is in your data directory (the "Advanced" tab of the dialog will find this for you), and this will reset your setup (but not any of your training). =Tony Meyer From tameyer at ihug.co.nz Mon Dec 8 19:19:15 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 8 19:26:11 2003 Subject: [Spambayes] Deleted Junk Mail folder In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130458F6D5@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677730@its-xchg4.massey.ac.nz> > I did a dumb thing, I though I was deleting a file but > actually deleted my "Junk Mail" folder and can't undo. I > tried re-installing the program, no change. Can I just > create another Junk Mail folder or should I uninstall and re-install? I may have missed it, but I'm not sure I've seen a reply to this yet: You do not need to uninstall and re-install. You *should* be able to go to the SpamBayes dialog, then to the "Filtering" tab and reselect the folder (it'll have "" in it, most likely). I tried this here just now and it worked fine. At most, if that doesn't work for some reason (I have vague recollections of a bug about this, which may mean (since I'm running from the latest source) that I'm using a fixed version), then you can just delete your configuration file (named outlook.ini, or [profile name].ini) which is in your data directory (the "Advanced" tab of the dialog will find this for you), and this will reset your setup (but not any of your training). =Tony Meyer From tameyer at ihug.co.nz Mon Dec 8 19:29:51 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 8 19:29:58 2003 Subject: [Spambayes] Re: Manager, Start Training Button In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130459042A@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677732@its-xchg4.massey.ac.nz> [Dennis W. Bulgrien] > Odd, "Deleted Items" doesn't show up in the Browse list. I > wanted to use it. I permanently delete spam, so deleted > items contains ham (or what used to be). That means it would be fine for *you*, but in general letting people use the Deleted Items folder is asking for trouble (and lots of error reports). Most people just delete any mail they don't want - spam (hopefully from the spam folder!) and ham, so it's unlikely to be a folder of any use to SpamBayes. I think there may be a feature request open to try and convince Mark that this should be able to be disabled by an advanced/experimental option that isn't exposed via the GUI. If not, you could always open one. =Tony Meyer From tameyer at ihug.co.nz Mon Dec 8 19:34:24 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 8 19:35:34 2003 Subject: [Spambayes] Why is this classified as Ham In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304590352@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A02@its-xchg4.massey.ac.nz> [Mark Howells] > Any idea why this is classified as Ham in the Web interface? > It's obviously Spam. > Spam probability: 0.999309965162 > *H*0.00138006962536 > *S*0.999999999949 [...] SpamBayes agrees, unless (bizarrely) your spam threshold is above 99.93. What do you mean by "classified as Ham in the Web interface"? What classification did it get when it arrived in your mail client? Do you happen to have on the options to include the evidence/score in the headers as well? If so, what was the score there? The 1.0a7 version of the web interface (IIRC) has the option to see the clues for a message, but only with the *current* database - if you've done more training since the original classification, then the score can change (although .9993 is very high). =Tony Meyer From tameyer at ihug.co.nz Mon Dec 8 19:38:46 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 8 19:38:55 2003 Subject: [Spambayes] Review of Spam And Ham Mail In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130458F6D8@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677734@its-xchg4.massey.ac.nz> > I am currently using Outlook 2002. I installed the SpamBayes > program. Outlook tells me I have mail, but when I try and > find it, it is missing. I have tried to find the file that > the SpamBayes program sends the mail, but have been > unsuccessful todate. > > Can some one let me know how to open the Spam Ham files to > review these e-mails. I think what you're asking is FAQ 3.12: =Tony Meyer From tameyer at ihug.co.nz Mon Dec 8 19:39:42 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 8 19:39:55 2003 Subject: [Spambayes] Potential Problem In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130458F724@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677735@its-xchg4.massey.ac.nz> > I have been using your program for about a week and it works > well. I am, however, continuing to get "Suspect" messages > that have a high probability of being spam, 99%+. Why aren't > these going to the junk mail folder? What is your threshold set to? Is spambayes set up to move spam and unsure messages to different folders? What does your log have in it when you receive one of these messages? =Tony Meyer From tameyer at ihug.co.nz Mon Dec 8 19:45:59 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 8 19:46:05 2003 Subject: [Spambayes] Strip Subject of Non-alpha In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130459041B@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A03@its-xchg4.massey.ac.nz> > I never got > overwhelming encouragement for my ideas about how to add > experimental extensions to the CVS repository. I'm not sure it's possible at the moment to get overwhelming encouragement for any ideas <0.5 wink>. You got two or three positive comments and no negative ones, yes (I don't really remember)? I think people have had enough chance that you can simply go ahead with it; it's not going to have any impact on most users anyway. (OTOH, it would be good if experimental extensions had a limited life (unless they work, of course), but I'm sure people would weed things out eventually, anyway). =Tony Meyer From trlee1 at cox.net Mon Dec 8 19:49:16 2003 From: trlee1 at cox.net (Rita) Date: Mon Dec 8 19:49:05 2003 Subject: [Spambayes] deleted file In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304677731@its-xchg4.massey.ac.nz> Message-ID: <000301c3bdee$468a22f0$6501a8c0@RITA> Thanks a millions! This is an absolutely wonderful program!! I had just about given up ever using email again. Anyhow - you saved me!! the only thing that worked was deleting the config file. It then allowed me to reconfigure. Thanks so much Rita -----Original Message----- From: Tony Meyer [mailto:tameyer@ihug.co.nz] Sent: Monday, December 08, 2003 4:23 PM To: 'Rita'; spambayes@python.org Subject: RE: [Spambayes] deleted file [Ok, now I've seen the reply to the one I just replied to, but I can use the answer here ;)] > Somehow, the junk email file was deleted. I reinstalled the > program, and this didn't restore the file. So I manually > restored it. However, it still doesn't work, and I can't get > the program to reconfigure even tho I keep getting a message > that I need to reconfigure. What do I do? By "junk email file", I presume you mean the folder that Outlook was set to move junk email to? (If you mean the database where information about this mail is kept, then you can just retrain). You *should* be able to go to the SpamBayes dialog, then to the "Filtering" tab and reselect the folder (it'll have "" in it, most likely). I tried this here just now and it worked fine. At most, if that doesn't work for some reason (I have vague recollections of a bug about this, which may mean (since I'm running from the latest source) that I'm using a fixed version), then you can just delete your configuration file (named outlook.ini, or [profile name].ini) which is in your data directory (the "Advanced" tab of the dialog will find this for you), and this will reset your setup (but not any of your training). =Tony Meyer From papaDoc at videotron.ca Mon Dec 8 20:10:57 2003 From: papaDoc at videotron.ca (Remi Ricard) Date: Mon Dec 8 20:07:11 2003 Subject: [Spambayes] Why is this classified as Ham Message-ID: <1070932257.6317.10.camel@porsche.hq.simlog.com> Hi, [Mark Howells] > Any idea why this is classified as Ham in the Web interface? > It's obviously Spam. > Spam probability: 0.999309965162 > *H*0.00138006962536 > *S*0.999999999949 [...] {Tony} > What do you mean by "classified as Ham in the Web interface"? What classification did it get when it arrived in your mail client? Do you happen to have on the options to include the evidence/score in the headers as well? If so, what was the score there? The 1.0a7 version of the web interface (IIRC) has the option to see the clues for a message, but only with the *current* database - if you've done more training since the original classification, then the score can change (although .9993 is very high). I think what is happening here is that: 1- In the web UI there is Ham, Spam, Unsure emails 2- You train on some of the emails and defer the remaining emails 3- You reload the UI. 4- Then the remaining emails from above stay in their current category. An email classified as ham in step 1 will stay in the ham section even if its new score is now 99.99999999999999. I don't know if this is a "feature" but that the way it works. -- Remi Ricard From nobody at spamcop.net Mon Dec 8 20:09:20 2003 From: nobody at spamcop.net (Seth Goodman) Date: Mon Dec 8 20:09:24 2003 Subject: [Spambayes] "Un-classifying" a message In-Reply-To: Message-ID: > [Seth Goodman] > > OK, but how about untraining a message that doesn't belong in either > > database but was trained on by mistake? If you do this, currently > > you are stuck. > [Tim Peters] > Speak for yourself . I keep all my training ham and spam in a > distinct .pst file dedicated to holding training data, and I retrain from > that. There's no mistake I can't recover from easily that way. I've also OK, but if I understand this right (50% odds at best), you have to manually copy each message you train on into these folders. If you are talking about the original training set, I agree with you and I do the same for exactly the reasons you gave. If you are talking about training during normal operation, that's a lot of work with a lot of room for manual errors. [Tim Peters] > disabled all the "automatically train as this-or-that when it's moved from > here-to-there" options. Too much magic makes operation > incomprehensible in > the end. So do too many UI buttons. I also agree here. I personally disable the "automatic classify upon move" as it is way too dangerous. I reluctantly agree that simplicity in the UI often is more important than functionality. It's your choice. > [Seth Goodman] > > This would suggest an "Untrain message" button. In this case (sounds > > like a lot of work, but possibly worth it), the toolbar would be context > > sensitive to the message that is selected, not the folder. This > > obviously gets tougher for multiple messages selected, but you would > > then display only buttons that are applicable to all selected messages. > > ... > [Tim Peters] > I expect this is too complex for most users to understand, and > that buttons > that stay in the same place but change meaning based on what > you've selected > would cause more training errors than they help solve. Yup, changing the functions at each button position will confuse many people. I didn't think of that. See what working in hardware for too many years does to your mind? We generally build architectures that no one but the software developers see, and they are not typical users. > [Tim Peters] > A simpler solution (one I wouldn't use, but ...) would probably > be to add a > single new checkbox option analogous to the existing "train on move" > options: untrain a message if it's moved to the Unsure folder > from a Ham or > Spam folder. Yes, this would be simplest, but suffers from the same "invisible action" on folder move that we both don't like. How about a button that says, "Recover to Unsure"? At least it's unambiguous. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From tameyer at ihug.co.nz Mon Dec 8 20:52:25 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 8 20:52:35 2003 Subject: [Spambayes] Strip Subject of Non-alpha In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13045904C8@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677738@its-xchg4.massey.ac.nz> [Skip] > I never got > overwhelming encouragement for my ideas about how to add > experimental extensions to the CVS repository. [me] > I'm not sure it's possible at the moment to get overwhelming > encouragement for any ideas <0.5 wink>. Whoops. I just realised that my own message in favour of this is still sitting in my Drafts folder awaiting completion... In general, +1 from me ;) =Tony Meyer From tim.one at comcast.net Mon Dec 8 21:56:52 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Dec 8 21:56:55 2003 Subject: [Spambayes] feature request In-Reply-To: Message-ID: [Kenny Pitt] > ... > I also would like to add a menu item to rescore the currently selected > folder. Would anyone else find this useful? +1. I do this a lot, especially on the Unsure folder. OTOH, I certainly don't want it to rescore my Inbox too (as others do seem to want) -- I've typically got more than 10,000 messages there waiting for replies, and I don't want to wait for that. From tim.one at comcast.net Mon Dec 8 22:07:53 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Dec 8 22:07:56 2003 Subject: [Spambayes] Strip Subject of Non-alpha In-Reply-To: <16340.55172.171511.255475@montanaro.dyndns.org> Message-ID: [Skip Montanaro] > ... > I suppose I could check it in, though it's not clear that for the > fairly small number of these sort of messages I receive that it makes > much difference (though perhaps my code modification has a bug > someone else could spot). I never got overwhelming encouragement for > my ideas about how to add experimental extensions to the CVS > repository. Probably because it came attached to such a weak change . Really, a few people tested it and it didn't seem to matter either way. I left it on for a month, then took it out again, and scores barely budged. No saved message changed classification as a result. Experimental extensions are fine by me, and you proposed a decent scheme for putting them in. The downside is that every piece of code complicates the whole, and I really don't know why you'd *want* to check in a gimmick that made no real difference to anyone who tried it (if I remember all the reports correctly -- maybe not). Hoping someone might find a bug in it isn't a good enough reason: I'm sure you looked (as I did) at specific before-and-after Subject lines to verify that it worked as intended. We're carrying around too much unused code already (e.g., extract_dow and generate_time_buckets generated some curious statistics at the time they were added, but, IIRC, they didn't make any bottom-line difference to anyone's testing results). From tim.one at comcast.net Mon Dec 8 22:24:47 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Dec 8 22:24:50 2003 Subject: [Spambayes] "Un-classifying" a message In-Reply-To: Message-ID: [Tim Peters] >> Speak for yourself . I keep all my training ham and spam in a >> distinct .pst file dedicated to holding training data, and I retrain >> from that. There's no mistake I can't recover from easily that way. >> I've also [Seth Goodman] > OK, but if I understand this right (50% odds at best), you have to > manually copy each message you train on into these folders. That's 50% right . The addin sends spam directly to my trained Spam folder, leaving it marked unread. At the end of the day (or every two days, ...), I *delete* the unread msgs in the Spam folder (after reviewing for mistakes). I do copy ham to the ham training folder. > If you are talking about the original training set, I agree with you > and I do the same for exactly the reasons you gave. If you are talking > about training during normal operation, that's a lot of work with a > lot of room for manual errors. I expect it's different for everyone. My training database right now contains about 800 emails total, a little more than one day's worth. I don't find much need to train, and the manual error rate in dragging a multi-selection of Unsure ham to the training folder hasn't been high enough to care about. About once a week I rescore my training folders and look at "the wrong end" of each for manual mistakes; they're rare; hasn't happened for months. Because everyone's email mix is different, and tolerance for fiddling with training varies widely too, and ours is a statistical approach, I don't expect that a one-size-fits-all strategy can exist. ... >> A simpler solution (one I wouldn't use, but ...) would probably be >> to add a single new checkbox option analogous to the existing "train >> on move" options: untrain a message if it's moved to the Unsure >> folder from a Ham or Spam folder. > Yes, this would be simplest, but suffers from the same "invisible > action" on folder move that we both don't like. How about a button > that says, "Recover to Unsure"? At least it's unambiguous. Well, the toolbar is already so wide that we get a lot of bogus bug reports about one of the (just two!) buttons on it "missing" (Outlook simply doesn't show it because there's not enough screen real estate). Needing to untrain a message is (or should be) so rare that I'd settle for an "Untrain selected" action on the dropdown SpamBayes menu instead. Buttons should really be confined to frequent actions. A really gonzo approach would follow Outlook's lead, creating an icon for each possible action and letting users populate their SpamBayes toolbar with whicever buttons they want. Without much effort along those lines, we could make the SpamBayes codebase bigger than Outlook's . From mhammond at skippinet.com.au Mon Dec 8 22:47:42 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Dec 8 22:47:54 2003 Subject: [Spambayes] Outlook: Setting background filtering as the default. In-Reply-To: Message-ID: <01e501c3be07$335b5d20$2c00a8c0@eden> > I have a question on background mode. Why is the second > timer necessary? I > use it at the default value of 1.0 sec, and what it does is > to process one > message per second after Outlook downloads a pile of > messages. I assume > that this is necessary or you wouldn't have gone to the > trouble of adding > it, but it does slow things down a lot when you download a > bunch of messages > at once. It seemed the best way to implement it (and I got the idea on another project). For my mail, having a 1 second start delay did not give enough time - mail often comes in at the rate of about 1 per second (when downloading in "background" mode), so conficts with the rules were still common. I expect that some dialup users would find 2 seconds too slow. If we then stick with a single timer value, I end up with *all* messages being processed at one per 2 seconds. When mail delivery has stopped, watching them get moved at this rate is pretty painful - hence the second timer. The intention of the second timer is "once mail delivery has stopped", but there is no way (I know) to get that event from Outlook. Mark. From mhammond at skippinet.com.au Mon Dec 8 22:52:00 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Dec 8 22:52:24 2003 Subject: [Spambayes] Manager, Start Training Button In-Reply-To: Message-ID: <01f801c3be07$cd8c0520$2c00a8c0@eden> > The Spambayes Manager, Training tab, "Start Training" button > does not have > on-line help, There is supposed to be a little "?" button at the top-right of the dialog. I'm guessing you are on Win9x? > and there's nothing in the FAQ about it. I > have collected a bunch > of ham and want to add them to the training database while > keeping the current > spam data. it that the way to do it? Pick a new folder with new ham for the "good" messages, and pick either your existing "Spam" folder, or an empty folder. Make sure "rebuild database" is *not* selected. Assuming all the Spam in your Spam folder had already been trained on (or it is empty), then no additional spams will be added. Mark. From mhammond at skippinet.com.au Mon Dec 8 22:57:46 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Dec 8 22:57:56 2003 Subject: [Spambayes] FAQ 4.5 How do I train...(Outlook plugin)? IncrementalTraining In-Reply-To: Message-ID: <01f901c3be08$9b41ea70$2c00a8c0@eden> Thanks - I've changed both of these. Mark. > -----Original Message----- > From: spambayes-bounces@python.org > [mailto:spambayes-bounces@python.org]On Behalf Of Dennis W. Bulgrien > Sent: Tuesday, 9 December 2003 1:23 AM > To: spambayes@python.org > Subject: [Spambayes] FAQ 4.5 How do I train...(Outlook plugin)? > IncrementalTraining > > > http://spambayes.sourceforge.net/faq.html > 4.5 How do I train SpamBayes (Outlook plugin)? > "...If you have set it to use incremental training then it > will also train on > messages which are moved into the spam folder and those > folders that you are > 'watching'." > > Clarification of the FAQ. I suppose it means when messages > are MANUALLY moved. > Based on other posts, the messages that are automatically > moved by Spambayes are > not trained on. > > Clarification of the Manager dialog. SpamBayes Manager, Training tab, > Incremental Training frame, two check boxes say > "Train... when it is moved... to the Inbox" and > "Train... when it is moved to the spam folder". I suppose > the first is > analogous to > "Train... when it is moved... to those folders that you are > 'watching'" > > > > > _______________________________________________ > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes > Check the FAQ before asking: http://spambayes.sf.net/faq.html From mhammond at skippinet.com.au Mon Dec 8 23:07:08 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Dec 8 23:07:19 2003 Subject: [Spambayes] Re: Manager, Start Training Button In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304677732@its-xchg4.massey.ac.nz> Message-ID: <01fa01c3be09$ea64bf00$2c00a8c0@eden> > I think there may be a feature request open to try and > convince Mark that > this should be able to be disabled by an > advanced/experimental option that > isn't exposed via the GUI. If not, you could always open one. It will also break "incremental training" (as lots of ham is moved into your 'deleted items' too, and you don't want to train them as spam), and indeed training from scratch should you lose you database (as you no longer have any spam separate from ham). Mark. From richie at entrian.com Tue Dec 9 04:37:56 2003 From: richie at entrian.com (Richie Hindle) Date: Tue Dec 9 04:38:10 2003 Subject: [Spambayes] Are you considering to ad the option of deleting theSpam mail from the servers In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130467772E@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1304590427@its-xchg4.massey.ac.nz> <1ED4ECF91CDED24C8D012BCF2B034F130467772E@its-xchg4.massey.ac.nz> Message-ID: [Tony] > I'm curious to know what your ideas are - this may move up my to-do list > after the testing/etc for the '7.5' binary is done since my wife's > connection is via dialup (in a rural area, as well, which doesn't help), has > very homogeneous spam, and could speed things up greatly. I want a webmail client with SpamBayes in it, or to put it another way, I want sb_server to be a full POP3 client. Your wife (and Guy, the OP) can run it locally on her machine and use it to clean out her POP3 account before downloading mail, and I can run it on www.entrian.com and use it to read my home email from work. It wouldn't interact with the proxy at all - you'd click a "Read email" button and you'd get three lists of messages (ham/spam/unsure). You could click on a message to read it (probably in another frame, a la Outlook). You'd get roughly the same radio buttons as the training interface, except that classifying a message would move it into the appropriate list, and Discard would delete it from the POP3 account (and hence should probably be called Delete). Any message left undiscarded would remain on the server to be picked up later. I have a simple webmail client at http://entrian.com/cgi-bin/pop3.py (but the code is horrible). It talks to the POP3 server, displays messages, and understands about multipart messages and attachments. So I want to either integrate SpamBayes into that, or (more likely) integrate a rewritten version of that into sb_server. -- Richie Hindle richie@entrian.com From richie at entrian.com Tue Dec 9 04:52:06 2003 From: richie at entrian.com (Richie Hindle) Date: Tue Dec 9 04:52:14 2003 Subject: [Spambayes] Are you considering to ad the option of deleting theSpam mail from the servers In-Reply-To: References: <1ED4ECF91CDED24C8D012BCF2B034F1304590427@its-xchg4.massey.ac.nz> <1ED4ECF91CDED24C8D012BCF2B034F130467772E@its-xchg4.massey.ac.nz> Message-ID: [Richie] > I want a webmail client with SpamBayes in it, or to put it another way, I > want sb_server to be a full POP3 client. Your wife (and Guy, the OP) can > run it locally on her machine and use it to clean out her POP3 account > before downloading mail, and I can run it on www.entrian.com and use it to > read my home email from work. Or in fact, we make it multiuser (by keeping a separate training database for each POP3 account) then your wife can use the one on www.entrian.com as well - that'll work fine over her slow modem (until I start to run out of disk space... we'd probably need to limit the training DB size.) (For people with multiple POP3 accounts, we might want to let a database be shared across POP3 accounts.) -- Richie Hindle richie@entrian.com From Mark.Howells at softoption.com Tue Dec 9 05:32:58 2003 From: Mark.Howells at softoption.com (Mark Howells) Date: Tue Dec 9 05:34:41 2003 Subject: [Spambayes] Why is this classified as Ham Message-ID: <5846CF419D2EF5439036CC3126A3A995017B7B@SOSERVER1.softoption.local> > -----Original Message----- > From: Tony Meyer [mailto:tameyer@ihug.co.nz] > > > [Mark Howells] > > Any idea why this is classified as Ham in the Web interface? > > It's obviously Spam. > > Spam probability: 0.999309965162 > > [...] > > What do you mean by "classified as Ham in the Web interface"? I mean that the message was tagged with 'unsure,' in the subject and shown on the web page as a 'Ham' message. > What > classification did it get when it arrived in your mail client? 'Unsure' > Do you happen to have on the options to include the evidence/score > in the headers as well? If so, what was the score there? The > 1.0a7 version of the web interface (IIRC) has the option to see > the clues for a message, but only > with the *current* database - if you've done more training since the > original classification, then the score can change (although > .9993 is very high). Unfortunately I have subsequently trained SB and told it to discard this message - is there any way I can get it out of the cache again now it's marked as trained? Cheers Mark -- Outgoing mail is certified Virus Free. Checked by AVG Anti-Virus (http://www.grisoft.com). Version: 7.0.206 / Virus Database: 261.4.0 - Release Date: 12/5/2003 From drbob1954 at aol.com Tue Dec 9 07:51:08 2003 From: drbob1954 at aol.com (Bob Lehman) Date: Tue Dec 9 07:51:19 2003 Subject: [Spambayes] Netscape compatable? Message-ID: <000801c3be53$1e5473c0$6701a8c0@VB.PAHR.COM> If I install Python prior to Spambayes, can I use Spambayes with Netscape 7.0? Please respond to drbob1954@aol.com Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031209/878453ae/attachment.html From talonkarrde at monmouth.com Tue Dec 9 08:08:08 2003 From: talonkarrde at monmouth.com (Robert J. Guadagno) Date: Tue Dec 9 08:09:19 2003 Subject: [Spambayes] Issues.. Message-ID: <0HPM00IHFPTKJF@mta11.srv.hcvlny.cv.net> Greetings! I have just upgraded to Outlook 2003 and I'm chronically receiving a dialog box about a "program wishing to access Outlook" and to either allow access or deny access. The program that is attempting to access Outlook in this case is SpamBayes. I've attempted to uninstall SpamBayes, but I received a message that not all of the components could be removed. What is the proper uninstall path, or, how can I prevent this dialog box from appearing (thereby keeping SpamBayes)? Thank you, Robert J. Guadagno Email: talonkarrde@monmouth.com FAX: (772) 325-3127 Website (Homepage): http://www.monmouth.com/~talonkarrde/ "Savor the fruit of life. It has a sweet taste when it is fresh from the vine. But don't live too long, because the taste turns bitter after a time." -Dahar Master: Kor -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031209/90f8afe1/attachment.html From syver at inout.no Tue Dec 9 08:29:22 2003 From: syver at inout.no (Syver Enstad) Date: Tue Dec 9 08:29:27 2003 Subject: [Spambayes] pop3 proxy issue. Message-ID: When my pop3 server is unreacheable the pop3 proxy just hangs for a long time (I haven't waited long enough to see whether it quits). When aborting from my mail client and trying again when the server is available everything works okay, but spambayes reports 2 pop sessions active when downloading mail and 1 pop session active afterwards indicating that the first failed session is still hanging. Using: win2k pro. SpamBayes POP3 Proxy Beta3, version 0.3 (September 2003), using SpamBayes POP3 Proxy Web Interface Alpha3, version 0.03 From dbulgrien at vcsd.com Tue Dec 9 08:47:44 2003 From: dbulgrien at vcsd.com (Dennis W. Bulgrien) Date: Tue Dec 9 08:47:54 2003 Subject: [Spambayes] Re: Manager, Start Training Button References: <01f801c3be07$cd8c0520$2c00a8c0@eden> Message-ID: Why no, I'm on W2K. Clicking the "?" button then "Start training" button doesn't display a tip like the other controls on the dialog. (nor does putting focus on the button and pressing F1, which might even work on 9x). "Mark Hammond" wrote... > The Spambayes Manager, Training tab, "Start Training" button > does not have on-line help, There is supposed to be a little "?" button at the top-right of the dialog. I'm guessing you are on Win9x? ... From kennypitt at hotmail.com Tue Dec 9 09:03:42 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Tue Dec 9 09:04:26 2003 Subject: [Spambayes] feature request In-Reply-To: Message-ID: Tim Peters wrote: > [Kenny Pitt] >> ... >> I also would like to add a menu item to rescore the currently >> selected folder. Would anyone else find this useful? > > +1. I do this a lot, especially on the Unsure folder. > > OTOH, I certainly don't want it to rescore my Inbox too (as others do > seem to want) -- I've typically got more than 10,000 messages there > waiting for replies, and I don't want to wait for that. Resounding agreement here. I'm envisioning a menu item for a strictly manual operation to rescore just the selected folder. Anyone who wants to use it to rescore their Inbox is welcome to do so, but it certainly won't do it automatically. I'll work on it as soon as I get this notification sound thing figured out. -- Kenny Pitt From kennypitt at hotmail.com Tue Dec 9 09:16:21 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Tue Dec 9 09:16:58 2003 Subject: [Spambayes] Outlook: Setting background filtering as the default. In-Reply-To: <01e501c3be07$335b5d20$2c00a8c0@eden> Message-ID: Mark Hammond wrote: >> I have a question on background mode. Why is the second timer >> necessary? I use it at the default value of 1.0 sec, and what it >> does is to process one message per second after Outlook downloads a >> pile of messages. I assume that this is necessary or you wouldn't >> have gone to the trouble of adding it, but it does slow things down >> a lot when you download a bunch of messages at once. > > It seemed the best way to implement it (and I got the idea on another > project). For my mail, having a 1 second start delay did not give > enough time - mail often comes in at the rate of about 1 per second > (when downloading in "background" mode), so conficts with the rules > were still common. I expect that some dialup users would find 2 > seconds too slow. > > If we then stick with a single timer value, I end up with *all* > messages being processed at one per 2 seconds. When mail delivery > has stopped, watching them get moved at this rate is pretty painful - > hence the second timer. The intention of the second timer is "once > mail delivery has stopped", but there is no way (I know) to get that > event from Outlook. Would it be problematic to wait until the entire batch is downloaded before doing any SpamBayes filtering? If not, we could change the meaning of the timers slightly. Instead of processing one message and then waiting again before processing the next message, we might could restart the second timer as each new message is received. When the timer finally expires without receiving a new message, then we start processing all of the received messages as quickly as possible. If a new message arrives while we are processing, then we stop processing after we finish filtering the current message and restart the delay timer. Hopefully this would have the same effect in terms of allowing the Outlook rules time to run, but would allow us to filter the remaining messages as quickly as possible once we are "idle" in terms of receiving new messages. -- Kenny Pitt From papaDoc at videotron.ca Tue Dec 9 10:06:51 2003 From: papaDoc at videotron.ca (papaDoc) Date: Tue Dec 9 10:06:54 2003 Subject: [Spambayes] Training by the command line on Windows Message-ID: <3FD5E50B.7010401@videotron.ca> Hi, Is there a way to train using the command line on windows ? This is what I get with sb_mboxtrain Microbe% cripts/sb_mboxtrain.py -d hammie.db -g Tata/26 < Training ham (Tata/26): Reading as Unix mbox Traceback (most recent call last): File "c:\Devtools\SPAMBA~1\SPAMBA~1\scripts\SB_MBO~1.PY", line 317, in ? main() File "c:\Devtools\SPAMBA~1\SPAMBA~1\scripts\SB_MBO~1.PY", line 302, in main train(h, g, False, force, trainnew, removetrained) File "c:\Devtools\SPAMBA~1\SPAMBA~1\scripts\SB_MBO~1.PY", line 230, in train mbox_train(h, path, is_spam, force) File "c:\Devtools\SPAMBA~1\SPAMBA~1\scripts\SB_MBO~1.PY", line 139, in mbox_tr ain import fcntl ImportError: No module named fcntl If I look in my python directory I see a file FCNTL.py saying import warnings warnings.warn("the FCNTL module is deprecated; please use fcntl", DeprecationWarning) But I can't find the fcntl.py ??? Then I said lets try to go further so I copied the file FCNTL.py to fcntl.py and then I get errors from lockf. I said I don't need lock since I know what I'm doing. Commented this out then I get error from os.ftruncate(f.fileno(), 0) so before I remove all the code from the program I say let ask what should I do !! Remi P.S. I don't want to train on my Linux box since it is my server and It is only a poor 486 66Mhz with 16Meg of ram ..... From tim at fourstonesExpressions.com Tue Dec 9 10:20:00 2003 From: tim at fourstonesExpressions.com (Tim Stone) Date: Tue Dec 9 10:20:06 2003 Subject: [Spambayes] Netscape compatable? In-Reply-To: <000801c3be53$1e5473c0$6701a8c0@VB.PAHR.COM> References: <000801c3be53$1e5473c0$6701a8c0@VB.PAHR.COM> Message-ID: You can use the pop3proxy with Netscape. See spambayes.sourceforge.net for information about the pop3proxy, as well as the readme. On Tue, 9 Dec 2003 07:51:08 -0500, Bob Lehman wrote: > If I install Python prior to Spambayes, can I use Spambayes with > Netscape 7.0? > Please respond to drbob1954@aol.com > Thanks -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun From tim.one at comcast.net Tue Dec 9 10:37:58 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 9 10:37:57 2003 Subject: [Spambayes] Training by the command line on Windows In-Reply-To: <3FD5E50B.7010401@videotron.ca> Message-ID: [papDoc] > Is there a way to train using the command line on windows ? > > This is what I get with sb_mboxtrain > > Microbe% cripts/sb_mboxtrain.py -d hammie.db -g Tata/26 < > ... > import fcntl > ImportError: No module named fcntl Windows doesn't support fcntl -- that's a Unixism. Ditto os.ftruncate(). > If I look in my python directory I see a file FCNTL.py saying > import warnings > warnings.warn("the FCNTL module is deprecated; please use fcntl", > DeprecationWarning) > > But I can't find the fcntl.py ??? On Unix boxes there's no fcntl.py either, and the import of fcntl is satisfied by C code in Python's fcntlmodule.c. That isn't compiled on Windows because the OS simply doesn't support it. > Then I said lets try to go further so I copied the file FCNTL.py to > fcntl.py and then I get errors from lockf. That's futile -- but you already learned that . > I said I don't need lock since I know what I'm doing. Now you're cooking! > Commented this out then I get error from > os.ftruncate(f.fileno(), 0) Replacing that one with f.seek(0) f.truncate() instead might work on Windows, or maybe plain f.truncate(0) would work. It depends on what (if anything) the program assumes about the position of the file pointer after the truncate. > so before I remove all the code from the program I say let ask > what should I do !! You're doing fine -- Python supports ways of writing portable code, but doesn't *force* people to write portable code, and "it's a feature" that Python exposes OS-specific facilities for those who want them. It's a fact of life, though, that Unix-heads tend to write the least portable code, because there are so many gimmicks unique to Unixish systems. On Windows, we tend to hide the Windows-specific gimmicks in modules with discouraging names instead (like winsound and _winreg). From dmuller at cyberlogic.com Tue Dec 9 10:48:54 2003 From: dmuller at cyberlogic.com (dmuller@cyberlogic.com) Date: Tue Dec 9 10:49:07 2003 Subject: [Spambayes] Can I avoid printing spam scores? Message-ID: Using Binary version 0.81 in Outlook 2002, when I print an e-mail, the spam score is included along with the From/To/Date/Subject information. Is there a way to stop this from printing? Dan Muller Director of Product Development Cyberlogic Technologies Inc. From nobody at spamcop.net Tue Dec 9 11:02:30 2003 From: nobody at spamcop.net (Seth Goodman) Date: Tue Dec 9 11:02:40 2003 Subject: [Spambayes] feature request In-Reply-To: Message-ID: > Tim Peters wrote: > > [Kenny Pitt] > >> ... > >> I also would like to add a menu item to rescore the currently > >> selected folder. Would anyone else find this useful? > > > > +1. I do this a lot, especially on the Unsure folder. > > > > OTOH, I certainly don't want it to rescore my Inbox too (as others do > > seem to want) -- I've typically got more than 10,000 messages there > > waiting for replies, and I don't want to wait for that. > > Resounding agreement here. I'm envisioning a menu item for a strictly > manual operation to rescore just the selected folder. Anyone who wants > to use it to rescore their Inbox is welcome to do so, but it certainly > won't do it automatically. > > I'll work on it as soon as I get this notification sound thing figured > out. > How do you feel about automatically rescoring the Unsures after any training event? Most people probably don't have that many Unsures stored up and it would be helpful. Again, I'm just one user and I don't know how others use the program. I understand your dilemma with the large inboxes. It's certainly your call, but I hope you recognize that many (most?) users don't have 10K messages waiting for reply. That's a burden I can hardly imagine, so I do really appreciate your developing this open source code. Personally, I don't get 10K messages that need reply in a year, but maybe I'm not typical and I don't develop software, so different world. Since hardware is expected to be bug-free in the first proto board (and yes, there is a tooth fairy), not too many people find bugs. But when one does get out of the lab, they are sometimes, uh, irritated. When this occurs, I do get a message or two that day, or perhaps an avalanche. They are remarkably similar, usually starting with the adverbial phrase, "When?", with the remainder being filler. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From skip at pobox.com Tue Dec 9 11:03:18 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Dec 9 11:03:42 2003 Subject: [Spambayes] Strip Subject of Non-alpha In-Reply-To: References: <16340.55172.171511.255475@montanaro.dyndns.org> Message-ID: <16341.62022.48555.624970@montanaro.dyndns.org> >> I never got overwhelming encouragement for my ideas about how to add >> experimental extensions to the CVS repository. Tim> Probably because it came attached to such a weak change . Okay, ignore the bit about a specific "enhancement". We all know most of them don't work anyway. Still, suppose someone comes up with an idea (we get them all the time in the spambayes mailing list): "I know, how about using the new header transmogrification feature of RFC-4822?", but doesn't have the programming cojones to implement it. Someone else comes along, realizes it wouldn't be such a big deal to implement, does so and posts, "Okay, try the version in CVS. SpamBayes now has a "Headers:X-transmogrify" option. Let us know whether it helps or not." People can then experiment with RFC-4822 transmogrification. If it proves not to be a worthy addition, the code can be ripped out. The key is tweaking the options parser to not care if there is no "Tokenizer:X-transmogrify" option (because the code was ripped out later) or to map "Tokenizer:X-transmogrify" to "Tokenizer:transmogrify" if it gains acceptance and moves out of the trial stage. (In fact, perhaps it should work the other was as well, so we can rip stuff out that's not useful without breaking peoples' options files. See below.) I just checked in a change to spambayes/OptionsClass.py which implements an experimental/deprecated option feature. It works like this: * Option is "foo", user sets "foo". status quo. * Option is "X-foo", user sets "X-foo". status quo. * Option is "foo", user sets "X-foo". "foo" is set silently. * Option is "X-foo", user sets "foo". "X-foo" is set and a warning emitted. The third case covers experimental options. The fourth case covers deprecated options. (The description for deprecated options in Options.py should start with "(DEPRECATED) ".) Tim> Really, a few people tested it and it didn't seem to matter either Tim> way. Granted. One thing I wonder about is how "current" peoples' training databases are. New techniques like c?mm?nt ?cc?nt??t??n or em.bed-ed punc#tua_tion aren't likely to turn up much in older training databases. I canned my old training database recently and have been working on rebuilding it from scratch. I think it's important that our training databases evolve as spam does. Another change I have locally is the remove_punctuation tokenizer gimmick I alluded to above. It also doesn't seem to change fp/fn results at the level of pushing messages clearly out of one category into another, however it seems to pretty consistently spread the ham/spam means apart a bit and reduce their standard deviations. I'm more interested in a framework for making such experimental changes easier for non-programmers to try out. Tim> Experimental extensions are fine by me, and you proposed a decent Tim> scheme for putting them in. The downside is that every piece of Tim> code complicates the whole, and I really don't know why you'd Tim> *want* to check in a gimmick that made no real difference to anyone Tim> who tried it (if I remember all the reports correctly -- maybe Tim> not). The point isn't sticking code in, it's being able to easily yank it back out. (I think my checking should make that easier.) You mentioned generate_time_buckets and extract_dow. I'll turn the screws in a moment to deprecate them. If this idea doesn't fly with people, or these options are deemed crucial for enough people we can just un-deprecate them. (BTW, has anyone on a Unix-ish system tried out testtools/Makefile when running timcv? If so, does it help or am I the only person who finds it useful?) Skip From michael.nitabach at yale.edu Tue Dec 9 12:54:08 2003 From: michael.nitabach at yale.edu (Michael N. Nitabach) Date: Tue Dec 9 12:55:23 2003 Subject: [Spambayes] Looking for Turn-Key Solution Message-ID: I would like to implement the following system, and am hoping that there might be some existing turn-key solution available. If not, I would potentially be willing to try to learn some Python and implement something myself. Here is the automated server system I want to implement: (1) My server receives e-mail from my ISP by having my ISP redirect all incoming mail to my server. (2) My server filters all incoming e-mail using Spambayes. (3) Only the e-mail that is below a chosen spam probability gets redirected to another e-mail address. As far as I can tell, it seems like what I am looking for is an SMTP server that can be configured to redirect incoming mail, but only after filtering by Spambayes has occurred, and only conditionally based upon the spam probability. Incidentally, the purpose of this would be to allow me to filter incoming e-mails using Spambayes before pushing them to my Blackberry wireless e-mail device. Any thoughts and/or suggestions would be appreciated. Michael N. Nitabach, Ph.D., J.D. Assistant Professor Department of Cellular and Molecular Physiology Yale University School of Medicine (203) 737-2939 mnitabach@acedsl.com From mnitabach at acedsl.com Tue Dec 9 12:56:31 2003 From: mnitabach at acedsl.com (Michael N. Nitabach) Date: Tue Dec 9 12:56:38 2003 Subject: [Spambayes] Seeking Server Solution Message-ID: I would like to implement the following system, and am hoping that there might be some existing turn-key solution available. If not, I would potentially be willing to try to learn some Python and implement something myself. Here is the automated server system I want to implement: (1) My server receives e-mail from my ISP by having my ISP redirect all incoming mail to my server. (2) My server filters all incoming e-mail using Spambayes. (3) Only the e-mail that is below a chosen spam probability gets redirected to another e-mail address. As far as I can tell, it seems like what I am looking for is an SMTP server that can be configured to redirect incoming mail, but only after filtering by Spambayes has occurred, and only conditionally based upon the spam probability. Incidentally, the purpose of this would be to allow me to filter incoming e-mails using Spambayes before pushing them to my Blackberry wireless e-mail device. Any thoughts and/or suggestions would be appreciated. Michael N. Nitabach, Ph.D., J.D. Assistant Professor Department of Cellular and Molecular Physiology Yale University School of Medicine (203) 737-2939 mnitabach@acedsl.com From rmalayter at bai.org Tue Dec 9 13:13:15 2003 From: rmalayter at bai.org (Ryan Malayter) Date: Tue Dec 9 13:13:19 2003 Subject: [Spambayes] Seeking Server Solution Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01A7500D@cliff.bai.org> What you want is an "SMTP proxy" that does probabilistic (in some cases erroneously called "Bayesian") spam detection. See http://assp.sourceforge.net There is also an SMTP proxy module you can add to PopFile, which like SPamBayes, is normally a POP3 proxy: http://popfile.sourceforge.net > -----Original Message----- > From: Michael N. Nitabach > > (1) My server receives e-mail from my ISP by having my ISP > redirect all incoming mail to my server. > > (2) My server filters all incoming e-mail using Spambayes. > > (3) Only the e-mail that is below a chosen spam probability > gets redirected to another e-mail address. > > Any thoughts and/or suggestions would be appreciated. From kjanuski at phillynews.com Tue Dec 9 12:48:07 2003 From: kjanuski at phillynews.com (Januski, Ken) Date: Tue Dec 9 13:30:10 2003 Subject: [Spambayes] Can I move default database? Message-ID: Hi, I signed up for users mailing list hoping to ask this there but still haven't gotten a message from list so I'm trying here instead. First let me repeat what others have said: I'm very happy with SpamBayes and the Outlook plugin, which we need to use where I work. I'm slowly convincing users to use it. Maybe eventually, if we have any sense, we'll make it part of all our pcs. But I've run into a small problem that I'm wondering about. We have set a limit on the size of user profiles that are saved back to the W2K server when a user logs out of network. The profile includes the fairly sizeable SpamBayes database. If we could change the location of it then we could get around this problem. Can anyone tell me if it's possible? Is it configurable? Is it configurable in source code? Thanks for any info, and thanks for a great piece of software. Ken Januski -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031209/186d1bae/attachment.html From kennypitt at hotmail.com Tue Dec 9 14:17:39 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Tue Dec 9 14:18:16 2003 Subject: [Spambayes] Can I move default database? In-Reply-To: Message-ID: Yes, it's configurable. I believe this is the correct procedure. Maybe someone else will correct me if I'm wrong. Create the new data directory and move the existing *.ini and *.db from the user's "Application Data\SpamBayes" directory to the new directory. Then create a new file "default_configuration.ini" in the "Application Data\SpamBayes" directory with the following contents: [General] data_directory: C:\NewDataDirectory You are probably already aware of this, but there are some issues to consider with roaming profiles and SpamBayes if you make this change. First, you'll need to make sure that SpamBayes is installed on every computer that a user might roam to. Second, you'll need to make sure that the C:\NewDataDirectory you specify is valid on every computer (SpamBayes will create the directory, but the path has to be valid. For example, don't use D:\Directory if some of the computers don't have a D: drive). Third, it would be best to have a separate C:\NewDataDirectory for each user so that users don't end up sharing training data. And fourth, be aware that users will have to re-configure and re-train SpamBayes on each computer that they use, and that filtering accuracy will vary depending on which computer they are using. You can alleviate some of these issues by pointing the data_directory to a location on a network drive, but I have no idea what the performance would be like in that case. -- Kenny Pitt _____ From: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org] On Behalf Of Januski, Ken Sent: Tuesday, December 09, 2003 12:48 PM To: spambayes@python.org Subject: [Spambayes] Can I move default database? Hi, I signed up for users mailing list hoping to ask this there but still haven't gotten a message from list so I'm trying here instead. First let me repeat what others have said: I'm very happy with SpamBayes and the Outlook plugin, which we need to use where I work. I'm slowly convincing users to use it. Maybe eventually, if we have any sense, we'll make it part of all our pcs. But I've run into a small problem that I'm wondering about. We have set a limit on the size of user profiles that are saved back to the W2K server when a user logs out of network. The profile includes the fairly sizeable SpamBayes database. If we could change the location of it then we could get around this problem. Can anyone tell me if it's possible? Is it configurable? Is it configurable in source code? Thanks for any info, and thanks for a great piece of software. Ken Januski -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031209/bec1a374/attachment.html From TiagoTiago at Globo.com Tue Dec 9 14:45:43 2003 From: TiagoTiago at Globo.com (Tiago Estill de Noronha) Date: Tue Dec 9 14:43:43 2003 Subject: [Spambayes] can some1 fix the list reply adress? Message-ID: <000601c3be8d$08da5b60$2960b7c8@virtua.com.br> cause everytime I try to reply a message to the list it only goes to the guys who sent it, and not the list ********************* Tiago Estill de Noronha TiagoTiago@Globo.com --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031209/c33213e5/attachment.html From kennypitt at hotmail.com Tue Dec 9 14:48:56 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Tue Dec 9 14:49:34 2003 Subject: [Spambayes] can some1 fix the list reply adress? In-Reply-To: <000601c3be8d$08da5b60$2960b7c8@virtua.com.br> Message-ID: Are you using "Reply to All" instead of just "Reply"? -- Kenny Pitt _____ From: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org] On Behalf Of Tiago Estill de Noronha Sent: Tuesday, December 09, 2003 2:46 PM To: spambayes@python.org Subject: [Spambayes] can some1 fix the list reply adress? cause everytime I try to reply a message to the list it only goes to the guys who sent it, and not the list ********************* Tiago Estill de Noronha TiagoTiago@Globo.com --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031209/64d007b5/attachment-0001.html From TiagoTiago at Globo.com Tue Dec 9 14:58:26 2003 From: TiagoTiago at Globo.com (Tiago Estill de Noronha) Date: Tue Dec 9 14:56:24 2003 Subject: RES: [Spambayes] can some1 fix the list reply adress? In-Reply-To: Message-ID: <001501c3be8e$cf645b40$2960b7c8@virtua.com.br> nice!, now the problem is another1, I have to delete the sender form the to box or else he/she will recive from me and from the list ********************* Tiago Estill de Noronha TiagoTiago@Globo.com -----Mensagem original----- De: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org] Em nome de Kenny Pitt Enviada em: ter?a-feira, 9 de dezembro de 2003 16:49 Para: 'Tiago Estill de Noronha'; spambayes@python.org Assunto: RE: [Spambayes] can some1 fix the list reply adress? Are you using "Reply to All" instead of just "Reply"? -- Kenny Pitt _____ From: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org] On Behalf Of Tiago Estill de Noronha Sent: Tuesday, December 09, 2003 2:46 PM To: spambayes@python.org Subject: [Spambayes] can some1 fix the list reply adress? cause everytime I try to reply a message to the list it only goes to the guys who sent it, and not the list ********************* Tiago Estill de Noronha TiagoTiago@Globo.com --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 --- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031209/b127db4f/attachment.html From TiagoTiago at Globo.com Tue Dec 9 15:00:15 2003 From: TiagoTiago at Globo.com (Tiago Estill de Noronha) Date: Tue Dec 9 14:58:12 2003 Subject: RES: [Spambayes] can some1 fix the list reply adress? In-Reply-To: Message-ID: <001a01c3be8f$10abc840$2960b7c8@virtua.com.br> and shouldn't I be receiving the msgs I send? I though I set that in the subscription... perhaps I am mistaken... I am not know by my good memory 8o) ********************* Tiago Estill de Noronha TiagoTiago@Globo.com -----Mensagem original----- De: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org] Em nome de Kenny Pitt Enviada em: ter?a-feira, 9 de dezembro de 2003 16:49 Para: 'Tiago Estill de Noronha'; spambayes@python.org Assunto: RE: [Spambayes] can some1 fix the list reply adress? Are you using "Reply to All" instead of just "Reply"? -- Kenny Pitt _____ From: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org] On Behalf Of Tiago Estill de Noronha Sent: Tuesday, December 09, 2003 2:46 PM To: spambayes@python.org Subject: [Spambayes] can some1 fix the list reply adress? cause everytime I try to reply a message to the list it only goes to the guys who sent it, and not the list ********************* Tiago Estill de Noronha TiagoTiago@Globo.com --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 --- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031209/63bf2f6c/attachment.html From TiagoTiago at Globo.com Tue Dec 9 15:02:04 2003 From: TiagoTiago at Globo.com (Tiago Estill de Noronha) Date: Tue Dec 9 15:00:03 2003 Subject: RES: [Spambayes] can some1 fix the list reply adress? Message-ID: <002901c3be8f$513a93a0$2960b7c8@virtua.com.br> hey some of my msgs are getting here, and some don't... I dunno whats hapenning ********************* Tiago Estill de Noronha TiagoTiago@Globo.com -----Mensagem original----- De: Tiago Estill de Noronha [mailto:TiagoTiago@Globo.com] Enviada em: ter?a-feira, 9 de dezembro de 2003 17:00 Para: 'spambayes@python.org' Assunto: RES: [Spambayes] can some1 fix the list reply adress? and shouldn't I be receiving the msgs I send? I though I set that in the subscription... perhaps I am mistaken... I am not know by my good memory 8o) ********************* Tiago Estill de Noronha TiagoTiago@Globo.com -----Mensagem original----- De: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org] Em nome de Kenny Pitt Enviada em: ter?a-feira, 9 de dezembro de 2003 16:49 Para: 'Tiago Estill de Noronha'; spambayes@python.org Assunto: RE: [Spambayes] can some1 fix the list reply adress? Are you using "Reply to All" instead of just "Reply"? -- Kenny Pitt _____ From: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org] On Behalf Of Tiago Estill de Noronha Sent: Tuesday, December 09, 2003 2:46 PM To: spambayes@python.org Subject: [Spambayes] can some1 fix the list reply adress? cause everytime I try to reply a message to the list it only goes to the guys who sent it, and not the list ********************* Tiago Estill de Noronha TiagoTiago@Globo.com --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 --- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031209/2360f87b/attachment-0001.html From kennypitt at hotmail.com Tue Dec 9 15:02:57 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Tue Dec 9 15:03:35 2003 Subject: [Spambayes] can some1 fix the list reply adress? In-Reply-To: <001501c3be8e$cf645b40$2960b7c8@virtua.com.br> Message-ID: Actually, the list does a good job of filtering out duplicate replies. It's another subscription option (see the last option on the subscription page, "Avoid duplicate copies of messages?"). _____ From: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org] On Behalf Of Tiago Estill de Noronha Sent: Tuesday, December 09, 2003 2:58 PM To: spambayes@python.org Subject: RES: [Spambayes] can some1 fix the list reply adress? nice!, now the problem is another1, I have to delete the sender form the to box or else he/she will recive from me and from the list -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031209/a9481b63/attachment.html From tim.one at comcast.net Tue Dec 9 15:03:37 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 9 15:03:40 2003 Subject: [Spambayes] can some1 fix the list reply adress? In-Reply-To: <000601c3be8d$08da5b60$2960b7c8@virtua.com.br> Message-ID: [Tiago Estill de Noronha] > cause everytime I try to reply a message to the list it only goes to > the guys who sent it, and not the list Sorry, that's intentional. I don't know which email client you use, but it probably has a "reply all" button, which will send your response to the original poster *and* to the list. Many people who post here who aren't members of the list, so if you just replied to the list they'd never see your response. From tim at fourstonesExpressions.com Tue Dec 9 15:04:32 2003 From: tim at fourstonesExpressions.com (Tim Stone) Date: Tue Dec 9 15:04:38 2003 Subject: [Spambayes] can some1 fix the list reply adress? In-Reply-To: <000601c3be8d$08da5b60$2960b7c8@virtua.com.br> References: <000601c3be8d$08da5b60$2960b7c8@virtua.com.br> Message-ID: You have to do a "reply all" instead of just a reply. On Tue, 9 Dec 2003 17:45:43 -0200, Tiago Estill de Noronha wrote: > cause everytime I try to reply a message to the list it only goes to the > guys who sent it, and not the list > > > > ********************* > Tiago Estill de Noronha > TiagoTiago@Globo.com > > > --- > Outgoing mail is certified Virus Free. > Checked by AVG anti-virus system (http://www.grisoft.com). > Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 > -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun From TiagoTiago at Globo.com Tue Dec 9 15:10:19 2003 From: TiagoTiago at Globo.com (Tiago Estill de Noronha) Date: Tue Dec 9 15:08:20 2003 Subject: RES: [Spambayes] can some1 fix the list reply adress? In-Reply-To: Message-ID: <003301c3be90$7b2e3620$2960b7c8@virtua.com.br> I got the reply all from the first message, thanx all of you for aswering so quickly ********************* Tiago Estill de Noronha TiagoTiago@Globo.com -=> -----Mensagem original----- -=> De: spambayes-bounces@python.org -=> [mailto:spambayes-bounces@python.org] Em nome de Tim Stone -=> Enviada em: ter?a-feira, 9 de dezembro de 2003 17:05 -=> Para: Tiago Estill de Noronha; spambayes@python.org -=> Assunto: Re: [Spambayes] can some1 fix the list reply adress? -=> -=> -=> You have to do a "reply all" instead of just a reply. -=> -=> On Tue, 9 Dec 2003 17:45:43 -0200, Tiago Estill de Noronha -=> wrote: -=> -=> > cause everytime I try to reply a message to the list it -=> only goes to -=> > the guys who sent it, and not the list -=> > -=> > -=> > -=> > ********************* -=> > Tiago Estill de Noronha -=> > TiagoTiago@Globo.com -=> > -=> > -=> > --- -=> > Outgoing mail is certified Virus Free. -=> > Checked by AVG anti-virus system (http://www.grisoft.com). -=> > Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 -=> > -=> -=> -=> -=> -- -=> -=> Vous exprimer; Expr?sese; Te stesso esprimere; Express -=> yourself! Tim Stone See my photography at -=> www.fourstonesExpressions.com See my writing -=> at -=> www.xanga.com/obj3kshun -=> -=> _______________________________________________ -=> Spambayes@python.org -=> -=> http://mail.python.org/mailman/listinfo/spambaye-=> s -=> Check the -=> -=> FAQ before asking: -=> http://spambayes.sf.net/faq.html -=> -=> --- -=> Incoming mail is certified Virus Free. -=> Checked by AVG anti-virus system (http://www.grisoft.com). -=> Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 -=> -=> --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 From TiagoTiago at Globo.com Tue Dec 9 15:19:34 2003 From: TiagoTiago at Globo.com (Tiago Estill de Noronha) Date: Tue Dec 9 15:17:30 2003 Subject: RES: [Spambayes] can some1 fix the list reply adress? In-Reply-To: <000601c3be8d$08da5b60$2960b7c8@virtua.com.br> Message-ID: <003901c3be91$c2ca9ea0$2960b7c8@virtua.com.br> now I got my first msg... the first one... the one that made me say they were not coming, the msgs following that one came to me b4 this LOL well, better i stop taking(intentional) and flooding(non intentional the list... sorry y'all 8o) ********************* Tiago Estill de Noronha TiagoTiago@Globo.com -----Mensagem original----- De: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org] Em nome de Tiago Estill de Noronha Enviada em: ter?a-feira, 9 de dezembro de 2003 16:46 Para: spambayes@python.org Assunto: [Spambayes] can some1 fix the list reply adress? cause everytime I try to reply a message to the list it only goes to the guys who sent it, and not the list ********************* Tiago Estill de Noronha TiagoTiago@Globo.com --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 --- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031209/fbd40c10/attachment.html From TiagoTiago at Globo.com Tue Dec 9 15:46:33 2003 From: TiagoTiago at Globo.com (Tiago Estill de Noronha) Date: Tue Dec 9 15:44:35 2003 Subject: RES: [Spambayes] can some1 fix the list reply adress? In-Reply-To: Message-ID: <004801c3be95$884da3e0$2960b7c8@virtua.com.br> so if I just send the reply all you wont get this twice? COOL! SPAMBAYES RULES! ********************* Tiago Estill de Noronha TiagoTiago@Globo.com -----Mensagem original----- De: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org] Em nome de Kenny Pitt Enviada em: ter?a-feira, 9 de dezembro de 2003 17:03 Para: 'Tiago Estill de Noronha'; spambayes@python.org Assunto: RE: [Spambayes] can some1 fix the list reply adress? Actually, the list does a good job of filtering out duplicate replies. It's another subscription option (see the last option on the subscription page, "Avoid duplicate copies of messages?"). _____ From: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org] On Behalf Of Tiago Estill de Noronha Sent: Tuesday, December 09, 2003 2:58 PM To: spambayes@python.org Subject: RES: [Spambayes] can some1 fix the list reply adress? nice!, now the problem is another1, I have to delete the sender form the to box or else he/she will recive from me and from the list --- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031209/d9ee4404/attachment.html From skip at pobox.com Tue Dec 9 17:42:35 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Dec 9 17:42:39 2003 Subject: [Spambayes] How low can you go? Message-ID: <16342.20443.331861.376383@montanaro.dyndns.org> Okay, time for a little contest. We've recently seen several users tout the size of their training database. I used to be one of those "enlarged database" types, but no more. Not long ago I dumped it all in favor of a more minimalist approach. In the past few days I noticed SB seemed to be leaving a large number of spam in my unsure box, so today I deleted that (~450 spams and 250 hams) and started from scratch again. I figure I either had introduced some outright mistakes into the database or had trained on some messages which are sort of legitimately both ham and spam. At any rate, it seemed easier to just start from scratch than really figure out what was wrong. At the moment I have trained on 14 spams and 20 hams and am quite pleased with how its performing so far. I've received mail for a half dozen or so different mailing lists, and it's catching spams left and right. I anticipate a slew of unsures overnight as I get new kinds of email (both ham and spam), but I will be damned selective about what I add to my database. So, how small is yours? Skip From mhammond at skippinet.com.au Tue Dec 9 18:00:38 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Dec 9 18:00:54 2003 Subject: [Spambayes] feature request In-Reply-To: Message-ID: <032b01c3bea8$43ec2b00$2c00a8c0@eden> > How do you feel about automatically rescoring the Unsures > after any training > event? Most people probably don't have that many Unsures > stored up and it > would be helpful. That sounds like a good idea, and may well help alot when you receive multiple duplicate spam, all ending up in unsure. Mark. From rmalayter at bai.org Tue Dec 9 18:01:49 2003 From: rmalayter at bai.org (Ryan Malayter) Date: Tue Dec 9 18:01:51 2003 Subject: [Spambayes] Can I move default database? Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01A7501E@cliff.bai.org> > From: Kenny Pitt > Yes, it's configurable... > [General] > data_directory: C:\NewDataDirectory Technically, the correct place for per-user, per-machine settings like SpamBayes databases is (in Windows 2000 and newer): c:\documents and settings\username\local settings\application data\SpamBayes This keeps the databases as part of the individual user's profile, but it stays on the local machine and doesn't roam with the rest of the profile to the network. > First, you'll need to make sure that SpamBayes is installed > on every computer that a user might roam to. This isn't completely necessary. If SpamBayes isn't installed on a machine, any files in these directories will simple be ignored - no error messages or anything. Of course, spam will not be filtered, but nothing really bad will happen. > Second, you'll need to make sure that the C:\NewDataDirectory > you specify is valid on every computer... Yes. The directory: c:\documents and settings\username\local settings\application data\ Will already be in every profile, but you'll have to create a SpamBayes subdirectory. This can be easily automated by adding this command to your login scripts (all on one line): IF NOT EXIST mkdir "%USERPROFILE%\Local Settings\Application Data\ SpamBayes" mkdir "%USERPROFILE%\Local Settings\Application Data\SpamBayes" > have a separate C:\NewDataDirectory for each user so that > users don't end up sharing training data. And fourth, be > aware that users will have to re-configure and re-train > SpamBayes on each computer that they use, You could copy small "default" databases to their profiles using a similar commands to those I'm showing above. > ...filtering accuracy will vary depending on which computer they are > using. This is why using a default database might be a bad idea. > You can alleviate some of these issues by pointing the > data_directory to a location on a network drive, but I have > no idea what the performance would be like in that case. I've tried this, and performance is not bad on my network. I've used the user's home directory, defined by the %HOMESHARE%\%HOMEPATH% environment variable. You can automate the INI file-path settings with login scripts, or with Windows Installer packaging. Regards, Ryan From cbateman at lmltechnologies.com Tue Dec 9 18:26:06 2003 From: cbateman at lmltechnologies.com (Craig Bateman) Date: Tue Dec 9 18:30:17 2003 Subject: [Spambayes] How low can you go? In-Reply-To: <618E6A93087FD011A32F00A024D618328C8D77@lmrvfs01.lmltechnologies.com> Message-ID: <618E6A93087FD011A32F00A024D618326855CA@lmrvfs01.lmltechnologies.com> I thought to do this a while back too... Since then I've registered 58 good and 1741 spam. I get about 50 suspects a day, with 2 or 3 spams slipping into my inbox a day as well. Still, it correctly files away about 150-250 spams per day, so I'm happier than I was a few months ago... It's unfortunate that we can't legally disrupt them at the source... Maybe international spam legislation will be the catalist that ushers in some "new world order" so that armaghedon can begin ;) -----Original Message----- From: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org] On Behalf Of Skip Montanaro Sent: Tuesday, December 09, 2003 2:43 PM To: spambayes@python.org Subject: [Spambayes] How low can you go? Okay, time for a little contest. We've recently seen several users tout the size of their training database. I used to be one of those "enlarged database" types, but no more. Not long ago I dumped it all in favor of a more minimalist approach. In the past few days I noticed SB seemed to be leaving a large number of spam in my unsure box, so today I deleted that (~450 spams and 250 hams) and started from scratch again. I figure I either had introduced some outright mistakes into the database or had trained on some messages which are sort of legitimately both ham and spam. At any rate, it seemed easier to just start from scratch than really figure out what was wrong. At the moment I have trained on 14 spams and 20 hams and am quite pleased with how its performing so far. I've received mail for a half dozen or so different mailing lists, and it's catching spams left and right. I anticipate a slew of unsures overnight as I get new kinds of email (both ham and spam), but I will be damned selective about what I add to my database. So, how small is yours? Skip _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html From degroot at ieee.org Tue Dec 9 18:50:01 2003 From: degroot at ieee.org (Joe DeGroot) Date: Tue Dec 9 18:50:12 2003 Subject: [Spambayes] Problem: (Outlook Plug-in) Messages scored but not moved to spam folder Message-ID: <000601c3beaf$29d190a0$6501a8c0@laptop> Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: spambayes1 (1).log Type: application/octet-stream Size: 1489 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031209/6ae27a64/spambayes11.obj -------------- next part -------------- Loaded bayes database from 'C:\Documents and Settings\J\Application Data\SpamBayes\default_bayes_database.db' Loaded message database from 'C:\Documents and Settings\J\Application Data\SpamBayes\default_message_database.db' Bayes database initialized with 704 spam and 142 good messages SpamBayes Outlook Addin, Binary version 0.81 (September 9, 2003) starting (with engine SpamBayes Beta2, version 0.2 (July 2003)) on Windows 5.1.2600 (Service Pack 1) using Python 2.3+ (#46, Aug 6 2003, 16:39:24) [MSC v.1200 32 bit (Intel)] SpamBayes: Watching for new messages in folder Inbox SpamBayes: Watching for new messages in folder MSU Mail SpamBayes: Watching for new messages in folder Junk E-Mail Processing 0 missed spam in folder 'Inbox' took 43.4966ms Processing 0 missed spam in folder 'MSU Mail' took 0.571581ms Message 'RE: Pictures' had a Spam classification of 'No' Message '[All Students] Kwanzaa Celebration 2003' had a Spam classification of 'No' Message 'DMB 2003 Holiday Extravaganza' had a Spam classification of 'Yes' Message '[GROWBONSAI] Digest Number 968' had a Spam classification of 'No' Training on message 'DMB 2003 Holiday Extravaganza' - trained as good Message '$20 Wal-Mart Gift Card for [Degroot, Joseph]' had a Spam classification of 'Yes' Training on message '$20 Wal-Mart Gift Card for [Degroot, Joseph]' - trained as good Saving configuration -> C:\Documents and Settings\J\Application Data\SpamBayes\Outlook.ini Current version is 0.81, latest is 0.81. Saving configuration -> C:\Documents and Settings\J\Application Data\SpamBayes\Outlook.ini Saving configuration -> C:\Documents and Settings\J\Application Data\SpamBayes\Outlook.ini Training on message '$20 Wal-Mart Gift Card for [Degroot, Joseph]' - trained as spam Saving configuration -> C:\Documents and Settings\J\Application Data\SpamBayes\Outlook.ini Recovering to folder 'MSU Mail' and ham training message 'Order Confirmed at Half.com (243658883268)' - Training on message 'Order Confirmed at Half.com (243658883268)' - trained as good Recovering to folder 'MSU Mail' and ham training message 'ORDER RECEIVED at Half.com' - Training on message 'ORDER RECEIVED at Half.com' - trained as good Recovering to folder 'MSU Mail' and ham training message 'Your Discover(R) Card Statement Is Available Online' - Training on message 'Your Discover(R) Card Statement Is Available Online' - trained as good Recovering to folder 'MSU Mail' and ham training message 'You can now check our website to find out the status of the jobs you've applied for' - Training on message 'You can now check our website to find out the status of the jobs you've applied for' - trained as good Recovering to folder 'MSU Mail' and ham training message 'Engineering, MEMS Newsletter - 10/30/2003' - Training on message 'Engineering, MEMS Newsletter - 10/30/2003' - trained as good Recovering to folder 'MSU Mail' and ham training message 'Your Discover(R) Card Statement Is Available Online' - Training on message 'Your Discover(R) Card Statement Is Available Online' - trained as good Recovering to folder 'MSU Mail' and ham training message 'Glocap has created a new jobs newsletter that you might want to sign-up for - Software/Technical Sales' - Training on message 'Glocap has created a new jobs newsletter that you might want to sign-up for - Software/Technical Sales' - trained as good Recovering to folder 'MSU Mail' and ham training message 'Receipt for your Payment to charharding@aroundthehounds.com ' - Training on message 'Receipt for your Payment to charharding@aroundthehounds.com ' - trained as good Recovering to folder 'MSU Mail' and ham training message 'You have added a new address!' - Training on message 'You have added a new address!' - trained as good Error filtering message '>>PUT ON BACK BURNER)' id=('0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000433A5C446F63756D656E747320616E642053657474696E67735C4A5C4C6F63616C2053657474696E67735C4170706C69636174696F6E20446174615C4D6963726F736F66745C4F75746C6F6F6B5C4F75746C6F6F6B2E70737400', '0000000074BFCD41C0542C46A94B986D86296194443F2000')>' Traceback (most recent call last): File "out1.pyz/filter", line 104, in filter_folder File "out1.pyz/filter", line 14, in filter_message File "out1.pyz/manager", line 778, in score File "out1.pyz/msgstore", line 894, in GetEmailPackageObject File "out1.pyz/msgstore", line 736, in _GetMessageText File "out1.pyz/msgstore", line 758, in _GetMessageTextParts File "out1.pyz/msgstore", line 439, in GetHTMLFromRTFProperty com_error: (-2147221221, 'OLE error 0x8004011b', None, None) Saving configuration -> C:\Documents and Settings\J\Application Data\SpamBayes\Outlook.ini From degroot at ieee.org Tue Dec 9 19:02:11 2003 From: degroot at ieee.org (Joe DeGroot) Date: Tue Dec 9 19:02:21 2003 Subject: [Spambayes] Problem: (Outlook Plug-in) Messages scored but not moved to spam folder Message-ID: <000c01c3beb0$dca24750$6501a8c0@laptop> Hello, I am having a problem with spambayes, and have read the troubleshooting FAQ, but none of the categories seem to describe what I am seeing. As messages come in, they are scored and are given the spam field with this score. However, the messages are not moved to the spam folder, even though some are given a 100% spam score. However, if I then go and manually perform the filter through the spambayes menu, the messages are moved to the appropriate folder. I am running Windows XP SP1 with spambayes version 0.81. I have attached two log files, one just after the messages came in, and the other after I performed the manual filter, and trained a few messages from the suspected spam back into non-spam. Any Ideas? Thanks in advance for you help, Regards, Joe Degroot -------------- next part -------------- A non-text attachment was scrubbed... Name: spambayes1 (1).log Type: application/octet-stream Size: 1489 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031209/87415820/spambayes11-0001.obj -------------- next part -------------- Loaded bayes database from 'C:\Documents and Settings\J\Application Data\SpamBayes\default_bayes_database.db' Loaded message database from 'C:\Documents and Settings\J\Application Data\SpamBayes\default_message_database.db' Bayes database initialized with 704 spam and 142 good messages SpamBayes Outlook Addin, Binary version 0.81 (September 9, 2003) starting (with engine SpamBayes Beta2, version 0.2 (July 2003)) on Windows 5.1.2600 (Service Pack 1) using Python 2.3+ (#46, Aug 6 2003, 16:39:24) [MSC v.1200 32 bit (Intel)] SpamBayes: Watching for new messages in folder Inbox SpamBayes: Watching for new messages in folder MSU Mail SpamBayes: Watching for new messages in folder Junk E-Mail Processing 0 missed spam in folder 'Inbox' took 43.4966ms Processing 0 missed spam in folder 'MSU Mail' took 0.571581ms Message 'RE: Pictures' had a Spam classification of 'No' Message '[All Students] Kwanzaa Celebration 2003' had a Spam classification of 'No' Message 'DMB 2003 Holiday Extravaganza' had a Spam classification of 'Yes' Message '[GROWBONSAI] Digest Number 968' had a Spam classification of 'No' Training on message 'DMB 2003 Holiday Extravaganza' - trained as good Message '$20 Wal-Mart Gift Card for [Degroot, Joseph]' had a Spam classification of 'Yes' Training on message '$20 Wal-Mart Gift Card for [Degroot, Joseph]' - trained as good Saving configuration -> C:\Documents and Settings\J\Application Data\SpamBayes\Outlook.ini Current version is 0.81, latest is 0.81. Saving configuration -> C:\Documents and Settings\J\Application Data\SpamBayes\Outlook.ini Saving configuration -> C:\Documents and Settings\J\Application Data\SpamBayes\Outlook.ini Training on message '$20 Wal-Mart Gift Card for [Degroot, Joseph]' - trained as spam Saving configuration -> C:\Documents and Settings\J\Application Data\SpamBayes\Outlook.ini Recovering to folder 'MSU Mail' and ham training message 'Order Confirmed at Half.com (243658883268)' - Training on message 'Order Confirmed at Half.com (243658883268)' - trained as good Recovering to folder 'MSU Mail' and ham training message 'ORDER RECEIVED at Half.com' - Training on message 'ORDER RECEIVED at Half.com' - trained as good Recovering to folder 'MSU Mail' and ham training message 'Your Discover(R) Card Statement Is Available Online' - Training on message 'Your Discover(R) Card Statement Is Available Online' - trained as good Recovering to folder 'MSU Mail' and ham training message 'You can now check our website to find out the status of the jobs you've applied for' - Training on message 'You can now check our website to find out the status of the jobs you've applied for' - trained as good Recovering to folder 'MSU Mail' and ham training message 'Engineering, MEMS Newsletter - 10/30/2003' - Training on message 'Engineering, MEMS Newsletter - 10/30/2003' - trained as good Recovering to folder 'MSU Mail' and ham training message 'Your Discover(R) Card Statement Is Available Online' - Training on message 'Your Discover(R) Card Statement Is Available Online' - trained as good Recovering to folder 'MSU Mail' and ham training message 'Glocap has created a new jobs newsletter that you might want to sign-up for - Software/Technical Sales' - Training on message 'Glocap has created a new jobs newsletter that you might want to sign-up for - Software/Technical Sales' - trained as good Recovering to folder 'MSU Mail' and ham training message 'Receipt for your Payment to charharding@aroundthehounds.com ' - Training on message 'Receipt for your Payment to charharding@aroundthehounds.com ' - trained as good Recovering to folder 'MSU Mail' and ham training message 'You have added a new address!' - Training on message 'You have added a new address!' - trained as good Error filtering message '>>PUT ON BACK BURNER)' id=('0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000433A5C446F63756D656E747320616E642053657474696E67735C4A5C4C6F63616C2053657474696E67735C4170706C69636174696F6E20446174615C4D6963726F736F66745C4F75746C6F6F6B5C4F75746C6F6F6B2E70737400', '0000000074BFCD41C0542C46A94B986D86296194443F2000')>' Traceback (most recent call last): File "out1.pyz/filter", line 104, in filter_folder File "out1.pyz/filter", line 14, in filter_message File "out1.pyz/manager", line 778, in score File "out1.pyz/msgstore", line 894, in GetEmailPackageObject File "out1.pyz/msgstore", line 736, in _GetMessageText File "out1.pyz/msgstore", line 758, in _GetMessageTextParts File "out1.pyz/msgstore", line 439, in GetHTMLFromRTFProperty com_error: (-2147221221, 'OLE error 0x8004011b', None, None) Saving configuration -> C:\Documents and Settings\J\Application Data\SpamBayes\Outlook.ini From atom at suspicious.org Tue Dec 9 19:36:49 2003 From: atom at suspicious.org (Atom 'Smasher') Date: Tue Dec 9 19:37:57 2003 Subject: [Spambayes] classifying tokens In-Reply-To: <000c01c3beb0$dca24750$6501a8c0@laptop> References: <000c01c3beb0$dca24750$6501a8c0@laptop> Message-ID: when scoring, i noticed that some tokens seem to be classified based on where (or how) they're found... 8bit% cc charset content-disposition content-type email addr email name filename from header message-id reply-to sender skip subject subjectcharset to url virus x-mailer most of these are self-explanatory, but what about "virus"?? is there part of an email that let's SB know it's a virus? what about "skip"? i haven't found any documentation on this... ...atom _______________________________________________ PGP key - http://smasher.suspicious.org/pgp.txt 3EBE 2810 30AE 601D 54B2 4A90 9C28 0BBF 3D7D 41E3 ------------------------------------------------- "I know a lot of people without brains who do an awful lot of talking." -- The Scarecrow, Wizard of Oz From tim.one at comcast.net Tue Dec 9 20:19:37 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 9 20:19:38 2003 Subject: [Spambayes] classifying tokens In-Reply-To: Message-ID: [Atom 'Smasher'] > when scoring, i noticed that some tokens seem to be classified based > on where (or how) they're found... When testing showed that was helpful, yes, tokens get tagged. Tokens coming from an email header line are generally tagged with the name of the header line ("subject:...", "date:..."), and pieces coming from embedded URLs are tagged with "url:". There are some others. > most of these are self-explanatory, but what about "virus"?? is there > part of an email that let's SB know it's a virus? No, but there are certain tokens that appear to be *associated* with viruses. That doesn't mean an email containing one of those *is* a virus, it's just one more clue to throw into the pot. Remember that spambayes has no preconceived notions of what ham or spam are. The appearance of height=0 in an email will get tagged with a "virus:" prefix by spambayes, but in *your* data it might be a ham clue. SpamBayes doesn't pre-judge that. I'll add that height=0 and width=0 in HTML are almost always used to hide *something* from you, and that is a common trick in virus email. I'm not sure I've ever seen a legitimate use for it that I recognized, but in my (currently small) database I must have some: 'virus:width=0' spamcount: 3 hamcount: 0 'virus: what about "skip"? > > i haven't found any documentation on this... That's because there isn't any . The internals of the database aren't documented, and there's no promise that they'll remain the same. If you really want to know, that's cool: the way to do it is to get the source code and study it. All tokens are produced by the tokenizer.py module. Between the code and the comments in that, there's a long and detailed explanation about what "skip:" tokens mean and why they're generated. It's hard to explain more briefly than that, because it's one of the features SpamBayes generates "for no reason at all" -- as the comments say, I don't know *why* it helps, I only know that testing showed that it did help. From papaDoc at videotron.ca Tue Dec 9 20:26:35 2003 From: papaDoc at videotron.ca (Remi Ricard) Date: Tue Dec 9 20:22:48 2003 Subject: [Spambayes] classifying tokens In-Reply-To: References: <000c01c3beb0$dca24750$6501a8c0@laptop> Message-ID: <1071019594.3314.3.camel@porsche.hq.simlog.com> Hi, > when scoring, i noticed that some tokens seem to be classified based on > where (or how) they're found... Yes that is true. and the other ones are from the body. > most of these are self-explanatory, but what about "virus"?? is there part > of an email that let's SB know it's a virus? what about "skip"? Some of your mail had the word virus in their body and skip is one of the "famous" Spambayes developper. > > i haven't found any documentation on this... I think there is none but you can read the Options.py to find out what you can add or remove from the list of where you want to check for tokens. Remi -- Remi Ricard From nobody at spamcop.net Tue Dec 9 22:28:23 2003 From: nobody at spamcop.net (Seth Goodman) Date: Tue Dec 9 22:28:23 2003 Subject: [Spambayes] How low can you go? In-Reply-To: <16342.20443.331861.376383@montanaro.dyndns.org> Message-ID: [Skip Montanaro] > At the moment I have trained on 14 spams and 20 hams and am quite pleased > with how its performing so far. I've received mail for a half dozen or so > different mailing lists, and it's catching spams left and right. I > anticipate a slew of unsures overnight as I get new kinds of > email (both ham > and spam), but I will be damned selective about what I add to my database. OK, I'll bite. How did you select those 14 spams and 20 hams? Just please don't say they're random. Even if you have to lie. Perhaps you selected them by incrementally training on a corpus of 100 each? What are your current thresholds? I would expect a lot of unsures, which doesn't bother me a bit, but what are you seeing (so far) for false positives and false negatives? Damned impressive, if you ask me. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From tim.one at comcast.net Tue Dec 9 23:31:01 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 9 23:31:05 2003 Subject: [Spambayes] feature request In-Reply-To: Message-ID: [Seth Goodman] > How do you feel about automatically rescoring the Unsures after any > training event? I'd feel good about that. But nothing is ever simple: if we did that, people will divide into four additional camps: those who want newly determined ham and spam to be moved by magic out of Unsure as a result of auto-rescoring; those who want neither moved by magic as a result of that; and two camps for those who want only a particular one of them moved by magic. I happen to fall in the second camp (I wouldn't want anything magically moved as a result of auto-rescoring), but it's not a right-versus-wrong issue. > Most people probably don't have that many Unsures stored up and it would > be helpful. Again, I'm just one user and I don't know how others use the > program. You've been reading this list long enough to know that no two people seem to use it exactly the same way, right? > I understand your dilemma with the large inboxes. It's certainly > your call, but I hope you recognize that many (most?) users don't > have 10K messages waiting for reply. Certainly! OTOH, I see almost no spam in my inbox anyway, so I wouldn't want to wait for a measly 200 messages to get rescored either (perceptible cost but no perceptible benefit). Every knob probably drives another 50% of potential users of a feature away. We've already got so many knobs in the UI it's a miracle anyone is still left here . > That's a burden I can hardly imagine, Na, it's easy: there are only 10K msgs waiting for replies because I keep archiving away the ones so far down the stack that it takes Outlook 5 minutes to scroll that far . > so I do really appreciate your developing this open source code. Oh, this project didn't intend to target personal email. That was an afterthought, and it's why so much of the early heavy testing turned out not to be particularly relevant to most people here. The original purpose was to filter high-volume mailing lists, as a possible addin to GNU Mailman (the mailing list software that runs *this* list, for example). So early testing was done against databases trained on tens of thousands of ham and spam, sliced and diced randomly, and well balanced by construction. It turns out nobody uses it that way, but that's still what it was designed for. There was a start toward testing strategies for real-life low-volume personal use, but that fizzled out around the time my employer yanked me off this project (they paid my salary for the initial development and testing, which is why it got done -- not really the salary part, but that I was able to spend major time on it then). That appears to have different characteristics than the high-volume mailing list use. I've been surprised it works as well as it does for as many users as it does. I'm not surprised it works as well as it does for me, not because I warped it toward my own email (to the contrary, I never tested it on my own email), but because most of my email *comes* from tech mailing lists, and that's what it was developed against. > Personally, I don't get 10K messages that need reply in a > year, I said they're waiting for replies, not that they're going to get one -- maybe one per 100 will. I'd *like* to reply to all, but that's physically impossible; I can't even acknowledge them all. > but maybe I'm not typical and I don't develop software, so different > world. Since hardware is expected to be bug-free in the first proto > board (and yes, there is a tooth fairy), not too many people find bugs. I'm an exception, but I worked for computer manufacturers for 15 years, and finding CPU and FPU bugs was part of my job. Well, not an *intended* part of my job . > But when one does get out of the lab, they are sometimes, uh, irritated. > When this occurs, I do get a message or two that day, or perhaps an > avalanche. They are remarkably similar, usually starting with the > adverbial phrase, "When?", with the remainder being filler. SpamBayes should do great on those -- repetitive msgs the bulk of which is filler is pretty much the definition of a tech mailing list . From atom at suspicious.org Wed Dec 10 00:08:11 2003 From: atom at suspicious.org (Atom 'Smasher') Date: Wed Dec 10 00:09:19 2003 Subject: [Spambayes] ham/spam show 'n tell In-Reply-To: References: Message-ID: this is an interesting (to me) observation.... at this moment i have my database made up of 524 hams and 522 spams, of which, the 5 hammiest spams score: 0.310988054593 0.477222956608 0.69525821016 0.778912509175 0.882964949455 and the 5 spammiest hams score: 0.00282491682801 0.00295767979157 0.00548257011708 0.00566510201445 0.00933374699939 right now my spam-cutoff is 0.8, and looking at these numbers even that seems conservative. so, what do these numbers look like with databases made from different sized pools of ham & spam? how about with a database made of 34 emails... skip? this might give some quantifiable clues about how big a database is "big enough". ...atom _______________________________________________ PGP key - http://smasher.suspicious.org/pgp.txt 3EBE 2810 30AE 601D 54B2 4A90 9C28 0BBF 3D7D 41E3 ------------------------------------------------- "The thing that bugs me is that the people think the FDA (Food and Drug Administration) is protecting them. It isn't. What the FDA is doing and what the public thinks it's doing are as different as night and day." -- Dr Ley, former Commissioner of the FDA From atom at suspicious.org Wed Dec 10 00:44:46 2003 From: atom at suspicious.org (Atom 'Smasher') Date: Wed Dec 10 00:45:53 2003 Subject: [Spambayes] concerns over MS patents Message-ID: here's what paul graham says about the M$ patents... http://paulgraham.com/msftpatent.html ...atom _______________________________________________ PGP key - http://smasher.suspicious.org/pgp.txt 3EBE 2810 30AE 601D 54B2 4A90 9C28 0BBF 3D7D 41E3 ------------------------------------------------- "IDEA's key length is 128 bits - over twice as long as DES. Assuming that a brute force attack is the most efficient, it would require 2^128 (10^38) encryptions to recover the key. Design a chip that can test a billion keys per second an throw a billion of the them at the problem, and it will still take 10^13 years - that's longer than the age of the universe. An array of 10^24 such chips can find the key in a day, but there aren't enough silicon atoms in the universe to build such a machine. Now we're getting somewhere - although I'd keep my eye on the dark matter debate." -- Bruce Schneier, Applied Cryptography From darren at idtelecoms.com Wed Dec 10 06:14:15 2003 From: darren at idtelecoms.com (Darren Westlake) Date: Wed Dec 10 06:14:19 2003 Subject: [Spambayes] run in background? Message-ID: <6.0.1.1.2.20031210111100.028b8368@localhost> Hi, I'm using the pop3 proxy by running sb_server.py It stays running in a DOS window. Is there any way of making it run in the background? Thanks and regards, Darren From rmalayter at bai.org Wed Dec 10 08:35:56 2003 From: rmalayter at bai.org (Ryan Malayter) Date: Wed Dec 10 08:35:59 2003 Subject: [Spambayes] Yahoo's "domain keys" and spam Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01A75055@cliff.bai.org> http://www.wired.com/news/business/0,1367,61495,00.html?tw=wn_bizhead_2 Will we want to have SpamBayes check these (since many SB users will have no control over what happens at their ISP or corporate gateway) as they become widespread? Just as obviously, spammers will attempt to forge them as well (to fool filters like the current SpamBayes that would just add a token verifying the *presence* of a domain key entry in the header), or use yet-to-be-revoked keys from domains obtained through fraudulent means. I think some intelligence must be built into spambayes to handle these... From tim at fourstonesExpressions.com Wed Dec 10 09:01:54 2003 From: tim at fourstonesExpressions.com (Tim Stone) Date: Wed Dec 10 09:02:00 2003 Subject: [Spambayes] Yahoo's "domain keys" and spam In-Reply-To: <792DE28E91F6EA42B4663AE761C41C2A01A75055@cliff.bai.org> References: <792DE28E91F6EA42B4663AE761C41C2A01A75055@cliff.bai.org> Message-ID: On Wed, 10 Dec 2003 07:35:56 -0600, Ryan Malayter wrote: > http://www.wired.com/news/business/0,1367,61495,00.html?tw=wn_bizhead_2 > I think some intelligence must be built into spambayes to handle these... Seems to me that spambayes will see the message only after its sender has been authenticated. We *may* want to ignore that header, if it's present, for clue purposes, but we don't need to perform this authentication. The fact that we are seeing a mail at all means that the MTA has already done it. -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun From tim at fourstonesExpressions.com Wed Dec 10 09:06:15 2003 From: tim at fourstonesExpressions.com (Tim Stone) Date: Wed Dec 10 09:06:20 2003 Subject: [Spambayes] run in background? In-Reply-To: <6.0.1.1.2.20031210111100.028b8368@localhost> References: <6.0.1.1.2.20031210111100.028b8368@localhost> Message-ID: Assuming that you're using Win2k or XP, you can run the proxy as a service, providing that you're running a reasonably recent release of Spambayes. The instructions for installing and running in this mode are in the readme. On Wed, 10 Dec 2003 11:14:15 +0000, Darren Westlake wrote: > Hi, > > I'm using the pop3 proxy by running sb_server.py > It stays running in a DOS window. > Is there any way of making it run in the background? > > Thanks and regards, > Darren > > > > > _______________________________________________ > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes > Check the FAQ before asking: http://spambayes.sf.net/faq.html > -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun From rmalayter at bai.org Wed Dec 10 09:18:24 2003 From: rmalayter at bai.org (Ryan Malayter) Date: Wed Dec 10 09:18:27 2003 Subject: [Spambayes] Yahoo's "domain keys" and spam Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01A75068@cliff.bai.org> > From: Tim Stone [mailto:tim@fourstonesExpressions.com] > Seems to me that spambayes will see the message only after > its sender has been authenticated. We *may* want to ignore > that header, if it's present, for clue purposes, but we don't > need to perform this authentication. The fact that we are > seeing a mail at all means that the MTA has already done it. Well, not always. As the service ramps up - which will probably take a few years - a large portion of ISPs and organizational gateways are going to accept mail without checking this doamain signature. If SpamBayes can validate it, it will help out any SB users until their ISPs or companies get on board. The code for signature validation (which I assume will be based on OpenPGP or something similar) is being given to the open-source mail MTA projects by Yahoo, and presuably will be GPL or OpenBSD licensed. We could use that code as a basis for whatever gets put into SpamBayes. An interesting side effect of this will be that SpamBayes will create a "level of trust" for sending domains based upon how much spam they send. For example, a "Validated-Domain-Sig:python.org" token might have a very low spam probability, while a "Validated-Domain-Sig:super-mail-promotions.biz" will have a very high spam probability. Since the sending domain (and recipient, subject, time, message-ID, etc.) will be cryptographically verified, couldn't this someday someday becomes the "ultimate" SpamBayes token, one that statistically trumps all others in the message? Regards, -Ryan- From kennypitt at hotmail.com Wed Dec 10 10:33:40 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Dec 10 10:34:15 2003 Subject: [Spambayes] run in background? In-Reply-To: Message-ID: Tim Stone wrote: > Assuming that you're using Win2k or XP, you can run the proxy as a > service, providing that you're running a reasonably recent release of > Spambayes. The instructions for installing and running in this mode > are in the readme. > > On Wed, 10 Dec 2003 11:14:15 +0000, Darren Westlake > wrote: > >> Hi, >> >> I'm using the pop3 proxy by running sb_server.py >> It stays running in a DOS window. >> Is there any way of making it run in the background? You can also run the script using "pythonw" instead of "python" (i.e. "pythonw sb_server.py"). That will invoke it as a background Windows app. On Windows, you might want to run pop3proxy_tray.py instead of running sb_server.py directly. It starts sb_server for you, but also you gives you an icon in the tray with a right-click menu for easy access to Review Messages, Configure, etc. Again, invoke using pythonw to avoid the DOS box. -- Kenny Pitt From wsy at merl.com Wed Dec 10 11:31:26 2003 From: wsy at merl.com (Bill Yerazunis) Date: Wed Dec 10 11:31:32 2003 Subject: [Spambayes] How low can you go? In-Reply-To: <16342.20443.331861.376383@montanaro.dyndns.org> (message from Skip Montanaro on Tue, 9 Dec 2003 16:42:35 -0600) References: <16342.20443.331861.376383@montanaro.dyndns.org> Message-ID: <200312101631.hBAGVQh28609@localhost.localdomain> From: Skip Montanaro Okay, time for a little contest. We've recently seen several users tout the size of their training database. I used to be one of those "enlarged database" types, but no more. So, how small is yours? Well, I'm now running with the mostly-hung CRM114 SBPH/BMM and the accuracy is 99.95% or better (most of my errors now are when a spammer gets onto an email list that has "good credentials"; even then, if the message is spammy enough, it doesn't get through). Total size of the training text is 770Kbytes of spam and 570K of nonspam. This is something like 250 spams and 150 nonspams, but that's only approximate. -Bill Yerazunis From skip at pobox.com Wed Dec 10 11:50:47 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Dec 10 11:50:51 2003 Subject: [Spambayes] How low can you go? In-Reply-To: References: <16342.20443.331861.376383@montanaro.dyndns.org> Message-ID: <16343.20199.985033.901080@montanaro.dyndns.org> >> At the moment I have trained on 14 spams and 20 hams and am quite >> pleased with how its performing so far. I've received mail for a >> half dozen or so different mailing lists, and it's catching spams >> left and right. I anticipate a slew of unsures overnight as I get >> new kinds of email (both ham and spam), but I will be damned >> selective about what I add to my database. Seth> OK, I'll bite. How did you select those 14 spams and 20 hams? Seth> Just please don't say they're random. Even if you have to lie. Nothing magic or random. I primed the pump one ham and one spam. Then sorted the unsures which arrived by score. Train the lowest scoring spam as spam. Now rescore the unsure mailbox only considering messages which are now scored as spam. Delete them. Lather. Rinse. Repeat. You will obviously have many hams which initially score as unsure as well. Do the same thing for them, just start from the highest scoring ham. I awoke to 96 unsures this morning. I did the above dance for awhile. I'm now up to 43 spams and 35 hams. I still have a few messages in my unsure mailbox which score between 0.30 and 0.58, but with such a small database I don't want to overload the spam side of things. I'll wait until I get a few more hams. Note that to keep the database more-or-less in balance, I do train on the occasional ham, though I try to find ones that score at the higher end of the ham region. Seth> Perhaps you selected them by incrementally training on a corpus of Seth> 100 each? No starting corpus other than mail as it arrived and the two initial pump primers. They were recently received messages as well though. I just wanted something to keep the initial scores from all being 0.50. Seth> What are your current thresholds? 0.15 and 0.60. I moved the spam threshold from 0.65 this morning. Seth> I would expect a lot of unsures, which doesn't bother me a bit, Seth> but what are you seeing (so far) for false positives and false Seth> negatives? A few. I haven't seen any false positives so far. Perhaps five false negatives. I think the system does a good job vis a vis false positives because most people's ham tends to be topically very similar. On the other hand, spam is all over the map, both as far as its content is concerned, as well as the mechanisms of the delivery process (hiding delivery routes in various ways, obscuring content, etc), so it's understandable that spam is harder to classify. I think it also helps to explain why my ham/spam thresholds can be so assymetric and still be effective. Note: When I encounter a false negative I don't automatically train on it. Instead, I move it to my unsure mailbox. Since it arrived and was incorrectly scored as ham, I may have done enough training on unsures to now correctly classify it as spam, so training on it won't help much. To be most accurate, I should look for false negatives and false positives before considering my unsure mailbox (since they are the most egregious mistakes), but that means I have to skim 20 mailboxes looking for mistakes. I'm more than happy to just deal with false negatives when I encounter them during my regular mail reading. Seth> Damned impressive, if you ask me. I think so too. (Not my training technique, SpamBayes.) Skip From skip at pobox.com Wed Dec 10 11:53:19 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Dec 10 11:53:22 2003 Subject: [Spambayes] ham/spam show 'n tell In-Reply-To: References: Message-ID: <16343.20351.791856.912026@montanaro.dyndns.org> atom> right now my spam-cutoff is 0.8, and looking at these numbers even atom> that seems conservative. atom> so, what do these numbers look like with databases made from atom> different sized pools of ham & spam? how about with a database atom> made of 34 emails... skip? this might give some quantifiable atom> clues about how big a database is "big enough". Like I said in my response to Seth, I'm currently using 0.15 and 0.60. I doubt I'll go much lower than 0.60 though without a fair amount of evidence that I don't see any hams which score above 0.40. Skip From skip at pobox.com Wed Dec 10 12:02:39 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Dec 10 12:02:44 2003 Subject: [Spambayes] Watch out for digests... Message-ID: <16343.20911.454825.961722@montanaro.dyndns.org> I had what looked like an obvious ham message score unsure this morning. Without thinking too much about it, I trained on it as ham. Big mistake. Stuff started getting wacky real fast. The message was a reply to a digest from a mailing list where the guy did this: blah blah blah blah ... a couple more lines ... in conclusion, blah blah blah > here's > ... dozens of lines ... > the > ... dozens of lines ... > entire > ... dozens of lines ... > friggin' > ... dozens of lines ... > digest Guess what? One of the messages in the digest was an obvious spam. I zapped the entire message from my ham database, retrained on the remaining hams and spams and all was right with the world once again. SpamBayes did precisely what I had trained it to do, and I punished it for that. I'm sorry, SpamBayes... Good dog, SpamBayes... Skip From danhealy at weston.com Wed Dec 10 13:45:24 2003 From: danhealy at weston.com (Dan Healy) Date: Wed Dec 10 13:45:40 2003 Subject: [Spambayes] Operator Error Message-ID: I have installed SpamBayes on a Windows 2000 machine and have used it successfully. Then I inadvertently deleted the Spam folder from my Outlook Personal Folders. I put a new Spam folder in Outlook, but SpamBayes doesn't put anything in the new folder. The SpamBayes pluginn in Outlook is apparently still working. I get no spam in my Inbox. I get some messages in Possible Spam folder. SpamBayes is putting the spam somewhere. Is there some way I can point SpamBayes to the new Spam folder? Where is the Spam going now? I would like to be able to recover from my error, deleting the Spam folder, without having to reinstall and re-train SpamBayes. Thanks, Dan H. From kennypitt at hotmail.com Wed Dec 10 13:50:05 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Dec 10 13:50:43 2003 Subject: [Spambayes] Operator Error In-Reply-To: Message-ID: Dan Healy wrote: > Is there some way I can point SpamBayes to the new Spam folder? > Where is the Spam going now? I would like to be able to recover from > my error, deleting the Spam folder, without having to reinstall and > re-train SpamBayes. Sure. Just go to the Filtering tab in SpamBayes Manager. You'll probably see something like "" for the selected folder in the Certain Spam section. Just use the Browse button to select your new spam folder. -- Kenny Pitt From dsvejda at charter.net Wed Dec 10 14:14:58 2003 From: dsvejda at charter.net (David Svejda) Date: Wed Dec 10 14:20:16 2003 Subject: [Spambayes] Unable to install Message-ID: <000001c3bf51$ea00c100$1a02a8c0@Sager2> Tried your program after reading a review, and found after downloading, that I couldn't install. Some dll file can't register. Any ideas? David -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031210/74ddcf9b/attachment.html From JMLafreniere at expertec.net Wed Dec 10 14:42:57 2003 From: JMLafreniere at expertec.net (Jim Morrison Lafreniere) Date: Wed Dec 10 14:44:39 2003 Subject: [Spambayes] SpamBayes Vs NetFolders Message-ID: Hi there, I really enjoy your software, as it works almost perfectly. I only have this problem with all of my clients that use the Outlook Net Folders to share Calendars and Contacts. Every time SpamBayes filters new messages, if the message is a "#netfolders" message, Outlook gives an error message about a problem with the form, and that Outlook is gonna use another form. When there is 12-15 netfolders messages, the message appears 12-15 times... If I uninstall SpamBayes, I don't have the problem. If I uninstall Net Folders, I don't have the problem. It looks like SpamBayes doesn't like that kind of message, and even if I tell SpamBayes not to treat it as a spam, the error message appears again. Do you have any idea ? Thanks. From TiagoTiago at Globo.com Wed Dec 10 14:53:01 2003 From: TiagoTiago at Globo.com (Tiago Estill de Noronha) Date: Wed Dec 10 14:51:00 2003 Subject: RES: [Spambayes] Operator Error In-Reply-To: Message-ID: <003101c3bf57$37f6ffa0$2960b7c8@virtua.com.br> As most of the times I've seen in this list(I'm not here for a long time tought) the spam folder of your will probaly still be in the deleted items folder if u haven't emptied it yet ********************* Tiago Estill de Noronha TiagoTiago@Globo.com -=> -----Mensagem original----- -=> De: spambayes-bounces@python.org -=> [mailto:spambayes-bounces@python.org] Em nome de Kenny Pitt -=> Enviada em: quarta-feira, 10 de dezembro de 2003 15:50 -=> Para: danhealy@weston.com; 'SpamBayes' -=> Assunto: RE: [Spambayes] Operator Error -=> -=> -=> Dan Healy wrote: -=> > Is there some way I can point SpamBayes to the new Spam folder? -=> > Where is the Spam going now? I would like to be able to -=> recover from -=> > my error, deleting the Spam folder, without having to -=> reinstall and -=> > re-train SpamBayes. -=> -=> Sure. Just go to the Filtering tab in SpamBayes Manager. -=> You'll probably see something like "" for -=> the selected folder in the Certain Spam section. Just use -=> the Browse button to select your new spam folder. -=> -=> -- -=> Kenny Pitt -=> -=> -=> _______________________________________________ -=> Spambayes@python.org -=> -=> http://mail.python.org/mailman/listinfo/spambaye-=> s -=> Check the -=> -=> FAQ before asking: -=> http://spambayes.sf.net/faq.html -=> -=> --- -=> Incoming mail is certified Virus Free. -=> Checked by AVG anti-virus system (http://www.grisoft.com). -=> Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 -=> -=> --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 From skip at pobox.com Wed Dec 10 14:52:14 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Dec 10 14:52:22 2003 Subject: [Spambayes] Unable to install In-Reply-To: <000001c3bf51$ea00c100$1a02a8c0@Sager2> References: <000001c3bf51$ea00c100$1a02a8c0@Sager2> Message-ID: <16343.31086.515303.875507@montanaro.dyndns.org> David> Tried your program after reading a review, and found after David> downloading, that I couldn't install. Some dll file can't David> register. Any ideas? Can you be more explicit about what dll couldn't register? What version of Windows and Outlook are you using? Skip From TiagoTiago at Globo.com Wed Dec 10 14:57:59 2003 From: TiagoTiago at Globo.com (Tiago Estill de Noronha) Date: Wed Dec 10 14:56:55 2003 Subject: [Spambayes] why this message says it can't be filtered? Message-ID: <003b01c3bf57$e9700b00$2960b7c8@virtua.com.br> When I try to set it as spam or not spam it say: "no filterable mail items selected now" ********************* Tiago Estill de Noronha TiagoTiago@Globo.com --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 -------------- next part -------------- An embedded message was scrubbed... From: Subject: Introducing Morpheus 2.0! Date: Fri, 28 Dec 2001 05:49:35 -0200 Size: 13884 Url: http://mail.python.org/pipermail/spambayes/attachments/20031210/a920f940/attachment-0001.mht From ck0e82002 at sneakemail.com Wed Dec 10 15:06:15 2003 From: ck0e82002 at sneakemail.com (David) Date: Wed Dec 10 15:06:18 2003 Subject: [Spambayes] Configuration Manager won't open after upgrade to Office 2003 Professional Message-ID: <30605-24700@sneakemail.com> I've worked through the troubleshooting guide and have even uninstalled and re-installed SpamBayes Outlook add-in. I'm still stumped. OS: Windows 2000 Professional SpamBayes V .81 Office 2003 Professional SpamBayes was working fine under Office/Outlook 2000. I upgraded Office to the 2003 version and SpamBayes could no longer find the Spam and Potential Spam folders. When I go to SpamBayes Manager and click on Configuration Wizard..., the wizard does not appear. I've tried: deleting the Addin and restarting Outlook, deleting the outcmd.dat file and restarting Outlook uninstalling SpamBayes and re-installing. So far, none of these efforts have resulted in a change of behaviour. The Configuration Wizard will not appear. All help/suggestions will be appreciated. David ==================== SpamBayes LOG ==================================================== Loaded bayes database from 'C:\Documents and Settings\David\Application Data\SpamBayes\default_bayes_database.db' Loaded message database from 'C:\Documents and Settings\David\Application Data\SpamBayes\default_message_database.db' Bayes database initialized with 253 spam and 1295 good messages SpamBayes Outlook Addin, Binary version 0.81 (September 9, 2003) starting (with engine SpamBayes Beta2, version 0.2 (July 2003)) on Windows 5.0.2195 (Service Pack 4) using Python 2.3+ (#46, Aug 6 2003, 16:39:24) [MSC v.1200 32 bit (Intel)] NOTE: Skipping deleted folder NOTE: Skipping deleted folder NOTE: Skipping deleted folder Creating new SpamBayes toolbar to host our buttons Error finding the MAPI folders for a folder switch event Traceback (most recent call last): File "out1.pyz/addin", line 1063, in OnFolderSwitch File "out1.pyz/msgstore", line 337, in GetFolder NotFoundException: NotFoundException: Exception 0x8004010f (MAPI_E_NOT_FOUND): OLE error 0x8004010f Traceback (most recent call last): File "out1.pyz/dialogs.dlgcore", line 310, in OnCommand File "out1.pyz/dialogs.dlgcore", line 262, in ApplyHandlingOptionValueError File "out1.pyz/dialogs.processors", line 76, in OnCommand File "out1.pyz/dialogs.dialog_map", line 322, in OnClicked File "out1.pyz/dialogs", line 64, in ShowWizard File "out1.pyz/config_wizard", line 142, in CreateWizardConfig File "out1.pyz/config_wizard", line 46, in InitWizardConfig File "out1.pyz/msgstore", line 337, in GetFolder msgstore.NotFoundException: NotFoundException: Exception 0x8004010f (MAPI_E_NOT_FOUND): OLE error 0x8004010f Saving configuration -> C:\Documents and Settings\David\Application Data\SpamBayes\Microsoft Outlook Internet Settings.ini Saving configuration -> C:\Documents and Settings\David\Application Data\SpamBayes\Microsoft Outlook Internet Settings.ini Saving configuration -> C:\Documents and Settings\David\Application Data\SpamBayes\Microsoft Outlook Internet Settings.ini -------------------------------------- From mbeloff at comcast.net Wed Dec 10 15:40:12 2003 From: mbeloff at comcast.net (Marv Beloff) Date: Wed Dec 10 15:40:27 2003 Subject: [Spambayes] Frustrated - Please help! Message-ID: Hi, I seem to keep losing my Junk Mail folder. It usually ends up under Deletions or Calendar. When this happens I can?t seem to reinstall. I get a message ?You must configure the spam folder.? I have tried to comply. I have: 1. set up two folders: Junk Mail & Junk Maybe 2. Reset Configuration 3. used Configuration Wizard 4. Checked the ? Enable Spambayes box 5. reexamined Training ? Folders with known good messages = Inbox Folders with spam or Junk messages = Junk Mail, Junk Maybe Still I get the same message ? ?You Must Configure the Spam Folder? What am I doing wrong? When I had it working it was a terrific help with up to 60 ugly spams daily. Why do I seem to lose it so easily? Please help. Best, Marv Beloff -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031210/3eeef418/attachment.html From nobody at spamcop.net Wed Dec 10 15:56:27 2003 From: nobody at spamcop.net (Seth Goodman) Date: Wed Dec 10 15:56:20 2003 Subject: [Spambayes] How low can you go? In-Reply-To: <16343.20199.985033.901080@montanaro.dyndns.org> Message-ID: [Skip Montanaro] > No starting corpus other than mail as it arrived and the two initial pump > primers. They were recently received messages as well though. I just > wanted something to keep the initial scores from all being 0.50. This is great. I think it's the ultimate case of incremental training. Don't you think the result would be nearly the same if you used your incoming mail stream or a saved corpus. The only random part is the one message of each type that you pick first. That causes a particular sort order for everything else, and that guides you as to what to train on next. Depending on which two messages you started with, you might wind up with a different training set, though not necessarily better or worse. But doing it one at a time tends to give you the least duplication in messages that you select for training. I think an interesting variation on this would be to start with one message of each type, score a small corpus of equal numbers of ham and spam (say 100-150 of each), and always add one spam plus one ham to the training set each time. That way the sets will stay balanced. As you suggest, hams are probably easier to classify, so without this, you would tend to have fewer trained messages, but more imbalance. I suppose that's OK to a point, wherever that is. It should be possible to automate the training on the single worst ham + the single worst spam on each pass, guaranteeing a balanced training set and the least duplication. However, with a small training set, each message that you add could skew things quite a bit as each new message can change the estimated classifier a lot. I'm still amazed that it can classify at all based on only 34 messages. I wonder if it would do better on small training sets if it was allowed to use more than 150 tokens when scoring? I'm assuming that every message token is put into the database when a message is trained. If that's so, there's more information that we're not using when the token counts haven't yet settled down closer to their expected values. That's an interesting question: if the token count mean-squared error is large, does including more tokens reduce the variance of the message score? If the token count error is zero mean (a big if) with a reasonable distribution (another big if), I'd have to guess that it would. Otherwise, it wouldn't help and could even get worse, but I doubt it's that badly behaved. I could see why it wouldn't make much difference with a large training set, since the errors on the individual token counts are smaller and we're combining (though not linearly) 150 of them, so the total estimation error is down in the noise. But with only a few trained messages, the token counts are all small enough that the errors are necessarily larger, if for no other reason due to quantization. For example, for a given number of messages, let's say the expected value of a particular token count is 1.33. If we're lucky, we probably have a count of 1 or 2 (best case), which would be a 33% error. That's not so good for the best case. With ten times that number of messages that have that token, the error from quantization goes down quickly. At some point, other error sources will dominate, but for very small training sets, this one might be important. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From Philippe.Gagne at nrcan.gc.ca Wed Dec 10 16:32:18 2003 From: Philippe.Gagne at nrcan.gc.ca (Gagne, Philippe) Date: Wed Dec 10 16:32:29 2003 Subject: [Spambayes] DB corruption Message-ID: <347F6A2EDD04D51196F70000C110386D0D7599AB@S0-OTT-X14.NRCan.gc.ca> To whom it may concern, I am testing SpamBayes for my division and I had a few questions. When transferring the database files from one PC to the other I found that the DB gets corrupted much faster, then if you set it up the long way. I was also wondering if you have an enterprise solution? Since I am testing it for my division, if you would like me to help in any way let me know. Thank you, Philippe Gagne Information Technology Analyst/Analyste en technologie de l'information Natural Resources Canada/Ressources naturelles Canada Energy Sector/Secteur de l'Energie Government of Canada/Gouvernement du Canada 580 rue Booth Street, Ottawa ON, K1A 0E4 pgagne@nrcan.gc.ca Tel: (613) 996-2900 / Fax: (613) 995-5005 / Cell: (613)299-2185 From dsvejda at charter.net Wed Dec 10 16:47:51 2003 From: dsvejda at charter.net (David Svejda) Date: Wed Dec 10 16:49:56 2003 Subject: [Spambayes] Unable to install In-Reply-To: <16343.31086.515303.875507@montanaro.dyndns.org> Message-ID: <000001c3bf67$45a6ec90$1a02a8c0@Sager2> Skip, Thanks for the reply! The error that I was experiencing is the same one as posted on the BBS about the dll that doesn't register. I'm running XP Pro and XP Office Pro. When I tried to reinstall the app to get a picture of the error message, it decided to install without an error. Go figure. Anyway, I'm up and running with your app. Will let you know if there's any other problem. David -----Original Message----- From: Skip Montanaro [mailto:skip@pobox.com] Sent: Wednesday, December 10, 2003 1:52 PM To: dsvejda@charter.net Cc: spambayes@python.org Subject: Re: [Spambayes] Unable to install David> Tried your program after reading a review, and found after David> downloading, that I couldn't install. Some dll file can't David> register. Any ideas? Can you be more explicit about what dll couldn't register? What version of Windows and Outlook are you using? Skip From papaDoc at videotron.ca Wed Dec 10 16:54:26 2003 From: papaDoc at videotron.ca (papaDoc) Date: Wed Dec 10 16:54:30 2003 Subject: [Spambayes] DB corruption In-Reply-To: <347F6A2EDD04D51196F70000C110386D0D7599AB@S0-OTT-X14.NRCan.gc.ca> References: <347F6A2EDD04D51196F70000C110386D0D7599AB@S0-OTT-X14.NRCan.gc.ca> Message-ID: <3FD79612.5080703@videotron.ca> Salut Philippe, > I am testing SpamBayes for my division and I had a few questions. When transferring the database files from one PC to the other I found that the DB gets corrupted much faster, then if you set it up the long way. What is your error message ? What do you use ? (Outlook plugin, Web interface ?) Remi From tameyer at ihug.co.nz Wed Dec 10 18:40:33 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Dec 10 18:40:40 2003 Subject: [Spambayes] Watch out for digests... In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13046B4478@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677744@its-xchg4.massey.ac.nz> > I had what looked like an obvious ham message score unsure > this morning. Without thinking too much about it, I trained > on it as ham. Big mistake. Stuff started getting wacky real > fast. The message was a reply to a digest from a mailing > list where the guy did this: [...] > Guess what? One of the messages in the digest was an obvious > spam. This is perhaps a drawback of the minimalist database size training strategy. I'm guessing that if you had a larger database, the effect wouldn't have been as pronounced? OTOH, maybe it's a good thing that it was, so that you noticed and was able to correct it, rather than leave the system with invalid data. =Tony Meyer From tameyer at ihug.co.nz Wed Dec 10 18:45:58 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Dec 10 18:46:06 2003 Subject: [Spambayes] Configuration Manager won't open after upgrade to Office 2003 Professional In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13046B44F2@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677746@its-xchg4.massey.ac.nz> > I've tried: > deleting the Addin and restarting Outlook, > deleting the outcmd.dat file and restarting Outlook > uninstalling SpamBayes and re-installing. > > So far, none of these efforts have resulted in a change of > behaviour. The Configuration Wizard will not appear. Try deleting your configuration file and trying again. It's called "C:\Documents and Settings\David\Application Data\SpamBayes\Microsoft Outlook Internet Settings.ini" (from the log). =Tony Meyer From tameyer at ihug.co.nz Wed Dec 10 18:45:56 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Dec 10 18:46:16 2003 Subject: [Spambayes] why this message says it can't be filtered? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13046B44EF@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677745@its-xchg4.massey.ac.nz> > When I try to set it as spam or not spam it say: > "no filterable mail items selected now" Hmm...this one works for me. Did this message arrive like any other? In particular, you didn't write it, did you? SpamBayes only filters messages with type "IPM.Note" (since this works for me, I presume it is), and that it believes was received. There is an open bug report that gives an example of another message that is incorrectly classed as 'not received'; OTOH, I can't filter that one either. Still, when that bug is resolved, this one might be too. You could add your experiences to that bug report. For the moment, just don't bother about training this message. =Tony Meyer From mhammond at skippinet.com.au Wed Dec 10 19:03:42 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Dec 10 19:03:58 2003 Subject: [Spambayes] DB corruption In-Reply-To: <347F6A2EDD04D51196F70000C110386D0D7599AB@S0-OTT-X14.NRCan.gc.ca> Message-ID: <01d501c3bf7a$3dd991d0$2c00a8c0@eden> > I am testing SpamBayes for my division and I had a few > questions. When > transferring the database files from one PC to the other I > found that the DB > gets corrupted much faster, then if you set it up the long > way. I was also > wondering if you have an enterprise solution? Since I am > testing it for my > division, if you would like me to help in any way let me know. See papaDoc's reply, but assuming you are talking about the Outlook Addin. If by "corrupted" you actually mean that the results become poor, then you will probably find this is because you are attempting to copy one person's training data, and use it with a different person. All tests along these lines show spectacularly poor results, hence at this stage our tools do not support such a concept. At this stage, the best "enterprise" support we have is to allow SpamBayes to be installed for all users on a machine (silently etc with a little work). Then, the first time the user launches Outlook, they are presented with the "Configuration Wizard", and must start the process for themselves. If they log onto another machine on the network though, their training data should follow them. If by corrupted you really mean corrupted, please attach a log. Mark. From mhammond at skippinet.com.au Wed Dec 10 19:05:36 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Dec 10 19:05:51 2003 Subject: [Spambayes] Frustrated - Please help! In-Reply-To: Message-ID: <01dd01c3bf7a$8230afd0$2c00a8c0@eden> The configuration wizard in version 0.8 gets upset when folders have been deleted. You should be able to configure SpamBayes manually using the "SpamBayes Manager", and manually setting the "Filter" tab. Otherwise, delete the configuration file. Mark. -----Original Message----- From: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org]On Behalf Of Marv Beloff Sent: Thursday, 11 December 2003 7:40 AM To: spambayes@python.org Subject: [Spambayes] Frustrated - Please help! Hi, I seem to keep losing my Junk Mail folder. It usually ends up under Deletions or Calendar. When this happens I can?t seem to reinstall. I get a message ?You must configure the spam folder.? I have tried to comply. I have: 1. set up two folders: Junk Mail & Junk Maybe 2. Reset Configuration 3. used Configuration Wizard 4. Checked the ? Enable Spambayes box 5. reexamined Training ? Folders with known good messages = Inbox Folders with spam or Junk messages = Junk Mail, Junk Maybe Still I get the same message ? ?You Must Configure the Spam Folder? What am I doing wrong? When I had it working it was a terrific help with up to 60 ugly spams daily. Why do I seem to lose it so easily? Please help. Best, Marv Beloff -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031211/5bc2fc07/attachment.html From exitaz at earthlink.net Wed Dec 10 19:53:26 2003 From: exitaz at earthlink.net (Bill, CCIM) Date: Wed Dec 10 19:52:04 2003 Subject: [Spambayes] oops Message-ID: <000001c3bf81$3284b470$6401a8c0@exitaz.local> i accidentally deleted the 'junk suspects' folder....what do i do?? thanks bill William H Higgins, CCIM EXIT Realty Arizona 2500 S Power Rd, Suite 103 Mesa, AZ 85208 480-603-4960 www.exitaz.com whhiggins@ccim.net -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031210/4e3619ee/attachment.html From skip at pobox.com Wed Dec 10 21:22:59 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Dec 10 21:23:11 2003 Subject: [Spambayes] Watch out for digests... In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304677744@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F13046B4478@its-xchg4.massey.ac.nz> <1ED4ECF91CDED24C8D012BCF2B034F1304677744@its-xchg4.massey.ac.nz> Message-ID: <16343.54531.802087.451246@montanaro.dyndns.org> >> Big mistake. Stuff started getting wacky real fast.... Guess what? >> One of the messages in the digest was an obvious spam. Tony> This is perhaps a drawback of the minimalist database size Tony> training strategy. I'm guessing that if you had a larger Tony> database, the effect wouldn't have been as pronounced? Maybe. At the moment, I have 9768 tokens in my database and 7731 of them are hapaxes. As you suggest, it would appear mistakes can throw things off more dramatically, but it is also easier to detect. I'd be interested to see what others' hapax fractions are: >>> import shelve >>> db = shelve.open(".hammiedb") >>> n = 0 >>> len([k for k in db if db[k] in [(0,1),(1,0)]]) 7731 >>> len(db) 9769 >>> len([k for k in db if db[k] in [(0,1),(1,0)]])/float(len(db)-1) 0.79146191646191644 (The -1 is to eliminate the 'saved state' token. I'm just being pedantic. ;-) Another interesting thing (I think) might be to investigate the importance of synthetic tokens (e.g.: 'url:eweek' or 'received:168.10.156') vs. natural tokens (e.g., 'highlight' or 'dot') for smaller vs larger databases. I think one of the reasons training a single unsure has a dramatic effect on a bunch of other unsure spams is because of all the synthetic tokens they have in common due to similar delivery mechanisms (gotta use that account before it gets shut down...). If a spammer spews a bunch of messages from ISP A, then gets booted, his next spew will be from somewhere else. I suspect many of the ISP-related synthetic tokens generated will only ever be hapaxes, and thus be much more important with a small database than with a large one. It's just a theory. Hey, maybe that's another master's thesis idea for Brett Cannon... ;-) Skip From tameyer at ihug.co.nz Wed Dec 10 21:55:12 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Dec 10 21:55:24 2003 Subject: [Spambayes] oops In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13046B45B2@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A06@its-xchg4.massey.ac.nz> > i accidentally deleted the 'junk suspects' folder....what do i do?? Please see FAQ 3.13: =Tony Meyer From atom at suspicious.org Thu Dec 11 01:46:29 2003 From: atom at suspicious.org (Atom 'Smasher') Date: Thu Dec 11 01:47:39 2003 Subject: [Spambayes] out of balance database Message-ID: what types of bad things happen when the database is "out of balance"? is it in balance when i have the same number of messages in each pile? or when the total size of each pile is the same? how far out of balance is considered "reasonable"? how far out of balance can it get before i notice problems? ...atom _______________________________________________ PGP key - http://smasher.suspicious.org/pgp.txt 3EBE 2810 30AE 601D 54B2 4A90 9C28 0BBF 3D7D 41E3 ------------------------------------------------- "We will, in fact, be greeted as liberators." -- Dick Cheney, March 16th 2003 From kennypitt at hotmail.com Thu Dec 11 09:58:45 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Thu Dec 11 09:59:21 2003 Subject: [spambayes-dev] RE: [Spambayes] Watch out for digests... In-Reply-To: <16343.54531.802087.451246@montanaro.dyndns.org> Message-ID: Skip Montanaro wrote: > I'd be interested to see what others' hapax fractions are: > > >>> import shelve > >>> db = shelve.open(".hammiedb") > >>> n = 0 > >>> len([k for k in db if db[k] in [(0,1),(1,0)]]) > 7731 > >>> len(db) > 9769 > >>> len([k for k in db if db[k] in [(0,1),(1,0)]])/float(len(db)-1) > 0.79146191646191644 My current Outlook training database has 40 good and 59 spam. Here are my results: >>> len([k for k in db if db[k] in [(0,1),(1,0)]]) 8158 >>> len(db) 11274 >>> len([k for k in db if db[k] in [(0,1),(1,0)]])/float(len(db)-1) 0.72367604009580411 -- Kenny Pitt From tim.one at comcast.net Thu Dec 11 11:17:47 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Dec 11 11:17:43 2003 Subject: [Spambayes] Watch out for digests... In-Reply-To: <16343.54531.802087.451246@montanaro.dyndns.org> Message-ID: [Tony] > This is perhaps a drawback of the minimalist database size > training strategy. I think it's a consequence of mistake-based training (and minimal database size is a (another) consequence of *that*). > I'm guessing that if you had a larger database, the effect wouldn't > have been as pronounced? A mistake in training has smaller effect under TOE (train-on-everything). The other side of that is that a correctly-trained example also has smaller effect under TOE. [Skip] > Maybe. At the moment, I have 9768 tokens in my database and 7731 of > them are hapaxes. As you suggest, it would appear mistakes can throw > things off more dramatically, We're rediscovering the bases for these old mantras: Mistake-based training leads to hapax-driven scoring. Hapax-driven scoring is brittle. "brittle" is an antonymn of "robust" . But in my personal email life, I've been very happy with mistake-based training despite its drawbacks. > but it is also easier to detect. Heh -- isn't that *because* it throws things off so dramatically ? > I'd be interested to see what others' hapax fractions are: I don't think that's the right thing to measure. There's really nothing in a database that's interesting on its own, the only thing that matters to performance is what gets used during *scoring* (everything else just sits there, passively, the same as if it didn't exist (except for its effect on database size)). A message score mostly derived from hapaxes is brittle because a single contrary training example can change the classifier's view of a hapax from "hammy" or "spammy" to "neither", and two contrary training examples can swing it to the other classification. In the early days, the database kept track of the last time a token was used in scoring, and the test framework kept track of often each token got used in scoring. There isn't an out-of-the-box way to get at that info anymore, so it's much harder to investigate how mistake-based training leads to hapax-driven scoring now. It's not *all* bad, or mistake-based training wouldn't be so effective for so many of us. Maybe the clearest example is that the hapaxes found in a new spam campaign are precisely what let us get away with training one sample and thereafter catch others from that campaign; in effect, hapaxes act like a pretty large set of lexical fingerprints in that case. > ... > Another interesting thing (I think) might be to investigate the > importance of synthetic tokens (e.g.: 'url:eweek' or > 'received:168.10.156') vs. natural tokens (e.g., 'highlight' or > 'dot') for smaller vs larger databases. I think one of the reasons > training a single unsure has a dramatic effect on a bunch of other > unsure spams is because of all the synthetic tokens they have in > common due to similar delivery mechanisms (gotta use that account > before it gets shut down...). If a spammer spews a bunch of messages > from ISP A, then gets booted, his next spew will be from somewhere > else. I suspect many of the ISP-related synthetic tokens generated > will only ever be hapaxes, and thus be much more important with a > small database than with a large one. It was established before that hapaxes are vital in mistake-based training. If you want to test that quickly but informally, modify a copy of your database to throw away all the hapaxes, then live with that reduced database for a while. It will probably have a hard time even with the messages it was originally trained with. From skip at pobox.com Thu Dec 11 11:38:28 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Dec 11 11:38:43 2003 Subject: [Spambayes] Watch out for digests... In-Reply-To: References: <16343.54531.802087.451246@montanaro.dyndns.org> Message-ID: <16344.40324.210107.698842@montanaro.dyndns.org> >> I'd be interested to see what others' hapax fractions are: Tim> I don't think that's the right thing to measure. There's really Tim> nothing in a database that's interesting on its own, the only thing Tim> that matters to performance is what gets used during *scoring* Tim> (everything else just sits there, passively, the same as if it Tim> didn't exist (except for its effect on database size)). Yes, you're correct, of course. So what we might want to look at is the relative occurrence of 0.84 and 0.16 scores in message clues? Tim> It's not *all* bad, or mistake-based training wouldn't be so Tim> effective for so many of us. Maybe the clearest example is that Tim> the hapaxes found in a new spam campaign are precisely what let us Tim> get away with training one sample and thereafter catch others from Tim> that campaign; in effect, hapaxes act like a pretty large set of Tim> lexical fingerprints in that case. This is where I think the synthetic vs. natural tokens thing would be interesting. I get lots of Viagra spam, most of which is caught, but in my current database, 'viagra' is a hapax. In fact, it appears I only added it very recently. Here's the evidence header from a message with the subject: Viagra, Soma, Fioricet, Prescribed Online for Free, Shipped Overnight which was scored around 12:25 AM today: X-Spambayes-Evidence: '*H*': 0.03; '*S*': 0.90; 'drug': 0.16; 'subject:Free': 0.16; 'store': 0.23; 'next': 0.25; 'list,': 0.30; 'via': 0.34; 'subject:, ': 0.37; 'our': 0.62; 'header:Reply-To:1': 0.64; 'enter': 0.67; 'content-type:multipart/alternative': 0.68; 'content-type:text/html': 0.74; 'doctors': 0.84; 'prescription': 0.84; 'received:103]': 0.84; 'received:165.175': 0.84; 'received:175': 0.84; 'received:199.249.165.175': 0.84; 'received:249.165.175': 0.84; 'reply-to:addr:yahoo.com': 0.93; 'url:biz': 0.98 Most of the spammy clues are synthetic tokens related to delivery (and are mostly hapaxes), not content. My 'train an unsure or false negative, check for spams' method suggests this is the case, since training on a single message often pushes several other spams about completely different topics into the spam category. This suggests a couple other downsides to minimalist training. One, spammers have to move, so hapaxes related to delivery are likely to only be useful for a short period while the spammer is abusing a single account. Two, if a delivery token pushes a bunch of other messages into the spam category which are then never used as inputs to training, the opportunity to reinforce that token's quality is lost, even though it might actually appear fairly frequently in spam. Skip From tim.one at comcast.net Thu Dec 11 12:08:08 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Dec 11 12:08:03 2003 Subject: [Spambayes] out of balance database In-Reply-To: Message-ID: [Atom 'Smasher'] > what types of bad things happen when the database is "out of balance"? Increased error rates, and sluggish response to new training. Suppose you trained on 1000 spam and no ham. Then every token in the database looks purely spammy, and no token in the database looks hammy. As a result, no message will get scored as ham, and you'll get high false positive and high Unsure rates. You won't get any false negatives, though. Now add training on one ham. The situation will improve, but probably not much. Etc. > is it in balance when i have the same number of messages in each pile? > or when the total size of each pile is the same? "Same number of messages" has been a good-enough approximation in practice. It's possible to get into the same kinds of trouble if, e.g., you trained on one spam containing a million words, and one ham containing a single word, but that's not something to worry about in real life. > how far out of balance is considered "reasonable"? how far out of > balance can it get before i notice problems? A sharp answer depends on your exact email mix, and exact training strategy. Both differ across users. I start to see (minor) flakiness under my combo if the ratio of messages trained on starts to exceed 2-to-1, although I've been "happy enough" letting it slide up to 5-to-1; I'm not happy enough if it gets worse than 5-to-1. Others here have reported no problems with ratios up to 10-to-1. Some people using the Outlook addin have ratios exceeding 300-to-1, but that's always something we *deduce* after they complain about poor classification performance . My seeded-with-200-of-each then trained-on-mistakes-and-some-unsures database today has ... 437 ham and 456 spam, a little more than one day's total email volume. From atom at suspicious.org Thu Dec 11 12:31:52 2003 From: atom at suspicious.org (Atom 'Smasher') Date: Thu Dec 11 12:33:04 2003 Subject: [Spambayes] out of balance database In-Reply-To: References: Message-ID: i sort all incoming mail into 2 piles. right now, i have a cron job build my database from scratch every night. i can usually keep my in-box between 300-1500 messages. this means my spam-box will need to be pruned every now and again... i guess the plan should be to prune the spam-box to keep it no larger than my in-box and let it grow on it's own: repeat. so, any suggestions for which messages to prune? oldest? duplicates? i experimented by deleting the spammiest spams, but that didn't produce encouraging results. ...atom _______________________________________________ PGP key - http://smasher.suspicious.org/pgp.txt 3EBE 2810 30AE 601D 54B2 4A90 9C28 0BBF 3D7D 41E3 ------------------------------------------------- "TELEVISION IS DRUGS" -- Bumper Sticker From Kent.Tegels at hdrinc.com Thu Dec 11 14:12:18 2003 From: Kent.Tegels at hdrinc.com (Tegels, Kent) Date: Thu Dec 11 14:12:23 2003 Subject: [Spambayes] Dictionary Analysis Tool Message-ID: <2368489DC1DAF2488B85739359AD74DE7F2F30@exch2003.intranet.hdr> Greetings, I love the SpamClues feature, but I'd really like to know -- by "word" what "words" have the highest to lowest probability of occurring in my SpamBayes for all messages. Putting it another way, I'd like to know what the top N most "spammy" words for me. Is there a tool or other way to do this? TIA, Kent Tegels, MCDBA, MCSE, MCP+SB Professional Associate Senior Systems Analyst - Corporate Information Services HDR One Company | Many Solutions http://www.hdrinc.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031211/61d77f77/attachment.html From epedersen at farris.com Thu Dec 11 14:47:33 2003 From: epedersen at farris.com (Eric Pedersen) Date: Thu Dec 11 14:52:43 2003 Subject: [Spambayes] Junk Emails Folder - 'write protect' Message-ID: <7F999A1D3DF0D3118DF800D0B73E9ABE050070FF@EXCHANGE> Howdy, Does anyone know of a way to modify the Junk Emails folder so that it cannot be deleted? Is there a reg-key, or the like, that would make the Junk Suspects, and Junk Emails, folders appear as "systems" folders? (Like Inbox, Sent Items, etc.) The reason I'm asking is that some of our users appear to suffer from 'Continual Mouse Misapplication Syndrome'* whilst trashing their spam collections. BTW, many thanks for a great project! Regards, Eric Pedersen, Network Manager Farris, Vaughan, Wills, and Murphy Barristers & Solicitors ----------------------- *CMMS is functionally similar to CKMS, aka 'Continual Keystroke Misapplication Syndrome' From blr28 at comcast.net Thu Dec 11 14:53:18 2003 From: blr28 at comcast.net (Fulinn28) Date: Thu Dec 11 14:53:21 2003 Subject: [Spambayes] Error installing Outlook addin with win xp Message-ID: Hi, I am new to win xp, but fairly good with pc's in general. I have a new laptop running win xp home version, outlook 2000 and I tried to install Spambayes Outlook Addin 0.81 It appears to install ok, until this error appears : C:/program files/spambayes outlook addin/spambayes_addin.dll Unable to register the DLL/OCX :DLL registar server failed; code OxOOOOOOOO. try again , ignore or abort. Can any tell me what this means ? Is the problem with XP or Spambayes, is there a fix ? This is by far the best anti-spam software on the market, and I've tried a lot of them !! Is possible could you e-mail me as well as posting, I'm on so many lists that I'm afraid I'll miss your response. Thank you for your attention in this matter. Thanks, Bonnie Rose fulinn28@yahoo.com --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.550 / Virus Database: 342 - Release Date: 12/9/2003 From rmalayter at bai.org Thu Dec 11 15:29:18 2003 From: rmalayter at bai.org (Ryan Malayter) Date: Thu Dec 11 15:29:26 2003 Subject: [Spambayes] Junk Emails Folder - 'write protect' Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01A750BC@cliff.bai.org> > Does anyone know of a way to modify the Junk Emails folder so > that it cannot > be deleted? The only way I can think of to do it is for users who connect Outlook to a Microsoft Exchange Server, rather than a simple POP3 or IMAP server. You could simply remove "owner" permissions on the folder from your own account, and assign them somewhere else. You would still want your own account to have "publishing editor" permissions, however. This could probably be automated with a script of some sort. You could do anything to the folder or its contents except rename or delete it. In POP3/IMAP mode, however, there are no permissions settings for the folders, which are stored in .PST files rather than the Exchange server database. Regards, Ryan From jarmlr1 at argontech.net Thu Dec 11 15:53:39 2003 From: jarmlr1 at argontech.net (JESS RUSSELL) Date: Thu Dec 11 15:53:31 2003 Subject: [Spambayes] spamblocker for outlook express Message-ID: <000801c3c028$dabe5f40$51e2ca40@jar> do you have a program that will work with outlook express jess -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031211/b617bbad/attachment.html From tim at fourstonesExpressions.com Thu Dec 11 16:17:47 2003 From: tim at fourstonesExpressions.com (Tim Stone) Date: Thu Dec 11 16:17:52 2003 Subject: [Spambayes] spamblocker for outlook express In-Reply-To: <000801c3c028$dabe5f40$51e2ca40@jar> References: <000801c3c028$dabe5f40$51e2ca40@jar> Message-ID: Yes, though it isn't a "plugin" like the Outlook support. You need to use the pop3proxy. Installation and setup are explained in the readme, and on spambayes.sourceforge.net. On Thu, 11 Dec 2003 14:53:39 -0600, JESS RUSSELL wrote: > do you have a program that will work with outlook express > > jess -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun From nobody at spamcop.net Thu Dec 11 16:45:31 2003 From: nobody at spamcop.net (Seth Goodman) Date: Thu Dec 11 16:45:55 2003 Subject: [Spambayes] How low can you go? In-Reply-To: <16343.20199.985033.901080@montanaro.dyndns.org> Message-ID: [Skip Montanaro] > Nothing magic or random. I primed the pump one ham and one spam. Then > sorted the unsures which arrived by score. Train the lowest > scoring spam as > spam. Now rescore the unsure mailbox only considering messages which are > now scored as spam. Delete them. Lather. Rinse. Repeat. You will > obviously have many hams which initially score as unsure as well. Do the > same thing for them, just start from the highest scoring ham. I just re-read this and realized I missed something key in your description. Your training set is culled only from unsures, rather than the set of all messages. My adaptation of your algorithm for Outlook on the Wiki is wrong, and I'll fix it. The more important thing is that your method is really "train on unsures", which is fundamentally different from mistake-based training and train on everything. The particular incremental method of selecting a minimal subset that makes a good classifier can be applied to any original corpus. The corpus that we select the training set from defines the training tactic. The real question is then, what corpus should you select the training set from (what is the best training tactic)? The choices identified so far are: - train on errors - train on unsures - train on errors + unsures - train on errors + unsures + non-obvious correct decisions - train on everything Train on errors defines mistake-based training, with it well-debated properties (see the "Watch out for digests ..." thread). Using the unsures for the original corpus makes it very different from mistake-based training because it doesn't include *any* mistakes, it consists entirely of messages that classified as "I can't decide". It's also different from train on everything because it doesn't include *any* messages that classified correctly. I don't know what it's properties would be, other than it appears to iteratively maximize the bimodal nature of the message score distribution by reducing unsures. I suggest that because you are training on unsures and picking your training set such that the corpus of unsures becomes very bimodal with few or no unsures. This method, if retrained often enough, will probably result in a small number of unsures, perhaps the smallest of all the methods. How it will perform on false positives and false negatives is a separate question, since they are not included in the message corpus the training set is selected from. I don't have an opinion one way or another, I'm just now recognizing how different a training tactic this is. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From skip at pobox.com Thu Dec 11 17:13:03 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Dec 11 17:13:04 2003 Subject: [Spambayes] How low can you go? In-Reply-To: References: <16343.20199.985033.901080@montanaro.dyndns.org> Message-ID: <16344.60399.678365.315438@montanaro.dyndns.org> (Seth, are you really nobody@spamcop.net or is that address harvester fodder? I've been deleting it from my replies because it smells vaguely like harvester fodder.) >> Nothing magic or random. I primed the pump one ham and one spam. >> Then sorted the unsures which arrived by score. Train the lowest >> scoring spam as spam. ... Seth> I just re-read this and realized I missed something key in your Seth> description. Your training set is culled only from unsures, Seth> rather than the set of all messages. Well, my description wasn't perfect either. False negatives (and false positives, should I ever see any) also get trained, *if* the training between the time they are incorrectly scored and the time I notice them doesn't push them into spam territory. I use procmail to sort my incoming mail into 20 or so mailboxes. sb_filter.py is executed very early on, with messages that score as spam or unsure siphoned off to relevant mailboxes. The stuff which scores as ham is then further sorted topically. Consequently, false negatives can go unnoticed for awhile, since they might be scattered all over the place. I focus most of my training attention on my unsure mailbox. When I see a false negative I save it in my unsure mailbox and deal with it the next time I work on that. Furthermore, since almost all mistakes and unsures are actually spam, I have to keep an eye on the spam/ham balance in my database, so occasionally stuff a correctly scored ham into the database. I try to choose messages which don't score 0.00 (rounded). Seth> - train on errors + unsures This is more-or-less what I do, it's just that since my focus is on unsures, some of the errors may go away before I get a chance to train on them. Seth> - train on errors + unsures + non-obvious correct decisions What do you mean by 'non-obvious correct decisions'? Skip From skip at pobox.com Thu Dec 11 15:30:09 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Dec 11 17:14:06 2003 Subject: [Spambayes] Dictionary Analysis Tool In-Reply-To: <2368489DC1DAF2488B85739359AD74DE7F2F30@exch2003.intranet.hdr> References: <2368489DC1DAF2488B85739359AD74DE7F2F30@exch2003.intranet.hdr> Message-ID: <16344.54225.691312.580124@montanaro.dyndns.org> Kent> I love the SpamClues feature, but I'd really like to know -- by Kent> "word" what "words" have the highest to lowest probability of Kent> occurring in my SpamBayes for all messages. Putting it another Kent> way, I'd like to know what the top N most "spammy" words for me. Kent> Is there a tool or other way to do this? There is a spamcounts.py script in the SpamBayes contrib directory. It will accept a regular expression to decide what tokens to display. Run it like so: spamcounts.py -r '.*' and it will dump a CSV file to standard output which contains all the tokens in your current database. It looks like so: token,nspam,nham,spam prob $63.01,1,0,0.844827586207 $1.99,1,0,0.844827586207 from:addr:detik.com,1,0,0.844827586207 four,1,2,0.310046433094 to:addr:ski,1,0,0.844827586207 "advertisers,",1,0,0.844827586207 08:06:09,0,1,0.155172413793 ... You can just pop that into Excel (or other favorite spreadsheet) and sort by the "spam prob" column or feed it into a Python script which uses the csv module to load it back up, sort it, then display the N rows with the highest spam prob. Skip From Kent.Tegels at hdrinc.com Thu Dec 11 17:36:03 2003 From: Kent.Tegels at hdrinc.com (Tegels, Kent) Date: Thu Dec 11 17:36:12 2003 Subject: [Spambayes] Dictionary Analysis Tool Message-ID: <2368489DC1DAF2488B85739359AD74DE7F315F@exch2003.intranet.hdr> Excellent. Thank you! -----Original Message----- From: Skip Montanaro [mailto:skip@pobox.com] Sent: Thursday, December 11, 2003 2:30 PM To: Tegels, Kent Cc: spambayes@python.org Subject: Re: [Spambayes] Dictionary Analysis Tool Kent> I love the SpamClues feature, but I'd really like to know -- by Kent> "word" what "words" have the highest to lowest probability of Kent> occurring in my SpamBayes for all messages. Putting it another Kent> way, I'd like to know what the top N most "spammy" words for me. Kent> Is there a tool or other way to do this? There is a spamcounts.py script in the SpamBayes contrib directory. It will accept a regular expression to decide what tokens to display. Run it like so: spamcounts.py -r '.*' and it will dump a CSV file to standard output which contains all the tokens in your current database. It looks like so: token,nspam,nham,spam prob $63.01,1,0,0.844827586207 $1.99,1,0,0.844827586207 from:addr:detik.com,1,0,0.844827586207 four,1,2,0.310046433094 to:addr:ski,1,0,0.844827586207 "advertisers,",1,0,0.844827586207 08:06:09,0,1,0.155172413793 ... You can just pop that into Excel (or other favorite spreadsheet) and sort by the "spam prob" column or feed it into a Python script which uses the csv module to load it back up, sort it, then display the N rows with the highest spam prob. Skip From nobody at spamcop.net Thu Dec 11 17:53:15 2003 From: nobody at spamcop.net (Seth Goodman) Date: Thu Dec 11 17:53:19 2003 Subject: [Spambayes] How low can you go? In-Reply-To: <16344.60399.678365.315438@montanaro.dyndns.org> Message-ID: [Skip Montanaro] > (Seth, are you really nobody@spamcop.net or is that address harvester > fodder? I've been deleting it from my replies because it smells vaguely > like harvester fodder.) nobody@spamcop.net is a dead address that the folks at SpamCop have kindly given permission for anyone to use in a public posting when they don't want their real address harvested. I don't know if it null routes or is rejected. Look after my sig for my real address. Apparently, Spambots can read and are polite so my spam load has decreased lately, though it does cause some confusion to humans. > Seth> - train on errors + unsures + non-obvious correct decisions > > What do you mean by 'non-obvious correct decisions'? That was Rob Hooft's description of what he does (see RobsSetup on the Wiki). Rob described it as "train on almost anything that didn't score 0.00 or 1.00". If I understand his description correctly, he trains on everything that scores between 0.2 and 99.8. I suppose a descriptive name would be "train on almost everything" or "train on everything but perfect classifications", but it's his method. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From tdickenson at devmail.geminidataloggers.co.uk Thu Dec 11 17:56:29 2003 From: tdickenson at devmail.geminidataloggers.co.uk (Toby Dickenson) Date: Thu Dec 11 17:56:34 2003 Subject: [Spambayes] out of balance database In-Reply-To: References: Message-ID: <200312112256.29144.tdickenson@devmail.geminidataloggers.co.uk> On Thursday 11 December 2003 17:31, Atom 'Smasher' wrote: > so, any suggestions for which messages to prune? oldest? duplicates? Im doing something similar, and I am happy with using kmail's mailbox expiry feature to remove oldest spams. -- Toby Dickenson From ktegels at msn.com Thu Dec 11 22:11:27 2003 From: ktegels at msn.com (ktegels@msn.com) Date: Thu Dec 11 22:11:55 2003 Subject: [Spambayes] Newbie having problems... Message-ID: Forgive the Python newbie questions there. I've downloaded and install ActivePython and the spamcounts script. Running it, I got: Traceback (most recent call last): File "C:\Documents and Settings\kent.ORAC\Application Data\SpamBayes\spamcount s.py", line 27, in ? from spambayes.Options import options ImportError: No module named spambayes.Options Okay, I think I understand, but I'm Perl person and I'm used to having .PMs. I'm assuming there really isn't such a thing for Python -- you just put the right .py file in the right place and magic happens. I'm aware of .pyc, but can't seem to find any for SpamBayes. Anyway, fetched Options.py and Tokenizer.py to c:\python32\lib. Same error. So, where do I go from here? Last question, is there a published schema for the .db files so I might try working with them with Perl instead (shabby, I know, but easier whilst I climb the Python curve.) That is assuming its a bsddb database, right? Thanks! kt -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031211/cddc9046/attachment.html From tim at fourstonesExpressions.com Thu Dec 11 22:54:45 2003 From: tim at fourstonesExpressions.com (Tim Stone) Date: Thu Dec 11 22:54:51 2003 Subject: [Spambayes] Newbie having problems... In-Reply-To: References: Message-ID: On Thu, 11 Dec 2003 21:11:27 -0600, wrote: > Forgive the Python newbie questions there. I've downloaded and install Ok, I think you must not have correctly installed Spambayes. The readme includes installation instructions. You cannot simply download particular pieces of spambayes from cvs, or run the scripts without installing them. They're all interdependent. As for the db, it is bsddb3 by default. There is no published schema. However, if there were, I don't think it would help you access the database from perl, as the information in the database is stored in a Python specific construction known as a "Pickle." I'm not aware of any picklers for perl... On the plus side, if you can do perl, then python will be a very quick study for you... -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun From Administrator at iisfaq.homeip.net Fri Dec 12 00:24:29 2003 From: Administrator at iisfaq.homeip.net (Chris Crowe) Date: Fri Dec 12 00:26:19 2003 Subject: [Spambayes] No longer working at all for me.... Message-ID: I installed version SpamBayes-Outlook-Setup-007.exe and it worked fine for 5-6 weeks and then it stopped working. I have been trying to install SpamBayes-Outlook-Setup-0081.exe It all installs fine but does not work at all, not installed into COM Addins, no special menus or anything. SpamBayes Log contains the following: Registered: SpamBayes.OutlookAddin Registration complete. I have uninstalled and also removed all references to SpamBayes from the registry but still no go. There is no reference to it on the COM Addins dialog in Outlook. Server OS : windows 2003 server Outlook : Outlook 2003 Any help would be appreciated. chris crowe -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031212/b99f190c/attachment.html From atom at suspicious.org Fri Dec 12 03:26:16 2003 From: atom at suspicious.org (Atom 'Smasher') Date: Fri Dec 12 03:27:31 2003 Subject: [Spambayes] shell script Message-ID: if anyone wants this... it counts how many hams are in $HOME/Maildir/cur/ and how many spams are in $HOME/Maildir/.spam/cur/ and if there's more spam than ham, it deletes the oldest spam (leaving equal amounts of ham and spam) and rebuilds the sb_mailsort.py database. this is not useful for everyone... if you need it, you'll probably know you need it. if you need something like it, but a little different, feel free to rewrite it. http://smasher.suspicious.org/tmp/spam.cron.tgz the tarball contains: bin/spam.cron bin/spam.cron.asc if anyone uses it, let me know how it works. i just threw it together, but it seems to work quite well... i'll be running it nightly as a cron job. ...atom _______________________________________________ PGP key - http://smasher.suspicious.org/pgp.txt 3EBE 2810 30AE 601D 54B2 4A90 9C28 0BBF 3D7D 41E3 ------------------------------------------------- Democracy, n.: A government of the masses. Authority derived through mass meeting or any other form of direct expression. Results in mobocracy. Attitude toward property is communistic... negating property rights. Attitude toward law is that the will of the majority shall regulate, whether it is based upon deliberation or governed by passion, prejudice, and impulse, without restraint or regard to consequences. Result is demagogism, license, agitation, discontent, anarchy. -- U. S. Army Training Manual No. 2000-25 (1928-1932), since withdrawn. From ktegels at msn.com Fri Dec 12 06:36:51 2003 From: ktegels at msn.com (ktegels@msn.com) Date: Fri Dec 12 06:36:53 2003 Subject: [Spambayes] Newbie having problems... In-Reply-To: Message-ID: >>Ok, I think you must not have correctly installed Spambayes. More likely than not, you're right. I did the binary install of the Windows bits, so your response makes me think I need to get all of the source and install that too, right? Easy enough. >>Python specific construction known as a "Pickle." I wondered what a Pickle was in this context. Now I know. :) Thanks! kt -----Original Message----- From: spambayes-bounces+kent=tegels.org@python.org [mailto:spambayes-bounces+kent=tegels.org@python.org] On Behalf Of Tim Stone Sent: Thursday, December 11, 2003 9:55 PM To: ktegels@msn.com; spambayes@python.org Subject: Re: [Spambayes] Newbie having problems... On Thu, 11 Dec 2003 21:11:27 -0600, wrote: > Forgive the Python newbie questions there. I've downloaded and install Ok, I think you must not have correctly installed Spambayes. The readme includes installation instructions. You cannot simply download particular pieces of spambayes from cvs, or run the scripts without installing them. They're all interdependent. As for the db, it is bsddb3 by default. There is no published schema. However, if there were, I don't think it would help you access the database from perl, as the information in the database is stored in a Python specific construction known as a "Pickle." I'm not aware of any picklers for perl... On the plus side, if you can do perl, then python will be a very quick study for you... -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html From ktegels at Msn.com Fri Dec 12 13:45:32 2003 From: ktegels at Msn.com (kent tegels) Date: Fri Dec 12 13:45:44 2003 Subject: [Spambayes] Oh good! In-Reply-To: Message-ID: Thanks to Skip and Tim, I think I'm making good progress on getting SpamCount to run, but I really seem to have another issue now. In the past, being a Windows User, I simply installed the 8.1 plug and was happy and content. Okay, so now I realize that I should have installed the source, since the modules that SpamCount wants aren't available otherwise. Fine, so I uninstall the plug-in and use setup to install the source code version. Now the toolbar for SpamBayes doesn't seem to work at all, and the "remove add-in" approach doesn't work since SB isn't registered any more. Worse yet, other posters asking "how do I remove SpamBayes" don't seem to getting answers posted back to the list. Yikes. So, I'll ask too: How do I completely remove the Add-In so that I can install and have a functioning application based on the source code so I can run SpamCount on Windows with Outlook. Or am I just asking for too much. If am I, could somebody point me at a depickle/otherwise export the .db files to some other format so I can get back to the first task I wanted to -- get the wordlist and the ham and spam counts? I'm doing this so I can provide our corporate folks with a list of words that we should consider spam in our rules-based filtering package that runs "in-front" of Exchange. Arigato, Kent From bob at 1776.com Fri Dec 12 13:20:46 2003 From: bob at 1776.com (Robert K. Coe) Date: Fri Dec 12 13:52:49 2003 Subject: [Spambayes] RE: Watch out for digests... In-Reply-To: <16343.54531.802087.451246@montanaro.dyndns.org> Message-ID: <000001c3c0dc$aa233cc0$6501a8c0@CambridgeMA.gov> What's a "hapax"? > -----Original Message----- > From: Skip Montanaro [mailto:skip@pobox.com] > Sent: Wednesday, December 10, 2003 9:23 PM > To: Tony Meyer > Cc: spambayes@python.org; spambayes-dev@python.org > Subject: RE: [Spambayes] Watch out for digests... > > > >> Big mistake. Stuff started getting wacky real fast.... Guess what? > >> One of the messages in the digest was an obvious spam. > > Tony> This is perhaps a drawback of the minimalist database size > Tony> training strategy. I'm guessing that if you had a larger > Tony> database, the effect wouldn't have been as pronounced? > > Maybe. At the moment, I have 9768 tokens in my database and 7731 of them > are hapaxes. As you suggest, it would appear mistakes can throw things off > more dramatically, but it is also easier to detect. > > I'd be interested to see what others' hapax fractions are: > > ... From bob at 1776.com Fri Dec 12 13:36:16 2003 From: bob at 1776.com (Robert K. Coe) Date: Fri Dec 12 13:52:56 2003 Subject: [Spambayes] RE: Yahoo's "domain keys" and spam In-Reply-To: <792DE28E91F6EA42B4663AE761C41C2A01A75055@cliff.bai.org> Message-ID: <000101c3c0de$d4c1f6e0$6501a8c0@CambridgeMA.gov> Where is it written that Spambayes would "add a token verifying the presence of a domain key entry in the header"? Doing so would seem to be self-defeating if credible forgeries of domain keys become widespread and only marginally helpful otherwise (since the "I know it when I see it" model of spam detection works very well for humans and fairly well for Bayesian filters without this additional complication). For that matter, where is it written that the use of domain keys will become widespread? (But I guess that's a topic for another discussion.) Bob MIS Department, City of Cambridge 831 Massachusetts Ave, Cambridge MA 02139 ? 617-349-4217 ? fax 617-349-6165 > -----Original Message----- > From: Ryan Malayter [mailto:rmalayter@bai.org] > Sent: Wednesday, December 10, 2003 8:36 AM > To: SpamBayes Forum > Subject: [Spambayes] Yahoo's "domain keys" and spam > > > http://www.wired.com/news/business/0,1367,61495,00.html?tw=wn_ > bizhead_2 > > Will we want to have SpamBayes check these (since many SB users will > have no control over what happens at their ISP or corporate gateway) as > they become widespread? > > Just as obviously, spammers will attempt to forge them as well (to fool > filters like the current SpamBayes that would just add a token verifying > the *presence* of a domain key entry in the header), or use > yet-to-be-revoked keys from domains obtained through fraudulent means. I > think some intelligence must be built into spambayes to handle these... From papaDoc at videotron.ca Fri Dec 12 14:00:41 2003 From: papaDoc at videotron.ca (papaDoc) Date: Fri Dec 12 14:00:45 2003 Subject: [Spambayes] RE: Watch out for digests... In-Reply-To: <000001c3c0dc$aa233cc0$6501a8c0@CambridgeMA.gov> References: <000001c3c0dc$aa233cc0$6501a8c0@CambridgeMA.gov> Message-ID: <3FDA1059.2030803@videotron.ca> Hi Robert; >What's a "hapax"? > > This is a word that appears in only one ham or spam (i.e. probabolity of 0.5) so we don't really know what to do with them we need more info. (i.e. must appears in more email before saying this is a spam of ham word with a given probability of X) Remi From nobody at spamcop.net Fri Dec 12 14:00:37 2003 From: nobody at spamcop.net (Seth Goodman) Date: Fri Dec 12 14:01:01 2003 Subject: [Spambayes] RE: Watch out for digests... In-Reply-To: <000001c3c0dc$aa233cc0$6501a8c0@CambridgeMA.gov> Message-ID: [Robert Coe] > What's a "hapax"? A token that only appears once in one database (ham or spam) and not at all in the other. At least that's my understanding of the definition. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From fmenottimacrossfan at aol.com Fri Dec 12 14:11:08 2003 From: fmenottimacrossfan at aol.com (Forest Medina) Date: Fri Dec 12 14:08:05 2003 Subject: [Spambayes] EB AY News Message-ID: <4o4$yjr$sb-066t62--9$7qxub$s@zmigdws0> An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031212/53037bfc/attachment.html From nobody at spamcop.net Fri Dec 12 14:06:51 2003 From: nobody at spamcop.net (Seth Goodman) Date: Fri Dec 12 14:11:57 2003 Subject: [Spambayes] RE: Yahoo's "domain keys" and spam In-Reply-To: <000101c3c0de$d4c1f6e0$6501a8c0@CambridgeMA.gov> Message-ID: > [Robert Coe] > Where is it written that Spambayes would "add a token verifying > the presence of a domain key entry in the header"? Doing so would > seem to be self-defeating if credible forgeries of domain keys > become widespread and only marginally helpful otherwise (since > the "I know it when I see it" model of spam detection works very > well for humans and fairly well for Bayesian filters without this > additional complication). My understanding of how SpamBayes deals with headers is that only headers that are specifically permitted in the .ini file are tokenized. Unless this one was added to the list, it would not be tokenized. At least that's how the Outlook plug-in version appears to work. > [Robert Coe] > > For that matter, where is it written that the use of domain keys > will become widespread? (But I guess that's a topic for another > discussion.) Good point. Even if Yahoo alone uses it, IMHO, SpamBayes should deal with it gracefully. If I understand the header restrictions, though, it already does. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From kennypitt at hotmail.com Fri Dec 12 14:12:13 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Fri Dec 12 14:12:49 2003 Subject: [Spambayes] Oh good! In-Reply-To: Message-ID: kent tegels wrote: > Fine, so I uninstall the plug-in and use setup to install the source > code version. > > Now the toolbar for SpamBayes doesn't seem to work at all, and the > "remove add-in" approach doesn't work since SB isn't registered any > more. Worse yet, other posters asking "how do I remove SpamBayes" > don't seem to getting answers posted back to the list. Yikes. > > So, I'll ask too: How do I completely remove the Add-In so that I can > install and have a functioning application based on the source code > so I can run SpamCount on Windows with Outlook. Two different issues. How do I completely remove the Add-In? See item 3.14 in the FAQ: http://spambayes.sourceforge.net/faq.html#how-do-i-uninstall-the-plug-in How do I have a functioning application based on the source code? You need to register the source code version of the plugin as a COM object. Go to the Outlook2000 directory and run "addin.py --register". Your current toolbar is dead because there is no COM add-in to respond to it's events. After you register the source code version, your toolbar should magically start working again. -- Kenny Pitt From kennypitt at hotmail.com Fri Dec 12 14:20:21 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Fri Dec 12 14:21:00 2003 Subject: [Spambayes] RE: Yahoo's "domain keys" and spam In-Reply-To: <000101c3c0de$d4c1f6e0$6501a8c0@CambridgeMA.gov> Message-ID: Robert K. Coe wrote: > Where is it written that Spambayes would "add a token verifying the > presence of a domain key entry in the header"? It's not. This was just a question for discussion as to whether or not it would be the right thing. > ... Doing so would seem to > be self-defeating if credible forgeries of domain keys become > widespread... Yes, I'm sure it would. It was suggested in the quoted message that we would need to include intelligence to validate them, and I think that would be important. > ... and only marginally helpful otherwise (since the "I know > it when I see it" model of spam detection works very well for humans > and fairly well for Bayesian filters without this additional > complication). The mantra of this project has always been "intuition is a poor guide". It's just one more piece of evidence for the filter to consider. Whether that piece of evidence ever makes a difference in the real world is anyone's guess. The actual performance would be thoroughly tested before incorporating the feature into the product. > >> -----Original Message----- >> From: Ryan Malayter [mailto:rmalayter@bai.org] >> Subject: [Spambayes] Yahoo's "domain keys" and spam >> >> Will we want to have SpamBayes check these (since many SB users will >> have no control over what happens at their ISP or corporate gateway) >> as they become widespread? >> >> Just as obviously, spammers will attempt to forge them as well (to >> fool filters like the current SpamBayes that would just add a token >> verifying the *presence* of a domain key entry in the header), or use >> yet-to-be-revoked keys from domains obtained through fraudulent >> means. I think some intelligence must be built into spambayes to >> handle these... -- Kenny Pitt From nobody at spamcop.net Fri Dec 12 14:21:53 2003 From: nobody at spamcop.net (Seth Goodman) Date: Fri Dec 12 14:22:00 2003 Subject: [Spambayes] Dictionary Analysis Tool In-Reply-To: <16344.54225.691312.580124@montanaro.dyndns.org> Message-ID: [Skip Montanaro] > There is a spamcounts.py script in the SpamBayes contrib > directory. It will > accept a regular expression to decide what tokens to display. Run it like > so: > > spamcounts.py -r '.*' > > and it will dump a CSV file to standard output which contains all > the tokens > in your current database. It looks like so: > > token,nspam,nham,spam prob > $63.01,1,0,0.844827586207 > $1.99,1,0,0.844827586207 > from:addr:detik.com,1,0,0.844827586207 > four,1,2,0.310046433094 > to:addr:ski,1,0,0.844827586207 > "advertisers,",1,0,0.844827586207 > 08:06:09,0,1,0.155172413793 > ... > > You can just pop that into Excel (or other favorite spreadsheet) > and sort by > the "spam prob" column or feed it into a Python script which uses the csv > module to load it back up, sort it, then display the N rows with > the highest > spam prob. Ooo, I like that. Since I am running the Outlook plug-in, what do I have to do to be able to use this? Won't there be a conflict if I bring in the source modules from CVS and run the install scripts? Could you give us a recipe for Outlook users who would like to mess with (or mess up) the source code and run it (crash it)? Also, which CVS version should we work with, considering we are not developers but would want to contribute working stuff to you? Some of the newer CVS forks have a lot of neat stuff implemented and without them, we might wind up re-inventing the wheel. Wow, a wheel, what a great idea! Think I'll write it up. A related question is where is the database of message ID's that are already trained? I know the system knows this as it won't train on the same copy of a message twice. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From skip at pobox.com Fri Dec 12 14:33:40 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Dec 12 14:35:27 2003 Subject: [Spambayes] RE: Watch out for digests... In-Reply-To: <3FDA1059.2030803@videotron.ca> References: <000001c3c0dc$aa233cc0$6501a8c0@CambridgeMA.gov> <3FDA1059.2030803@videotron.ca> Message-ID: <16346.6164.516524.327304@montanaro.dyndns.org> >> What's a "hapax"? Remi> This is a word that appears in only one ham or spam Remi> (i.e. probabolity of 0.5) so we don't really know what to do with Remi> them we need more info. (i.e. must appears in more email before Remi> saying this is a spam of ham word with a given probability of X) The first part is correct (appearing only once in the database), however, the prob assigned to a hapax which only turned up in a spam message is 0.84 and that for a hapax ham token is 0.16. Skip From kennypitt at hotmail.com Fri Dec 12 14:40:41 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Fri Dec 12 14:41:20 2003 Subject: [Spambayes] Dictionary Analysis Tool In-Reply-To: Message-ID: Seth Goodman wrote: > ... Since I am running the Outlook plug-in ... > > A related question is where is the database of message ID's that are > already trained? I know the system knows this as it won't train on > the same copy of a message twice. For the Outlook plug-in, it's in "default_message_database.db" in the data directory. -- Kenny Pitt From skip at pobox.com Fri Dec 12 14:42:38 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Dec 12 14:42:42 2003 Subject: [Spambayes] Dictionary Analysis Tool In-Reply-To: References: <16344.54225.691312.580124@montanaro.dyndns.org> Message-ID: <16346.6702.64362.499289@montanaro.dyndns.org> >> spamcounts.py -r '.*' ... Seth> Ooo, I like that. Since I am running the Outlook plug-in, what do Seth> I have to do to be able to use this? Won't there be a conflict if Seth> I bring in the source modules from CVS and run the install Seth> scripts? Dunno. That's a question for a Windows SB developer. I would think you could simply run from source. I think Kenny posted how to --register with COM if you're running from source. Seth> Also, which CVS version should we work with, considering we are Seth> not developers but would want to contribute working stuff to you? Just set up for cvs access and always use the trunk. While it might be slightly less stable than the latest release, I'm sure there are several people who actually rely on it to filter their mail (I do), so there aren't likely to be too many end-of-the-world-as-we-know-it bugs lurking there. Seth> Some of the newer CVS forks have a lot of neat stuff implemented Seth> and without them, we might wind up re-inventing the wheel. Wow, a Seth> wheel, what a great idea! Think I'll write it up. Use the trunk, Luke. Seth> A related question is where is the database of message ID's that Seth> are already trained? I know the system knows this as it won't Seth> train on the same copy of a message twice. Sounds Outlook-specific to me. Skip From tim.one at comcast.net Fri Dec 12 14:48:42 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Dec 12 14:48:46 2003 Subject: [Spambayes] RE: Watch out for digests... In-Reply-To: <000001c3c0dc$aa233cc0$6501a8c0@CambridgeMA.gov> Message-ID: [Robert Coe] > What's a "hapax"? Short for "hapax legomenon" . See the glossary at http://spambayes.sourceforge.net/docs.html hapax, hapax legomenon A word or form occurring only once in a document or corpus. (plural is hapax legomena). It's a standard term in the literature. [papaDoc] > This is a word that appears in only one ham or spam (i.e. probabolity > of 0.5) so we don't really know what to do with them we need more info. > (i.e. must appears in more email before saying this is a spam of ham > word with a given probability of X) Nope, we don't ignore any words that appear in the training database. The by-counting spamprob of a hapax is exactly 0 or exactly 1 (depending on whether the hapax appeared in a ham or in a spam), but the Bayesian adjustment drives the by-counting spamprob much closer to 0.5 because the word has been seen so rarely. It doesn't drive it *enough* toward 0.5 to push it into the range of spamprobs we ignore, though. For example, in the message I'm replying to right now, there was one hapax (among the significant tokens == those with a spamprob outside 0.4-0.6): token spamprob #ham #spam ----- -------- ---- ----- 'subject:Watch' 0.844828 0 1 So I've only trained on one msg with "Watch" in the Subject line, and that happened to be spam. Because it was seen only once, the by-counting spamprob was reduced from 1.0 to about 0.84, and that actually left it as the strongest spam clue in the message. The overall spam score was 0.000529751, so it didn't have much effect. From ktegels at Msn.com Fri Dec 12 14:54:44 2003 From: ktegels at Msn.com (kent tegels) Date: Fri Dec 12 14:54:53 2003 Subject: [Spambayes] So close but yet... In-Reply-To: Message-ID: Okay, that got me over the hump and everything is well and good, but SpamCount.py still has issues. I run: G:\sba>spamcounts.py -r '.*' -p -d 'G:\Documents and Settings\ktegels\Application Data\SpamBayes\default_bayes_database.db' I picked the default_bayes_database.db because I don't have proxy\statistics_database.db. And get: db: G:\Documents and Settings\ktegels\Application Data\SpamBayes\Proxy\statistic s_database.db Traceback (most recent call last): File "G:\sba\spamcounts.py", line 136, in ? sys.exit(main(sys.argv[1:])) File "G:\sba\spamcounts.py", line 123, in main db = shelve.open(dbname, flag='r') File "G:\Python23\lib\shelve.py", line 231, in open return DbfilenameShelf(filename, flag, protocol, writeback, binary) File "G:\Python23\lib\shelve.py", line 212, in __init__ Shelf.__init__(self, anydbm.open(filename, flag), protocol, writeback, binar y) File "G:\Python23\lib\anydbm.py", line 77, in open raise error, "need 'c' or 'n' flag to open new db" anydbm.error: need 'c' or 'n' flag to open new db Exception exceptions.AttributeError: "DbfilenameShelf instance has no attribute 'writeback'" in ignored So, obviously, I don't know how to point SpamCount.py at the right database instead of statistics, right? That's what I thought the -db switch did. How do I it? Thanks all for your patience with this somewhat dense Windows User. kt -----Original Message----- From: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org] On Behalf Of Kenny Pitt Sent: Friday, December 12, 2003 1:12 PM To: 'kent tegels'; spambayes@python.org Subject: RE: [Spambayes] Oh good! kent tegels wrote: > Fine, so I uninstall the plug-in and use setup to install the source > code version. > > Now the toolbar for SpamBayes doesn't seem to work at all, and the > "remove add-in" approach doesn't work since SB isn't registered any > more. Worse yet, other posters asking "how do I remove SpamBayes" > don't seem to getting answers posted back to the list. Yikes. > > So, I'll ask too: How do I completely remove the Add-In so that I can > install and have a functioning application based on the source code so > I can run SpamCount on Windows with Outlook. Two different issues. How do I completely remove the Add-In? See item 3.14 in the FAQ: http://spambayes.sourceforge.net/faq.html#how-do-i-uninstall-the-plug-in How do I have a functioning application based on the source code? You need to register the source code version of the plugin as a COM object. Go to the Outlook2000 directory and run "addin.py --register". Your current toolbar is dead because there is no COM add-in to respond to it's events. After you register the source code version, your toolbar should magically start working again. -- Kenny Pitt _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html From tim.one at comcast.net Fri Dec 12 14:58:47 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Dec 12 14:58:51 2003 Subject: [Spambayes] RE: Yahoo's "domain keys" and spam In-Reply-To: Message-ID: [Robert Coe] >> Where is it written that Spambayes would "add a token verifying >> the presence of a domain key entry in the header"? [Seth Goodman] > My understanding of how SpamBayes deals with headers is that only > headers that are specifically permitted in the .ini file are > tokenized. Unless this one was added to the list, it would not be > tokenized. At least that's how the Outlook plug-in version appears > to work. Yes, that's right for the Outlook plugin. The underlying engine has dozens of options, though, and *can* be configured to do lots of other things. For example, if the basic_header_tokenize option is enabled, a uniform kind of tokenization of *all* header lines is performed. The basic_header_skip option is then active too, and can be set to a sequence of regular expressions matching header lines that *shouldn't* get tokenized. An example of another header gimmick we ignore by default is Habeas headers: http://www.habeas.com The options search_for_habeas_headers and reduce_habeas_headers can be enabled to pick those apart. If the Yahoo gimmick appears to have any value, another option will surely grow so people can try mining those; if it turns out to have a lot of value, spambayes will then enable it by default. That all depends on someone caring enough about it to volunteer the work, of course. From kennypitt at hotmail.com Fri Dec 12 15:00:59 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Fri Dec 12 15:01:39 2003 Subject: [Spambayes] Dictionary Analysis Tool In-Reply-To: <16346.6702.64362.499289@montanaro.dyndns.org> Message-ID: Skip Montanaro wrote: > >> spamcounts.py -r '.*' > ... > > Seth> Ooo, I like that. Since I am running the Outlook plug-in, what do > Seth> I have to do to be able to use this? Won't there be a conflict if > Seth> I bring in the source modules from CVS and run the install > Seth> scripts? > > Dunno. That's a question for a Windows SB developer. I would think > you could simply run from source. I think Kenny posted how to > --register with COM if you're running from source. Yes, you could run everything from source. That would be my recommendation if you have any intention of playing around with how things work. However, it shouldn't cause any conflicts to have both the binary install and the source version as long as you don't do the "addin.py --register" dance. The binary packages everything it needs to run, and doesn't care at all about other source versions or even other versions of Python that are installed. The only potential conflict would be if the binary was using a different database format than the source code version, but I don't think that would be an issue because the database format hasn't changed in quite awhile. > Seth> Also, which CVS version should we work with, considering we are > Seth> not developers but would want to contribute working stuff to you? > > Just set up for cvs access and always use the trunk. While it might > be slightly less stable than the latest release, I'm sure there are > several people who actually rely on it to filter their mail (I do), > so there aren't likely to be too many end-of-the-world-as-we-know-it > bugs lurking there. I do, as well, and I haven't found it to be any less stable than the 0.81 binary. There have actually been quite a few things fixed since then. The one thing I would add is that once you get a copy of the source, don't do a "cvs update" as long as it's working well for you. That way you won't risk picking up new bugs or partially-implemented features. -- Kenny Pitt From skip at pobox.com Fri Dec 12 15:26:32 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Dec 12 15:26:41 2003 Subject: [Spambayes] So close but yet... In-Reply-To: References: Message-ID: <16346.9336.655620.20716@montanaro.dyndns.org> kent> Okay, that got me over the hump and everything is well and good, kent> but SpamCount.py still has issues. kent> I run: kent> G:\sba>spamcounts.py -r '.*' -p -d 'G:\Documents and kent> Settings\ktegels\Application Data\SpamBayes\default_bayes_database.db' kent> I picked the default_bayes_database.db because I don't have kent> proxy\statistics_database.db. kent> And get: ... kent> File "G:\Python23\lib\anydbm.py", line 77, in open kent> raise error, "need 'c' or 'n' flag to open new db" kent> anydbm.error: need 'c' or 'n' flag to open new db My wild-ass guess is that you are actually using a pickle file, not a shelf file for storage. Short term, try adding the -p flag to tell spamcounts.py that your training datbase is a pickle. I just checked in a change so the script uses the persistent_use_database to determine the default database type. Skip From rmalayter at bai.org Fri Dec 12 16:12:51 2003 From: rmalayter at bai.org (Ryan Malayter) Date: Fri Dec 12 16:12:54 2003 Subject: [Spambayes] RE: Yahoo's "domain keys" and spam Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01A750F0@cliff.bai.org> > From: Robert K. Coe > Where is it written that SpamBayes would "add a token > verifying the presence of a domain key entry in the header"? It's not, although the current SpamBayes header tokenizer would detect this header and add the same token no matter what the actual validity of the signature was. It would look something like "X-Yahoo-Domain-Sig: skip:20". > Doing so would seem to be self-defeating if credible > forgeries of domain keys become widespread *Credible* forgeries will not be widespread, since you need the organization's private key to generate the domain signature. However, unless the logic to actually perform the crypto and verify the signature is added to SpamBayes, all SB will see is a header token with a bunch of random hex characters. That is, only the *presence* of a domain signature would be detected, not whether or not it is actually valid. > marginally helpful otherwise (since the "I know it when I see > it" model of spam detection works very well for humans and > fairly well for Bayesian filters without this additional > complication). A validated domain key signature would tell you with certainty whether or not a mail message that claims to be from a domain is actually from an SMTP server controlled by the same people who control the domain's DNS. My intuition tells me this is a very strong bit of evidence that would be very useful to the classifier. Of course, that will have to be tested before the feature would be added to the production code. > For that matter, where is it written that the use of domain > keys will become widespread? (But I guess that's a topic for > another discussion.) Yahoo is one of the biggest mail hosts on the Internet, and they're donating the code to the most popular Open-source SMTP servers in use on the Internet. It's supposed to be a cheap and simple addition to the DNS and e-mail infrastructure, and Yahoo is evangelizing it to other major ISPs as a way to cut down on forged spam message headers. I assume a lot of smaller ISPs and corporations will get on board as a result, but of course it may just be ignored. There will also be a significant ramp-up time. I'm merely suggesting that the SB project get out in front of the issue. I will try adapt the necessary crypto code into Python myself as soon as it is available in other open-source projects. (I'm guessing the code will be in C, and go to the Qmail, sendmail, etc. projects). Regards, -Ryan- From wsy at merl.com Fri Dec 12 16:11:26 2003 From: wsy at merl.com (Bill Yerazunis) Date: Fri Dec 12 16:13:14 2003 Subject: [Spambayes] RE: Watch out for digests... In-Reply-To: References: Message-ID: <200312122111.hBCLBQJ09241@localhost.localdomain> From: "Seth Goodman" [Robert Coe] > What's a "hapax"? A token that only appears once in one database (ham or spam) and not at all in the other. At least that's my understanding of the definition. Yep. The full term is "hapax legumenon", from the greek meaning "counted once", and it means a word seen only once in a corpus of text. When you're trying to decode a "lost language", hapaxes are your worst nightmare, as you really can't cross-check to see if your believed translation of the word is right or not if the word only occurs in one place. -Bill Yerazunis From nobody at spamcop.net Fri Dec 12 16:56:24 2003 From: nobody at spamcop.net (Seth Goodman) Date: Fri Dec 12 16:56:28 2003 Subject: [Spambayes] RE: Yahoo's "domain keys" and spam In-Reply-To: <792DE28E91F6EA42B4663AE761C41C2A01A750F0@cliff.bai.org> Message-ID: [Ryan Malayter] > I'm merely suggesting that the SB project get out in front of the issue. > I will try adapt the necessary crypto code into Python myself as soon as > it is available in other open-source projects. (I'm guessing the code > will be in C, and go to the Qmail, sendmail, etc. projects). IMHO, this code belongs in the MTA's, not in SpamBayes. The usefulness of the approach is that an MTA can reject, at the SMTP level, any message whose encrypted signature does not decrypt with the public key. There will be a period where some MTA's don't support this feature, and one could argue that some MTA's will never implement it. My take on this, FWIW, is that unless Yahoo fails to implement it, it's a done deal and all MTA's will pretty much have to support it. If it proves useful at rejecting spam, which appears likely, I also can't see ISP's deciding to not enable the feature in their MTA's. During the period when few MTA's have this feature, this capability would add some value to SpamBayes. If I understand it right, you need to do a DNS lookup to access the PKI key to do the decryption. Both the DNS lookup and the decryption calculation are very costly in terms of time per message, but it may still be worth it. That's your call. If it does succeed in reliably authenticating senders and is not easily abused (I don't know enough about DNS and PKI to comment), it is not unlikely that Yahoo or some other major ISP will ultimately decide to reject any mail that doesn't carry the authentication header. At that point, the relevant IETF technical committee will probably vote to elevate the RFC from experimental to a draft standard and then a standard. Of course, I neither run a large mail system nor an ISP, so I could be dead wrong. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From mhammond at skippinet.com.au Fri Dec 12 18:47:04 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri Dec 12 18:47:19 2003 Subject: [Spambayes] No longer working at all for me.... In-Reply-To: Message-ID: <072c01c3c10a$3f302760$2c00a8c0@eden> See the troubleshooting guide - online at: http://cvs.sourceforge.net/viewcvs.py/*checkout*/spambayes/spambayes/Outlook 2000/docs/troubleshooting.html?rev=HEAD&content-type=text/html Try the "Toolbar items appear, but fail to work" item, and if that fails, try "Resetting SpamBayes configuration" Mark. -----Original Message----- From: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org]On Behalf Of Chris Crowe Sent: Friday, 12 December 2003 4:24 PM To: spambayes@python.org Subject: [Spambayes] No longer working at all for me.... I installed version SpamBayes-Outlook-Setup-007.exe and it worked fine for 5-6 weeks and then it stopped working. I have been trying to install SpamBayes-Outlook-Setup-0081.exe It all installs fine but does not work at all, not installed into COM Addins, no special menus or anything. SpamBayes Log contains the following: Registered: SpamBayes.OutlookAddin Registration complete. I have uninstalled and also removed all references to SpamBayes from the registry but still no go. There is no reference to it on the COM Addins dialog in Outlook. Server OS : windows 2003 server Outlook : Outlook 2003 Any help would be appreciated. chris crowe -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2316 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031213/13758c91/winmail.bin From atom at suspicious.org Sat Dec 13 00:10:11 2003 From: atom at suspicious.org (Atom 'Smasher') Date: Sat Dec 13 00:11:26 2003 Subject: [Spambayes] RE: Yahoo's "domain keys" and spam In-Reply-To: References: Message-ID: http://story.news.yahoo.com/news?tmpl=story&u=/nm/20031205/wr_nm/tech_yahoo_dc_2 i haven't found many technical reports on yahoo's plan, but i suspect that some of the failures in it are: 1) a paying (or theiving!) customer of XYZ-ISP sends spam, and it's "authenticated". this can happen either through a virus or a "make money at home with your computer!" scheme. 2) domains names and hosting are cheap. it would be a slight hurdle for spammers to register new domain names through ISPs and "hit & run" that server, ISP, domain name... depending on how the system is set up. 3) spam-houses that consider themselves to be legit will have no problem sending "authenticated" spam. so, the system will likely have the effect of not only blocking non-spam email, but giving a green light to a large volume of "authenticated" spam. which brings us back where we started... RBLSs, filtering, etc... but with some added overhead to maintaining an SMTP server. that's my $0.03 (adjusted for the falling dollar). ...atom _______________________________________________ PGP key - http://smasher.suspicious.org/pgp.txt 3EBE 2810 30AE 601D 54B2 4A90 9C28 0BBF 3D7D 41E3 ------------------------------------------------- "Politics is the art of preventing people from taking part in affairs which properly concern them." -- Paul Valery From nconstantin at socal.rr.com Fri Dec 5 16:06:38 2003 From: nconstantin at socal.rr.com (Nicholas) Date: Sat Dec 13 04:32:09 2003 Subject: [Spambayes] <.....'You must configure the Spam folder' message.....> Message-ID: <000a01c3c15b$e93bc920$6e7ba8c0@socal.rr.com> How do I go about doing that? ThanksNick-at-Nite -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031205/26a2aa1f/attachment.html From Eric.Kehr at dsl.pipex.com Sat Dec 13 03:36:51 2003 From: Eric.Kehr at dsl.pipex.com (Eric Kehr) Date: Sat Dec 13 10:38:46 2003 Subject: [Spambayes] Spambayes has stopped filtering altogether Message-ID: Hi, The program has worked perfectly for a long time, and now suddenly has stopped working - no Spam percentages are displayed and the logs (attached) are full of errors. What should I do? Thanks for your help. Eric --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.551 / Virus Database: 343 - Release Date: 11/12/2003 -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 237803 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031213/bf9d2fe0/winmail-0001.bin From gerrit at nl.linux.org Sat Dec 13 15:23:06 2003 From: gerrit at nl.linux.org (Gerrit Holl) Date: Sat Dec 13 15:23:58 2003 Subject: [Spambayes] How low can you go? In-Reply-To: <16342.20443.331861.376383@montanaro.dyndns.org> References: <16342.20443.331861.376383@montanaro.dyndns.org> Message-ID: <20031213202306.GA8893@nl.linux.org> Skip Montanaro wrote: > Not long ago I dumped it all in favor of a more minimalist approach. In the I think that's a good idea. Having trained some unsures either as ham or as spam although they were very similar (that is, spambayes was actually _right_ to classify it as unsure but I didn't like it), the unsure ratio of my database got worse and I started having false negatives. So now, I restarted my database as well. Drawback is that by current unsure ratio is 1 ;) > At the moment I have trained on 14 spams and 20 hams and am quite pleased > with how its performing so far. I remember the same from the past. My father is using non-bayesian spamassasin, and it seems the spamassasin manpage warns that without 'hundreds of messages' bayesian spamfiltering is unusable. This is obviously incorrect for Spambayes. Spambayes comes with no knowledge. Does it have a more intelligent algorithm? Or is the warning in the spamassasin manpage incorrect? > So, how small is yours? Currently 0, with no unsures, but for 4 minutes and a 1 per 5 minute frequence of fetchmail, it's no surprise . I have the dilemma that a lot of spam I receive has already been 'handles' by my ISP. It filters it for viruses and if it contains a virus (2/3 of the spam I receive does), it replaces it with a message. The message is equal each time, so, based on the wording in that message, some words with are not spammy at all based on intuition are being handled as spammy words. It doesn't work bad, however. A more serieus problem is how to recognize fake bounces from real bounces... (ah, 1 unsure now, but I'll wait until I have at least 3 hams and 3 spams before I start training) yours, Gerrit. -- 225. If he perform a serious operation on an ass or ox, and kill it, he shall pay the owner one-fourth of its value. -- 1780 BC, Hammurabi, Code of Law -- Asperger's Syndrome - a personal approach: http://people.nl.linux.org/~gerrit/english/ From tim.one at comcast.net Sat Dec 13 18:02:22 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Dec 13 18:02:25 2003 Subject: [Spambayes] How low can you go? In-Reply-To: <16342.20443.331861.376383@montanaro.dyndns.org> Message-ID: [Skip Montanaro] > Okay, time for a little contest. We've recently seen several users > tout the size of their training database. I used to be one of those > "enlarged database" types, but no more. ... > At the moment I have trained on 14 spams and 20 hams and am quite > pleased with how its performing so far. ... > So, how small is yours? If size is measured by total # of messages trained on, a different approach could probably work better. The original goal of this project was filtering high-volume mailing lists, on high-end server-class machines, and there was really no limit on how large a training set was acceptable, nor any difficulty in getting any number of ham and spam to train on. Schemes that consider bigrams too (or even larger units, like Bill's CRM114) learn faster, meaning that my tests showed they achieved accuracy comparable to our current scheme after training on fewer total messages (although in the versions of CRM114-like tokenizing we tried, performance plummeted after hash collisions became "too common"). Given the way we store token statistics, though, they can create very much larger databases (so large that even a high-end box would struggle to keep up). Since the size of database required under our current scheme grows mildly (in comparison) as the # of training messages increase, and experiments showed that it did at least as well as any alternative tried given enough training data, there was no good reason to switch. But if you're actively looking to minimize the # of messages trained on, the higher database growth rate of other schemes isn't as important. One scheme I recommend trying is the mixed unigram/bigram scheme, with the special scoring gimmick Gary Robinson suggested. I *think* Tony Meyer has an up-date-patch for that, but I may be wrong. A difficulty with enlarging "the source window" we look at when generating tokens is that it creates highly correlated features. For example, the snippet penis enlargement generates "penis" and "enlargement", but under the mixed unigram/bigram scheme it *additionally* creates the single feature "penis enlargement". All of those are likely to have high spamprobs, of course. Correlation usually helps us rather than hurts us, but there are exceptions, and the truth of it relies in large part on that our current scheme rarely goes out of its way to create correlated features. Mixing unigrams and bigrams does go out of its way, and, indeed, creates nothing but correlated features. Correlated features have been implicated time and again in the rare "spectacular failures" we see, so it's scary to create more of them. That's where Gary Robinson's "special scoring gimmick" comes in, a way to *count* no more than one feature per source token when scoring. In the example, it might decide to score "penis enlargement" as a single feature, but, if it did, it would *not* also feed the spamprobs of "penis" and "enlargement" into the final score; or it might decide to feed the spamprobs of both constituent words into the final score, in which case it would leave the spamprob of the bigram out of the score. In effect, scoring "tiles" the source with a collection of non-overlapping unigram and bigram features, picked in such a way as to approximate maximizing the aggregate spamprob strengths over all possible tilings. That wasn't tested enough to ensure it achieved what it was after, but it made a lot of theoretical sense, and worked fine in small preliminary tests. The point is to get faster learning without increasing the "spectacular failure" rate (which has always been very small, but isn't 0, and would most likely get much larger (but still remain "small"!) without a gimmick to counteract systematic correlation). From wsy at merl.com Sat Dec 13 18:20:57 2003 From: wsy at merl.com (Bill Yerazunis) Date: Sat Dec 13 18:21:01 2003 Subject: [Spambayes] How low can you go? In-Reply-To: References: Message-ID: <200312132320.hBDNKvv12163@localhost.localdomain> From: "Tim Peters" That's where Gary Robinson's "special scoring gimmick" comes in, a way to *count* no more than one feature per source token when scoring. In the example, it might decide to score "penis enlargement" as a single feature, but, if it did, it would *not* also feed the spamprobs of "penis" and "enlargement" into the final score; or it might decide to feed the spamprobs of both constituent words into the final score, in which case it would leave the spamprob of the bigram out of the score. In effect, scoring "tiles" the source with a collection of non-overlapping unigram and bigram features, picked in such a way as to approximate maximizing the aggregate spamprob strengths over all possible tilings. That wasn't tested enough to ensure it achieved what it was after, but it made a lot of theoretical sense, and worked fine in small preliminary tests. The point is to get faster learning without increasing the "spectacular failure" rate (which has always been very small, but isn't 0, and would most likely get much larger (but still remain "small"!) without a gimmick to counteract systematic correlation). I tried that too - for each window stepping, only the most extreme probability was used. Essentially this decorrellated the incoming stream so that Bayesian modeling was a little more accurate. But the results were a statistical failure. the error rate on my standard test corpus jumped from 68 (using no correction) to 80 using this "tiling" method. What _has_ worked better is to use a Markov model instead of a Bayesian model; that actually gets me down to 56. I haven't tried tiling Markov yet... oh dear... another CPU-day down the tubes. :) -Bill Yerazunis From tim.one at comcast.net Sat Dec 13 18:22:23 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Dec 13 18:22:26 2003 Subject: [Spambayes] How low can you go? In-Reply-To: <20031213202306.GA8893@nl.linux.org> Message-ID: [Gerrit Holl] > ... > My father is using non-bayesian spamassasin, and it seems the > spamassasin manpage warns that without 'hundreds of messages' bayesian > spamfiltering is unusable. This is obviously incorrect for Spambayes. > Spambayes comes with no knowledge. Does it have a more intelligent > algorithm? Or is the warning in the spamassasin manpage incorrect? I'm afraid there are several research projects hiding in there, so we'll probably never know. For an individual, SpamBayes does much better than chance after training on 1 ham and 1 spam, and even just that much can *help* keep your inbox saner. If a single SpamBayes was trying to filter email for several people, though, it may even do worse than chance after training on 1 of each (don't know -- haven't tried; "head arguments" can be made in any direction here; for example, if the single spam trained on was addressed to me, and the single ham to you, then there's systematic pressure to call everything addressed to me spam, and everything addressed to you ham). From tim.one at comcast.net Sat Dec 13 19:49:29 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Dec 13 19:49:32 2003 Subject: [Spambayes] Spambayes has stopped filtering altogether In-Reply-To: Message-ID: [Eric Kehr] > The program has worked perfectly for a long time, and now suddenly has > stopped working - no Spam percentages are displayed and the logs > (attached) are full of errors. > > What should I do? Unfortunately, they're all the same error, ending with: File "e:\src\spambayes\spambayes\classifier.py", line 217, in chi2_spamprob File "e:\src\spambayes\spambayes\classifier.py", line 465, in _getclues File "e:\src\spambayes\spambayes\classifier.py", line 316, in probability exceptions.AssertionError: This is a sure sign that your training database has become corrupted, and you have to retrain from scratch. If you saved your training data, that's easy: SpamBayes -> SpamBayes Manager -> Training -> check the "Rebuild entire database" box, select your training folders, click the "Start Training" button It's very unusual to get a report of database corruption under the Outlook addin, and I don't have a theory for why it happened. From tameyer at ihug.co.nz Sat Dec 13 21:28:24 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Sat Dec 13 21:28:34 2003 Subject: [Spambayes] Oh good! In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13046B4A9A@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130467774E@its-xchg4.massey.ac.nz> > Or am I just asking for too much. You're not, but I'll answer this anyway . > If am I, could somebody point me at a depickle/otherwise > export the .db files to some other format so I can get back > to the first task I wanted to -- get the wordlist and the > ham and spam counts? The sb_dbexpimp.py script in the scripts folder of the source will do this. You can convert to and from pickle and bsddb, and also to/from 'flat' text ('`' delimited, for some reason). You'll need need the full source package to do this (well, some subset of that, anyway). Easiest to just unpack the archive, "set pythonpath=path:\to\archive" and then run the script. Running with no options will give example usage. =Tony Meyer From tameyer at ihug.co.nz Sat Dec 13 21:32:12 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Sat Dec 13 21:32:19 2003 Subject: [Spambayes] How low can you go? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13046B4D13@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130467774F@its-xchg4.massey.ac.nz> > One scheme I recommend trying is the mixed unigram/bigram > scheme, with the special scoring gimmick Gary Robinson > suggested. I *think* Tony Meyer has an up-date-patch for > that, but I may be wrong. As usual, you're not. If anyone wants this, I can put it up on the wiki or something. Or if there's enough demand (discussion of training is certainly a hot topic at the moment, although I'm not sure how many people are from-cvs users) I could check it in as an experimental option. IIRC, I may have posted the code for the change to the -dev list some time back, too, so it could very well be in the archives. Tim's original implementation of the scheme is available there, too (but *much* further back, and here, not -dev). =Tony Meyer From tim at fourstonesExpressions.com Sat Dec 13 22:02:37 2003 From: tim at fourstonesExpressions.com (Tim Stone) Date: Sat Dec 13 22:02:46 2003 Subject: [Spambayes] Oh good! In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130467774E@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130467774E@its-xchg4.massey.ac.nz> Message-ID: On Sun, 14 Dec 2003 15:28:24 +1300, Tony Meyer wrote: > ('`' delimited, for some reason). backtick was a character that seemed unlikely to be present in the database... ;) -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun From traynham at mindspring.com Sat Dec 13 23:39:53 2003 From: traynham at mindspring.com (Ken Traynham) Date: Sat Dec 13 23:40:20 2003 Subject: [Spambayes] Problem? Message-ID: <000001c3c1fc$548a6060$b900a8c0@bigmax> Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 14627 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031213/19d8d770/attachment-0001.jpe From tim.one at comcast.net Sat Dec 13 23:53:14 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Dec 13 23:53:16 2003 Subject: [Spambayes] How low can you go? In-Reply-To: <200312132320.hBDNKvv12163@localhost.localdomain> Message-ID: [Bill Yerazunis] > I tried that too - for each window stepping, only the most extreme > probability was used. Essentially this decorrellated the incoming > stream so that Bayesian modeling was a little more accurate. I think details matter a lot here, and I doubt they left your results predictive of ours. Two things in particular: 1. We do Bayesian modeling of individual word spamprobs, but there's nothing Bayesian about the way we combine spamprobs. As the graphs on http://spambayes.sourceforge.net/background.html show, we had relatively enormous "spectacular failure" rates when using Bayesian combining in the very early days, but those dropped so low after moving to chi-combining that there are no instances of a spectacular failure at all on the third graph. By "spectacular failure" I mean an extremely low-scoring spam or extremely high-scoring ham. Graham's scheme produced tons of these (compared to what eventually proved possible). 2. Gary suggested a window-based scoring gimmick, but I didn't implement it that way because it was too poor an approximation to "strongest over all possible tilings". Instead it was done like: throw all the features into a bag while the bag isn't empty: pick a feature F with maximal strength among all features still in the bag (meaning a feature whose spamprob is maximally distant from 0.5, in either direction) feed F's spamprob into scoring remove every feature from the bag that intersects with F in at least one position (in particular, that removes F from the bag, and possibly other features too) > But the results were a statistical failure. > > the error rate on my standard test corpus jumped from 68 (using > no correction) to 80 using this "tiling" method. We didn't do enough tests to say anything with confidence; the initial tests showed better performance than what we do now given the same (small) amount of training data, but there wasn't enough coverage in the initial tests to have confidence in the results. The tests were especially weak because they were only done on one corpus. > What _has_ worked better is to use a Markov model instead of a > Bayesian model; that actually gets me down to 56. > > I haven't tried tiling Markov yet... oh dear... another CPU-day > down the tubes. :) Since Markov models come in 150 flavors of their own, I'll wait until you write a paper about it . From rcoe at CambridgeMA.GOV Sat Dec 13 23:56:43 2003 From: rcoe at CambridgeMA.GOV (Coe, Bob) Date: Sat Dec 13 23:56:47 2003 Subject: [Spambayes] RE: Oh good! Message-ID: Unless, of course, the text is in Hawaiian. ;^) Bob > -----Original Message----- > From: Tim Stone [mailto:tim@fourstonesExpressions.com] > Sent: Saturday, December 13, 2003 10:03 PM > To: Tony Meyer; 'kent tegels'; spambayes@python.org > Subject: Re: [Spambayes] Oh good! > > > On Sun, 14 Dec 2003 15:28:24 +1300, Tony Meyer > wrote: > > ('`' delimited, for some reason). > > backtick was a character that seemed unlikely to be present in the > database... ;) From dreas at emailaccount.nl Sun Dec 14 04:17:52 2003 From: dreas at emailaccount.nl (Dreas van Donselaar) Date: Sun Dec 14 04:18:10 2003 Subject: [Spambayes] Problem? References: <000001c3c1fc$548a6060$b900a8c0@bigmax> Message-ID: <00cb01c3c223$27113970$7a7ba8c0@hedwigpc> Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 14627 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031214/5a12338d/attachment-0001.jpe From wsy at merl.com Sun Dec 14 10:19:25 2003 From: wsy at merl.com (Bill Yerazunis) Date: Sun Dec 14 10:19:33 2003 Subject: [Spambayes] How low can you go? In-Reply-To: References: Message-ID: <200312141519.hBEFJP626393@localhost.localdomain> From: "Tim Peters" [Bill Yerazunis] > I tried that too - for each window stepping, only the most extreme > probability was used. Essentially this decorrellated the incoming > stream so that Bayesian modeling was a little more accurate. I think details matter a lot here, and I doubt they left your results predictive of ours. Two things in particular: 1. We do Bayesian modeling of individual word spamprobs, but there's nothing Bayesian about the way we combine spamprobs. As the graphs on http://spambayes.sourceforge.net/background.html show, we had relatively enormous "spectacular failure" rates when using Bayesian combining in the very early days, but those dropped so low after moving to chi-combining that there are no instances of a spectacular failure at all on the third graph. By "spectacular failure" I mean an extremely low-scoring spam or extremely high-scoring ham. Graham's scheme produced tons of these (compared to what eventually proved possible). Do you have a good writeup of chi-combining online? I still haven't quite managed to wrap my head around it from what I've read. 2. Gary suggested a window-based scoring gimmick, but I didn't implement it that way because it was too poor an approximation to "strongest over all possible tilings". Instead it was done like: throw all the features into a bag while the bag isn't empty: pick a feature F with maximal strength among all features still in the bag (meaning a feature whose spamprob is maximally distant from 0.5, in either direction) feed F's spamprob into scoring remove every feature from the bag that intersects with F in at least one position (in particular, that removes F from the bag, and possibly other features too) Ah, OK. That's quite different. I just took: - for each window position - put the maximal strength feature in as the local probability for the window position. - throw the rest of this window's features away. > What _has_ worked better is to use a Markov model instead of a > Bayesian model; that actually gets me down to 56. > > I haven't tried tiling Markov yet... oh dear... another CPU-day > down the tubes. :) Since Markov models come in 150 flavors of their own, I'll wait until you write a paper about it . It's more like one or two slides. :-( Here's the quick description... Right now, the way CRM114 calculates local probabilities for each polynomial on each window is biased strongly toward 0.5. The previous formula that worked best was: hits_this_corpus - (hits_all_corpus - hits_this_corpus) 0.5 + --------------------------------------------------------- 16 * ( hits_all_corpus + 1) which as you can see, needs a LOT of corpus hits on a feature to get much certainty out of it. For a corpus hapax that recurs in the test message, the local probability would be 0.5 + 1/32 = .53125 With 16 hits on it, all spam, the local probability would be 0.5 + 16/(16*17) = .5588 ...um... that's not right. and now that I'm staring at it... I'm wondering how this could ever have worked, what I was thinking when I wrote it, and the shocking realization that I need to spend a LOT more time checking my code... -Bill Yerazunis From cn at wildbit.com Sun Dec 14 12:01:16 2003 From: cn at wildbit.com (Chris Nagele) Date: Sun Dec 14 12:01:24 2003 Subject: [Spambayes] Bug and Help Message-ID: Hi. For some reason the Spam Bayes software is not working correctly. It does not auto-filter messages and the "suspect email" button does not show up in the tool bar (only the delete as spam). I can manually select to filter messages from the SpamBayes menu and it works fine. I am running Outlook 2003 11.5207.5207. My log files are attached. Thanks for your help. -------------- next part -------------- A non-text attachment was scrubbed... Name: spambayes1.log Type: application/octet-stream Size: 911 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031214/3ab77f64/spambayes1.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: spambayes2.log Type: application/octet-stream Size: 1008 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031214/3ab77f64/spambayes2.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: spambayes3.log Type: application/octet-stream Size: 59 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031214/3ab77f64/spambayes3.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: spambayes4.log Type: application/octet-stream Size: 2077 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031214/3ab77f64/spambayes4.obj From tim.one at comcast.net Sun Dec 14 15:58:17 2003 From: tim.one at comcast.net (Tim Peters) Date: Sun Dec 14 15:58:22 2003 Subject: [Spambayes] How low can you go? In-Reply-To: <200312141519.hBEFJP626393@localhost.localdomain> Message-ID: [Bill Yerazunis] > Do you have a good writeup of chi-combining online? > > I still haven't quite managed to wrap my head around it from what > I've read. Gary Robinson wrote a good article about it in the Linux Journal: http://www.linuxjournal.com/article.php?sid=6467 I can't guess whether it would help you without a concept of Unsure messages, though. A dramatic effect of chi-combining was transforming the "spectacular failures" under Bayesian combining into solid Unsures. For example, when there's a ton of evidence in both directions, Bayesian combining tends toward an absurd level of confidence in its guess, and regardless of whether its guess is right or wrong, but chi-combining reliably returns a score near 0.50 in such cases. A mental hangup at first may be that chi-combining doesn't even pretend to compute the probability that a msg is spam, and only confusion can come from supposing that it does. It produces "a score", with no intuitive meaning other than that "near 1.0 means spam, near 0.0 means ham, and close to 0.5 means the evidence is so mixed that the system can't guess". > ... > Here's the quick description... > > Right now, the way CRM114 calculates local probabilities for each > polynomial on each window is biased strongly toward 0.5. The > previous formula that worked best was: > > hits_this_corpus - (hits_all_corpus - hits_this_corpus) > 0.5 + --------------------------------------------------------- > 16 * ( hits_all_corpus + 1) > > > which as you can see, needs a LOT of corpus hits on a feature to get > much certainty out of it. > For a corpus hapax that recurs in the test message, the local > probability would be 0.5 + 1/32 = .53125 > > With 16 hits on it, all spam, the local probability would be 0.5 + > 16/(16*17) = .5588 > > ...um... that's not right. Well, it follows from the equation . If there are N hits, all spam, it's 0.5 + N/(16*(N+1)) = 0.5 + N/(N+1) * 1/16 which approaches 0.5 + 1/16 = 0.5625 from below as N approaches infinity. > and now that I'm staring at it... I'm wondering how this could > ever have worked, Keeping spamprobs very mild would help Bayesian combining avoid "spectacular failures", and that may have played into it. Have to agree it doesn't seem right anyway, though. > what I was thinking when I wrote it, You're on your own there . > and the shocking realization that I need to spend a LOT more time > checking my code... Or figuring out why it works so well despite parts being crazy -- that sometimes turns up unexpected insight! A coworker and I recently spent a lot of time simulating various schemes for implementing a second-level client cache, based on running captured traces of cache requests from live installations. One particular relatively unsophisticated approach consistently gave dramatically better results than others, darned near unbelievably better. After staring at it for more than a week (off & on), it turned out that this scheme won huge *mostly* because it favored evicting larger objects over evicting smaller objects, which left it with a lot more objects sitting in the fixed-sized cache, thus enormously boosting the hit rate. So it turned out the single most important factor feeding into the hit rate is one we weren't even aware of at first, and that one particular algorithm happened to pay some attention to it was more accident than design. From tim.one at comcast.net Sun Dec 14 20:06:03 2003 From: tim.one at comcast.net (Tim Peters) Date: Sun Dec 14 20:06:09 2003 Subject: [Spambayes] Watch out for digests... In-Reply-To: <16344.40324.210107.698842@montanaro.dyndns.org> Message-ID: [Skip Montanaro] > ... > This is where I think the synthetic vs. natural tokens thing would be > interesting. I'm not sure what's being distinguished here. > I get lots of Viagra spam, most of which is caught, but in my current > database, 'viagra' is a hapax. In fact, it appears I only added it > very recently. Here's the evidence header from a message with the > subject: > > Viagra, Soma, Fioricet, Prescribed Online for Free, Shipped > Overnight > > which was scored around 12:25 AM today: > > X-Spambayes-Evidence: '*H*': 0.03; '*S*': 0.90; 'drug': 0.16; > 'subject:Free': 0.16; "Free" in a Subject line and "drug" in the body are hammy for you? Staring at clues from mistake-based training can be, umm, counter-intuitive . > 'store': 0.23; 'next': 0.25; 'list,': 0.30; > 'via': 0.34; 'subject:, ': 0.37; 'our': 0.62; > 'header:Reply-To:1': 0.64; 'enter': 0.67; > 'content-type:multipart/alternative': 0.68; > 'content-type:text/html': 0.74; 'doctors': 0.84; > 'prescription': 0.84; 'received:103]': 0.84; > 'received:165.175': 0.84; 'received:175': 0.84; > 'received:199.249.165.175': 0.84; 'received:249.165.175': > 0.84; 'reply-to:addr:yahoo.com': 0.93; 'url:biz': 0.98 > > Most of the spammy clues are synthetic tokens related to delivery > (and are mostly hapaxes), not content. I'm not sure what's synthetic about these. Most of your spam clues come from the email *headers*, but that's fair game. Note that mining received headers is disabled by default, so you're getting a pile of clues most people aren't getting. Maybe they should. > My 'train an unsure or false negative, check for spams' method suggests > this is the case, since training on a single message often pushes several > other spams about completely different topics into the spam category. I'm unclear on what's noteworthy about that. The biz domain is used by lots of spam, lots of spam has a yahoo.com return address, lots of spam is multipart/alternative HTML, and so on. Looks like you're generating 4 correlated clues from a single Received header, and that you got one spam before from the same box. Strangely, though, it looks like you're sucking out *suffixes* of IP addrs instead of prefixes (you've got 199.249.165.175 249.165.175 165.175 and 175 but not the almost-surely more useful 199.249.165 199.249 and 199 ). > This suggests a couple other downsides to minimalist training. One, > spammers have to move, so hapaxes related to delivery are likely to > only be useful for a short period while the spammer is abusing a > single account. IP *prefixes* should be useful despite that, due to the way IP space is handed out. If you're a spammer with a cooperative host, you're likely to get other IP addresses from the netblocks assigned to that host, and they'll share a common prefix. > Two, if a delivery token pushes a bunch of other messages into the > spam category which are then never used as inputs to training, the > opportunity to reinforce that token's quality is lost, even though it > might actually appear fairly frequently in spam. I expect 'subject:Free' was a fine example of that. From skip at pobox.com Sun Dec 14 21:13:43 2003 From: skip at pobox.com (Skip Montanaro) Date: Sun Dec 14 21:13:46 2003 Subject: [Spambayes] Watch out for digests... In-Reply-To: References: <16344.40324.210107.698842@montanaro.dyndns.org> Message-ID: <16349.6359.927187.517763@montanaro.dyndns.org> >> X-Spambayes-Evidence: '*H*': 0.03; '*S*': 0.90; 'drug': 0.16; >> 'subject:Free': 0.16; Tim> "Free" in a Subject line and "drug" in the body are hammy for you? Tim> Staring at clues from mistake-based training can be, umm, Tim> counter-intuitive . Yeah, one of the online communities I participate in is a list of parents of "troubled kids", hence the hammy "drug" reference. "subject:Free" comes from the music community: Subject: SFS Special Announcement (Free Guest List to Fluid this Friday) >> 'store': 0.23; 'next': 0.25; 'list,': 0.30; >> 'via': 0.34; 'subject:, ': 0.37; 'our': 0.62; >> 'header:Reply-To:1': 0.64; 'enter': 0.67; >> 'content-type:multipart/alternative': 0.68; >> 'content-type:text/html': 0.74; 'doctors': 0.84; >> 'prescription': 0.84; 'received:103]': 0.84; >> 'received:165.175': 0.84; 'received:175': 0.84; >> 'received:199.249.165.175': 0.84; 'received:249.165.175': >> 0.84; 'reply-to:addr:yahoo.com': 0.93; 'url:biz': 0.98 >> >> Most of the spammy clues are synthetic tokens related to delivery >> (and are mostly hapaxes), not content. Tim> I'm not sure what's synthetic about these. I guess my operational definitions of "synthetic" and "natural" tokens are in order: "natural tokens" are those which derive simply by splitting the message body on whitespace boundaries. "synthetic tokens" are those which are not "natural tokens". Tim> Most of your spam clues come from the email *headers*, but that's Tim> fair game. Note that mining received headers is disabled by Tim> default, so you're getting a pile of clues most people aren't Tim> getting. Maybe they should. Sure, email headers are fair game, but if the tokenizer didn't do anything special with them, that "subject:Free" token would at most just be "free" or "Free". >> My 'train an unsure or false negative, check for spams' method >> suggests this is the case, since training on a single message often >> pushes several other spams about completely different topics into the >> spam category. Tim> I'm unclear on what's noteworthy about that. The biz domain is Tim> used by lots of spam, lots of spam has a yahoo.com return address, Tim> lots of spam is multipart/alternative HTML, and so on. Looks like Tim> you're generating 4 correlated clues from a single Received header, Tim> and that you got one spam before from the same box. Strangely, Tim> though, it looks like you're sucking out *suffixes* of IP addrs Tim> instead of prefixes (you've got Tim> 199.249.165.175 Tim> 249.165.175 Tim> 165.175 Tim> and Tim> 175 Tim> but not the almost-surely more useful Tim> 199.249.165 Tim> 199.249 Tim> and Tim> 199 Tim> ). I don't know. I agree those look backwards (that's my mail server, BTW). OTOH, given the fairly random assignment of IP networks, I doubt it makes much sense for the above IP address to be stripped of more than the last two octets ("received:199.249.165.175", "received:199.249.165" and "received:199.249"). "recevied:199", where 199 is the first octet, not the last, almost certainly means nothing. If it's spammy or hammy, it's just by sheer coincidence. >> This suggests a couple other downsides to minimalist training. One, >> spammers have to move, so hapaxes related to delivery are likely to >> only be useful for a short period while the spammer is abusing a >> single account. Tim> IP *prefixes* should be useful despite that, due to the way IP Tim> space is handed out. If you're a spammer with a cooperative host, Tim> you're likely to get other IP addresses from the netblocks assigned Tim> to that host, and they'll share a common prefix. Again, no more general than the first two octets (a class B network). Class A networks are very rare (for obvious reasons): http://euclid.math.brandeis.edu/turtschi/whois/neta1.html >> Two, if a delivery token pushes a bunch of other messages into the >> spam category which are then never used as inputs to training, the >> opportunity to reinforce that token's quality is lost, even though it >> might actually appear fairly frequently in spam. Tim> I expect 'subject:Free' was a fine example of that. 'subject:Free' is now slightly spammy, having turned up in three spams and only one ham at this point. Skip From tim.one at comcast.net Sun Dec 14 22:09:04 2003 From: tim.one at comcast.net (Tim Peters) Date: Sun Dec 14 22:09:10 2003 Subject: [Spambayes] Watch out for digests... In-Reply-To: <16349.6359.927187.517763@montanaro.dyndns.org> Message-ID: [Skip Montanaro] > I guess my operational definitions of "synthetic" and "natural" > tokens are in order: > > "natural tokens" are those which derive simply by splitting the > message body on whitespace boundaries. > > "synthetic tokens" are those which are not "natural tokens". OK. Now I've forgotten why you drew the distinction to begin with <0.9 wink>. [about busting apart IP addrs] > I don't know. I agree those look backwards (that's my mail server, > BTW). OTOH, given the fairly random assignment of IP networks, I > doubt it makes much sense for the above IP address to be stripped of > more than the last two octets ("received:199.249.165.175", > "received:199.249.165" and "received:199.249"). "recevied:199", > where 199 is the first octet, not the last, almost certainly means > nothing. If it's spammy or hammy, it's just by sheer coincidence. In that case, the database will learn it; since it can't generate more than 126 legitimate "Class A" tokens total, it's a trivial database burden. OTOH, for someone in the DOD, it may be valuable to know that email came from a DOD Class A network. On the third hand, spammers often forge Received headers, and I doubt most do research to forge sensible IPs. IOW, the system learns what does and doesn't work, in both directions, provided only that it's shown potentially interesting stuff. > ... > Again, no more general than the first two octets (a class B network). > Class A networks are very rare (for obvious reasons): > > http://euclid.math.brandeis.edu/turtschi/whois/neta1.html They're rarer than that now -- that's over 4 years old, and lots of those have been busted up. Since current practice is to assign a range of initial bits instead of initial bytes, maybe we should generate all *bit* prefixes instead. That would sure test whether correlation is our friend . From jtech at hyperionmail.com Sun Dec 14 22:09:07 2003 From: jtech at hyperionmail.com (My Tech) Date: Sun Dec 14 22:09:39 2003 Subject: [Spambayes] SpamBayes Corrupted My Profile Message-ID: <000001c3c2b8$ce2de0b0$1e02a8c0@JDi8000> After installing SpamBayes, Outlook could only be opened in Safe Mode. (That is to say that when clicking on the Outlook desktop icon, a dialogue box popped up before the application would open, informing me that Outlook had encountered an error and needed to shut down. The checkbox to "Restart Outlook" was already checked and I clicked on the "Don't Send [Error Report to Microsoft]" button. Then, a new dialogue box popped up saying that Outlook failed to start correctly and asked me if I wanted to start in Safe Mode, "Yes" or "No." If I select "No", then the first dialogue box re-appears telling me about Outlook encountering an error and wanting to restart. If I select "Yes", only then will Outlook open.) I've come to find out that installing SpamBayes has corrupted by Windows Administrator profile and that is why Outlook will not open. PLEASE HELP ASAP!!! I do not want to have to reinstall my OS (and all of my software) because of this. FYI: My Windows OS: 2000 Professional, 5.00.2195, Service Pack 4 SpamBayes installer used: SpamBayes-Outlook-Setup-0081.exe Outlook version: 2002, part of Office XP Small Business Edition If there is a way to fix this, please tell me. Also, please send detailed instructions for installing SpamBayes, as it appears that I did not do it correctly (even though I followed the instructions per the SpamBayes website.) Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031214/cae76a76/attachment-0001.html From tameyer at ihug.co.nz Sun Dec 14 23:28:54 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Sun Dec 14 23:29:03 2003 Subject: [Spambayes] Bug and Help In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13046B4EDA@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677758@its-xchg4.massey.ac.nz> > Hi. For some reason the Spam Bayes software is not working > correctly. It does not auto-filter messages and the "suspect > email" button does not show up in the tool bar (only the > delete as spam). What "suspect email" button? There should always be a "SpamBayes" button that drops down a menu, and, depending on which folder you are in, a "Delete As Spam" and "Recover From Spam" buttons. I can't see anything in your logs that indicates that things aren't working...Are you sure that you have set SpamBayes up to move mail on classification? Is it set to move to the right folder? =Tony Meyer From jonesclan at verizon.net Mon Dec 15 00:49:29 2003 From: jonesclan at verizon.net (Jones Clan) Date: Mon Dec 15 00:47:11 2003 Subject: [Spambayes] Won't work anymore Message-ID: <000201c3c2cf$352f58a0$e09b2e04@home> I loved your product with Outlook 2000. But now that I have installed XP, it won't work. I don't get any errors but I click the button on the toolbar and nothing happens. I have uninstalled and reinstalled thinking it had to be installed again after the Outlook upgrade. Still nothing. Please help because I miss using your product. McLean Jones NO Sugar - NO Carb Energy Drink www.getsomexs.com user: mclean pass: guest 888.870.5070 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031214/6124c254/attachment.html From tameyer at ihug.co.nz Mon Dec 15 00:59:46 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 15 00:59:27 2003 Subject: [Spambayes] Won't work anymore In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13047C06D4@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677759@its-xchg4.massey.ac.nz> > I loved your product with Outlook 2000. But now that > I have installed XP, it won't work. I don't get any > errors but I click the button on the toolbar and nothing > happens. Try removing your configuration file (the uninstall does not touch it) and seeing if that helps. It's in the data directory; the FAQ explains where to find that. It'll be named '[profile name].ini', which might just mean "Outlook.ini". You can just rename it if you like. =Tony Meyer From tameyer at ihug.co.nz Mon Dec 15 03:03:02 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 15 03:03:11 2003 Subject: [Spambayes] RE: Yahoo's "domain keys" and spam In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13046B4AC7@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130467775B@its-xchg4.massey.ac.nz> [Tim] > Yes, that's right for the Outlook plugin. The underlying > engine has dozens of options, though, and *can* be configured > to do lots of other things. [...] > An example of another header gimmick we ignore by default is > Habeas headers: > > http://www.habeas.com > The options search_for_habeas_headers and reduce_habeas_headers > can be enabled to pick those apart. Whoops. Ages ago I wrote code that I use locally to utilise the Habeas headers (they pushed some regular and semi-regular newsletters I get from very low ham to 0.0 (rounded)); I think I even posted code at the time. Some time later (but still some time ago ;) I (by mistake) committed the options, but not the tokenizing code (it seemed unlikely that it would be a significant gain, since AFAIK the Habeas headers haven't really caught on). So right now enabling those options will actually have no effect. (Tim's theory is right, though ). To remedy this, I've deprecated/experimentalised the Habeas options and checked in tokenizing code for them. I'll leave it like this for the next release so that anyone that has enabled them can get a deprecation warning, and (unless results come back saying that the they see a real gain) remove them after that. =Tony Meyer From assafpinhasi at hotmail.com Mon Dec 15 04:31:48 2003 From: assafpinhasi at hotmail.com (Assaf Pinhasi) Date: Mon Dec 15 04:31:53 2003 Subject: [Spambayes] SpamBayes for multiple users on one server Message-ID: 1. Has it been tried before? 2. Does SpamBayes offer seperate tables of good/bad tokens for each user in a multiple user enviroment? Thanks, Assaf. _________________________________________________________________ Protect your PC - get McAfee.com VirusScan Online http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963 From aarontay at mochamail.com Mon Dec 15 08:34:30 2003 From: aarontay at mochamail.com (Aaron) Date: Mon Dec 15 08:41:40 2003 Subject: [Spambayes] How low can you go? In-Reply-To: References: <20031213202306.GA8893@nl.linux.org> Message-ID: <3FDE28E6.7413.5731DC@localhost> On 13 Dec 2003 at 18:22, Tim Peters wrote: > [Gerrit Holl] > > ... > > My father is using non-bayesian spamassasin, and it seems the > > spamassasin manpage warns that without 'hundreds of messages' > > bayesian spamfiltering is unusable. This is obviously incorrect for > > Spambayes. Spambayes comes with no knowledge. Does it have a more > > intelligent algorithm? Or is the warning in the spamassasin manpage > > incorrect? > > I'm afraid there are several research projects hiding in there, so > we'll probably never know. Probably because Spamassassin uses bayesian approach only as a supplementary tool to detect spam, they don't need to use bayesian filtering straight off the bat, as pure bayesian approaches do, hence the warning. ( for the first hundred mails or so, the built in rulebase is probably more accruate than many pure bayesian approaches so they don't need it) I doubt if it's that different from the other bayesian filters though, but who knows. -- Want to learn how to use 150+ free Chess programs to run with Winboard? Visit http://www.aarontay.per.sg/Winboard/index.html -- Want to learn how to use 150+ free Chess programs to run with Winboard? Visit http://www.aarontay.per.sg/Winboard/index.html From dbulgrien at vcsd.com Mon Dec 15 08:48:14 2003 From: dbulgrien at vcsd.com (Dennis W. Bulgrien) Date: Mon Dec 15 08:48:21 2003 Subject: [Spambayes] Background, Delay between processing items Message-ID: SpamBayes Manager, Advanced, Background, Filter timer, Enable background filtering, Delay between processing items. I see the help for this item, but still wonder about the purpose. What value does 1 second have over 0 seconds? I get 200+ messages in the morning and 1s between each takes quite a while to clear out. Will 0s fix it, and might it cause other undesirable side effects... From Kent.Tegels at hdrinc.com Mon Dec 15 09:06:46 2003 From: Kent.Tegels at hdrinc.com (Tegels, Kent) Date: Mon Dec 15 09:06:52 2003 Subject: [Spambayes] Oh good! Message-ID: <2368489DC1DAF2488B85739359AD74DE84A1F8@exch2003.intranet.hdr> That begs the question, do most SB/Python scripts need that path set? -----Original Message----- From: Tony Meyer [mailto:tameyer@ihug.co.nz] Sent: Saturday, December 13, 2003 8:28 PM To: 'kent tegels'; spambayes@python.org Subject: RE: [Spambayes] Oh good! > Or am I just asking for too much. You're not, but I'll answer this anyway . > If am I, could somebody point me at a depickle/otherwise export the > .db files to some other format so I can get back to the first task I > wanted to -- get the wordlist and the ham and spam counts? The sb_dbexpimp.py script in the scripts folder of the source will do this. You can convert to and from pickle and bsddb, and also to/from 'flat' text ('`' delimited, for some reason). You'll need need the full source package to do this (well, some subset of that, anyway). Easiest to just unpack the archive, "set pythonpath=path:\to\archive" and then run the script. Running with no options will give example usage. =Tony Meyer From bob.hyde at computility.com Mon Dec 15 10:00:15 2003 From: bob.hyde at computility.com (Bob Hyde) Date: Mon Dec 15 09:58:19 2003 Subject: [Spambayes] Outlook issues Message-ID: <001a01c3c31c$25957bc0$a701a8c0@Bob> Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: spambayes3.log Type: application/octet-stream Size: 1810 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031215/4c58048d/spambayes3.obj From skip at pobox.com Mon Dec 15 10:26:09 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Dec 15 10:26:21 2003 Subject: [Spambayes] How low can you go? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130467774F@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F13046B4D13@its-xchg4.massey.ac.nz> <1ED4ECF91CDED24C8D012BCF2B034F130467774F@its-xchg4.massey.ac.nz> Message-ID: <16349.53905.796591.633231@montanaro.dyndns.org> >> One scheme I recommend trying is the mixed unigram/bigram scheme, >> with the special scoring gimmick Gary Robinson suggested. I *think* >> Tony Meyer has an up-date-patch for that, but I may be wrong. Tony> As usual, you're not. If anyone wants this, I can put it up on Tony> the wiki or something. How about the "or something" of checking it in and keying it with an x-ified config option? Does it mess with the code all over the place or is it a relatively isolated change to the source? Does it include Gary's scoring change? Skip From Kent.Tegels at hdrinc.com Mon Dec 15 10:38:46 2003 From: Kent.Tegels at hdrinc.com (Tegels, Kent) Date: Mon Dec 15 10:38:57 2003 Subject: [Spambayes] Oh good! Message-ID: <2368489DC1DAF2488B85739359AD74DE84A31B@exch2003.intranet.hdr> Humm... If I unzipped the package to g:\temp\sbi\, what do I need to set pythonpath too? This is what I'm getting pretty consistently if I set pythonpath to anything. G:\Python23\Scripts>set pythonpath=G:\temp\sbi\ G:\Python23\Scripts>sb_dbexpimp.py -e -v -d "G:\Documents and Settings\ktegels\A pplication Data\SpamBayes\default_bayes_database.db" -f data.txt Loading state from G:\Documents and Settings\ktegels\Application Data\SpamBayes\ default_bayes_database.db pickle Traceback (most recent call last): File "G:\Python23\Scripts\sb_dbexpimp.py", line 262, in ? runExport(dbFN, useDBM, flatFN) File "G:\Python23\Scripts\sb_dbexpimp.py", line 112, in runExport bayes = spambayes.storage.PickledClassifier(dbFN) File "G:\Python23\Lib\site-packages\spambayes\storage.py", line 90, in __init_ _ self.load() File "G:\Python23\Lib\site-packages\spambayes\storage.py", line 113, in load tempbayes = pickle.load(fp) EOFError Humm... kt -----Original Message----- From: Tony Meyer [mailto:tameyer@ihug.co.nz] Sent: Saturday, December 13, 2003 8:28 PM To: 'kent tegels'; spambayes@python.org Subject: RE: [Spambayes] Oh good! > Or am I just asking for too much. You're not, but I'll answer this anyway . > If am I, could somebody point me at a depickle/otherwise export the > .db files to some other format so I can get back to the first task I > wanted to -- get the wordlist and the ham and spam counts? The sb_dbexpimp.py script in the scripts folder of the source will do this. You can convert to and from pickle and bsddb, and also to/from 'flat' text ('`' delimited, for some reason). You'll need need the full source package to do this (well, some subset of that, anyway). Easiest to just unpack the archive, "set pythonpath=path:\to\archive" and then run the script. Running with no options will give example usage. =Tony Meyer From dprice at doble.com Mon Dec 15 10:41:42 2003 From: dprice at doble.com (Price, Derek) Date: Mon Dec 15 10:44:12 2003 Subject: [Spambayes] Enhancement Suggestion Message-ID: <6ED95C184239FC419DFE88BA479737652EA3EC18@mailnt.doble.com> Hello, I previously used Outclass [1], but I switched to your program since Outclass was erractic with Outlook 2003. The only feature I miss is the "Safe View" function. This would essentially launch the highlighed email into notepad for safe review for junk status. Have you considered this functionality? Thanks for the great software! Derek Price [1] http://www.vargonsoft.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031215/d747bb41/attachment.html From glen.meiring at nsbhp.com Mon Dec 15 10:45:21 2003 From: glen.meiring at nsbhp.com (Meiring, Glen NSBHP) Date: Mon Dec 15 10:44:34 2003 Subject: [Spambayes] Unable to initialize add-in Message-ID: <27486F3822CD98449353E114AC2E99237E2DDE@nbs239.northstar_d02.nsbhp.com> I am attempting to use the SpamBayes software. I receive the message: There was an error initializing the SpamBayes addin. I have read the Troubleshooting document. The addin does not show up in the Add-ins window in Outlook. I tried manually registering the DLL with: regsvr32.exe spambayes_addin.dll The toolbar icons appear, and sort of work, but complains that SpamBayes needs to be enabled. Any help appreciated. I am using Windows 2000, Outlook 2000, and the Binary version of SpamBayes 0081 Here is the log: --------------------------------------------------------- Loaded bayes database from 'C:\Documents and Settings\nbmeig\Application Data\SpamBayes\default_bayes_database.db' Loaded message database from 'C:\Documents and Settings\nbmeig\Application Data\SpamBayes\default_message_database.db' Bayes database initialized with 0 spam and 0 good messages SpamBayes Outlook Addin, Binary version 0.81 (September 9, 2003) starting (with engine SpamBayes Beta2, version 0.2 (July 2003)) on Windows 5.0.2195 (Service Pack 4) using Python 2.3+ (#46, Aug 6 2003, 16:39:24) [MSC v.1200 32 bit (Intel)] Error connecting to Outlook! Traceback (most recent call last): File "out1.pyz/addin", line 1177, in OnConnection File "out1.pyz/dialogs", line 64, in ShowWizard File "out1.pyz/config_wizard", line 142, in CreateWizardConfig File "out1.pyz/config_wizard", line 49, in InitWizardConfig File "out1.pyz/msgstore", line 372, in YieldReceiveFolders File "out1.pyz/msgstore", line 337, in GetFolder MsgStoreException: MsgStoreException: Exception 0x80004005 (Unspecified error): Unspecified error ERROR: 'There was an error initializing the SpamBayes addin\r\n\r\nPlease re-start Outlook and try again.' Traceback (most recent call last): File "out1.pyz/addin", line 1177, in OnConnection File "out1.pyz/dialogs", line 64, in ShowWizard File "out1.pyz/config_wizard", line 142, in CreateWizardConfig File "out1.pyz/config_wizard", line 49, in InitWizardConfig File "out1.pyz/msgstore", line 372, in YieldReceiveFolders File "out1.pyz/msgstore", line 337, in GetFolder MsgStoreException: MsgStoreException: Exception 0x80004005 (Unspecified error): Unspecified error FAILED to add the toolbar item 'SpamBayesCommand.Manager' - (-2147352567, 'Exception occurred.', (0, None, None, None, 0, -2147467259), None) Deleted the dead popup control - re-creating Traceback (most recent call last): File "out1.pyz/dialogs.dlgcore", line 310, in OnCommand File "out1.pyz/dialogs.dlgcore", line 262, in ApplyHandlingOptionValueError File "out1.pyz/dialogs.processors", line 76, in OnCommand File "out1.pyz/dialogs.dialog_map", line 322, in OnClicked File "out1.pyz/dialogs", line 64, in ShowWizard File "out1.pyz/config_wizard", line 142, in CreateWizardConfig File "out1.pyz/config_wizard", line 49, in InitWizardConfig File "out1.pyz/msgstore", line 372, in YieldReceiveFolders File "out1.pyz/msgstore", line 337, in GetFolder msgstore.MsgStoreException: MsgStoreException: Exception 0x80004005 (Unspecified error): Unspecified error Saving configuration -> C:\Documents and Settings\nbmeig\Application Data\SpamBayes\nbmeig.ini ERROR: 'You must enable SpamBayes before you can delete as spam' This electronic message is intended for the person to whom it is addressed and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are neither the intended recipient nor responsible for delivering the message to the intended recipient, note that any dissemination, distribution or copying of this communication is prohibited. If you have received this communication in error, please notify us immediately. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031215/4d5f5ef9/attachment.html From kennypitt at hotmail.com Mon Dec 15 10:48:12 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Mon Dec 15 10:48:59 2003 Subject: [Spambayes] Oh good! In-Reply-To: <2368489DC1DAF2488B85739359AD74DE84A31B@exch2003.intranet.hdr> Message-ID: Tegels, Kent wrote: > Humm... If I unzipped the package to g:\temp\sbi\, what do I need to > set pythonpath too? This is what I'm getting pretty consistently if I > set pythonpath to anything. > > G:\Python23\Scripts>set pythonpath=G:\temp\sbi\ > > G:\Python23\Scripts>sb_dbexpimp.py -e -v -d "G:\Documents and > Settings\ktegels\A > pplication Data\SpamBayes\default_bayes_database.db" -f data.txt > Loading state from G:\Documents and Settings\ktegels\Application > Data\SpamBayes\ > default_bayes_database.db pickle > Traceback (most recent call last): > File "G:\Python23\Scripts\sb_dbexpimp.py", line 262, in ? > runExport(dbFN, useDBM, flatFN) > File "G:\Python23\Scripts\sb_dbexpimp.py", line 112, in runExport > bayes = spambayes.storage.PickledClassifier(dbFN) > File "G:\Python23\Lib\site-packages\spambayes\storage.py", line 90, > in __init_ > _ > self.load() > File "G:\Python23\Lib\site-packages\spambayes\storage.py", line 113, > in load > tempbayes = pickle.load(fp) > EOFError I don't think the problem is your pythonpath setting. The Outlook database is almost certainly a bsddb database and not a pickle. Try using "-D" (capitalized) instead of "-d" (lower case) in your command line options. -- Kenny Pitt From rmalayter at bai.org Mon Dec 15 11:15:03 2003 From: rmalayter at bai.org (Ryan Malayter) Date: Mon Dec 15 11:15:06 2003 Subject: [Spambayes] RE: Yahoo's "domain keys" and spam Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01A7518D@cliff.bai.org> [Seth Goodman] > IMHO, this code belongs in the MTA's, not in SpamBayes. The Agreed, the true value of this approach is filtering at the MTA level. But there must be some method of establishing a "trust level" for authenticated sending domains, otherwise all the spam-houses will simply put public keys in their DNS, install the signing feature on their MTA's, use "real" domains in their messages and continue as before. SpamBayes could be very useful in maintaining a "statistical trust" list of domains, which an organization could enforce at the MTA level if they chose. > Both the DNS lookup and > the decryption calculation are very costly in terms of time > per message, but it may still be worth it. That's your call. This would be no more costly, really, than establishing an SSL connection to a web site. The operations required are basically the same as what Yahoo system would require: 1) DNS lookup 2) a download of a site certificate 3) cryptographic verification of a message (in the case of SSL, this is the session key used for the encryption. In the Yahoo case, this would be verification of the signatures) The Yahoo verification would also require the calculation of a hash of several strings from the message header (sending domain, timestamp, etc.), but this is computationally trivial once the mail message is in memory. Connecting to a commercial website via SSL (for example, https://www.verisign.com), even with nothing in my DNS cache, takes much less than 1 second over my company's T1. I would guess that performance of message verification using Yahoo domain keys would be of similar. Regards, Ryan From rmalayter at bai.org Mon Dec 15 11:30:50 2003 From: rmalayter at bai.org (Ryan Malayter) Date: Mon Dec 15 11:30:56 2003 Subject: [Spambayes] RE: Yahoo's "domain keys" and spam Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01A7518E@cliff.bai.org> [Atom Smasher] > i haven't found many technical reports on yahoo's plan, but i > suspect that some of the failures in it are: > > 1) a paying (or theiving!) customer of XYZ-ISP sends spam, > and it's "authenticated". this can happen either through a > virus or a "make money at home with your computer!" scheme. > > 2) domains names and hosting are cheap. it would be a > slight hurdle for spammers to register new domain names > through ISPs and "hit & run" that server, ISP, domain name... > depending on how the system is set up. > > 3) spam-houses that consider themselves to be legit will > have no problem sending "authenticated" spam. > > so, the system will likely have the effect of not only > blocking non-spam email, but giving a green light to a large > volume of "authenticated" spam. > which brings us back where we started... RBLSs, filtering, > etc... but with some added overhead to maintaining an SMTP server. 1) This a problem, I agree. It might take some smarts on the part of the virus/worm to figure out the victim's ISP and SMTP addresses, but it could certainly be done. This is something that ISPs should be responsible for preventing. ISPs should already use snort or some other IDS to discover compromised PCs - and then block those machines. Many already do, and it isn't a ridiculous cost burden to place on ISPs, either. 2) and 3) could be addressed by blacklists, as you state. Except the blacklist could be much more effective than current IP-based ones, even at the organizational level. We would know that the originating domain was not spoofed, and since there would be added cost to setting up a spam operation (Domain registration and DNS setup), spammers couldn't hop around as easily. Regards, Ryan From rmalayter at bai.org Mon Dec 15 11:45:28 2003 From: rmalayter at bai.org (Ryan Malayter) Date: Mon Dec 15 11:45:31 2003 Subject: [Spambayes] Enhancement Suggestion Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01A7518F@cliff.bai.org> [Derek Price] > I previously used Outclass [1], but I switched to your > program since Outclass was erractic with Outlook 2003. The > only feature I miss is the "Safe View" function. This would > essentially launch the highlighed email into notepad for safe > review for junk status. Have you considered this functionality? Oulook 2003 already does this by default: all messages are shown without performing any internet transmissions (downloading of images, form postings, Java-scripts, etc.). Simply showing HTML formatted text is not terribly dangerous, or else you wouldn't browse the web. (Unless, of course, there's a local buffer-overrun vulnerability in your HTML parser. But even IE/Outlook hasn't had one of those crop up yet, to my knowledge.) Regards, Ryan From atom at suspicious.org Mon Dec 15 12:10:07 2003 From: atom at suspicious.org (Atom 'Smasher') Date: Mon Dec 15 12:11:30 2003 Subject: [Spambayes] SpamBayes for multiple users on one server In-Reply-To: References: Message-ID: > 1. Has it been tried before? > 2. Does SpamBayes offer seperate tables of good/bad tokens for each user in > a multiple user > enviroment? ======================================= that's how i'm doing it. it's easy on *nix because each user has their own ~/ and their own prefs... even their own ~/bin/ for customizing scripts. it's available for everyone on the server, they can use it or not, however they need to. ...atom _______________________________________________ PGP key - http://smasher.suspicious.org/pgp.txt 3EBE 2810 30AE 601D 54B2 4A90 9C28 0BBF 3D7D 41E3 ------------------------------------------------- "The enemy is anybody who's going to get you killed, no matter which side he's on." -- Joseph Heller, Catch-22 From rcoe at CambridgeMA.GOV Mon Dec 15 12:44:00 2003 From: rcoe at CambridgeMA.GOV (Coe, Bob) Date: Mon Dec 15 12:44:04 2003 Subject: [Spambayes] RE: Yahoo's "domain keys" and spam Message-ID: Maybe I'm missing your point, but how would an ISP that uses dynamically assigned IP addresses (which is pretty much all of them, AFAIK) recognize compromised PCs? Bob MIS Department, City of Cambridge 831 Massachusetts Ave, Cambridge MA 02139 ? 617-349-4217 ? fax 617-349-6165 > -----Original Message----- > From: Ryan Malayter [mailto:rmalayter@bai.org] > Sent: Monday, December 15, 2003 11:31 AM > To: spambayes@Python.org > Subject: RE: [Spambayes] RE: Yahoo's "domain keys" and spam > > > [Atom Smasher] > > i haven't found many technical reports on yahoo's plan, but i > > suspect that some of the failures in it are: > > > > 1) a paying (or theiving!) customer of XYZ-ISP sends spam, > > and it's "authenticated". this can happen either through a > > virus or a "make money at home with your computer!" scheme. > > ... > > 1) This a problem, I agree. It might take some smarts on the part of the > virus/worm to figure out the victim's ISP and SMTP addresses, but it > could certainly be done. > > This is something that ISPs should be responsible for preventing. ISPs > should already use snort or some other IDS to discover compromised PCs - > and then block those machines. Many already do, and it isn't a > ridiculous cost burden to place on ISPs, either. From ingridm at qoslabs.com Mon Dec 15 13:07:32 2003 From: ingridm at qoslabs.com (ingrim) Date: Mon Dec 15 13:08:39 2003 Subject: [Spambayes] Exception sleepycat Message-ID: <001201c3c336$5302ab40$7d64a8c0@ingridma> Hi, I have the following problem, when I try to recover some archivied files I have an exception in a library. I?m development in java , using the last version of Berkeley db. Thanks Ingrid Unexpected Signal : 11 occurred at PC=0x5504E559 Function=__memp_register+0x21 Library=/usr/local/BerkeleyDB.4.2/lib/libdb_java-4.2.so Current Java thread: at com.sleepycat.db.db_javaJNI.DbEnv_open0(Native Method) at com.sleepycat.db.DbEnv.open0(DbEnv.java:1574) at com.sleepycat.db.DbEnv.open(DbEnv.java:702) at com.qoslabs.sleepycat.Recovery.(Recovery.java:24) at com.qoslabs.sleepycat.Recovery.main(Recovery.java:46) Dynamic libraries: 08048000-0804e000 r-xp 00000000 03:03 69374 /usr/java/j2sdk1.4.1/jre/bin/java 0804e000-0804f000 rw-p 00005000 03:03 69374 /usr/java/j2sdk1.4.1/jre/bin/java 40000000-40016000 r-xp 00000000 03:06 62250 /lib/ld-2.2.4.so 40016000-40017000 rw-p 00015000 03:06 62250 /lib/ld-2.2.4.so 40018000-40025000 r-xp 00000000 03:06 72294 /lib/i686/libpthread-0.9.so 40025000-4002d000 rw-p 0000c000 03:06 72294 /lib/i686/libpthread-0.9.so 4002d000-40030000 r-xp 00000000 03:06 62263 /lib/libdl-2.2.4.so 40030000-40031000 rw-p 00002000 03:06 62263 /lib/libdl-2.2.4.so 40031000-40163000 r-xp 00000000 03:06 72290 /lib/i686/libc-2.2.4.so 40163000-40168000 rw-p 00131000 03:06 72290 /lib/i686/libc-2.2.4.so 4016c000-40484000 r-xp 00000000 03:03 69406 /usr/java/j2sdk1.4.1/jre/lib/i386/client/libjvm.so 40484000-40639000 rw-p 00317000 03:03 69406 /usr/java/j2sdk1.4.1/jre/lib/i386/client/libjvm.so 4064a000-4065d000 r-xp 00000000 03:06 62268 /lib/libnsl-2.2.4.so 4065d000-4065e000 rw-p 00012000 03:06 62268 /lib/libnsl-2.2.4.so 40660000-40682000 r-xp 00000000 03:06 72292 /lib/i686/libm-2.2.4.so 40682000-40683000 rw-p 00021000 03:06 72292 /lib/i686/libm-2.2.4.so 40683000-4068c000 r-xp 00000000 03:03 54299 /usr/java/j2sdk1.4.1/jre/lib/i386/native_threads/libhpi.so 4068c000-4068d000 rw-p 00008000 03:03 54299 /usr/java/j2sdk1.4.1/jre/lib/i386/native_threads/libhpi.so 4068e000-4069e000 r-xp 00000000 03:03 69429 /usr/java/j2sdk1.4.1/jre/lib/i386/libverify.so 4069e000-406a0000 rw-p 0000f000 03:03 69429 /usr/java/j2sdk1.4.1/jre/lib/i386/libverify.so 406a0000-406c1000 r-xp 00000000 03:03 69417 /usr/java/j2sdk1.4.1/jre/lib/i386/libjava.so 406c1000-406c3000 rw-p 00020000 03:03 69417 /usr/java/j2sdk1.4.1/jre/lib/i386/libjava.so 406c3000-406d8000 r-xp 00000000 03:03 69430 /usr/java/j2sdk1.4.1/jre/lib/i386/libzip.so 406d8000-406da000 rw-p 00014000 03:03 69430 /usr/java/j2sdk1.4.1/jre/lib/i386/libzip.so 406da000-41da8000 r--s 00000000 03:03 38056 /usr/java/j2sdk1.4.1/jre/lib/rt.jar 41deb000-41e02000 r--s 00000000 03:03 38057 /usr/java/j2sdk1.4.1/jre/lib/sunrsasign.jar 41e02000-41e73000 r--s 00000000 03:03 38050 /usr/java/j2sdk1.4.1/jre/lib/jsse.jar 41e73000-41e86000 r--s 00000000 03:03 38049 /usr/java/j2sdk1.4.1/jre/lib/jce.jar 41e86000-42142000 r--s 00000000 03:03 38037 /usr/java/j2sdk1.4.1/jre/lib/charsets.jar 441ea000-441ed000 r--s 00000000 03:03 69397 /usr/java/j2sdk1.4.1/jre/lib/ext/dnsns.jar 441ed000-441ef000 rw-s 00000000 03:02 47642 /opt/vpemeter/sleepycat/__db.001 54c76000-54ca1000 r--p 00000000 03:03 16574 /usr/lib/locale/en_US/LC_CTYPE 54ca1000-54cab000 r-xp 00000000 03:06 62284 /lib/libnss_files-2.2.4.so 54cab000-54cac000 rw-p 00009000 03:06 62284 /lib/libnss_files-2.2.4.so 54eb0000-54f4f000 r--s 00000000 03:03 69399 /usr/java/j2sdk1.4.1/jre/lib/ext/localedata.jar 54f4f000-54f5d000 r--s 00000000 03:03 69398 /usr/java/j2sdk1.4.1/jre/lib/ext/ldapsec.jar 54f5d000-54f7a000 r--s 00000000 03:03 69400 /usr/java/j2sdk1.4.1/jre/lib/ext/sunjce_provider.jar 54f7a000-54f7f000 r--s 00000000 03:02 126918 /opt/vpemeter/sbin/qosfoundation.jar 54f7f000-54fa7000 r--s 00000000 03:03 21855 /usr/local/BerkeleyDB.4.2/lib/db.jar 54fa7000-54fb0000 r--s 00000000 03:02 126896 /opt/vpemeter/sbin/flowsLoader.jar 54fb0000-5506d000 r-xp 00000000 03:03 21845 /usr/local/BerkeleyDB.4.2/lib/libdb_java-4.2.so 5506d000-55070000 rw-p 000bc000 03:03 21845 /usr/local/BerkeleyDB.4.2/lib/libdb_java-4.2.so 55070000-55088000 rw-s 00000000 03:02 47643 /opt/vpemeter/sleepycat/__db.002 55088000-5508c000 rw-s 00000000 03:02 47644 /opt/vpemeter/sleepycat/__db.003 Local Time = Fri Dec 12 17:32:30 2003 Elapsed Time = 1 # # The exception above was detected in native code outside the VM # # Java VM: Java HotSpot(TM) Client VM (1.4.1-b21 mixed mode) # From gerrit at nl.linux.org Mon Dec 15 13:32:57 2003 From: gerrit at nl.linux.org (Gerrit Holl) Date: Mon Dec 15 13:33:30 2003 Subject: [Spambayes] How low can you go? In-Reply-To: References: <16342.20443.331861.376383@montanaro.dyndns.org> Message-ID: <20031215183257.GA6060@nl.linux.org> > > So, how small is yours? I started with a minimal database, each time reclassifying my unsure folder. I found out that with less than 15 ham + 16 spam, it doesn't work good enough. Note that the spam I receive is very monotonous, because my ISP replaces viruses with text messages, and since that's almost all the spam I receive, 4 spams are enough to get all spam with a probability op more dan 90%. However, with 15 hams, some ham scores above 20%. And because I don't want to unbalance the database, I trained on already-correctly-classified spams as well as the most highly unsures. With 15 ham, 16 spam, 1.5% of the incoming e-mail is classified as unsure, all ham, with scores ranging from .108 to .290, so a ham_cutoff of 30% would solve it all. Gerrit. -- 110. If a "sister of a god" open a tavern, or enter a tavern to drink, then shall this woman be burned to death. -- 1780 BC, Hammurabi, Code of Law -- Asperger Syndroom - een persoonlijke benadering: http://people.nl.linux.org/~gerrit/ Kom in verzet tegen dit kabinet: http://www.sp.nl/ From rmalayter at bai.org Mon Dec 15 13:58:58 2003 From: rmalayter at bai.org (Ryan Malayter) Date: Mon Dec 15 13:59:03 2003 Subject: [Spambayes] RE: Yahoo's "domain keys" and spam Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01A75199@cliff.bai.org> [Bob Coe] > Maybe I'm missing your point, but how would an ISP that uses > dynamically assigned IP addresses (which is pretty much all > of them, AFAIK) recognize compromised PCs? By matching dial-in logs, PPPoE sessions, or even MAC addresses to IP address reservations. I use DHCP on my company's network, and I can easily tell from my logs what machine had a particular IP address at a particular point in time, no matter if they're dial-up, VPN, or local machines. From richie at entrian.com Mon Dec 15 14:51:40 2003 From: richie at entrian.com (Richie Hindle) Date: Mon Dec 15 14:51:48 2003 Subject: [Spambayes] Exception sleepycat In-Reply-To: <001201c3c336$5302ab40$7d64a8c0@ingridma> References: <001201c3c336$5302ab40$7d64a8c0@ingridma> Message-ID: Ingrid, > I?m development in java , using the last version of Berkeley db. > [...] > Unexpected Signal : 11 occurred at PC=0x5504E559 > Function=__memp_register+0x21 > Library=/usr/local/BerkeleyDB.4.2/lib/libdb_java-4.2.so Are you sure you have the right mailing list? 8-) If you *do* have the right mailing list, we'd love to hear about how and why you're using SpamBayes from Java! -- Richie Hindle richie@entrian.com From tameyer at ihug.co.nz Mon Dec 15 19:01:48 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 15 19:01:42 2003 Subject: [Spambayes] Enhancement Suggestion In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13047C07FD@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677760@its-xchg4.massey.ac.nz> > The only feature I miss is the "Safe View" function. > This would essentially launch the highlighed email > into notepad for safe review for junk status. You can get more-or-less this if you "View Clues" for a message. After the list of clues (and before the list of tokens) the raw message is listed (sans attachments). =Tony Meyer From tameyer at ihug.co.nz Mon Dec 15 19:03:10 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 15 19:03:04 2003 Subject: [Spambayes] Outlook issues In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13047C07E9@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677761@its-xchg4.massey.ac.nz> > Having issues getting Spambayes to reinitialize. > Have uninstalled and reinstalled both Spambayes and Outlook. > It is giving me an error that it cannot start spambayes, > please close Outlook and rerun the application. Try removing/renaming your configuration file ("[profile name].ini" or "Outlook.ini" in your data directory; the FAQ explains where to find that) and trying again. =Tony Meyer From tameyer at ihug.co.nz Mon Dec 15 19:05:10 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 15 19:05:21 2003 Subject: [Spambayes] Background, Delay between processing items In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13047C07C2@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677762@its-xchg4.massey.ac.nz> > SpamBayes Manager, Advanced, Background, Filter timer, Enable > background > filtering, Delay between processing items. I see the help > for this item, but > still wonder about the purpose. The purpose is to avoid conflict with Outlook's own rules, by ensuring that we run *after* all the Outlook rules finish. If you don't have a problem with the rules conflicting, then you can happily disable the background filtering. If you do, you can still play around with different values (you could try 0.1 seconds, for example) and see how that goes. The only problem that you might run into is having SpamBayes filter mail before Outlook has a chance to get it's hand on it. =Tony Meyer From tameyer at ihug.co.nz Mon Dec 15 19:11:22 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 15 19:11:27 2003 Subject: [Spambayes] Oh good! In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13047C07CC@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677763@its-xchg4.massey.ac.nz> [Setting PYTHONPATH] > That begs the question, do most SB/Python scripts need that path set? When a Python script tries to import a module it looks in certain places (the docs have the gory details). Adding to the envar PYTHONPATH adds to those places that are searched. So if the module is already somewhere it will be found (the Python Lib directory, for example), then it's not necessary. If it's not going to be found, then it will need to be set. If you've run "setup.py install" with a package, you can almost certainly not worry about the PYTHONPATH. If you haven't, then you may need to. Does that help? =Tony Meyer From tameyer at ihug.co.nz Mon Dec 15 19:35:21 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 15 19:35:35 2003 Subject: [Spambayes] How low can you go? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13047C07F4@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677766@its-xchg4.massey.ac.nz> > How about the "or something" of checking it in and keying it > with an x-ified config option? Does it mess with the code > all over the place or is it a relatively isolated change to > the source? Relatively isolated; it's all in classifier.py, anyway. I'll do this. If you (or anyone else) would like to eyeball it to check that there's no obvious error anywhere, that would be great (it did appear to work for me, though, and I used it for a couple of months. Note that anyone wanting to use it will need to retrain from scratch to get full benefit, and that it does slow down classifying somewhat. Tim also suggested that increasing the number of tokens used would be beneficial, and I certainly found that the case (from 150 to 600, I think). > Does it include Gary's scoring change? I wasn't paying enough attention to the earlier messages: is this the change that means that only the strongest of the two unigrams and one bigram is used? If so, then yes, it includes that. =Tony Meyer From skip at pobox.com Mon Dec 15 19:59:05 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Dec 15 19:59:11 2003 Subject: [Spambayes] How low can you go? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304677766@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F13047C07F4@its-xchg4.massey.ac.nz> <1ED4ECF91CDED24C8D012BCF2B034F1304677766@its-xchg4.massey.ac.nz> Message-ID: <16350.22745.746489.659660@montanaro.dyndns.org> >> Does it include Gary's scoring change? Tony> I wasn't paying enough attention to the earlier messages: is this Tony> the change that means that only the strongest of the two unigrams Tony> and one bigram is used? If so, then yes, it includes that. Yup, that's the one. Thx, Skip From alice_utter at firstpenn.com Mon Dec 15 17:44:15 2003 From: alice_utter at firstpenn.com (Utter, Alice) Date: Mon Dec 15 22:15:09 2003 Subject: [Spambayes] SpamBayes not working Message-ID: Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/bmp Size: 59638 bytes Desc: ole0.bmp Url : http://mail.python.org/pipermail/spambayes/attachments/20031215/9af343bb/attachment-0001.bin From tameyer at ihug.co.nz Mon Dec 15 22:18:47 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 15 22:18:56 2003 Subject: [Spambayes] Bug and Help In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13047C0797@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130467776A@its-xchg4.massey.ac.nz> > Everything is setup correctly. I compared the setup with a > working SpamBayes version. Everyone's setup is unique, though, in that you choose your own folders to move things to. Have you reselected those to ensure that they are where you think they are? > The icon does not change when I select the Junk Suspects > icon. What icons are you talking about here? Do you mean that when you select the "Junk Suspects" folder the toolbar doesn't change it's buttons? This sounds very much like you're not selecting the folder that you've set as the 'unsure' folder. Or perhaps there isn't enough room on the toolbar to display all three buttons? In that case Outlook automatically hides one for you in a little drop down menu at the end of the toolbar. > Also, after reading the troubleshooting info I noticed > that the plugin is not present in the add-ins. I tried to > reinstall, but it is still not there. This is a known (and fixed) bug with 008.1; it won't appear in anyone's list, whether spambayes is working or not. =Tony Meyer From tameyer at ihug.co.nz Mon Dec 15 22:21:30 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 15 22:21:36 2003 Subject: [Spambayes] SpamBayes not working In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13047C0991@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130467776B@its-xchg4.massey.ac.nz> > I have been using SpamBayes for a short time, > and had not been having problems with it. However, > a couple of weeks ago the server was upgraded to > Exchange 2000, and since then I have not been able > to get SpamBayes to work properly on my PC. I suspect that the IDs for the folders that SpamBayes was set to watch have changed in the upgrade. Try going into the manager dialog, then to the "Filtering" tab and reselecting the folders to move unsure and spam messages into (and then ticking the enable box again). =Tony Meyer From tim.one at comcast.net Tue Dec 16 01:05:04 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 16 01:05:09 2003 Subject: [Spambayes] How low can you go? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304677766@its-xchg4.massey.ac.nz> Message-ID: [Skip Montanaro] >> Does it include Gary's scoring change? [Tony Meyer] > I wasn't paying enough attention to the earlier messages: is this the > change that means that only the strongest of the two unigrams and one > bigram is used? If so, then yes, it includes that. I see that it's a cruder approximation to the suggested scoring algorithm (which I implemented at one time). For example, the 3-word message: human growth hormone generates three sets with three tokens each: human, "human growth", growth growth, "growth hormone", hormone hormone, "hormone human", human # this is an artifact of wrap-around and the strongest token is taken from each set. The result isn't necessarily a tiling of the original; for example, "growth" might win in each of the first two sets, and "hormone" in the last set, leaving "human" out of the scored part entirely. Probably worse, it can score more than one systematically correlated token, such as if "growth" wins from the first set and "growth hormone" from the second set, and "hormone" from the third set (then we end up scoring the bigram and both its constituent words). A tiling would pick one of these three final outcomes: human, growth, hormone "human growth", hormone human, "growth hormone" Then every token contributes to the score, but no pair of systematically correlated tokens contribute to the score. It's harder to code a tiling method; the advantage is that tiling doesn't have systematic flaws. It will nevertheless be interesting to see how this other gimmick works (as explained before, the danger in allowing systematically correlated tokens to feed into scoring is an increase in "spectacular failures"). BTW, it should *not* be necessary to increase max_discriminators, and doing so can create subtle numeric problems in the inverse chi-squared function. Without this option, in an N-token message, N tokens were candidates for scoring; with this option, there are still exactly N candidates for scoring; with a true tiling implementation, there are no more than N candidates for scoring (and usually less than N). From clarke at hyperformix.com Tue Dec 16 09:37:11 2003 From: clarke at hyperformix.com (Allan Clarke) Date: Tue Dec 16 09:34:12 2003 Subject: [Spambayes] spambayes rocks! Message-ID: <5F4FB3CCBBDD014E89E80C80E1AA9392B9125D@exchange.hyperformix.com> I just wanted to say that SpamBayes (the Outlook add-in version) rocks! I have not had a single false positive yet. It catches about 98% of spam. Its fast. I have tried a few other solutions and they were inferior. Job very well done, folks! I'm going to track down the link to donate to support this product. Allan PS If only there were a solution for my home POP3 account (Outlook Express) that was as good... -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031216/d27e5a44/attachment.html From papaDoc at videotron.ca Tue Dec 16 09:55:12 2003 From: papaDoc at videotron.ca (papaDoc) Date: Tue Dec 16 09:55:22 2003 Subject: [Spambayes] spambayes rocks! In-Reply-To: <5F4FB3CCBBDD014E89E80C80E1AA9392B9125D@exchange.hyperformix.com> References: <5F4FB3CCBBDD014E89E80C80E1AA9392B9125D@exchange.hyperformix.com> Message-ID: <3FDF1CD0.1060403@videotron.ca> Hi Allan, > I just wanted to say that SpamBayes (the Outlook add-in version) > rocks! I have not had a single false positive yet. It catches about > 98% of spam. Its fast. I have tried a few other solutions and they > were inferior. Job very well done, folks! I'm going to track down the > link to donate to support this product. One more happy customer ........................... > > Allan > > PS > > If only there were a solution for my home POP3 account (Outlook > Express) that was as good... There is one. Take a look at sb_server.py type sb_server.py -b then open a browser type http://localhost:8880 Remi From Mark.Howells at softoption.com Tue Dec 16 09:59:20 2003 From: Mark.Howells at softoption.com (Mark Howells) Date: Tue Dec 16 10:00:45 2003 Subject: [Spambayes] spambayes rocks! Message-ID: <5846CF419D2EF5439036CC3126A3A995017B8D@SOSERVER1.softoption.local> > -----Original Message----- > From: Allan Clarke [mailto:clarke@hyperformix.com] > Sent: 16 December 2003 14:37 > > If only there were a solution for my home POP3 account (Outlook Express) that was as good... ... there is. I use sb_server (available at the SpamBayes site) to protect me and my family from Spam. It's easy to set up, and once established you'd never know it's there. Training is a snap with the web interface. There's also popfile (popfile.sourceforge.net) which I'm trialling that seems to give at least as good results and does n-way classification. Mark -- Outgoing mail is certified Virus Free. Checked by AVG Anti-Virus (http://www.grisoft.com). Version: 7.0.206 / Virus Database: 261.5.0 - Release Date: 12/15/2003 From tim at fourstonesExpressions.com Tue Dec 16 10:08:30 2003 From: tim at fourstonesExpressions.com (Tim Stone) Date: Tue Dec 16 10:08:36 2003 Subject: [Spambayes] spambayes rocks! In-Reply-To: <5846CF419D2EF5439036CC3126A3A995017B8D@SOSERVER1.softoption.local> References: <5846CF419D2EF5439036CC3126A3A995017B8D@SOSERVER1.softoption.local> Message-ID: On Tue, 16 Dec 2003 14:59:20 -0000, Mark Howells wrote: > There's also popfile (popfile.sourceforge.net) which I'm trialling that > seems to give at least as good results and does n-way classification. We should put a filter on this list so that any mail that mentions that other open source non-python based filtering technology gets tagged as spam -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun From kennypitt at hotmail.com Tue Dec 16 10:25:25 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Tue Dec 16 10:26:03 2003 Subject: [Spambayes] spambayes rocks! In-Reply-To: Message-ID: Tim Stone wrote: > On Tue, 16 Dec 2003 14:59:20 -0000, Mark Howells > wrote: > > >> There's also popfile (popfile.sourceforge.net) which I'm trialling >> that seems to give at least as good results and does n-way >> classification. > > We should put a filter on this list so that any mail that mentions > that other open source non-python based filtering technology gets > tagged as spam What mention of other open-source filters? Oh, there it is in the Spam folder. You mean you haven't already trained SpamBayes to dump them? -- Kenny Pitt From techsupport at gemtek.com Tue Dec 16 11:45:20 2003 From: techsupport at gemtek.com (Mark Drake) Date: Tue Dec 16 11:45:25 2003 Subject: [Spambayes] Accidentally deleted Junk email folder. Message-ID: <000601c3c3f3$ff64ef20$1014a8c0@station16> We use Spambayes in my company with great success, and have come across only one bug, which I have not found listed. Since this has happened to all three of us using Spambayes, I was surprised to not find it in the troubleshooting guide. After the user accidentally deleted the Junk email folder or the Junk Suspect folder, I created new ones, but Spambayes would not filter to them. I tried to use Spambayes manager to reset the destination folders for spam and spam suspects, but upon clicking Browse, I got a black selection window, and I was unable to reset the folders. I tried following the instructions in the troubleshooting guide, but nothing worked. I exited Outlook, uninstalled Spambayes, then reinstalled Spambayes. Same problem. But I noticed that the database and configuration information had survived the uninstall, so I found those (C:\Documents and Settings\UserID\Application Data\SpamBayes) and deleted the 2 configuration settings. After reinstalling Spambayes everything worked perfectly. Regards, Mark Mark Drake GEMTEK Products techsupport@gemtek.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031216/1c76b3da/attachment-0001.html From rmalayter at bai.org Tue Dec 16 11:45:52 2003 From: rmalayter at bai.org (Ryan Malayter) Date: Tue Dec 16 11:45:55 2003 Subject: [Spambayes] More "spam of the future" lately? Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01A751BC@cliff.bai.org> I've been getting a lot more "spam of the future" these last few days. Paul Graham predicted that improved filters would eventually force spammers into such tactics, near the bottom of his seminal article: http://www.paulgraham.com/spam.html. The messages consist of basically nonsensical English sentences with a link. The problem is, all of these seem to be slipping by my trained SpamBayes, scoring 10% or less. Even domains and URLs change in every message, so SpamBayes hasn't really been able to pin them down yet, despite my training on each one. Is anyone else having problems with these types of spams recently? Has some prolific spammer changed tactics? Most of the one's I've seen seem to originate from Australia or Asia. Regards, Ryan From akiva at atwood.co.il Tue Dec 16 12:20:55 2003 From: akiva at atwood.co.il (Akiva Atwood) Date: Tue Dec 16 12:21:02 2003 Subject: [Spambayes] RE: Spambayes Digest, Vol 64, Issue 68 In-Reply-To: Message-ID: > Is anyone else having problems with these types of spams recently? Has > some prolific spammer changed tactics? Most of the one's I've seen seem > to originate from Australia or Asia. I've been getting a lot of them. I thought there was a problem with MY filter, and reinstalled it. Akiva From tim at fourstonesExpressions.com Tue Dec 16 12:29:26 2003 From: tim at fourstonesExpressions.com (Tim Stone) Date: Tue Dec 16 12:29:32 2003 Subject: [Spambayes] RE: Spambayes Digest, Vol 64, Issue 68 In-Reply-To: References: Message-ID: On Tue, 16 Dec 2003 19:20:55 +0200, Akiva Atwood wrote: >> Is anyone else having problems with these types of spams recently? Has >> some prolific spammer changed tactics? Most of the one's I've seen seem >> to originate from Australia or Asia. > > I've been getting a lot of them. I thought there was a problem with MY > filter, and reinstalled it. This might be well dealt with by changing the unknown word probability to indicate a stronger spamminess. By default, it's .5, iirc. Perhaps we should do some experiments with pushing it to .6 or .7. My corpus has virtually none of these spams, so I can't say what would happen, and I imagine that our test corpus has relatively few of them as well. Comments anyone? -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun From skip at pobox.com Tue Dec 16 12:54:21 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Dec 16 12:54:26 2003 Subject: [Spambayes] RE: Spambayes Digest, Vol 64, Issue 68 In-Reply-To: References: Message-ID: <16351.18125.655745.992400@montanaro.dyndns.org> >> Is anyone else having problems with these types of spams recently? >> Has some prolific spammer changed tactics? Most of the one's I've >> seen seem to originate from Australia or Asia. Akiva> I've been getting a lot of them. I thought there was a problem Akiva> with MY filter, and reinstalled it. It might be time to retrain from scratch, for a couple reasons I can think of: * your training database has some mistakes in it * as spam evolves, your old training database reflects current trends less accurately. If you have your ham and spam training databases arranged by date, perhaps you can just lop off the oldest 20% or so of your messages, train on a handful of the new-style spam and be on your merry way. Skip From st.intrope at verizon.net Tue Dec 16 12:55:28 2003 From: st.intrope at verizon.net (Intrope) Date: Tue Dec 16 12:55:29 2003 Subject: [Spambayes] More "spam of the future" lately? In-Reply-To: <792DE28E91F6EA42B4663AE761C41C2A01A751BC@cliff.bai.org> Message-ID: <20031216175525.ZFSW38.out012.verizon.net@titan> I noticed that too; sampling my spam folder, essentially all of my recent spams are SOTF. SpamBayes seems to take a long time to recognize these, but now days at least 90% of SOTF type messages are being filtered out. I guess what I'm saying is, be patient and train--it'll block more and more as time goes on. I wasn't paying close attention, but I'm guessing that it took several weeks (at least) to train SpamBayes on the SOTF; at one point I'd see 10 SOTF a day that got through the filters, but in the last few days I don't think I've seen any get through. Go SpamBayes Go! -Jon > -----Original Message----- > From: Ryan Malayter [mailto:rmalayter@bai.org] > Sent: Tuesday, December 16, 2003 10:46 AM > To: spambayes@Python.org > Subject: [Spambayes] More "spam of the future" lately? > > > I've been getting a lot more "spam of the future" these last few days. > Paul Graham predicted that improved filters would eventually force > spammers into such tactics, near the bottom of his seminal article: > http://www.paulgraham.com/spam.html. > > The messages consist of basically nonsensical English sentences with a > link. The problem is, all of these seem to be slipping by my trained > SpamBayes, scoring 10% or less. Even domains and URLs change in every > message, so SpamBayes hasn't really been able to pin them down yet, > despite my training on each one. > > Is anyone else having problems with these types of spams recently? Has > some prolific spammer changed tactics? Most of the one's I've seen seem > to originate from Australia or Asia. > > Regards, > Ryan > From clewis at iquest.net Tue Dec 16 13:45:09 2003 From: clewis at iquest.net (Chuck Lewis) Date: Tue Dec 16 13:42:59 2003 Subject: [Spambayes] Is this a sign of future problems ? Message-ID: <012901c3c404$bcbc34b0$190a10ac@GR43> Hi folks, Got this from a friend that runs another mailing list: ========================================================================= Interesting trend ... Garbage spam I've noticed an interesting trend recently ... a lot of the 'spam' I'm receiving lately is totally garbage. No content whatsoever ... not even hidden in HTML. Here's an example ... > Subject: Re: AMUKZGO, was beyond even > From: "Oliver" > Date: Wed, 17 Dec 2003 09:44:46 +0600 > To: midrange-jobs@midrange.com, midrange-l-admin@midrange.com, > midrange-l-owner@midrange.com, midrange-l-request@midrange.com, > midrange-l-sub@midrange.com, midrange-l-unsub@midrange.com > > > papaw bedimmed prophetic cocky > farfetched conceive auction ergodic robbin lullaby omaha > manslaughter pea celanese florentine assure depressible bowl cannel ewe gertrude The only reason I can think of is that the spammers are trying to poison the Baysian statistics that are being gathered, so more of the legitimate spam will be let through. David ========================================================================= So is he on to something here ? This sounds plausible from my admittedly limited understanding of these tools. Thought/Comments ? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031216/d21868dc/attachment.html From TiagoTiago at Globo.com Tue Dec 16 14:01:06 2003 From: TiagoTiago at Globo.com (Tiago Estill de Noronha) Date: Tue Dec 16 13:58:47 2003 Subject: [Spambayes] solution for the "spam of the future"? Message-ID: <003501c3c406$f5cb5fe0$2960b7c8@virtua.com.br> I have na idea, I dunno if it will work or if it is possible to implement it, but my guess is yes for both, k, here it goes: Create a "meta token" that will be used everytime a word not in the database is found in the email Do the bayesian thing when the user send the email containing a new word to spam or ham from that, everytime a user gets a email with new words spambayes would classify it as ham or spam After a while receiveing those random chars emails (and building the database of know words, the token database it self) the points for new word "meta token" would increase to the spam side ********************* Tiago Estill de Noronha TiagoTiago@Globo.com --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 From tim at fourstonesExpressions.com Tue Dec 16 14:02:18 2003 From: tim at fourstonesExpressions.com (Tim Stone) Date: Tue Dec 16 14:02:24 2003 Subject: [Spambayes] Is this a sign of future problems ? In-Reply-To: <012901c3c404$bcbc34b0$190a10ac@GR43> References: <012901c3c404$bcbc34b0$190a10ac@GR43> Message-ID: On Tue, 16 Dec 2003 13:45:09 -0500, Chuck Lewis wrote: > So is he on to something here ? This sounds plausible from my admittedly > limited understanding of these tools. There are a couple reasons a spammer might do something like this... one is to probe for valid addresses. However, it would be completely ineffective as a tool to poison bayesian filtering databases. These filters work on the words that are in the mail, not on the words that are not in the mail. So including words that are unrecognizable can have no adverse effect on the database, or the filter's ability to recognize spam based on the content of the database. In fact, an additional filtering technique could easily be developed which would create a token based on the number of words that are NOT in the database... so they're probably shooting themselves in the foot... -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun From mikes at members-insurance.com Tue Dec 16 14:20:42 2003 From: mikes at members-insurance.com (Michael K. Schummers) Date: Tue Dec 16 14:20:49 2003 Subject: [Spambayes] Deleted Junk folder Message-ID: <000f01c3c409$b31ae5a0$d60000c8@PRK94> Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 7559 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031216/84488595/attachment-0001.jpe From tim.one at comcast.net Tue Dec 16 14:50:50 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 16 14:51:00 2003 Subject: [Spambayes] Is this a sign of future problems ? In-Reply-To: <012901c3c404$bcbc34b0$190a10ac@GR43> Message-ID: [Chuck Lewis] > Got this from a friend that runs another mailing list: > > ========================================================================= > > Interesting trend ... Garbage spam > I've noticed an interesting trend recently ... a lot of the 'spam' I'm > receiving lately is totally garbage. No content whatsoever ... not > even hidden in HTML. > > Here's an example ... > >> Subject: Re: AMUKZGO, was beyond even >> From: "Oliver" >> Date: Wed, 17 Dec 2003 09:44:46 +0600 >> To: midrange-jobs@midrange.com, midrange-l-admin@midrange.com, >> midrange-l-owner@midrange.com, midrange-l-request@midrange.com, >> midrange-l-sub@midrange.com, midrange-l-unsub@midrange.com >> >> >> papaw bedimmed prophetic cocky >> farfetched conceive auction ergodic robbin lullaby omaha >> manslaughter pea celanese florentine assure depressible bowl cannel >> ewe gertrude > > The only reason I can think of is that the spammers are trying to > poison the Baysian statistics that are being gathered, so more of the > legitimate spam will be let through. > > David > ========================================================================= > > So is he on to something here ? This sounds plausible from my > admittedly limited understanding of these tools. > > Thought/Comments ? The point of inserting random gibberish is to frustrate fingerprinting schemes (if no two spam are the same, comparing new spam simple-mindedly to a database of known spam won't catch anything new). Spam has always done this. The modern variation is inserting random dictionary words instead of completely random strings, because some fingerprinting schemes have grown smart enough to ignore non-dictionary strings, or even to penalize their presence. Neither variation is much use against Bayesian filters (e.g., is bedimmed a strong ham word for you? heh). More effective against those is to include the text of a randomly chosen contemporary news story (then it stands a decent chance of sneaking by the filters of those to whom the content of the news story matches things they normally correspond about). From dario at escape.com Tue Dec 16 15:05:45 2003 From: dario at escape.com (Dario Laverde) Date: Tue Dec 16 15:05:14 2003 Subject: [Spambayes] request Message-ID: <3FDF6599.8060400@escape.com> I heard alot of good reviews by I don't use Outllook - how about a version for Netscape mail? Can it work on archived mail (flat text files)? thanks dario From tim at fourstonesExpressions.com Tue Dec 16 15:12:04 2003 From: tim at fourstonesExpressions.com (Tim Stone) Date: Tue Dec 16 15:12:11 2003 Subject: [Spambayes] request In-Reply-To: <3FDF6599.8060400@escape.com> References: <3FDF6599.8060400@escape.com> Message-ID: On Tue, 16 Dec 2003 15:05:45 -0500, Dario Laverde wrote: > I heard alot of good reviews by I don't use Outllook - how about a > version for Netscape mail? Can it work on archived mail (flat text > files)? sb_server or pop3proxy can be used by netscape mail, or any pop3 mail client for that matter. See the documentation for usage. It can work on standard mbx files. Again, the documentation contains information about how to do this. Doc is delivered with the code, and/or is on http://spambayes.sourceforge.net -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun From papaDoc at videotron.ca Tue Dec 16 15:18:04 2003 From: papaDoc at videotron.ca (papaDoc) Date: Tue Dec 16 15:18:35 2003 Subject: [Spambayes] request In-Reply-To: <3FDF6599.8060400@escape.com> References: <3FDF6599.8060400@escape.com> Message-ID: <3FDF687C.10104@videotron.ca> Hi, > I heard alot of good reviews by I don't use Outllook - how about a > version for Netscape mail? Can it work on archived mail (flat text > files)? This is an easy one! Yes and Yes ........ Seriously > how about a version for Netscape mail? Yes you can use sb_server.py with the web interface to train. Netscape is able to do almost anything you want with your mail. Create a filter to check for a special header and move your spam to a spam folder. > Can it work on archived mail (flat text files)? Yes you can use sb_filter to add a header to say if the mail is spam or ham then use sb_splitndir.py to split the file so that one file equal 1 mail then you can delete all files containing X-Spambayes-I_dont_remenber_the_name:: spam and reassemble the file cat * > ../reassembled.mbox This should work and It might exist an easiest solution but It is the first one I came with, Remi From dreas at emailaccount.nl Tue Dec 16 15:38:12 2003 From: dreas at emailaccount.nl (Dreas van Donselaar) Date: Tue Dec 16 15:38:16 2003 Subject: [Spambayes] SpamBayes for 500.000 users Message-ID: Hi everyone :) I am quite new here but have been following the current discussions with a lot of interest. I actually have the plan to build a comprehensive anti-spam solution (yes, yet another one) which will mainly work server-side. A combination of the Cloudmark system (generating an unique ID per email .. and matching the ID in the central database to test whether it has been identified as spam before or not), Bayesian server-side and Bayesian user side seems to be the ideal solution. I am not a real technical person, and I will hire developers to build this, but I was wondering whether Bayesian filtering will actually be useful if there would be 500.000 using a central database server. Should the database only store data for like 24 hour or would it make sense to keep it growing? Would there actually be extra value by having so many (reporting) users? I was wondering if you guys/girls could give me some things to think about and maybe I can get some input about what has already been thought about by others before :) P.S. Yes I know I'll need huge server-capacity. Regards, Dreas van Donselaar -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031216/69c3e8fa/attachment.html From matt at mondoinfo.com Tue Dec 16 16:02:28 2003 From: matt at mondoinfo.com (Matthew Dixon Cowles) Date: Tue Dec 16 16:02:39 2003 Subject: [Spambayes] More "spam of the future" lately? In-Reply-To: <792DE28E91F6EA42B4663AE761C41C2A01A751BC@cliff.bai.org> References: <792DE28E91F6EA42B4663AE761C41C2A01A751BC@cliff.bai.org> Message-ID: <1071607720.02.5775@mint-julep.mondoinfo.com> > I've been getting a lot more "spam of the future" these last few > days. Paul Graham predicted that improved filters would eventually > force spammers into such tactics, near the bottom of his seminal > article: http://www.paulgraham.com/spam.html. I've gotten a few of those lately. I've found that Richard Jowsey's idea of retrieving the URLs in messages that score in the unsure range and scoring the pages received as a sort of synthesized message helps a good bit. That's a somewhat controversial thing to do. People have pointed out that if everyone did it, it would enable a spammer to engineer a distributed denial-of-service attack against some site they didn't like. But I expect that a spammer could engineer an attack like that more easily just by including a URL that pointed to the innocent server with a link that said "Free sex here". The other problem with doing things that way is that web pages don't look a lot like emails. Web pages sometimes get scored as spammy for the wrong reasons. I haven't looked at the numbers carefully but since I get few hams in the unsure range, I could probably do just as well by setting my spam cutoff to 0.5 or so. Regards, Matt From rcoe at CambridgeMA.GOV Tue Dec 16 16:07:30 2003 From: rcoe at CambridgeMA.GOV (Coe, Bob) Date: Tue Dec 16 16:07:38 2003 Subject: [Spambayes] RE: solution for the "spam of the future"? Message-ID: I suppose that would also tend to filter out messages in languages you don't understand (assuming, as we've discussed before, that the orthography of the language at issue lends itself to tokenization). Bob MIS Department, City of Cambridge 831 Massachusetts Ave, Cambridge MA 02139 ? 617-349-4217 ? fax 617-349-6165 > -----Original Message----- > From: Tiago Estill de Noronha [mailto:TiagoTiago@Globo.com] > Sent: Tuesday, December 16, 2003 2:01 PM > To: 'SpamBayes' > Subject: [Spambayes] solution for the "spam of the future"? > > > I have an idea, I dunno if it will work or if it is possible to implement > it, but my guess is yes for both, k, here it goes: > > Create a "meta token" that will be used everytime a word not in the > database is found in the email > Do the bayesian thing when the user send the email containing a new word to > spam or ham > from that, everytime a user gets a email with new words spambayes would > classify it as ham or spam > After a while receiveing those random chars emails (and building the > database of know words, the token database it self) the points for new word > "meta token" would increase to the spam side From d.kindred at telesciences.com Tue Dec 16 16:22:48 2003 From: d.kindred at telesciences.com (David L Kindred (Dave)) Date: Tue Dec 16 16:22:53 2003 Subject: [Spambayes] Is this a sign of future problems ? In-Reply-To: References: <012901c3c404$bcbc34b0$190a10ac@GR43> Message-ID: <16351.30632.847868.731045@gargle.gargle.HOWL> >>>>> "Tim" == Tim Peters writes: Tim> [Chuck Lewis] >> The only reason I can think of is that the spammers are trying to >> poison the Baysian statistics that are being gathered, so more of >> the legitimate spam will be let through. Tim> ... The modern variation is inserting random dictionary words Tim> instead of completely random strings ... Neither variation is Tim> much use against Bayesian filters (e.g., is bedimmed a strong Tim> ham word for you? heh). ... What if the goal is not to try and trick the Bayesian filter into treating spam as ham, but the inverse? Could this be an attempt at a kind of "denial-of-service" attack by trying to get the filter to start treating everything as spam? Would that idea work? -- David L. Kindred Unix Systems & Network Administrator Telesciences, Inc. From skip at pobox.com Tue Dec 16 16:34:08 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Dec 16 16:34:10 2003 Subject: [Spambayes] Is this a sign of future problems ? In-Reply-To: <012901c3c404$bcbc34b0$190a10ac@GR43> References: <012901c3c404$bcbc34b0$190a10ac@GR43> Message-ID: <16351.31312.645532.479461@montanaro.dyndns.org> Chuck> Interesting trend ... Garbage spam Chuck> I've noticed an interesting trend recently ... a lot of the Chuck> 'spam' I'm receiving lately is totally garbage. No content Chuck> whatsoever ... not even hidden in HTML. ... Chuck> The only reason I can think of is that the spammers are trying to Chuck> poison the Baysian statistics that are being gathered, so more of Chuck> the legitimate spam will be let through. Maybe, but it may also be they are just verifying email addresses. Skip From tim.one at comcast.net Tue Dec 16 16:38:43 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 16 16:38:49 2003 Subject: [Spambayes] More "spam of the future" lately? In-Reply-To: <792DE28E91F6EA42B4663AE761C41C2A01A751BC@cliff.bai.org> Message-ID: [Ryan Malayter] > I've been getting a lot more "spam of the future" these last few days. > Paul Graham predicted that improved filters would eventually force > spammers into such tactics, near the bottom of his seminal article: > http://www.paulgraham.com/spam.html. > > The messages consist of basically nonsensical English sentences with a > link. I doubt Paul would agree that's what he was predicting . Chatty "just folks" spam is more along his lines, not reams of nonsense. > The problem is, all of these seem to be slipping by my trained > SpamBayes, scoring 10% or less. Why? Look at the spam clues. There has to be something decidely hammy about them to score that low, and a collection of random words isn't decidedly hammy except by accident. There must be more to it. If they're managing to hit something *systematically* hammy for you, then continued training will make whatever that is stop looking hammy to you. > Even domains and URLs change in every message, so SpamBayes hasn't > really been able to pin them down yet, despite my training on each one. I've been seeing a fair number of these lately too, and some end up Unsure for me. I don't care unless it persists, though: any spammer who thinks his goal is evading filters is a spammer who won't stay in business long. His real goal has to be selling product, and I expect that including random sentences has to decrease response rate significantly. I look at one of these and can't imagine being tempted to respond, because the author appears illiterate and incompetent simply *because* the message contains so much nonsense. Why would I buy anything from someone that lame? Spam goes thru fads, like everything else, and I *expect* this one to self-destruct due to its own ineffectiveness at selling product. I could be wrong, of course, but I'm probably not . From cej at intech.com Tue Dec 16 16:47:58 2003 From: cej at intech.com (Christopher Jastram) Date: Tue Dec 16 16:47:27 2003 Subject: [Spambayes] SpamBayes for 500.000 users In-Reply-To: References: Message-ID: <3FDF7D8E.80005@intech.com> Hi, All I can say is ... wow ... But I can give you some first-hand knowledge from a much smaller user base. I'm setting the same thing up for an office of 5 people, and here's the bare-bones fact; I need a separate database for each user. I've tried using one database for everyone, and it does work. But it only catches about 30-40 percent of spam. Not sure why this is the case, but it is (unbalanced training?). I'm still fiddling with making it work right (lots of other things take priority), but that's what I've discovered. I'm sure others can help you out much more. Also, unless you have 500,00 really really really super wise high-falutin' happy joyous technophiles for users, you'll have a sorry time educating everyone. Chris Dreas van Donselaar wrote: >Hi everyone :) > > > >I am quite new here but have been following the current discussions with a >lot of interest. I actually have the plan to build a comprehensive anti-spam >solution (yes, yet another one) which will mainly work server-side. A >combination of the Cloudmark system (generating an unique ID per email .. >and matching the ID in the central database to test whether it has been >identified as spam before or not), Bayesian server-side and Bayesian user >side seems to be the ideal solution. > > > >I am not a real technical person, and I will hire developers to build this, >but I was wondering whether Bayesian filtering will actually be useful if >there would be 500.000 using a central database server. Should the database >only store data for like 24 hour or would it make sense to keep it growing? >Would there actually be extra value by having so many (reporting) users? > > > >I was wondering if you guys/girls could give me some things to think about >and maybe I can get some input about what has already been thought about by >others before :) > > > >P.S. Yes I know I'll need huge server-capacity. > > >Regards, > > > >Dreas van Donselaar > > > > >------------------------------------------------------------------------ > >_______________________________________________ >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes >Check the FAQ before asking: http://spambayes.sf.net/faq.html > From skip at pobox.com Tue Dec 16 16:49:45 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Dec 16 16:49:50 2003 Subject: [Spambayes] solution for the "spam of the future"? In-Reply-To: <003501c3c406$f5cb5fe0$2960b7c8@virtua.com.br> References: <003501c3c406$f5cb5fe0$2960b7c8@virtua.com.br> Message-ID: <16351.32249.599295.688762@montanaro.dyndns.org> Tiago> Create a "meta token" that will be used everytime a word not in Tiago> the database is found in the email Do the bayesian thing when the Tiago> user send the email containing a new word to spam or ham from Tiago> that, everytime a user gets a email with new words spambayes Tiago> would classify it as ham or spam After a while receiveing those Tiago> random chars emails (and building the database of know words, the Tiago> token database it self) the points for new word "meta token" Tiago> would increase to the spam side Let's modify your proposal slightly. Suppose we add a "missing: N" clue, where N is the number of tokens found in the message but not in the training database. Otherwise, I suspect almost all mails will generate a "missing:" token. (No token is generated more than once per message.) There's a problem with either formulation. Start with an empty training database. Add one spam. All N tokens it contains will be missing from the database, yielding a "missing: N" token (or maybe a "missing log(N)" token). Add another message, make this one ham. It won't overlap 100% with the spam you just added, so it will generate a "missing: M" token. And so on. Early on, it seems your database will be polluted with a rather large number of missing: tokens for both ham and spam. I think it might be difficult to overcome these initial training "mistakes" to turn it into a potentially useful clue. Skip From tim.one at comcast.net Tue Dec 16 16:53:57 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 16 16:54:00 2003 Subject: [Spambayes] Is this a sign of future problems ? In-Reply-To: <16351.30632.847868.731045@gargle.gargle.HOWL> Message-ID: [David L. Kindred] > What if the goal is not to try and trick the Bayesian filter into > treating spam as ham, but the inverse? I already know what the goal is, and already explained it . > Could this be an attempt at a kind of "denial-of-service" attack by > trying to get the filter to start treating everything as spam? Would > that idea work? No. The empirical proof is in the pudding: is *anyone* here seeing an increase in false positives due to training on these collection-of-random-word msgs as spam? If the spammers were well-organized and picked on sets of *common* words in a coordinated way, just maybe, but these appear to be picked out of a dictionary at random. It doesn't matter one whit to me, e.g., whether "bedimmed" gets treated as hammy or spammy in my database, because I'll never see it in real life. The same is true of most words picked at random out of a dictionary (the day-to-day working vocabulary of most adults is a small percentage of all the words in even a poor dictionary -- and a tiny percentage of the words in, e.g., the OED). From skip at pobox.com Tue Dec 16 16:56:45 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Dec 16 16:56:50 2003 Subject: [Spambayes] More "spam of the future" lately? In-Reply-To: References: <792DE28E91F6EA42B4663AE761C41C2A01A751BC@cliff.bai.org> Message-ID: <16351.32669.700492.431318@montanaro.dyndns.org> >> The problem is, all of these seem to be slipping by my trained >> SpamBayes, scoring 10% or less. Tim> Why? Look at the spam clues. There has to be something decidely Tim> hammy about them to score that low, and a collection of random Tim> words isn't decidedly hammy except by accident. There must be more Tim> to it. If they're managing to hit something *systematically* hammy Tim> for you, then continued training will make whatever that is stop Tim> looking hammy to you. Based on my own personal experience, I always consider "pilot error" as one of the first possible causes of such problems. It occurs to me that a simple script (or a database parallel to the training database) which maps tokens to lists of spam/ham message ids instead of just message counts might be helpful in tracking down such mistakes. Instead of executing db = shelve.open("hammie.db") print db["url:biz"] and getting (2, 12) I might execute db = shelve.open("hammie-msgids.db") print db["url:biz"] and get [["spam-msgid1", "spam-msgid2"], ["ham-msgid1", ..., "ham-msgid12"]] thus allowing me to more easily locate the spuriously trained ham messages which are the source of the "url:biz" token. Skip From skip at pobox.com Tue Dec 16 17:00:26 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Dec 16 17:00:31 2003 Subject: [Spambayes] SpamBayes for 500.000 users In-Reply-To: <3FDF7D8E.80005@intech.com> References: <3FDF7D8E.80005@intech.com> Message-ID: <16351.32890.550695.824501@montanaro.dyndns.org> Chris> But I can give you some first-hand knowledge from a much smaller Chris> user base. I'm setting the same thing up for an office of 5 Chris> people, and here's the bare-bones fact; I need a separate Chris> database for each user. I've tried using one database for Chris> everyone, and it does work. But it only catches about 30-40 Chris> percent of spam. Not sure why this is the case, but it is Chris> (unbalanced training?). Does your shared database draw fairly equally on mail sent to all five people? If not, you may find that some of the clues in the header will "poison" your database. Tim discovered this effect in spades during early testing. I believe one of the larger spam databases he used initially were all sent to one person. The recipient-oriented clues related to that user poisoned his tests. Skip From tim at fourstonesExpressions.com Tue Dec 16 17:03:30 2003 From: tim at fourstonesExpressions.com (Tim Stone) Date: Tue Dec 16 17:03:36 2003 Subject: [Spambayes] solution for the "spam of the future"? In-Reply-To: <16351.32249.599295.688762@montanaro.dyndns.org> References: <003501c3c406$f5cb5fe0$2960b7c8@virtua.com.br> <16351.32249.599295.688762@montanaro.dyndns.org> Message-ID: On Tue, 16 Dec 2003 15:49:45 -0600, Skip Montanaro wrote: > Let's modify your proposal slightly. Suppose we add a "missing: N" clue, > where N is the number of tokens found in the message but not in the > training > database. It would seem better to me to have a threshold, specified as an option, of how many words are missing to trigger a specific token, like "more-than-n-words-missing" Having "missing 1", "missing 2", "missing 3", ... tokens is probably not as good an indicator... That said, at initial training time, and until a database grows to some reasonable size (see separate thread "how low can you go") this token will always show up. Therefore, it'd only be a good indicator after a certain point... Does that limit it's usefulness? -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun From rcoe at CambridgeMA.GOV Tue Dec 16 17:16:28 2003 From: rcoe at CambridgeMA.GOV (Coe, Bob) Date: Tue Dec 16 17:16:33 2003 Subject: [Spambayes] RE: solution for the "spam of the future"? Message-ID: Don't start generating the "Missing: N" token until the database is large enough for it to make sense. Bob > -----Original Message----- > From: Skip Montanaro [mailto:skip@pobox.com] > Sent: Tuesday, December 16, 2003 4:50 PM > To: Tiago Estill de Noronha > Cc: 'SpamBayes' > Subject: Re: [Spambayes] solution for the "spam of the future"? > > > > Tiago> Create a "meta token" that will be used everytime a word not in > Tiago> the database is found in the email Do the bayesian thing when the > Tiago> user send the email containing a new word to spam or ham from > Tiago> that, everytime a user gets a email with new words spambayes > Tiago> would classify it as ham or spam After a while receiveing those > Tiago> random chars emails (and building the database of know words, the > Tiago> token database it self) the points for new word "meta token" > Tiago> would increase to the spam side > > Let's modify your proposal slightly. Suppose we add a "missing: N" clue, > where N is the number of tokens found in the message but not in the training > database. Otherwise, I suspect almost all mails will generate a "missing:" > token. (No token is generated more than once per message.) > > There's a problem with either formulation. Start with an empty training > database. Add one spam. All N tokens it contains will be missing from the > database, yielding a "missing: N" token (or maybe a "missing log(N)" token). > Add another message, make this one ham. It won't overlap 100% with the spam > you just added, so it will generate a "missing: M" token. And so on. Early > on, it seems your database will be polluted with a rather large number of > missing: tokens for both ham and spam. I think it might be difficult to > overcome these initial training "mistakes" to turn it into a potentially > useful clue. > > Skip From russ_foster at comcast.net Tue Dec 16 16:11:33 2003 From: russ_foster at comcast.net (Russ Foster) Date: Tue Dec 16 17:17:57 2003 Subject: [Spambayes] Is this a sign of future problems ? In-Reply-To: <16351.31312.645532.479461@montanaro.dyndns.org> Message-ID: How does it validate an email address if it doesn't contain any HTML? -- > Chuck> I've noticed an interesting trend recently ... a lot of the > Chuck> 'spam' I'm receiving lately is totally garbage. No content > Chuck> whatsoever ... not even hidden in HTML. > > Chuck> The only reason I can think of is that the spammers are trying to > Chuck> poison the Baysian statistics that are being gathered, so more of > Chuck> the legitimate spam will be let through. > > Maybe, but it may also be they are just verifying email addresses. > > Skip From tim.one at comcast.net Tue Dec 16 17:24:52 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 16 17:24:56 2003 Subject: [Spambayes] More "spam of the future" lately? In-Reply-To: <1071607720.02.5775@mint-julep.mondoinfo.com> Message-ID: [Matthew Dixon Cowles] > I've gotten a few of those lately. I've found that Richard Jowsey's > idea of retrieving the URLs in messages that score in the unsure > range and scoring the pages received as a sort of synthesized message > helps a good bit. Is that checked in as a "standard albeit experimental" option yet? It should be. > That's a somewhat controversial thing to do. People have pointed out > that if everyone did it, it would enable a spammer to engineer a > distributed denial-of-service attack against some site they didn't > like. There aren't enough spambayes users for me to worry about that; then cut by the number who would actually enable this option, and multiply by their unsure rate. It wouldn't amount to much in the end. > But I expect that a spammer could engineer an attack like that > more easily just by including a URL that pointed to the innocent > server with a link that said "Free sex here". No, if spammers had *any* way to entice large numbers of people to click on a link, they would use that way to sell their product instead. The best they could do is change the source for their highest-response-rate spam campaign to point to a hated site instead of their own. > The other problem with doing things that way is that web pages don't > look a lot like emails. Web pages sometimes get scored as spammy for > the wrong reasons. Easily solved by changing Richard's gimmick to, e.g., stick a "web:" prefix on each token generated from a web page. That effectively creates an independent (sub)database for tokens derived from parsing web pages. > I haven't looked at the numbers carefully but since I get few hams in > the unsure range, I could probably do just as well by setting my spam > cutoff to 0.5 or so. 0.7 maybe, but you'd eventually regret dropping it to 0.5. From kennypitt at hotmail.com Tue Dec 16 17:27:05 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Tue Dec 16 17:27:46 2003 Subject: [Spambayes] RE: solution for the "spam of the future"? In-Reply-To: Message-ID: Coe, Bob wrote: > Don't start generating the "Missing: N" token until the database is > large enough for it to make sense. If this works at all, it also seems like the *percentage* of unknown word tokens in the message would work better than a log()'d count. A very large newsletter is pretty much guaranteed to have a higher *count* of unknown tokens than a short mailing list message, but that's because it has more total tokens and not because it's any spammier. -- Kenny Pitt From tim.one at comcast.net Tue Dec 16 17:29:43 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 16 17:29:47 2003 Subject: [Spambayes] Is this a sign of future problems ? In-Reply-To: Message-ID: [Russ Foster] > How does it validate an email address if it doesn't contain any HTML? The spammer can simply record whether the email is rejected at SMTP time, with a "no such user" kind of response. Keeping "rejected because it looks like spam" rejections out of the mix is helpful for that purpose. From kennypitt at hotmail.com Tue Dec 16 17:33:06 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Tue Dec 16 17:33:42 2003 Subject: [Spambayes] More "spam of the future" lately? In-Reply-To: Message-ID: Tim Peters wrote: > [Matthew Dixon Cowles] >> I've gotten a few of those lately. I've found that Richard Jowsey's >> idea of retrieving the URLs in messages that score in the unsure >> range and scoring the pages received as a sort of synthesized message >> helps a good bit. > > Is that checked in as a "standard albeit experimental" option yet? It > should be. > >> That's a somewhat controversial thing to do. People have pointed out >> that if everyone did it, it would enable a spammer to engineer a >> distributed denial-of-service attack against some site they didn't >> like. > > There aren't enough spambayes users for me to worry about that; then > cut by the number who would actually enable this option, and multiply > by their unsure rate. It wouldn't amount to much in the end. My biggest concern here would be verifying my e-mail address to all the spammers. By retrieving the content of a link in order to classify it, you have performed the same operation as clicking on it and have thus triggered any "web bug" attached to that link in the form of an identifying query parameter. -- Kenny Pitt From skip at pobox.com Tue Dec 16 17:37:38 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Dec 16 17:37:38 2003 Subject: [Spambayes] Is this a sign of future problems ? In-Reply-To: References: <16351.31312.645532.479461@montanaro.dyndns.org> Message-ID: <16351.35122.443946.422903@montanaro.dyndns.org> >> Maybe, but it may also be they are just verifying email addresses. Russ> How does it validate an email address if it doesn't contain any Russ> HTML? They just send their nonsense message to russ_foster@comcast.net. If it bounces, they delete that address from their database. If not, they mark it valid for some period of time. I think the primary reason they have to do this is that many people simply abandon an email address when it gets too overloaded with spam. If they didn't perform some checks, their database would fairly rapidly become useless. Skip From tim.one at comcast.net Tue Dec 16 17:57:42 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 16 17:57:46 2003 Subject: [Spambayes] More "spam of the future" lately? In-Reply-To: Message-ID: [Matthew Dixon Cowles] >>> I've found that Richard Jowsey's idea of retrieving the URLs in >>> messages that score in the unsure range and scoring the pages >>> received as a sort of synthesized message helps a good bit. [Tim Peeter] >> There aren't enough spambayes users for me to worry about that; then >> cut by the number who would actually enable this option, and multiply >> by their unsure rate. It wouldn't amount to much in the end. [Kenny Pitt] > My biggest concern here would be verifying my e-mail address to all > the spammers. By retrieving the content of a link in order to > classify it, you have performed the same operation as clicking on it > and have thus triggered any "web bug" attached to that link in the > form of an identifying query parameter. That's part of what I mean by "cut by the number who would actually enable this option". I would, but I wouldn't *recommend* that most should (and it should certainly be off by default). Another class of user I have in mind is those on slow pay-by-the-minute dialup accounts, where fetching mounds of web trash is immediately expensive, and in more than one way (time, bandwidth and money). I'm not worried about validating my owh email address to spammers -- seems to me it's been validated to them ten thousand times over already . From dreas at emailaccount.nl Tue Dec 16 18:25:58 2003 From: dreas at emailaccount.nl (Dreas van Donselaar) Date: Tue Dec 16 18:25:59 2003 Subject: [Spambayes] SpamBayes for 500.000 users In-Reply-To: <16351.32890.550695.824501@montanaro.dyndns.org> Message-ID: So does this mean that Bayesian won't be effective for such a large user database? Regards, Dreas van Donselaar -----Original Message----- From: spambayes-bounces+dreas=emailaccount.nl@python.org [mailto:spambayes-bounces+dreas=emailaccount.nl@python.org] On Behalf Of Skip Montanaro Sent: dinsdag 16 december 2003 23:00 To: Christopher Jastram Cc: spambayes@python.org Subject: Re: [Spambayes] SpamBayes for 500.000 users Chris> But I can give you some first-hand knowledge from a much smaller Chris> user base. I'm setting the same thing up for an office of 5 Chris> people, and here's the bare-bones fact; I need a separate Chris> database for each user. I've tried using one database for Chris> everyone, and it does work. But it only catches about 30-40 Chris> percent of spam. Not sure why this is the case, but it is Chris> (unbalanced training?). Does your shared database draw fairly equally on mail sent to all five people? If not, you may find that some of the clues in the header will "poison" your database. Tim discovered this effect in spades during early testing. I believe one of the larger spam databases he used initially were all sent to one person. The recipient-oriented clues related to that user poisoned his tests. Skip _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html From matt at mondoinfo.com Tue Dec 16 18:53:12 2003 From: matt at mondoinfo.com (Matthew Dixon Cowles) Date: Tue Dec 16 18:54:35 2003 Subject: [Spambayes] More "spam of the future" lately? In-Reply-To: References: Message-ID: <1071615477.83.5976@mint-julep.mondoinfo.com> [spidering URLs in unsure messages] > My biggest concern here would be verifying my e-mail address to all > the spammers. By retrieving the content of a link in order to > classify it, you have performed the same operation as clicking on > it and have thus triggered any "web bug" attached to that link in > the form of an identifying query parameter. The code I use strips off a question-mark and the CGI parameters that follow it and an at-sign and the username and password fields that precede it in an effort to avoid send identifying data. I suspect that that means that it sometimes doesn't get any data back but I'm willing to live with that. But, as Tim says, it may not be important. Regards, Matt From rmalayter at bai.org Tue Dec 16 20:39:31 2003 From: rmalayter at bai.org (Ryan Malayter) Date: Tue Dec 16 20:39:36 2003 Subject: [Spambayes] More "spam of the future" lately? Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01A751C7@cliff.bai.org> [Tim Peters] > Why? Look at the spam clues. There has to be something > decidely hammy about them to score that low, and a collection > of random words isn't decidedly hammy except by accident. > There must be more to it. If they're managing to hit > something *systematically* hammy for you, then continued > training will make whatever that is stop looking hammy to you. I've attached the spam clues (post training, it originally scored 5%) for one of the messages below. There is a lot in there that scores as hammy, but as you said, training should eventually take care of that. Maybe it's accidental that I'm getting quite a few of these recently, but it seems statistically unlikely that their nonsense would score hammy as repeatedly as I've seen (maybe a dozen in the past few days). One of the most innocent tokens is a .org URL, which I thought were only supposed to go to not-for-profits. Did somebody fool Verisign or Register.com during the domain registration process? Or can you get a .org domain from those fly-by-night registrars, too? ---------- Spam Score: 67% (0.666863) word spamprob #ham #spam '*H*' 0.0491694 - - '*S*' 0.382895 - - 'maybe' 0.147442 122 14 'hi,' 0.157707 65 8 'know' 0.161535 426 55 'url:org' 0.167468 1103 149 'location' 0.168549 38 5 'when' 0.176258 599 86 'services,' 0.177496 15 2 'still' 0.18705 292 45 'package' 0.212158 23 4 'rest' 0.22472 47 9 'let' 0.229087 211 42 'mail' 0.247433 349 77 'had' 0.247833 267 59 'distribution' 0.252839 49 11 'wanted' 0.269298 49 12 'just' 0.277374 543 140 'may' 0.288208 287 78 'have' 0.290698 1042 287 'last' 0.305627 166 49 'envelope' 0.316995 10 3 'page' 0.318821 83 26 'that' 0.324895 1218 394 'can' 0.327004 937 306 'part' 0.342645 123 43 'miss' 0.35393 41 15 'spoke' 0.365277 8 3 'and' 0.37135 1453 577 'not' 0.372072 944 376 'for' 0.378071 1270 519 'with' 0.37937 954 392 'saw' 0.394568 23 10 'decided' 0.395516 32 14 'further' 0.604276 33 34 'header:MIME-Version:1' 0.608193 1027 1072 'header:Received:1' 0.614259 83 89 'cc:none' 0.615065 1061 1140 'title' 0.622095 8 9 'transferred' 0.641935 4 5 'address' 0.645478 133 163 'to:addr:bai.org' 0.72425 322 569 'header:Reply-To:1' 0.745034 324 637 'sender:none' 0.770971 503 1139 'life.' 0.777099 10 24 'angelika' 0.844828 0 1 'cd-roms,' 0.844828 0 1 'friends?' 0.844828 0 1 'from:addr:alisa-qween' 0.844828 0 1 'from:addr:ancitel.it' 0.844828 0 1 'from:name:alisa-qween' 0.844828 0 1 'libraries,' 0.844828 0 1 'more??' 0.844828 0 1 'reply-to:addr:alisa-qween' 0.844828 0 1 'reply-to:addr:ancitel.it' 0.844828 0 1 'reply-to:name:alisa-qween' 0.844828 0 1 'single.' 0.844828 0 1 'subject:Greetings' 0.844828 0 1 'to:addr:acsons' 0.844828 0 1 'to:name:acsons' 0.844828 0 1 'url:angelika_k' 0.844828 0 1 'url:myowndate' 0.844828 0 1 'write.' 0.844828 0 1 'married' 0.874007 1 6 Message Stream: X-MS-Mail-Gibberish: Microsoft Mail Internet Headers Version 2.0 Received: from 218.13.137.125 ([218.13.137.125]) by smtp.bai.org with Microsoft SMTPSVC(5.0.2195.5329); Tue, 16 Dec 2003 09:10:01 -0600 From: alisa-qween Reply-To: alisa-qween To: acsons Subject: Greetings MIME-Version: 1.0 Content-Type: text/html; charset=windows-1251 Content-Transfer-Encoding: 8bit Return-Path: alisa-qween@ancitel.it Message-ID: X-OriginalArrivalTime: 16 Dec 2003 15:10:02.0066 (UTC) FILETIME=[ADB41320:01C3C3E6] Date: 16 Dec 2003 09:10:02 -0600 Hi, return the media envelope along with the rest of the package to the location REMAINS WITH YOU. i saw your e mail address and decided to write. i know when we last spoke you had a lady in your life. i just wanted to let you know i am still single. i have not yet married and still have no children. Maybe we can still be friends? or maybe more?? My page distribution libraries, CD-ROMs, etc.). You may charge a distribution fee for miss you information you provide to as part of the Support Services, the Angelika the is transferred to you. You further acknowledge that title and full Hi, return the media envelope along with the rest of the package to the location
REMAINS WITH YOU.
i saw your e mail address and decided to write. i know when we last spoke you had a lady in your life. i just wanted to let you know i am still single. i have not yet married and still have no children. Maybe we can still be friends? or maybe more??
My page
distribution libraries, CD-ROMs, etc.). You may charge a distribution fee for
miss you
information you provide to as part of the Support Services, the
Angelika
the is transferred to you. You further acknowledge that title and full
From tim.one at comcast.net Tue Dec 16 21:40:53 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 16 21:40:54 2003 Subject: [Spambayes] How low can you go? In-Reply-To: Message-ID: [Tim, on the spambayes list, about x-use_bigrams in CVS] > I see that it's a cruder approximation to the suggested scoring > algorithm (which I implemented at one time). For example ... I checked in the intended implementation. Here's the checkin comment: Implemented the intended "tiling" version of x-use_bigrams. Tried to restore most of the speed lost when this option *isn't* in use. Will add comments later. Anyone using x-use_bigrams needs to retrain: synthesized bigrams now begin with a "bi:" prefix. Skip, that last point addresses your (good!) concern about ambiguity wrt the special 'saved state' key. Here's what I've found so far. My main personal database is currently trained on 474 ham and 489 spam, using mostly mistake-and-unsure-based training, with a spam cutoff of 95 and a ham cutoff of 4 (yup, those are extreme -- I've been experimenting). Database size (a bsddb3 hash database): without x-use_bigrams 2,544KB with x-use_bigrams 10,288KB That's a major size boost, and (of course) is expected (bigrams create fat hapaxes at a prodigious rate). There's no reason to suppose that the selection of training ham and spam based on mistake-and-unsure training from a unigram-only classifier makes much sense for a mixed uni+bi-gram classifier; to the contrary, the latter almost certainly has different strengths and weaknesses. An example of that is the highest scoring ham in my inbox. Because I had previously put copies of some of those into my ham training data, back when my ham cutoff was 20, without x-use_bigrams no message in my inbox today scores above 20. These are the worst: 6 6 6 7 7 7 7 8 8 8 9 9 9 12 13 13 14 16 After retraining on the same training sets with x-use_bigrams, then rescoring my inbox, the highest-scoring ham in my inbox are worse: 7 8 8 9 10 12 13 13 13 13 16 22 25 31 34 38 45 49 I'm confident that this is an artifact of using training sets based on picking on the weakest performance of a different scoring strategy, and that had I been using train-on-everything all along, that result would have been very different. There's an interesting example in the other direction too: the last time I started over from scratch, I left one Unsure in my Unsure folder, and have kept it there ever since. It's a long and chatty spam, about a topic I even have some interest in (no, my wang already has carpet burns ), and I wanted to see how mistake-based training changed its score over time. It drifted slowly upward all along, from the low 40s to the low 80s. Under x-use_bigrams, though, the score zoomed to 95.34. The difference is high-scoring bigrams that appeared in a few other spam: 'bi:any questions,' 0.908163 0 2 'bi:website at:' 0.908163 0 2 'bi:visit our' 0.931987 1 17 'bi:create your' 0.934783 0 3 'bi:than years' 0.934783 0 3 "than years" is a peculiar one, eh?! Then original text was ... more than 30 years ago ... and we skipped "30" because it's shorter than 3 characters. So, conclusions for now: + x-use_bigrams is going to bloat your database bigtime. + If you use train-on-everything, and want to try it, no problem. + If you're doing mistake-based training and want to try it, probably best to start over from scratch. + I believe that mistake-based training under this method is likely to be substantially more brittle than mistake-based training under the (still default) unigram-only scheme, because it's even more hapax-driven (synthesizing bigrams creates many more hapaxes). + OTOH, bigrams are better at recognizing the language of advertising. For example, "bi:website at:" is more clearly a "call to action" than either "website" or "at:". From kj4ta at bellsouth.net Tue Dec 16 21:55:10 2003 From: kj4ta at bellsouth.net (Ken Fortier) Date: Tue Dec 16 21:54:35 2003 Subject: [Spambayes] Register Error Message-ID: I just downloaded your program because I heard good news about it from PC World. Unfortunately, when I tried to install it I get this message: "C:\Programs\Spambayes Outlook Addin\spambayes_addin.dll Unable to register DLL/OCX: DLLRegisterServer failed code 0x00000000" Any ideas? I'm using WinXP Home, Outlook 2000. Thanks for your help, Ken Fortier From matt at mondoinfo.com Tue Dec 16 22:03:20 2003 From: matt at mondoinfo.com (Matthew Dixon Cowles) Date: Tue Dec 16 22:04:42 2003 Subject: [Spambayes] More "spam of the future" lately? In-Reply-To: References: <1071607720.02.5775@mint-julep.mondoinfo.com> Message-ID: <1071627456.36.6025@mint-julep.mondoinfo.com> [me] >> I've gotten a few of those lately. I've found that Richard >> Jowsey's idea of retrieving the URLs in messages that score in the >> unsure range and scoring the pages received as a sort of >> synthesized message helps a good bit. [Tim] > Is that checked in as a "standard albeit experimental" option yet? > It should be. It certainly hasn't been by me. The code I'm running is some that I hacked up just to see if it would work. It's not much integrated into SpamBayes; it's a separate module and the code that decides whether or not to run it is at a higher level. If folks would like me to clean it up and upload it to SourceForge, I'd be glad to. Regards, Matt From listsub at wickedgrey.com Tue Dec 16 22:10:54 2003 From: listsub at wickedgrey.com (Eli Stevens (WG.c)) Date: Tue Dec 16 22:11:35 2003 Subject: [Spambayes] .org and .net (was: More "spam of the future" lately?) References: <792DE28E91F6EA42B4663AE761C41C2A01A751C7@cliff.bai.org> Message-ID: <3FDFC93E.5030002@wickedgrey.com> Ryan Malayter wrote: > > One of the most innocent tokens is a .org URL, which I thought were only > supposed to go to not-for-profits. Did somebody fool Verisign or > Register.com during the domain registration process? Or can you get a > .org domain from those fly-by-night registrars, too? My understanding is that the rules were changed regarding .net and .org some while back (several years), due to the difficulty in policing (rather, determining what was a valid .org and what wasn't). I could be wrong on the reason, but register.com says that .org/.net are only "recommended for not-for-profit/internet infrastructure." http://www.register.com/domain-rules.cgi I didn't check Verisign, but this matches what I remember from looking into registering my own domain in early 2000. HTH, Eli PS - I'm new to the list (haven't lurked much) - is there an established protocol for topic changes, off-topic(-ish, like this one) posts, etc? Or am I just being too uptight? ;) Feel free to reply off-list. Thanks. From tim at fourstonesExpressions.com Tue Dec 16 22:19:09 2003 From: tim at fourstonesExpressions.com (Tim Stone) Date: Tue Dec 16 22:19:15 2003 Subject: [Spambayes] .org and .net (was: More "spam of the future" lately?) In-Reply-To: <3FDFC93E.5030002@wickedgrey.com> References: <792DE28E91F6EA42B4663AE761C41C2A01A751C7@cliff.bai.org> <3FDFC93E.5030002@wickedgrey.com> Message-ID: On Tue, 16 Dec 2003 19:10:54 -0800, Eli Stevens (WG.c) wrote: > Ryan Malayter wrote: > >> >> One of the most innocent tokens is a .org URL, which I thought were only >> supposed to go to not-for-profits. Did somebody fool Verisign or >> Register.com during the domain registration process? Or can you get a >> .org domain from those fly-by-night registrars, too? > Yup... anyone can get 'em, all ya gotta do is ask. There is no 'policing' of those domains. -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun From shellyp at compacsort.com Tue Dec 16 23:21:01 2003 From: shellyp at compacsort.com (shelly) Date: Tue Dec 16 23:21:07 2003 Subject: [Spambayes] problem report Message-ID: <000701c3c455$2e59d7d0$ae00a8c0@shelly-electr.hal> Is there a version of spam bayes you can use with outlook 98? If not is there any other good anti spam downloads available? (preferably for free)] Thank you Shelly -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031217/33a0f95d/attachment.html From shellyp at compacsort.com Tue Dec 16 23:21:01 2003 From: shellyp at compacsort.com (shelly) Date: Tue Dec 16 23:21:21 2003 Subject: [Spambayes] problem report Message-ID: <000001c3c455$36813ed0$ae00a8c0@shelly-electr.hal> Is there a version of spam bayes you can use with outlook 98? If not is there any other good anti spam downloads available? (preferably for free)] Thank you Shelly -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031217/e055225c/attachment.html From tameyer at ihug.co.nz Wed Dec 17 01:33:12 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Dec 17 01:33:19 2003 Subject: [Spambayes] More "spam of the future" lately? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13047C0CC8@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A0C@its-xchg4.massey.ac.nz> [Tim] > Is that checked in as a "standard albeit experimental" option yet? > It should be. [Matthew Dixon Cowles] > It certainly hasn't been by me. The code I'm running is some that I > hacked up just to see if it would work. It's not much integrated into > SpamBayes; it's a separate module and the code that decides whether > or not to run it is at a higher level. > > If folks would like me to clean it up and upload it to SourceForge, > I'd be glad to. There's a file urlslurper.py in the testtools directory that runs timtest or timcv (can't recall which) doing this, which I wrote some time back to test this out. The results were fairly indeterminate. I have an integrated version, too, but only locally. I'll check in the integrated version as an experimental option. If you'd like to take a look over it, comparing it to yours, and submit either a patch or comments, that would be fantastic. =Tony Meyer From anthony at interlink.com.au Wed Dec 17 01:44:11 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Wed Dec 17 01:44:28 2003 Subject: [Spambayes] More "spam of the future" lately? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A0C@its-xchg4.massey.ac.nz> Message-ID: <200312170644.hBH6iB7C007980@localhost.localdomain> I'm also getting hit pretty hard with these little sods, and think I have an idea (but, alas, little or no time to look at implementing it). I have a suspicion that we could do something by handling the text inside a HTML pair differently to how we handle other text. Assuming that the little bastards want the users to click on their link, they _have_ to make the text visible, and prominent. One possibility would be to either tokenize them as something like anchor:word, but that doesn't help with the problem of cancelling out the effect of all the random words (some of which are hammy). What about trying something where we use tokens from inside anchors &c before we use other tokens? -- Anthony Baxter It's never too late to have a happy childhood. From tameyer at ihug.co.nz Wed Dec 17 03:06:28 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Dec 17 03:06:36 2003 Subject: [Spambayes] problem report In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13047C0D20@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130467776F@its-xchg4.massey.ac.nz> > Is there a version of spam bayes you can use > with outlook 98? Not an integrated solution like the plug-in, but you can use the POP3 proxy (sb_server) or IMAP filter, as long as you get your mail in one of those ways (i.e. not via Exchange). The website & readme have more information. > If not are there any other good anti spam downloads available? You can also look at the 'related' page on the website, which lists other spam filters similar to spambayes. (I have no idea whether how any of them work with Outlook 98). =Tony Meyer From python-spambayes at discworld.dyndns.org Wed Dec 17 09:19:51 2003 From: python-spambayes at discworld.dyndns.org (Charles Cazabon) Date: Wed Dec 17 09:15:38 2003 Subject: [Spambayes] More "spam of the future" lately? In-Reply-To: <1071615477.83.5976@mint-julep.mondoinfo.com>; from matt@mondoinfo.com on Tue, Dec 16, 2003 at 05:53:12PM -0600 References: <1071615477.83.5976@mint-julep.mondoinfo.com> Message-ID: <20031217081951.D530@discworld.dyndns.org> Matthew Dixon Cowles wrote: > [spidering URLs in unsure messages] > > > My biggest concern here would be verifying my e-mail address to all > > the spammers. By retrieving the content of a link in order to > > classify it, you have performed the same operation as clicking on > > it and have thus triggered any "web bug" attached to that link in > > the form of an identifying query parameter. > > The code I use strips off a question-mark and the CGI parameters that > follow it and an at-sign and the username and password fields that > precede it in an effort to avoid send identifying data. Insufficient: web bug links (not just from spammers) frequently encode the unique identifier in ways other than with a query string: http://host.example.org/dir/script/uniqueid/otherparameter etc. The OP is right; no matter how much you munge the URL, following it is a possible information leak. Charles -- ----------------------------------------------------------------------- Charles Cazabon GPL'ed software available at: http://www.qcc.ca/~charlesc/software/ ----------------------------------------------------------------------- From Ayende at Ayende.com Wed Dec 17 09:20:52 2003 From: Ayende at Ayende.com (Ayende Rahien) Date: Wed Dec 17 09:22:18 2003 Subject: [Spambayes] Spambayes initialization problem. Message-ID: Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: spambayes1.log Type: application/octet-stream Size: 3210 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031217/8f582cd4/spambayes1-0001.obj From skip at pobox.com Wed Dec 17 10:46:37 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Dec 17 10:46:39 2003 Subject: [Spambayes] SpamBayes for 500.000 users In-Reply-To: References: <16351.32890.550695.824501@montanaro.dyndns.org> Message-ID: <16352.31325.84895.242716@montanaro.dyndns.org> Dreas> So does this mean that Bayesian won't be effective for such a Dreas> large user database? Dunno. You'll definitely have to do some testing. It might be sufficient to do one or more of the following: * suppress tokenizing of headers which would generate such a strong user-oriented bias * make it easy for your users to submit mail for inclusion in the database The first is a surmountable problem. The key option to tweak is address_headers in the Tokenizer section. The second will be more difficult. In my limited multi-user experience: * people only submit spam * people (understandably) never submit anything very sensitive Skip From skip at pobox.com Wed Dec 17 11:10:55 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Dec 17 11:10:59 2003 Subject: [Spambayes] How low can you go? In-Reply-To: References: Message-ID: <16352.32783.360628.370482@montanaro.dyndns.org> Tim> Database size (a bsddb3 hash database): Tim> without x-use_bigrams 2,544KB Tim> with x-use_bigrams 10,288KB Tim> That's a major size boost, and (of course) is expected (bigrams Tim> create fat hapaxes at a prodigious rate). I've been experimenting with the bigram stuff and like it so far. I also have some mods to the DBDictClassifier stuff which add timestamps (last set, last used) to the database. There's some interaction between the two which keeps me from using the two together. It may be worthwhile considering a last used timestamp to control the number of unused (or rarely used) tokens. The first thing I did was retrain and then score my then current unsure mailbox. Out of about 40 messages it scored over half of them as spam with bigrams enabled. I then took my entire training database (around 140 spams and 100 hams) and tossed them into my unsure mailbox. Using that now much bigger mailbox (about 280 messages), I then started with a fresh round of unsure+mistake based training. I got to roughly the same performance as without bigrams using a much smaller set of training messages. I'm currently at 97 spams and 64 hams. I'm still getting a fair number of unsures, but the false positive rate doesn't seem horrible (I've seen a few, but haven't been counting). Tim> + I believe that mistake-based training under this method is likely Tim> to be substantially more brittle than mistake-based training Tim> under the (still default) unigram-only scheme, because it's even Tim> more hapax-driven (synthesizing bigrams creates many more Tim> hapaxes). As I was training, I noticed some wild fluctuations in scores with bigrams enabled, especially with small databases. Skip From bob at 1776.com Wed Dec 17 11:07:01 2003 From: bob at 1776.com (Robert K. Coe) Date: Wed Dec 17 11:12:54 2003 Subject: [Spambayes] RE: Is this a sign of future problems ? In-Reply-To: Message-ID: <009a01c3c4b7$cf60c230$6601a8c0@CambridgeMA.gov> > From: Tim Peters [mailto:tim.one@comcast.net] > Sent: Tuesday, December 16, 2003 5:30 PM > To: Russ Foster > Cc: spambayes@python.org > Subject: RE: [Spambayes] Is this a sign of future problems ? > > > [Russ Foster] > > How does it validate an email address if it doesn't contain any HTML? > > The spammer can simply record whether the email is rejected at SMTP time, > with a "no such user" kind of response. Keeping "rejected because it looks > like spam" rejections out of the mix is helpful for that purpose. You bring up a good point. My home domain has a catchy name, so spammers frequently use it in forged headers. So I get a lot of bounce messages from recipient sites that don't recognize the forgery. I've noticed that some large ISP's (AOL, for one) seem to enjoy taunting the spammer with the exact reason they didn't deliver his message. Aside from the fact that it's usually not the spammer who sees the bounce messages, maybe that's not a very good idea. Bob From bob at 1776.com Wed Dec 17 10:27:43 2003 From: bob at 1776.com (Robert K. Coe) Date: Wed Dec 17 11:13:00 2003 Subject: [Spambayes] RE: More "spam of the future" lately? In-Reply-To: <20031216175525.ZFSW38.out012.verizon.net@titan> Message-ID: <009701c3c4b2$519248b0$6601a8c0@CambridgeMA.gov> Do SOTF messages have a method of forcing the recipient to follow the link? If not, it may be less important to knock down SOTF than SOTP (spam of the present). All spam is annoying and wastes people's time, but the reason the problem has gone critical at some facilities is the fear that women (and men?) at those facilities will claim that failure to filter really raunchy spam constitutes sexual harrassment. (Some, myself included, consider that notion preposterous, but there it is.) An SOTF message, containing nothing but gibberish and a link you have to explicitly follow, should constitute a lot less grounds for a sexual harrassment claim. Bob > -----Original Message----- > From: Intrope [mailto:st.intrope@verizon.net] > Sent: Tuesday, December 16, 2003 12:55 PM > To: spambayes@Python.org > Subject: [Spambayes] More "spam of the future" lately? > > > I noticed that too; sampling my spam folder, essentially all of my recent > spams are SOTF. SpamBayes seems to take a long time to recognize these, but > now days at least 90% of SOTF type messages are being filtered out. > > I guess what I'm saying is, be patient and train--it'll block more and more > as time goes on. I wasn't paying close attention, but I'm guessing that it > took several weeks (at least) to train SpamBayes on the SOTF; at one point > I'd see 10 SOTF a day that got through the filters, but in the last few days > I don't think I've seen any get through. > > Go SpamBayes Go! > -Jon From bob at 1776.com Wed Dec 17 10:52:19 2003 From: bob at 1776.com (Robert K. Coe) Date: Wed Dec 17 11:13:11 2003 Subject: [Spambayes] RE: More "spam of the future" lately? In-Reply-To: Message-ID: <009901c3c4b5$c16d5b40$6601a8c0@CambridgeMA.gov> > From: Kenny Pitt [mailto:kennypitt@hotmail.com] > Sent: Tuesday, December 16, 2003 5:33 PM > To: 'Tim Peters'; 'Matthew Dixon Cowles' > Cc: spambayes@python.org > Subject: RE: [Spambayes] More "spam of the future" lately? > > > My biggest concern here would be verifying my e-mail address to all the > spammers. By retrieving the content of a link in order to classify it, > you have performed the same operation as clicking on it and have thus > triggered any "web bug" attached to that link in the form of an > identifying query parameter. If you don't funnel the virtual click through a commercial browser, you ought to be able to keep your email address out of it. But you're gonna reveal your IP, so you'd better make sure your anti-spyware utility is up to date. ;^) Bob From bob at 1776.com Wed Dec 17 10:38:30 2003 From: bob at 1776.com (Robert K. Coe) Date: Wed Dec 17 11:13:17 2003 Subject: [Spambayes] RE: More "spam of the future" lately? In-Reply-To: Message-ID: <009801c3c4b3$d333c000$6601a8c0@CambridgeMA.gov> > From: Tim Peters [mailto:tim.one@comcast.net] > Sent: Tuesday, December 16, 2003 4:39 PM > To: Ryan Malayter > Cc: spambayes@Python.org > Subject: RE: [Spambayes] More "spam of the future" lately? > > > I've been seeing a fair number of these lately too, and some end up Unsure > for me. I don't care unless it persists, though: any spammer who thinks > his goal is evading filters is a spammer who won't stay in business long. > His real goal has to be selling product, and I expect that including random > sentences has to decrease response rate significantly. I look at one of > these and can't imagine being tempted to respond, because the author appears > illiterate and incompetent simply *because* the message contains so much > nonsense. Why would I buy anything from someone that lame? > > Spam goes thru fads, like everything else, and I *expect* this one to > self-destruct due to its own ineffectiveness at selling product. I could be > wrong, of course, but I'm probably not . No, I suspect you probably are. The product a lot of them are selling isn't aimed at people who care whether the seller is literate or competent. You wouldn't buy anything from someone that lame, but you're not the target. You're just an innocent bystander watching suckers get fleeced. Bob From tim at fourstonesExpressions.com Wed Dec 17 11:18:49 2003 From: tim at fourstonesExpressions.com (Tim Stone) Date: Wed Dec 17 11:18:56 2003 Subject: [Spambayes] How low can you go? In-Reply-To: <16352.32783.360628.370482@montanaro.dyndns.org> References: <16352.32783.360628.370482@montanaro.dyndns.org> Message-ID: On Wed, 17 Dec 2003 10:10:55 -0600, Skip Montanaro wrote: > I've been experimenting with the bigram stuff and like it so far. I also > have some mods to the DBDictClassifier stuff which add timestamps (last > set, > last used) to the database. There's some interaction between the two > which > keeps me from using the two together. It may be worthwhile considering a > last used timestamp to control the number of unused (or rarely used) > tokens. iirc, there was quite a bit of discussion about aging mechanisms quite a few months ago. It seemed like most everyone agreed that it was a good idea, but nobody wanted to implement it for database size considerations. It still seems like a good idea... -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun From python-spambayes at discworld.dyndns.org Wed Dec 17 11:23:57 2003 From: python-spambayes at discworld.dyndns.org (Charles Cazabon) Date: Wed Dec 17 11:19:46 2003 Subject: [Spambayes] RE: More "spam of the future" lately? In-Reply-To: <009901c3c4b5$c16d5b40$6601a8c0@CambridgeMA.gov>; from bob@1776.com on Wed, Dec 17, 2003 at 10:52:19AM -0500 References: <009901c3c4b5$c16d5b40$6601a8c0@CambridgeMA.gov> Message-ID: <20031217102357.A1550@discworld.dyndns.org> Robert K. Coe wrote: > > > > My biggest concern here would be verifying my e-mail address to all the > > spammers. By retrieving the content of a link in order to classify it, > > you have performed the same operation as clicking on it and have thus > > triggered any "web bug" attached to that link in the form of an > > identifying query parameter. > > If you don't funnel the virtual click through a commercial browser, you > ought to be able to keep your email address out of it. Not true: it doesn't matter software you use to request http://bob=1776:com.example.org/ ; because of a wildcard RR this will resolve and I can tell from the Host: header you submit exactly what address responded to my spam. Spammers already do this and other similar tricks. Charles -- ----------------------------------------------------------------------- Charles Cazabon GPL'ed software available at: http://www.qcc.ca/~charlesc/software/ ----------------------------------------------------------------------- From TiagoTiago at Globo.com Wed Dec 17 11:28:59 2003 From: TiagoTiago at Globo.com (Tiago Estill de Noronha) Date: Wed Dec 17 11:26:42 2003 Subject: RES: [Spambayes] RE: solution for the "spam of the future"? In-Reply-To: Message-ID: <003e01c3c4ba$e046bfc0$2960b7c8@virtua.com.br> Using the idea form kenny, I came with the following: U would have a slider( or value or whatever on the non plugin ver) the would set the weight of the new word metatoken The slider would control how much points would be added for each 1% of new word on the email Or u could have spambayes to set the value for it self, learning as u traing it, it would go like this: It would get the average percentage of new words on your ham mail, and the average on your spam mail, >From that it would get the average of both values, and would interpolate the percentage in so that from 0% to the treshold percentage the points to the metatoken would be from 0 to .5, and the percentages from the average to 100% would go from .5 to 1 I think the formula would be something like this: Code ========== If msgnewwordpercent < averagepercent then newwordsmetatokenpoints = .5 / hamnewordsaverage* msgnewwordpercent Else newwordsmetatokenpoints = ( .5 / (100 - averagepercent)* (100 - msgnewwordpercent)+ .5 End if ==== End of the code Sorry that it is in basic, it is the only programming language I know enough to write something simple without consulting any books or help files or tutorials But I think it is easy to understand what it is meant to do ********************* Tiago Estill de Noronha TiagoTiago@Globo.com -=> -----Mensagem original----- -=> De: spambayes-bounces@python.org -=> [mailto:spambayes-bounces@python.org] Em nome de Kenny Pitt -=> Enviada em: ter?a-feira, 16 de dezembro de 2003 19:27 -=> Para: 'Coe, Bob'; spambayes@Python.org -=> Assunto: RE: [Spambayes] RE: solution for the "spam of the future"? -=> -=> -=> Coe, Bob wrote: -=> > Don't start generating the "Missing: N" token until the -=> database is -=> > large enough for it to make sense. -=> -=> If this works at all, it also seems like the *percentage* -=> of unknown word tokens in the message would work better -=> than a log()'d count. A very large newsletter is pretty -=> much guaranteed to have a higher *count* of unknown tokens -=> than a short mailing list message, but that's because it -=> has more total tokens and not because it's any spammier. -=> -=> -- -=> Kenny Pitt -=> -=> -=> _______________________________________________ -=> Spambayes@python.org -=> -=> http://mail.python.org/mailman/listinfo/spambaye-=> s -=> Check the -=> -=> FAQ before asking: -=> http://spambayes.sf.net/faq.html -=> -=> --- -=> Incoming mail is certified Virus Free. -=> Checked by AVG anti-virus system (http://www.grisoft.com). -=> Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 -=> -=> --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 From tdickenson at devmail.geminidataloggers.co.uk Wed Dec 17 11:30:12 2003 From: tdickenson at devmail.geminidataloggers.co.uk (Toby Dickenson) Date: Wed Dec 17 11:30:16 2003 Subject: [Spambayes] Is this a sign of future problems ? In-Reply-To: References: Message-ID: <200312171630.12706.tdickenson@devmail.geminidataloggers.co.uk> On Tuesday 16 December 2003 21:53, Tim Peters wrote: > It doesn't matter one whit to me, e.g., whether "bedimmed" gets treated as > hammy or spammy in my database, because I'll never see it in real life. Maybe thats what they are counting on..... This message is not obvious spam, so there is a fair chance that it will get into some users ham training set. It certainly would for me, with a train-on-everything regime. That spammer could then insert those same words into a subsequent spam, with a slightly better chance of it getting through. Maybe.... -- Toby Dickenson From tim.one at comcast.net Wed Dec 17 11:42:50 2003 From: tim.one at comcast.net (Tim Peters) Date: Wed Dec 17 11:42:51 2003 Subject: [Spambayes] Is this a sign of future problems ? In-Reply-To: <200312171630.12706.tdickenson@devmail.geminidataloggers.co.uk> Message-ID: [Tim Peters] >> It doesn't matter one whit to me, e.g., whether "bedimmed" gets >> treated as hammy or spammy in my database, because I'll never see it >> in real life. [Toby Dickenson] > Maybe thats what they are counting on..... > > This message is not obvious spam, so there is a fair chance that it > will get into some users ham training set. It certainly would for me, > with a train-on-everything regime. That spammer could then insert > those same words into a subsequent spam, with a slightly better chance > of it getting through. Sorry, I don't follow this, unless you're presuming a TOE regime so careless that the user doesn't even correct classification mistakes. If a user trains on spam messages as ham, then, sure, all sorts of horrid results can follow. If I get a message like (this is easy, cuz I just got one exactly like this ): cake paleozoic immortal couscous devon advocacy agriculture arbitrage couple census deceive psi dana cremate ceremony physiotherapist haunch commissary transpacific frigid bryophyta Free CableTV!No more pay!& stegosaurus bowfin egret throughout damsel wilful cometary dreamt minsk throughput chastity [... on & on & on ...] and train on it as ham, there's something wrong with *me*. From skip at pobox.com Wed Dec 17 11:45:30 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Dec 17 11:45:36 2003 Subject: [Spambayes] How low can you go? In-Reply-To: References: <16352.32783.360628.370482@montanaro.dyndns.org> Message-ID: <16352.34858.121487.578149@montanaro.dyndns.org> Tim> iirc, there was quite a bit of discussion about aging mechanisms Tim> quite a few months ago. It seemed like most everyone agreed that Tim> it was a good idea, but nobody wanted to implement it for database Tim> size considerations. It still seems like a good idea... Size definitely does matter. With both bigrams and my set/used timestamps (datetime objects), the size of the database ballooned. I think the set timestamp could be dispensed with and the last used timestamp converted to something smaller, like a YYYYMMDD string. Skip From cej at intech.com Wed Dec 17 11:54:17 2003 From: cej at intech.com (Christopher Jastram) Date: Wed Dec 17 11:53:43 2003 Subject: [Spambayes] SpamBayes for 500.000 users In-Reply-To: <16351.32890.550695.824501@montanaro.dyndns.org> References: <3FDF7D8E.80005@intech.com> <16351.32890.550695.824501@montanaro.dyndns.org> Message-ID: <3FE08A39.8050504@intech.com> Skip Montanaro wrote: > Chris> But I can give you some first-hand knowledge from a much smaller > Chris> user base. I'm setting the same thing up for an office of 5 > Chris> people, and here's the bare-bones fact; I need a separate > Chris> database for each user. I've tried using one database for > Chris> everyone, and it does work. But it only catches about 30-40 > Chris> percent of spam. Not sure why this is the case, but it is > Chris> (unbalanced training?). > >Does your shared database draw fairly equally on mail sent to all five >people? If not, you may find that some of the clues in the header will >"poison" your database. Tim discovered this effect in spades during early >testing. I believe one of the larger spam databases he used initially were >all sent to one person. The recipient-oriented clues related to that user >poisoned his tests. > >Skip > > > Nope. Not at all. My script scans messages stored in "Junk" or "Spam," grabs an equal number of messages in non-Inbox/Trash/Outbox/Spam/Junk folders as ham, and trains on everyone. Quite clumsy. Chris From cej at intech.com Wed Dec 17 12:04:44 2003 From: cej at intech.com (Christopher Jastram) Date: Wed Dec 17 12:06:45 2003 Subject: [Spambayes] SpamBayes for 500.000 users In-Reply-To: <16352.31325.84895.242716@montanaro.dyndns.org> References: <16351.32890.550695.824501@montanaro.dyndns.org> <16352.31325.84895.242716@montanaro.dyndns.org> Message-ID: <3FE08CAC.5000708@intech.com> Skip Montanaro wrote: > * people only submit spam > > * people (understandably) never submit anything very sensitive > >Skip > > > I was thinking of hacking the web interface and/or a mail interface into server-side user-specific databases that get modified in realtime. For example, forward spam to spam@mydomain, forward ham to ham@mydomain, it'll check the sender to see whose database gets modified. Every night, it'll take a sampling from all the database to store in a global DB for users who have never forwarded anything to spam@ or ham@. Another idea: reject training of spam without equivalent training of ham (by percentage, so high-volume sufferers can deviate from norm by 500 messages...). These were my ideas. Has anyone started prototype work in this direction? Chris From tim.one at comcast.net Wed Dec 17 12:39:54 2003 From: tim.one at comcast.net (Tim Peters) Date: Wed Dec 17 12:39:56 2003 Subject: [Spambayes] How low can you go? In-Reply-To: <16352.34858.121487.578149@montanaro.dyndns.org> Message-ID: [Skip Montanaro] > Size definitely does matter. With both bigrams and my set/used > timestamps (datetime objects), the size of the database ballooned. I > think the set timestamp could be dispensed with and the last used > timestamp converted to something smaller, like a YYYYMMDD string. A small integer should be enough for last-used, like the number of days between the day the database was first created and the day a feature was most recently used in scoring. That's easily computed, easy to use *in* computations, and consumes no more than 3 bytes in a binary pickle (proto 1 or proto 2) until about 180 years after the database was created . Especially with the bigram scheme-- which creates a relatively enormous number of hapaxes --I expect the best use for a per-feature "last used" timestamp is to expire hapaxes that haven't been used in scoring for N days. That should yield major size savings, actually increase resistance to "spectacular failures" (which so far most often seem to be associated with hitting a large number of old hapaxes from "the other" category), and *probably* not hurt anything else. Expiring "near hapaxes" too gets dicier, and more so the more liberal the conception of "near". From Ayende at Ayende.com Wed Dec 17 12:42:56 2003 From: Ayende at Ayende.com (Ayende Rahien) Date: Wed Dec 17 12:44:32 2003 Subject: [Spambayes] Spambayes initialization error Message-ID: Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: spambayes2.log Type: application/octet-stream Size: 3366 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031217/855cce32/spambayes2.obj From kennypitt at hotmail.com Wed Dec 17 13:19:57 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Dec 17 13:20:37 2003 Subject: [Spambayes] Spambayes initialization error In-Reply-To: Message-ID: Ayende Rahien wrote: > There was an error initializing the Spam plugin. > > Spam filtering has been disabled. Please re-configure and re-enable this plugin. > > Error details: > Could not watch the spesified folders > > I'm getting the above message on Windows XP & Outlook 2003. It looks like you're running Outlook 2003 on an account without Administrator rights. This is a known problem that should be fixed in the next release. The problem is that the add-in is built for Outlook 2000, and in order to run with Outlook 2003 it will automatically update some files in the installation directory. If you do not have Administrator rights then these file updates will fail and SpamBayes cannot run. The temporary work-around is to load SpamBayes once from an account that has Administrator rights. This should update the necessary files in the installation directory, and from then on SpamBayes should work fine from any user account. -- Kenny Pitt From nobody at spamcop.net Wed Dec 17 13:21:08 2003 From: nobody at spamcop.net (Seth Goodman) Date: Wed Dec 17 13:21:14 2003 Subject: [Spambayes] How low can you go? In-Reply-To: <16352.34858.121487.578149@montanaro.dyndns.org> Message-ID: [Tim Stone] > Tim> iirc, there was quite a bit of discussion about aging mechanisms > Tim> quite a few months ago. It seemed like most everyone agreed that > Tim> it was a good idea, but nobody wanted to implement it > for database > Tim> size considerations. It still seems like a good idea... > > [Skip Montanaro] > Size definitely does matter. With both bigrams and my set/used > timestamps (datetime objects), the size of the database > ballooned. I think > the set timestamp could be dispensed with and the last used timestamp > converted to something smaller, like a YYYYMMDD string. I know this is a developer conversation, so I hope you don't mind if I offer my two cents. And I definitely agree that size matters, at least for databases. I have seen a lot of references, not just in this thread, to ageing out individual tokens. For a probability calculation in which one of the variables is the number of messages of a given class that a token appears in, it seems dangerous to remove only some tokens from a message and not adjust the message count. Here's my problem with it: all tokens from a trained message *could* conceivably age out individually, but the trained message count for the appropriate category would not change. This would result in a wrong probabilities for *all* other tokens, since the database is the same state as before the message was trained but the trained message count is now wrong. It is even harder to conceive what the trained message count should be if you only remove some of the tokens from a message. Using a token ageing scheme, the trained message counts would monotonically rise until you started over, despite removing plenty of tokens over time. I do understand that most of the aged out tokens would be oddball hapaxes, but not all of them will be. Though I often hear "intuition is a poor guide", I would propose ageing out whole messages rather than tokens. This at least maintains the integrity of your basic probability calculation. It also has the advantage of enforcing balanced (or unbalanced in a particular way) training set size. This would require adding all the tokens from a trained message to the message database and the message entry would be timestamped rather than the individual tokens. When a message got too old, all it's tokens would have their counts decremented and the trained message count for that message class would also be decremented. I would propose going one step further to give the train on everything approach some additional "memory" for atypical messages (of either type) that don't occur regularly enough to always be in a fixed-size database. This might give it some of the advantages of the train on exceptions schemes, perhaps with less of the "brittle" behavior others have noted and I have seen as well. One possible mechanism to do this is as follows: 1) If the database message count is at maximum, untrain the oldest message. 2) Score the new message to be trained. 3) Move the new training message timestamp into the future by an amount related to it's "distance" from a perfect score for that message type. More atypical messages that classify poorly would be timestamped further into the future and would thus stick around longer than ones that classify perfectly. The ones that classify perfectly would have their tokens replaced sooner, which should be no great loss. With train on everything, there should be lots of messages that classify very well to take their place. There could be a scaling constant that sets the maximum amount of extra time that an unusual message remains in the database. This determines how long the database "memory" is, along with the maximum message count and the number of messages that you train per day (depends on your training scheme). The goal of this is to allow train on everything, keep moderate database sizes and still have a long enough memory for atypical messages that are infrequent. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From wsy at merl.com Wed Dec 17 13:44:48 2003 From: wsy at merl.com (Bill Yerazunis) Date: Wed Dec 17 13:44:57 2003 Subject: [Spambayes] How low can you go? In-Reply-To: References: Message-ID: <200312171844.hBHIimx07769@localhost.localdomain> From: "Seth Goodman" [... re aging out tokens ...] Here's a particularly cute solution I implemented in CRM114. The problem is that if you choose to store a token's last-seen date, you will likely consume almost as much space in the storage of the date as you will in the token count or the token hash. But most tokens are hapaxes anyway. They have very low value, and you probably will _never_ see them again. So, when you need to clean up the database a little, go through and decrement the "seen" count on a few (very few!) tokens Choose the tokens to decrement randomly. REALLY randomly. Don't pick one chain that's too long and decrement every element in it. Decrement only every sixteenth one, or only the ones that have values that, when added to the system clock, have a hash with the low order byte == 0x00, or something like that. Sure, you're losing information- but that's a necessary consequence of forgetting tokens. The net result is very fast and has an acceptable level of damage to accuracy. Tests show that, at least for CRM114 which is HEAVILY hapax-oriented, that the damage does not increase the error rate until you get into obscenely small databases (i.e. less than 100K slots). Anyway, this is how is implemented in CRM114, and it seems to work acceptably well. -Bill Yerazunis From jm at jmason.org Wed Dec 17 13:59:02 2003 From: jm at jmason.org (Justin Mason) Date: Wed Dec 17 13:59:19 2003 Subject: [spambayes-dev] RE: [Spambayes] How low can you go? In-Reply-To: Message-ID: <20031217185904.1E1F217076@jmason.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Tim Peters writes: > [Skip Montanaro] > > Size definitely does matter. With both bigrams and my set/used > > timestamps (datetime objects), the size of the database ballooned. I > > think the set timestamp could be dispensed with and the last used > > timestamp converted to something smaller, like a YYYYMMDD string. > > A small integer should be enough for last-used, like the number of days > between the day the database was first created and the day a feature was > most recently used in scoring. That's easily computed, easy to use *in* > computations, and consumes no more than 3 bytes in a binary pickle (proto 1 > or proto 2) until about 180 years after the database was created . FWIW -- in SpamAssassin, we used to use an approximate scheme that fit the remaining UNIX epoch into 2 bytes something like you're suggesting (by dividing time_t by several hours and starting the current epoch from 1 Jan 2000, or something like that). However we found that we ran into expiry problems for large dbs and busy sites, because that just didn't give us enough precision -- having a granularity of hours wasn't good enough. so SpamAssassin db version 2 now just uses a plain old long containing a time_t value, and damn the db bloat. A bit bigger, but expiry now works reliably ;) However a good way we found to cut down hapax db bloat was to use a polymorphic format for the tokens in the db; if a token has spamcount < 8 and hamcount < 8, it's marshalled so that the spamcount and hamcount are both shoved into 1 byte as a bitmask, with the high bits set. Here's the perl code in question: sub tok_pack { my ($self, $ts, $th, $atime) = @_; $ts ||= 0; $th ||= 0; $atime ||= 0; if ($ts < 8 && $th < 8) { return pack ("CV", ONE_BYTE_FORMAT | ($ts << 3) | $th, $atime); } else { return pack ("CVVV", TWO_LONGS_FORMAT, $ts, $th, $atime); } } I do like Bill Y's "sunspots expiry" scheme though ;) - --j. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) Comment: Exmh CVS iD8DBQE/4Kd2QTcbUG5Y7woRAh/DAKC6MGlXpd1bEeR2/BzTmhtH71075ACgg21j pJ85tiGe697R3s90bP/LRS4= =slib -----END PGP SIGNATURE----- From nobody at spamcop.net Wed Dec 17 14:00:45 2003 From: nobody at spamcop.net (Seth Goodman) Date: Wed Dec 17 14:00:53 2003 Subject: [Spambayes] How low can you go? In-Reply-To: <200312171844.hBHIimx07769@localhost.localdomain> Message-ID: [Bill Yerazunis] > Here's a particularly cute solution I implemented in CRM114. ---------snip---------------- > Choose the tokens to decrement randomly. REALLY randomly. Don't Does CRM114 use the number of trained ham and trained spam *messages* as variables in its probability calculation? If not, then you wouldn't expect that deleting infrequently used tokens would do much damage. AFAIK, SpamBayes uses the trained message counts in the probability calculation and those becomes inaccurate if you delete individual tokens. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From mnitabach at acedsl.com Wed Dec 17 15:11:13 2003 From: mnitabach at acedsl.com (Michael N. Nitabach) Date: Wed Dec 17 15:16:29 2003 Subject: [Spambayes] More "spam of the future" lately? In-Reply-To: Message-ID: Original Message: > Date: Tue, 16 Dec 2003 17:24:52 -0500 > From: "Tim Peters" > [Matthew Dixon Cowles] > > I haven't looked at the numbers carefully but since I get > > few hams in > > the unsure range, I could probably do just as well by > > setting my spam > > cutoff to 0.5 or so. > > 0.7 maybe, but you'd eventually regret dropping it to 0.5. What makes you say that? I have my certain-spam cutoff at .30, and my uncertain at .01. My training database has about 8000 hams and 3000 spams. I have only ever received ten hams that scored over .01, and only one over .20. Michael N. Nitabach, Ph.D., J.D. Assistant Professor Department of Cellular and Molecular Physiology Yale University School of Medicine (203) 737-2939 mnitabach@acedsl.com From tim.one at comcast.net Wed Dec 17 15:42:13 2003 From: tim.one at comcast.net (Tim Peters) Date: Wed Dec 17 15:42:14 2003 Subject: [Spambayes] More "spam of the future" lately? In-Reply-To: Message-ID: >> 0.7 maybe, but you'd eventually regret dropping [spam_cutoff] to 0.5. [Michael N. Nitabach] > What makes you say that? I have my certain-spam cutoff at .30, and > my uncertain at .01. My training database has about 8000 hams and > 3000 spams. I have only ever received ten hams that scored over > .01, and only one over .20. Unless you've eyeballed every message scored as spam, then it's almost certain you've suffered false positives due to those settings. There's more info on the project's background page: http://spambayes.sourceforge.net/background.html Note especially the third graph. The way spamprobs are combined in SpamBayes guarantees that a highly ambiguous message will score very near 0.5 (explained in more detail before the third graph, and much more at http://www.linuxjournal.com/article.php?sid=6467 ). The kinds of email people get vary widely, though, and it's possible your mix is extremely well-suited to this classifier, devoid of any significant ambiguity. (I'll note that if you use your SpamBayes'd email only for professional purposes, and no personal ones (like chatting with friends and relatives), it doesn't strain my imagination that your ham could be *so* uniform that ambiguity doesn't arise -- but then your email mix would be atypical too.) From mnitabach at acedsl.com Wed Dec 17 16:00:39 2003 From: mnitabach at acedsl.com (Michael N. Nitabach) Date: Wed Dec 17 16:02:21 2003 Subject: [Spambayes] More "spam of the future" lately? In-Reply-To: Message-ID: > -----Original Message----- > From: Tim Peters [mailto:tim.one@comcast.net] > Sent: Wednesday, December 17, 2003 3:42 PM > To: Michael N. Nitabach; spambayes@python.org > Subject: RE: [Spambayes] More "spam of the future" lately? > > > >> 0.7 maybe, but you'd eventually regret dropping > [spam_cutoff] to 0.5. > > [Michael N. Nitabach] > > What makes you say that? I have my certain-spam cutoff at .30, and > > my uncertain at .01. My training database has about 8000 hams and > > 3000 spams. I have only ever received ten hams that scored over > > .01, and only one over .20. > > Unless you've eyeballed every message scored as spam, then it's almost > certain you've suffered false positives due to those > settings. I just looked in my certain-spam folder at all e-mails that scored below 0.70. Only a single one was a false positive: a SpamBayes mailing list digest that contained a complete actual spam e-mail that someone had posted, which scored 0.49. > There's more > info on the project's background page: > > http://spambayes.sourceforge.net/background.html > > Note especially the third graph. The way spamprobs are combined in > SpamBayes guarantees that a highly ambiguous message will > score very near > 0.5 (explained in more detail before the third graph, and much more at > > http://www.linuxjournal.com/article.php?sid=6467 > > ). I receive a substantial amount of e-mail that scores between 0.30 and 0.70, but so far it has *all* been spam. > The kinds of email people get vary widely, though, and it's > possible your > mix is extremely well-suited to this classifier, devoid of > any significant > ambiguity. Well, the interesting thing is that a lot of my spam is relatively technical sales-pitch e-mail that is talking about the same sorts of things that I talk about in my ham professional e-mails. > (I'll note that if you use your SpamBayes'd email only for > professional purposes, and no personal ones (like chatting > with friends and > relatives), it doesn't strain my imagination that your ham > could be *so* > uniform that ambiguity doesn't arise -- but then your email > mix would be > atypical too.) No, I use it for equal parts professional and personal correspondence. Michael N. Nitabach, Ph.D., J.D. Assistant Professor Department of Cellular and Molecular Physiology Yale University School of Medicine (203) 737-2939 mnitabach@acedsl.com From tim.one at comcast.net Wed Dec 17 18:22:25 2003 From: tim.one at comcast.net (Tim Peters) Date: Wed Dec 17 18:22:26 2003 Subject: [Spambayes] RE: More "spam of the future" lately? In-Reply-To: <009801c3c4b3$d333c000$6601a8c0@CambridgeMA.gov> Message-ID: [Tim] >> Spam goes thru fads, like everything else, and I *expect* this one to >> self-destruct due to its own ineffectiveness at selling product. I >> could be wrong, of course, but I'm probably not . [Robert K. Coe] > No, I suspect you probably are. The product a lot of them are > selling isn't aimed at people who care whether the seller is > literate or competent. Nobody sends money to anyone without the expectation of getting something back. If the seller appears to be wholly disreputable, the only response the seller can hope to get is from the desperately naive. Sales is always a percentage game, the naive who get fleeced aren't all identically naive, and cutting response rate always hurts. > You wouldn't buy anything from someone that lame, but you're not > the target. You're just an innocent bystander watching suckers get > fleeced. I understand that people respond to spam. No problem, and so long as a spam campaign turns a profit, it persists. But spam campaigns-- unlike commercials for Toyota and MacDonalds --*do* go away: the ones that stop bringing in more than they cost self-destruct naturally. Putting random crap in the initial sales message is akin to a real-life con man approaching you with vomit dribbling down his chin and reeking of stale whiskey, whispering that he's the CEO of a major company going incognito, and offering to sell you a hot insider stock tip. Some people will fall for it, but not many. As in any other field, successful spammers throw lots of crap at the wall just to see what sticks. Most campaigns are brief because they don't stick, and I have yet to see a "let's target dribbling morons" campaign persist. Well, OK, state lotteries do pretty well, but I doubt that seeing random words from a dictionary carries an addictive thrill for anyone . From nobody at spamcop.net Wed Dec 17 18:23:48 2003 From: nobody at spamcop.net (Seth Goodman) Date: Wed Dec 17 18:23:51 2003 Subject: [Spambayes] How low can you go? In-Reply-To: Message-ID: An interesting aside to the message ageing proposal I made is that it would help fight what is being discussed in the "Spam of the Future" threads. It would do this by keeping the token databases current with the message stream so that it would adapt as quickly as possible to the extraneous words used and then retire them after a time. Another implementation suggestion for using an approach like this with a train-on-everything scheme is to only train *after* the user has verified all the classifications. If we allow it to classify on-the-fly and it makes a mistake, a whole bunch of mistakes will likely follow. It's probably better to allow the classifier to do the best it can do in it's present form, then after moving any mis-classified messages into their appropriate folders, do an incremental training on all emails in a given list of folders. This will only train messages which are previously untrained, at least in the Outlook plug-in version. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From tim at fourstonesExpressions.com Wed Dec 17 18:34:41 2003 From: tim at fourstonesExpressions.com (Tim Stone) Date: Wed Dec 17 18:34:48 2003 Subject: [Spambayes] RE: More "spam of the future" lately? In-Reply-To: References: Message-ID: On Wed, 17 Dec 2003 18:22:25 -0500, Tim Peters wrote: Putting random > crap in the initial sales message is akin to a real-life con man > approaching > you with vomit dribbling down his chin and reeking of stale whiskey, > whispering that he's the CEO of a major company going incognito, and > offering to sell you a hot insider stock tip. Some people will fall for > it, > but not many. Good one . Let's augment url's in unsure messages with "go ahead, click here if you're a sucker" -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun From tim.one at comcast.net Wed Dec 17 19:13:58 2003 From: tim.one at comcast.net (Tim Peters) Date: Wed Dec 17 19:13:59 2003 Subject: [Spambayes] How low can you go? In-Reply-To: Message-ID: [Seth Goodman] > Does CRM114 use the number of trained ham and trained spam *messages* > as variables in its probability calculation? If not, then you > wouldn't expect that deleting infrequently used tokens would do much > damage. AFAIK, SpamBayes uses the trained message counts in the > probability calculation Yes. > and those becomes inaccurate if you delete individual tokens. No, it doesn't matter if that's *all* you do. Say I've trained on 243 ham, and 257 spam, total, and throw out the hapax 'bi:choose the'. That has no effect on that the features I didn't throw out still came from training on 243 ham and 257 spam, total. The problem comes when untraining a message M. That reduces the count of total messages trained on, but if I threw away a hapax H from M previously, and H reappeared again later, it would be a mistake to reduce the category count on H during untraining M. There's another bullet we haven't bitten yet, saving a map of message id to an explicit list of all tokens produced by that message (Skip wants the inverse of that mapping for diagnostic purposes too). Given that, training and untraining of individual messages could proceed smoothly despite intervening changes in tokenization details; expiring entire messages would be straightforward; and when expiring an individual feature, it would be enough to remove that feature from each msg->[feature] list it's in (then untraining on a msg later wouldn't *try* to decrement the per-feature count of any feature that had previously been expired individually and appeared in the msg at the time). That's all easy enough to do, but the database grows ever bigger. It would probably need reworking to start using "feature ids" (little integers) too, so that relatively big strings didn't have to get duplicated all over the database. From nobody at spamcop.net Wed Dec 17 20:41:30 2003 From: nobody at spamcop.net (Seth Goodman) Date: Wed Dec 17 20:41:32 2003 Subject: [Spambayes] How low can you go? In-Reply-To: Message-ID: [Tim Peters] > No, it doesn't matter if that's *all* you do. Say I've trained > on 243 ham, > and 257 spam, total, and throw out the hapax 'bi:choose the'. That has no > effect on that the features I didn't throw out still came from training on > 243 ham and 257 spam, total. OK, but there are still a couple of potential problems. 1) Let's say the discarded bi-gram occurs in a spam at a later date. Though it was only a hapax, it now contributes nothing. 2) Let's say we want to train on a spam with the discarded bi-gram. It was originally a hapax, so it should now have an occurrence count of two. After training, it again shows up as a hapax. This is a more significant problem. 3) Do we eventually reduce the occurrence count of a non-hapax token? If we do, we could eventually have none of the tokens from a trained message present but its message count will still be there. Unless we implement your token cross-reference as explained below, the message counts will eventually not be correct if we expire enough tokens. If we don't expire a lot of tokens over the long run, why bother? > > The problem comes when untraining a message M. That reduces the count of > total messages trained on, but if I threw away a hapax H from M > previously, > and H reappeared again later, it would be a mistake to reduce the category > count on H during untraining M. Yup, and you have the solution below. > > There's another bullet we haven't bitten yet, saving a map of > message id to > an explicit list of all tokens produced by that message (Skip wants the > inverse of that mapping for diagnostic purposes too). Given > that, training > and untraining of individual messages could proceed smoothly despite > intervening changes in tokenization details; expiring entire > messages would > be straightforward; and when expiring an individual feature, it would be > enough to remove that feature from each msg->[feature] list it's in (then > untraining on a msg later wouldn't *try* to decrement the > per-feature count > of any feature that had previously been expired individually and > appeared in > the msg at the time). This definitely works. But why bother tracking, cross-referencing and expiring individual tokens when we can just expire whole messages, which is a lot simpler? It accomplishes the goal of keeping the token databases cleaned of excessive hapaxes and gradually expires non-hapax tokens, as well. There is also less need for reverse indexing of tokens to messages, since all messages and their tokens will eventually expire. However, if people need that feature, they need it. > > That's all easy enough to do, but the database grows ever bigger. > It would > probably need reworking to start using "feature ids" (little > integers) too, > so that relatively big strings didn't have to get duplicated all over the > database. No argument there. How about a 32-bit hash for any token whether unigram, bi-gram, etc.? The token database could then consist of an ordered list of 32-bit hashes paired with an occurrence count (16-bits would probably do it). That's only six bytes/token, and you could use your indexing method of choice, if any, to speed up the lookups. Similarly, if we implemented a message database with this method, each token in a message would only take up four bytes. The hash calculation costs something, but the smaller database size and quicker lookup time could make up for it. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From tim.one at comcast.net Thu Dec 18 00:08:39 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Dec 18 00:08:42 2003 Subject: [spambayes-dev] RE: [Spambayes] How low can you go? In-Reply-To: Message-ID: [Seth Goodman] > OK, but there are still a couple of potential problems. Oh, sure -- but testing is the only judge of what works here. > 1) Let's say the discarded bi-gram occurs in a spam at a later date. > Though it was only a hapax, it now contributes nothing. I doubt it matters. Most text classification systems (this field is more than 40 years old, BTW) ignore hapaxes entirely, and also ignore tokens that don't appear in at least *several* distinct training examples (see Paul Graham's essay, where he carried on that tradition). We don't ignore anything, because testing said it worked better not to ignore anything in this particular task. It wasn't a killer-strong improvement to pay attention to everything, but was a statistically significant win. Good enough. Since then, use in real life, unlike our randomized cross-validation testing, doesn't see messages "at random" at all: it sees them ordered in time. That appears to make a difference, and actually helps us overall. After some 16 months of watching this algorithm in various tests and in practice, I've identified only two clear, repeated effects of hapaxes: 1. Good: When a spam campaign begins, the hapaxes in its first example very often help to nail the upcoming variations in that campaign. People with small databases using mistake-based training see this dramatically, and it's very handy for them in real-life use. A similar effect helps on the ham side, when training (e.g.) on that once-per-month HTML newletter from (say) American Century Investments, which look very spammy the first time around. Because legit companies pay ad firms small fortunes to establish "brand identity", such newletters are typically *stuffed* with hapaxes identifying the source. 2. Bad: Most spam campaigns fizzle out within a month. The hapaxes stick around, though. Sooner or later an unusual ham comes across that just happens to hit a large number of the leftover spam hapaxes, then serves as a "spectacular failure" example here. They're very rare, but very unsettling when they occur (well, likely *because* they're so rare for most people). > 2) Let's say we want to train on a spam with the discarded bi-gram. > It was originally a hapax, so it should now have an occurrence count > of two. After training, it again shows up as a hapax. This is a > more significant problem. Based on what evidence? Token spamprobs are guesses at best, and an estimated spamprob based on only one or two examples isn't even reliable to one significant digit. The difference between seeing something once or twice doesn't move a spamprob much, either. So I have to guess that this effect is so tiny it will be lost in estimation noise. In early experiments, the database stored more info, and the test framework was able to report which features were used *most* often in making a correct decision. Several times I took the few hundred "most valuable" features (based on a combination of how often they contributed to a correct decision, and their spamprob strength (distance, in either direction, from 0.5)), and threw them out of the database. An amazing (at the time) thing was that this didn't hurt performance -- if the classifier was blinded to what *were* its best clues, it found another set of clues that did just as well overall. Performance eventually deteriorated dramatically if this was done over and over again, but the system has already been shown to be very robust against losing even its best features. That's one reason I'm not worried about throwing away its least useful features (hapaxes have weak spamprobs, and hapaxes that haven't been *used* in scoring for N days may as well not have existed at all for the last N days -- and most hapaxes are like that, no matter how big N is). > 3) Do we eventually reduce the occurrence count of a non-hapax token? There are many possible schemes. Strongly storage-conscious schemes only save a byte or two for a count, and periodically shift all the counts right by 1 bit, to prevent overflow. That seems to work very well in systems that do it. I've already said here that I see the primary point of expiring hapaxes as being a means to reduce database size, and in the context of the much more storage-intensive mixed unigram/bigram scheme. Hapaxes can account for the bulk of the storage all by themselves (this isn't unique to spam filtering, btw -- across many kinds of computer text indexing systems, hapaxes typically account for about half the content), and most hapaxes are never seen again. I'm experimenting with a mixed unigram/bigram classifier right now. It's been trained on (just) 94 ham and 96 spam so far, but there are already 51,378 features in the database. 45,624 of them are hapaxes -- that's 89%! I could eliminate the rest of the database entirely, and not cut its size enough to care about. This is why picking specifically on hapaxes is a high-value proposition (high potential, low risk). > If we do, we could eventually have none of the tokens from a trained > message present but its message count will still be there. Unless we > implement your token cross-reference as explained below, the message > counts will eventually not be correct if we expire enough tokens. I want to do expiration "correctly". But even if all the tokens from a message expire when the total message count is N, it still doesn't change that counts on tokens that remain were in fact derived from N messages, and so N remains the best possible thing to feed into the spamprob guesses. > If we don't expire a lot of tokens over the long run, why bother? I expect an enormous number of hapaxes to expire, in steady state essentially equaling the rate at which they're created by new messages. In the example above, 90% of the features created for me right now *are* hapaxes. I expect that to drop with more training, but for hapaxes to remain both the single biggest database consumer, and the least valuable tokens to retain. >> ... >> There's another bullet we haven't bitten yet, saving a map of >> message id to an explicit list of all tokens produced by that >> message (Skip wants the inverse of that mapping for diagnostic >> purposes too). Given that, training and untraining of individual >> messages could proceed smoothly despite intervening changes in >> tokenization details; expiring entire messages would be >> straightforward; and when expiring an individual feature, it would >> be enough to remove that feature from each msg->[feature] list it's >> in (then untraining on a msg later wouldn't *try* to decrement the >> per-feature count of any feature that had previously been expired >> individually and appeared in the msg at the time). > This definitely works. But why bother tracking, cross-referencing and > expiring individual tokens when we can just expire whole messages, > which is a lot simpler? I doubt that it's simpler at all, and you earlier today sketched quite an elaborate scheme for expiring different messages at different rates. That's got its share of tuning parameters (aka wild-ass guesses ) too, showed every sign of being just the beginning of its brand of complication, and has no testing or experience to support it. We know a lot about the real-life effects of hapaxes now. BTW, the single worst thing you can do with a system of this type is train a message into the wrong category. Everyone does it eventually, and some people can't seem to help but doing it often. Maybe that's a UI problem at heart -- I don't know, because I seem to be unusually resistant to it. It's happened to me too, though, and it can be hard to recover. One sterling use for a feature -> msg_ids map is, as Skip noted, a way to find out *why* your latest spam was a false negative: look at the low-scoring features, then look at the messages with those features that were trained on as ham. This has an excellent shot at pinpointing mis-trained messages. That's difficult at best now, and is a real problem for some people. I've got gigabytes of unused disk space myself . Evolution of this system would also be served by saving an explict msg_id -> features map. When we change tokenization to get a small win, sometimes the tokens originally added to a database by training on message M can no longer be reconstructed by re-tokenizing M (the tokenizer has changed! if it always returned exactly what it returned before the change, there wasn't much point to the change ). Blindly untraining anyway can violate database invariants then, eventually manifesting as assertion errors and the need to retrain from scratch. The only clear and simple way to prevent this is to save a map from msg_id to the tokens it originally produced. Then untraining simply walks that list, and nothing can go wrong as a result. That's a bit subtle, so takes some long-term experience to appreciate at a gut level. Of more immediate concern to most users is that only the obsessed *want* to save their spam. Most people want to throw spam away ASAP. But, if they do that, we currently have no way to expire any spam they ever trained on. Moving toward saving msg_ids <-> features maps solves that too, and with suitable reuse of little integers for feature ids can store the relevant bits about trained messages in less space than it takes to save the original messages. Note that hapaxes would waste the most resource in this context too. >> That's all easy enough to do, but the database grows ever bigger. >> It would probably need reworking to start using "feature ids" >> (little integers) too, so that relatively big strings didn't have to >> get duplicated all over the database. > No argument there. How about a 32-bit hash for any token whether > unigram, bi-gram, etc.? The token database could then consist of an > ordered list of 32-bit hashes paired with an occurrence count > (16-bits would probably do it). That's only six bytes/token, and you > could use your indexing method of choice, if any, to speed up the > lookups. We ran experiments on that before, and results were dreadful. 32-bit hashes have far too high a collision rate on a sizable database (don't forget the Birthday Paradox here!), confusing ham with spam in highly entertaining ways (provided you're just experimenting and don't really care how well it does). An MD5 or SHA-1 hash would be fine, but then it's up to 16 or 20 bytes per feature, and most of the strings we store in the current pure unigram scheme are shorter than that. A 64-bit hash would probably be OK. Another hated (widely in this project, among the developers) consequence of using hash codes is that mining the database for clues is useless then. "Hey, hash code 45485448 is your strongest spam clue!" "Oh -- no wonder, then" . Storing the actual feature strings as plainly as possible is extremely helpful for development, debugging, and research. > Similarly, if we implemented a message database with this method, each > token in a message would only take up four bytes. The hash calculation > costs something, but the smaller database size and quicker lookup time > could make up for it. We're not going to abandon plain strings, because they're far too useful and loved in various reports intended for human consumption. Adding feature_id <-> feature_string maps would allow for effective compression of message storage. From coulters at netspace.net.au Thu Dec 18 00:19:31 2003 From: coulters at netspace.net.au (John Coulter) Date: Thu Dec 18 00:20:24 2003 Subject: [Spambayes] delete Message-ID: <3FE138E3.2060403@netspace.net.au> Please delete my name from the Spambayes mailing list. Thank you. John Coulter From tim at fourstonesExpressions.com Thu Dec 18 00:28:46 2003 From: tim at fourstonesExpressions.com (Tim Stone) Date: Thu Dec 18 00:28:54 2003 Subject: [Spambayes] delete In-Reply-To: <3FE138E3.2060403@netspace.net.au> References: <3FE138E3.2060403@netspace.net.au> Message-ID: The only way to do this is for you to do it yourself, at http://mail.python.org/mailman/listinfo/spambayes On Thu, 18 Dec 2003 15:49:31 +1030, John Coulter wrote: > Please delete my name from the Spambayes mailing list. > Thank you. > John Coulter > > > _______________________________________________ > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes > Check the FAQ before asking: http://spambayes.sf.net/faq.html > -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun From tameyer at ihug.co.nz Thu Dec 18 00:29:25 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Dec 18 00:29:27 2003 Subject: [Spambayes] delete In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13047C0FE3@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A0E@its-xchg4.massey.ac.nz> > Please delete my name from the Spambayes mailing list. > Thank you. > John Coulter Look at the end of any message received via the list: > Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Go there and unsubscribe yourself. Or, look in the headers of any message received via the list: > List-Unsubscribe: , > =Tony Meyer From pcum7668 at bigpond.net.au Thu Dec 18 01:54:44 2003 From: pcum7668 at bigpond.net.au (Peter Cummins) Date: Thu Dec 18 01:55:10 2003 Subject: [Spambayes] (no subject) Message-ID: How do you uninstall?? All new mail just disappears?? P.A.Cummins --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.552 / Virus Database: 344 - Release Date: 15-Dec-03 From tameyer at ihug.co.nz Thu Dec 18 02:02:39 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Dec 18 02:02:44 2003 Subject: [Spambayes] (no subject) In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13047C1014@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677779@its-xchg4.massey.ac.nz> > How do you uninstall?? FAQ 3.14: > All new mail just disappears?? FAQ 3.12: =Tony Meyer From tameyer at ihug.co.nz Thu Dec 18 03:08:43 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Dec 18 03:08:52 2003 Subject: [Spambayes] How low can you go? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13047C09F0@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A0F@its-xchg4.massey.ac.nz> [Tim] > I see that it's a cruder approximation to the suggested > scoring algorithm (which I implemented at one time). [...] > It's harder to code a tiling method; Exactly . > BTW, it should *not* be necessary to increase > max_discriminators, and doing so can create subtle numeric > problems in the inverse chi-squared function. > Without this option, in an N-token message, N tokens were > candidates for scoring; with this option, there are still > exactly N candidates for scoring; with a true tiling > implementation, there are no more than N > candidates for scoring (and usually less than N). So the comment in here: Is only referring to cases where both unigrams *and* bigrams are used, rather than the tiling (or crude approximation) is used? I did get improvements with a higher max_discriminators: Is that likely to be just a side-effect of the crudeness of my approximation? =Tony Meyer From bob at 1776.com Wed Dec 17 17:26:47 2003 From: bob at 1776.com (Robert K. Coe) Date: Thu Dec 18 07:50:55 2003 Subject: [Spambayes] RE: How low can you go? In-Reply-To: Message-ID: <000001c3c4ec$dc90a9e0$6601a8c0@CambridgeMA.gov> > From: Seth Goodman [mailto:nobody@spamcop.net] > Sent: Wednesday, December 17, 2003 2:01 PM > To: spambayes@python.org; spambayes-dev@python.org > Subject: RE: [Spambayes] How low can you go? > > > Does CRM114 use the number of trained ham and trained spam *messages* > as variables in its probability calculation? If not, then you wouldn't > expect that deleting infrequently used tokens would do much damage. > AFAIK, SpamBayes uses the trained message counts in the probability > calculation and those becomes inaccurate if you delete individual tokens. If you delete, say, 5% of the tokens in the database, reduce the message count by 5% as well. Bob From bob at 1776.com Wed Dec 17 17:53:12 2003 From: bob at 1776.com (Robert K. Coe) Date: Thu Dec 18 07:51:03 2003 Subject: [Spambayes] RE: Is this a sign of future problems ? In-Reply-To: Message-ID: <000201c3c4f0$8d2f5370$6601a8c0@CambridgeMA.gov> This may be beside the point, but even the most rudimentary grammatical analysis would nail that message immediately. That might increase the spammers' problem by a quantum or two, since the message was obviously generated by random selection from a dictionary, not excerpted from a piece of actual text. Bob > -----Original Message----- > From: Tim Peters [mailto:tim.one@comcast.net] > Sent: Wednesday, December 17, 2003 11:43 AM > To: tdickenson@geminidataloggers.com; spambayes@python.org > Subject: RE: [Spambayes] Is this a sign of future problems ? > > > Sorry, I don't follow this, unless you're presuming a TOE regime so careless > that the user doesn't even correct classification mistakes. If a user > trains on spam messages as ham, then, sure, all sorts of horrid results can > follow. If I get a message like (this is easy, cuz I just got one exactly > like this ): > > cake paleozoic immortal couscous devon advocacy > agriculture arbitrage couple census deceive psi dana > cremate ceremony physiotherapist haunch commissary transpacific frigid > bryophyta > > Free CableTV!No more pay!& > > stegosaurus bowfin egret throughout damsel wilful cometary dreamt > minsk throughput chastity [... on & on & on ...] > > and train on it as ham, there's something wrong with *me*. From bob at 1776.com Wed Dec 17 17:39:23 2003 From: bob at 1776.com (Robert K. Coe) Date: Thu Dec 18 07:51:09 2003 Subject: [Spambayes] RE: SpamBayes for 500.000 users In-Reply-To: <16352.31325.84895.242716@montanaro.dyndns.org> Message-ID: <000101c3c4ee$9ee675f0$6601a8c0@CambridgeMA.gov> If you genuinely have to go to a server-based solution, you're probably dealing with a user community that's abnormally spam-averse (e.g., it may include lots of small children) or whose management is unusually sensitive to the time consumed by users in dealing with spam. In either case, you won't be able to populate your database with contributions from users. I think any server-based solution that depends on user contributions is probably doomed. Bob > -----Original Message----- > From: Skip Montanaro [mailto:skip@pobox.com] > Sent: Wednesday, December 17, 2003 10:47 AM > To: Dreas van Donselaar > Cc: spambayes@python.org > Subject: RE: [Spambayes] SpamBayes for 500.000 users > > > > Dreas> So does this mean that Bayesian won't be effective for such a > Dreas> large user database? > > Dunno. You'll definitely have to do some testing. It might be sufficient > to do one or more of the following: > > * suppress tokenizing of headers which would generate such a strong > user-oriented bias > > * make it easy for your users to submit mail for inclusion in the > database > > The first is a surmountable problem. The key option to tweak is > address_headers in the Tokenizer section. The second will be more > difficult. In my limited multi-user experience: > > * people only submit spam > > * people (understandably) never submit anything very sensitive From glennw at mrhtech.com Thu Dec 18 08:17:57 2003 From: glennw at mrhtech.com (Glenn Welker) Date: Thu Dec 18 08:18:00 2003 Subject: [Spambayes] Outlook Addin hang fix Message-ID: <6408F62AAFFCE54890943C06EF9A8DB747664C@zeus.festech.com> I have noticed that since the Addin has been installed it causes Outlook to hang occasionally when closing. This is most often caused by a general lack of events from Outlook. Most developers have resorted to capturing the close event of the explorer rather than the application. This event is much more predictable. If you need any help I am more than happy to look at the code. It appears as thought the Outlook addin is binary only so I was unable to look if the above scenario is true. Cheers. Keep up the good work. Sincerely, Glenn Welker Director of Product Development MRH Technology Group -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031218/30b6c62f/attachment.html From rcoe at CambridgeMA.GOV Thu Dec 18 08:36:46 2003 From: rcoe at CambridgeMA.GOV (Coe, Bob) Date: Thu Dec 18 08:36:51 2003 Subject: [Spambayes] RE: Outlook Addin hang fix Message-ID: My experience has been that the Explorer hangs on closing more often than Outlook does (with or without the addin). Is that irrelevant to this proposal? Bob -----Original Message----- From: Glenn Welker [mailto:glennw@mrhtech.com] Sent: Thursday, December 18, 2003 8:18 AM To: spambayes@python.org Subject: [Spambayes] Outlook Addin hang fix I have noticed that since the Addin has been installed it causes Outlook to hang occasionally when closing. This is most often caused by a general lack of events from Outlook. Most developers have resorted to capturing the close event of the explorer rather than the application. This event is much more predictable. If you need any help I am more than happy to look at the code. It appears as thought the Outlook addin is binary only so I was unable to look if the above scenario is true. Cheers. Keep up the good work. Sincerely, Glenn Welker Director of Product Development MRH Technology Group -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031218/7b2469ca/attachment-0001.html From skip at pobox.com Thu Dec 18 08:41:11 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Dec 18 08:41:05 2003 Subject: [spambayes-dev] RE: [Spambayes] How low can you go? In-Reply-To: References: Message-ID: <16353.44663.34193.301968@montanaro.dyndns.org> Tim> I'm experimenting with a mixed unigram/bigram classifier right now. Tim> It's been trained on (just) 94 ham and 96 spam so far, but there Tim> are already 51,378 features in the database. 45,624 of them are Tim> hapaxes -- that's 89%! Late yesterday afternoon I tweaked my procmailrc file to automatically train on everything which scored as ham or spam. I awoke this morning to a database with 489 spam, 600 ham and 198,747 features, 158,116 of were hapaxes (80%). At the same time I moved my ham/spam thresholds closer to 0 and 1 to minimize the amount of retraining necessary to counteract false positives and false negatives. (It's kind of a pain because I'm also saving the messages I train on, so I have to rummage around in a Unix mbox to find incorrectly trained messages.) I train unsures by hand. Still only 16 unsures overnight, but my database is up to 10.5MB, so training and scoring time is on the rise. Bringing it back to this topic, hapax expiration seems like both a worthwhile step to take from space/time considerations, and even less likely to produce problems because I'm training on everything I see. Now if I could only test this setup easily without a huge time investment. Perhaps a few more Emacs keybindings are in order. Tim> BTW, the single worst thing you can do with a system of this type Tim> is train a message into the wrong category. Everyone does it Tim> eventually, and some people can't seem to help but doing it often. :-) Skip From gerrit at nl.linux.org Thu Dec 18 09:00:16 2003 From: gerrit at nl.linux.org (Gerrit Holl) Date: Thu Dec 18 09:01:14 2003 Subject: [Spambayes] More "spam of the future" lately? In-Reply-To: References: Message-ID: <20031218140016.GA4091@nl.linux.org> Michael N. Nitabach wrote: > What makes you say that? I have my certain-spam cutoff at .30, and my uncertain at .01. My training database has about 8000 hams and 3000 spams. I have only ever received ten hams that scored over .01, and only one over .20. Heh... I have my ham-cutoff at .30 and my spam at .99. Still, 9/10 unsure messages are ham. But I have a database with... let's see... 28 spam, 34 ham. Gerrit. -- 265. If a herdsman, to whose care cattle or sheep have been entrusted, be guilty of fraud and make false returns of the natural increase, or sell them for money, then shall he be convicted and pay the owner ten times the loss. -- 1780 BC, Hammurabi, Code of Law -- Asperger's Syndrome - a personal approach: http://people.nl.linux.org/~gerrit/english/ From hutch120 at btopenworld.com Thu Dec 18 11:42:06 2003 From: hutch120 at btopenworld.com (Steve Hutchins) Date: Thu Dec 18 11:41:01 2003 Subject: [Spambayes] Spambayes Message-ID: Does this work with Outlook 97? Steve Hutchins Minerva Computer Services Orchard House Potters Bar Herts EN6 3AX England 01707 60700 hutch@minerva-cs.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031218/29f70683/attachment.html From rmalayter at bai.org Thu Dec 18 12:09:21 2003 From: rmalayter at bai.org (Ryan Malayter) Date: Thu Dec 18 12:09:24 2003 Subject: [Spambayes] Spambayes Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01A7526E@cliff.bai.org> [Steve Hutchins] > Does this work with Outlook 97? Unfortunately, *nothing* works with Outlook 97 anymore. Outlook 97 was released in mid-1996. SpamBayes requires Outlook 2000 or newer. You're using software which is more than 7 years old, which is an eternity in the computer world. That's not unlike using a car that's more than 50 years old, and expecting new parts to fit it. It just won't happen. I'd strongly suggest an upgrade of some sort, if only for security's sake. If you don't want to pay for a new version from Microsoft, there are lots of free email/personal organizers for Windows out there to choose from. Some of them even import Outlook files, and would work with the POP3/IMAP version of SpamBayes. Regards, Ryan From kennypitt at hotmail.com Thu Dec 18 12:22:29 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Thu Dec 18 12:23:05 2003 Subject: [Spambayes] Spambayes In-Reply-To: <792DE28E91F6EA42B4663AE761C41C2A01A7526E@cliff.bai.org> Message-ID: Ryan Malayter wrote: > [Steve Hutchins] >> Does this work with Outlook 97? > > Unfortunately, *nothing* works with Outlook 97 anymore. Outlook 97 was > released in mid-1996. SpamBayes requires Outlook 2000 or newer. The Outlook add-in requires Outlook 2000 or greater. The sb_server POP3 proxy will work with any POP3 e-mail account, regardless of the client, so should work fine with Outlook 97. The drawbacks are: * It is not integrated into the client. * It takes more work to set it up (for now, the next release should remedy this somewhat). You can find more info about sb_server here: http://spambayes.sourceforge.net/applications.html#sb_server -- Kenny Pitt From uuhhuu at athenet.net Thu Dec 18 12:24:03 2003 From: uuhhuu at athenet.net (uuhhuu@athenet.net) Date: Thu Dec 18 12:24:14 2003 Subject: [Spambayes] Spambayes In-Reply-To: References: Message-ID: <57679.67.36.83.25.1071768243.squirrel@webmail.athenet.net> I realize you've just gotten an answer you your question that can [retty much be summed up as "No!" but I believe (and I'm sure I will be corrected if wrong) that the answer is more properly, "No, but..." If you are using Outlook 97 with an Exchange server (an unlikely circumstance anymore) then probably Spambayes would not work. This is most likely in a corporate environment with its own mail server(s) on its own network. If you are using Outlook 97 to communicate with an ISP's mail servers via POP3/SMTP or even IMAP, then YES, you can use SpamBayes. You just can't use the Outlook plugin and the fancy buttons and interface. You'll be using the pop3proxy (now called sbserver.py) just like I do with Eudora Pro, or Pegasus mail, or any other POP3 mail client program. Hopefully you are using Windows 2000, or XP, or NT, where the sbserver proxy can run as a service. Much neater that way. I don't know much about custominzing OL97 to run over the proxy, but I'm sure there's a way to do it. -T > Does this work with Outlook 97? > > Steve Hutchins > Minerva Computer Services > Orchard House > Potters Bar > Herts > EN6 3AX > England > > 01707 60700 > hutch@minerva-cs.co.uk_______________________________________________ > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes > Check the FAQ before asking: http://spambayes.sf.net/faq.html From aaronpc at core-inc.com Thu Dec 18 13:38:11 2003 From: aaronpc at core-inc.com (Aaron P. Crowell) Date: Thu Dec 18 13:38:15 2003 Subject: [Spambayes] SPAM FOLDER QUESTION - Deleting emails Message-ID: <7B3FBEA0529D1040BF9E465EACBC82AE17183F@coresrvr2000.core-inc.com> I have close to 1000 emails in this folder. If I deleted some / most with this adversely affect the filter / scanning capabilities of SpamBayes? Thanks! From kennypitt at hotmail.com Thu Dec 18 14:15:19 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Thu Dec 18 14:15:57 2003 Subject: [Spambayes] SPAM FOLDER QUESTION - Deleting emails In-Reply-To: <7B3FBEA0529D1040BF9E465EACBC82AE17183F@coresrvr2000.core-inc.com> Message-ID: Aaron P. Crowell wrote: > I have close to 1000 emails in this folder. If I deleted some / most > with this adversely affect the filter / scanning capabilities of > SpamBayes? Thanks! You can delete the contents of the spam folder at any time without affecting the filtering accuracy. SpamBayes does not use existing e-mails in the spam folder when filtering. It collects statistics from the messages as you train on them with the toolbar (I assume you're using Outlook), and stores them in its own database. The saved messages only become important if you want to "re-train" SpamBayes. You can either retrain from scratch on new messages that you receive, or you can point SpamBayes at some existing messages to train from. There are several cases that might lead you to re-train. * You've made some training mistakes and your accuracy is suffering. * You've trained a lot more of one type of message than the other, which will also end up affecting your accuracy. * Your training data somehow gets lost or corrupted, and you are forced to retrain. In any case, saving the most recent 100 or so messages of each type should be plenty. -- Kenny Pitt From D.Kindred at telesciences.com Thu Dec 18 15:15:50 2003 From: D.Kindred at telesciences.com (D.Kindred@telesciences.com) Date: Thu Dec 18 15:15:54 2003 Subject: [Spambayes] Changing the icons in the Outlook Add-In Message-ID: <8ED1A9EA2B9AAD4EBAAD78EB1F817EC50EB3F1@s-tnj-23-0018.tlsi.corp.4tel.no> After using the Outlook Add-In myself for a while I've started setting it up for those users who get a lot of junk. SO far most of the comments have been positive. The one unusual complaint is from a user who doesn't like having the "mean grouchy unhappy" face staring at her all the time. The "happy" face is okay, but you only get to see that in a spam folder. Is there a way for a user to change the icon? -- David L. Kindred Unix Systems & Network Administrator Telesciences, Inc. From jon at intelligent-design.net Thu Dec 18 15:18:49 2003 From: jon at intelligent-design.net (Jon A. Pastor) Date: Thu Dec 18 15:19:31 2003 Subject: [Spambayes] Performance black hole Message-ID: <019901c3c5a4$26baf180$727ba8c0@cc893633b> Folks- I installed and configured, got everything functioning just fine -- except when I tried to do anything at all, after training on a HAM mailbox, performance went into the toilet. Python was consuming 75-95% of my cycles just trying to get to the SpamBayes local home page -- never mind trying to train. I'm disappointed, but not inclined to spend a lot of time debugging this situation. If you know of any reason, off the tops of your heads, why this might be happening, let me know; otherwise, nice system but I can't afford the overhead. Also, the proxy seems to have aborted at one point without any notice; all that happened is that the system tray icon vanished, and I got "no socket" errors from my mail client for all three of my accounts. -Jon -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031218/16a93b4c/attachment.html From richie at entrian.com Thu Dec 18 16:25:02 2003 From: richie at entrian.com (Richie Hindle) Date: Thu Dec 18 16:25:12 2003 Subject: [Spambayes] Performance black hole In-Reply-To: <019901c3c5a4$26baf180$727ba8c0@cc893633b> References: <019901c3c5a4$26baf180$727ba8c0@cc893633b> Message-ID: <2e64uvghuqqbfaj2qjrcertdgah7db6ie0@4ax.com> Hi Jon, > If you know of any reason, off the tops of your heads, why > this might be happening, let me know; otherwise, nice system > but I can't afford the overhead. I'm afraid I don't know what could be causing that. What I will say is that it's not normal - SpamBayes shouldn't have much of an impact on your system performance at all. It's small and light, especially when all it's doing is serving up its web interface. When training or classifying a large number of messages, and you're on a fast network so bandwidth isn't the limiting factor, then it might soak your processor a bit, but I've never seen it significantly impact the OS or other processes, whatever it's doing. -- Richie Hindle richie@entrian.com From tameyer at ihug.co.nz Thu Dec 18 17:21:03 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Dec 18 17:21:18 2003 Subject: [Spambayes] Outlook Addin hang fix In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13047C10CD@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130467777B@its-xchg4.massey.ac.nz> > If you need any help I am more than happy to > look at the code. It appears as thought the Outlook > addin is binary only so I was unable to look if the > above scenario is true. The source isn't included in the binary installer, no, but you can get it in any of the source releases (available from the same download page). 1.0a7 is the most up-to-date, and corresponds reasonably closely with 008.1. =Tony Meyer From tim.one at comcast.net Thu Dec 18 17:41:35 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Dec 18 17:41:38 2003 Subject: [Spambayes] Changing the icons in the Outlook Add-In In-Reply-To: <8ED1A9EA2B9AAD4EBAAD78EB1F817EC50EB3F1@s-tnj-23-0018.tlsi.corp.4tel.no> Message-ID: [D.Kindred@telesciences.com] > After using the Outlook Add-In myself for a while I've started > setting it up for those users who get a lot of junk. SO far most of > the comments have been positive. > > The one unusual complaint is from a user who doesn't like having the > "mean grouchy unhappy" face staring at her all the time. Good for her! Most people think we put the icon on the button just to make it easy to find, and maybe provoke a smile. She's the first one to figure out it's actually staring at her, and really doesn't like her . > The "happy" face is okay, but you only get to see that in a spam folder. > Is there a way for a user to change the icon? Many, but that's a builtin function of Outlook, so you don't need us for this. Type "change button image" at the Office Assistant when in Outlook, and it will lead you to the relevant docs. You can remove the icon entirely, replace it with the happy face, replace it with a number of other silly icons Outlook provides, or even edit the icon pixel by pixel, all within Outlook. The last time I tried this, the next time I started Outlook a box popped up asking whether I accepted Microsoft's End User License Agreement for Outlook. Say "yes" . From eliot at isogen.com Thu Dec 18 18:37:33 2003 From: eliot at isogen.com (Eliot Kimber) Date: Thu Dec 18 18:38:46 2003 Subject: [Spambayes] Performance black hole In-Reply-To: <019901c3c5a4$26baf180$727ba8c0@cc893633b> References: <019901c3c5a4$26baf180$727ba8c0@cc893633b> Message-ID: <3FE23A3D.301@isogen.com> Jon A. Pastor wrote: > Folks- > > I installed and configured, got everything functioning just fine -- > except when I tried to do anything at all, after training on a HAM > mailbox, performance went into the toilet. Python was consuming > 75-95% of my cycles just trying to get to the SpamBayes local home > page -- never mind trying to train. This is almost certainly the problem I had. The problem is the number of messges trained. I had trained several thousand. The solution is to delete your training database and retrain with a much smaller set of ham and spam messages, say 100-200 of each. Cheers, Eliot -- W. Eliot Kimber Innodata Isogen eliot@isogen.com www.isogen.com From nobody at spamcop.net Thu Dec 18 18:41:04 2003 From: nobody at spamcop.net (Seth Goodman) Date: Thu Dec 18 18:41:09 2003 Subject: [spambayes-dev] RE: [Spambayes] How low can you go? In-Reply-To: Message-ID: Tim, Thanks for taking the time to construct such a complete set of answers. I learned a lot from it and I assume other list readers did as well. > > [Seth Goodman] > > If we do, we could eventually have none of the tokens from a trained > > message present but its message count will still be there. Unless we > > implement your token cross-reference as explained below, the message > > counts will eventually not be correct if we expire enough tokens. > > [Tim Peters] > I want to do expiration "correctly". But even if all the tokens from a > message expire when the total message count is N, it still doesn't change > that counts on tokens that remain were in fact derived from N > messages, and > so N remains the best possible thing to feed into the spamprob guesses. Not really. If you decrement all the token counts from a trained message, the database is in the exact same state as it was before you trained on that message (ignoring subsequent messages trained). At that point, the trained message count was N-1, so that is the best thing to use for the probability calculation rather than N. The message count will keep increasing as you train new messages but the token database will eventually level off. That suggests that the trained message counts will become too large as time goes on. If you only expire hapaxes, perhaps the incorrect message count is a technicality and won't have a significant effect on the spam probabilities. But unless you expire non-hapaxes as well, the token database can't track a changing message stream very well. Once you start expiring non-hapax tokens (is there a name for these?), my guess is that you can no longer ignore the incorrect message count issue. So how _do_ you do expiration "correctly" if not by whole messages? > >> [Tim Peters] > >> ... > >> There's another bullet we haven't bitten yet, saving a map of > >> message id to an explicit list of all tokens produced by that > >> message (Skip wants the inverse of that mapping for diagnostic > >> purposes too). Given that, training and untraining of individual > >> messages could proceed smoothly despite intervening changes in > >> tokenization details; expiring entire messages would be > >> straightforward; and when expiring an individual feature, it would > >> be enough to remove that feature from each msg->[feature] list it's > >> in (then untraining on a msg later wouldn't *try* to decrement the > >> per-feature count of any feature that had previously been expired > >> individually and appeared in the msg at the time). > > > [Seth Goodman] > > This definitely works. But why bother tracking, cross-referencing and > > expiring individual tokens when we can just expire whole messages, > > which is a lot simpler? > > [Tim Peters] > I doubt that it's simpler at all, and you earlier today sketched quite an > elaborate scheme for expiring different messages at different > rates. That's > got its share of tuning parameters (aka wild-ass guesses ) > too, showed > every sign of being just the beginning of its brand of > complication, and has > no testing or experience to support it. We know a lot about the real-life > effects of hapaxes now. Offhand, adding a single timestamp per message at training time sounds easier than tracking the last time seen for every token in the database. As far as the "elaborate" scheme I suggested for variable expiration times, all that's involved is changing the message timestamp before storing it. Since you don't have anything like that now, you can just ignore that idea and the extra parameter that goes with it. BTW, that parameter value is not just a wild-ass guess, it's a SWAG (sophisticated wild-ass guess), and I don't like them any better than you do :) Either way, rather than frequently searching for expired tokens (in a very long list), you would only do token expiration when you have to train a new message. At that point, you find the oldest trained message (from a much shorter list) and untrain it. The extra complication is storing the token list with each message ID plus its training timestamp. That doesn't sound big compared to cross referencing every token to every message it appeared in. They're certainly not mutually exclusive and you later made a good argument for having this extra information anyway. > [Tim Peters] > BTW, the single worst thing you can do with a system of this type > is train a > message into the wrong category. Everyone does it eventually, and some > people can't seem to help but doing it often. Maybe that's a UI > problem at > heart -- I don't know, because I seem to be unusually resistant > to it. It's I agree completely. This was an important motivation for expiring a whole message at a time. Training mistakes would eventually drop out of the database without user intervention. Not that a tool to help track down training mistakes wouldn't be great, but a "casual" user could still make occasional mistakes and the system would recover by itself. > [Tim Peters] > happened to me too, though, and it can be hard to recover. One > sterling use > for a feature -> msg_ids map is, as Skip noted, a way to find out > *why* your > latest spam was a false negative: look at the low-scoring features, then > look at the messages with those features that were trained on as > ham. This > has an excellent shot at pinpointing mis-trained messages. > That's difficult > at best now, and is a real problem for some people. I've got gigabytes of > unused disk space myself . No argument there, it's a great feature for problem-solving. > [Tim Peters] > Evolution of this system would also be served by saving an > explict msg_id -> > features map. When we change tokenization to get a small win, > sometimes the > tokens originally added to a database by training on message M > can no longer > be reconstructed by re-tokenizing M (the tokenizer has changed! if it > always returned exactly what it returned before the change, there wasn't > much point to the change ). Blindly untraining anyway can violate > database invariants then, eventually manifesting as assertion > errors and the > need to retrain from scratch. The only clear and simple way to > prevent this > is to save a map from msg_id to the tokens it originally produced. Then > untraining simply walks that list, and nothing can go wrong as a result. I agree completely and that's why I suggested saving the token list with each message. Your feature_ID scheme makes it practical. > [Tim Peters] > That's a bit subtle, so takes some long-term experience to appreciate at a > gut level. Of more immediate concern to most users is that only the > obsessed *want* to save their spam. Most people want to throw spam away > ASAP. But, if they do that, we currently have no way to expire any spam > they ever trained on. Moving toward saving msg_ids <-> features > maps solves > that too, and with suitable reuse of little integers for feature ids can > store the relevant bits about trained messages in less space than it takes > to save the original messages. Note that hapaxes would waste the most > resource in this context too. Sounds like _you're_ arguing for expiration of whole messages :) I know you're not arguing that, but if there were bidirectional msg_id <-> feature_ID maps, it would be fairly easy to expire whole messages. That would obviate the need to track last time seen for every token. In any case, I hope you move in the direction of saving such maps as it adds so much flexibility. > [Tim Peters] > We're not going to abandon plain strings, because they're far too > useful and > loved in various reports intended for human consumption. Adding > feature_id > <-> feature_string maps would allow for effective compression of message > storage. All your arguments on this point make lots of sense. I'm a little surprised that you had significant collisions mapping perhaps 100K items (my guess) into a 32-bit space. I think that is rather dependent on the hash used, but that's what you saw. Since you need the cleartext anyway, your feature-ID concept is far superior. Thanks for educating me. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From rmalayter at bai.org Thu Dec 18 18:57:11 2003 From: rmalayter at bai.org (Ryan Malayter) Date: Thu Dec 18 18:57:14 2003 Subject: [spambayes-dev] RE: [Spambayes] How low can you go? Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01A75280@cliff.bai.org> {Seth Goodman} > All your arguments on this point make lots of sense. > I'm a little surprised that you had significant > collisions mapping perhaps 100K items (my guess) > into a 32-bit space. I think that is rather dependent > on the hash used, but that's what you saw. That's not surprising at all to me. Because of the "birthday paradox", even very input-sensitive (random-looking) hash functions like the 160-bit SHA-1 only give 80 bits of collision resistance. With a 32 bit perfect hash, you get just 16 bits of collision resistance. That means there is a 50% chance of a collision if you hash just 65,536 items. Hash more items than that, and your chances of collision go up further. If your hash function isn't perfectly (randomly) distributed in the 32-bit space, things could be much worse with 100,000 hashes in a collection. I would suggest using storing at least a 64 bit hash; perhaps the first 8 bytes of an SHA-1 or MD5 hash would be appropriate. There exists good optimized code for both algorithms in the public domain. Regards, Ryan From rcoe at CambridgeMA.GOV Thu Dec 18 19:12:06 2003 From: rcoe at CambridgeMA.GOV (Coe, Bob) Date: Thu Dec 18 19:12:10 2003 Subject: [Spambayes] Changing the icons in the Outlook Add-In Message-ID: If you make the3 change within Outlook, and then later reinstall Spambayes, I'm pretty sure the icons will revert. I'd be inclined to edit the images as they're stored in the Spambayes directory instead. Bob > -----Original Message----- > From: Tim Peters [mailto:tim.one@comcast.net] > Sent: Thursday, December 18, 2003 5:42 PM > To: D.Kindred@telesciences.com > Cc: spambayes@python.org > Subject: RE: [Spambayes] Changing the icons in the Outlook Add-In > > > [D.Kindred@telesciences.com] > > After using the Outlook Add-In myself for a while I've started > > setting it up for those users who get a lot of junk. SO far most of > > the comments have been positive. > > > > The one unusual complaint is from a user who doesn't like having the > > "mean grouchy unhappy" face staring at her all the time. > > Good for her! Most people think we put the icon on the button just to make > it easy to find, and maybe provoke a smile. She's the first one to figure > out it's actually staring at her, and really doesn't like her . > > > The "happy" face is okay, but you only get to see that in a spam folder. > > Is there a way for a user to change the icon? > > Many, but that's a builtin function of Outlook, so you don't need us for > this. Type "change button image" at the Office Assistant when in Outlook, > and it will lead you to the relevant docs. You can remove the icon > entirely, replace it with the happy face, replace it with a number of other > silly icons Outlook provides, or even edit the icon pixel by pixel, all > within Outlook. > > The last time I tried this, the next time I started Outlook a box popped up > asking whether I accepted Microsoft's End User License Agreement for > Outlook. Say "yes" . From pacummins2000 at yahoo.com.au Thu Dec 18 20:13:24 2003 From: pacummins2000 at yahoo.com.au (=?iso-8859-1?q?Peter=20Cummins?=) Date: Thu Dec 18 20:13:32 2003 Subject: [Spambayes] (no subject) Message-ID: <20031219011324.801.qmail@web80708.mail.yahoo.com> re previous mail . your reply dissapeared with all other mail to pcum7668@bigpond.net.au. could you reply to this address pacummins2000@yahoo.com.au with how to uninstall this programme , os w2000pro /office 2000. thank you Peter Cummins Come to Yorkeys Peter --------------------------------- Yahoo! Personals - New people, new possibilities. FREE for a limited time! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031219/8c97edc1/attachment.html From tameyer at ihug.co.nz Thu Dec 18 20:17:39 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Dec 18 20:18:04 2003 Subject: [Spambayes] (no subject) In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13047C1277@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130467777D@its-xchg4.massey.ac.nz> > your reply dissapeared with all other mail to > pcum7668@bigpond.net.au. could you reply to this > address pacummins2000@yahoo.com.au with how to > uninstall this programme """ > How do you uninstall?? FAQ 3.14: > All new mail just disappears?? FAQ 3.12: =Tony Meyer """ =Tony Meyer From nobody at spamcop.net Thu Dec 18 20:25:16 2003 From: nobody at spamcop.net (Seth Goodman) Date: Thu Dec 18 20:25:27 2003 Subject: [spambayes-dev] RE: [Spambayes] How low can you go? In-Reply-To: <792DE28E91F6EA42B4663AE761C41C2A01A75280@cliff.bai.org> Message-ID: > {Seth Goodman} > > All your arguments on this point make lots of sense. > > I'm a little surprised that you had significant > > collisions mapping perhaps 100K items (my guess) > > into a 32-bit space. I think that is rather dependent > > on the hash used, but that's what you saw. > > [Ryan Malayter] > That's not surprising at all to me. Because of the "birthday paradox", > even very input-sensitive (random-looking) hash functions like the > 160-bit SHA-1 only give 80 bits of collision resistance. With a 32 bit > perfect hash, you get just 16 bits of collision resistance. That means > there is a 50% chance of a collision if you hash just 65,536 items. Hash > more items than that, and your chances of collision go up further. > > If your hash function isn't perfectly (randomly) distributed in the > 32-bit space, things could be much worse with 100,000 hashes in a > collection. As I understand it, the birthday paradox leads to the conclusion that for a 32-bit perfect hash function, after hashing around 78,000 items (just over 16-bits worth), you are likely to experience a _single_ collision. What Tim described sounded like they probably had multiple collisions to account for the spectacular failures they saw. I don't know the size of the token databases they dealt with back then, but I doubt a single collision in a token list of 78K items would affect the classifier. Since most of the tokens are hapaxes anyway (perhaps 80-90% ?), it is most probable that there would be no visible effect. You are of course correct that going over 78K items limit would give more collisions, but it would take quite a few collisions for one of the colliding tokens to be something other than a hapax. I am guessing that unless there were a lot more than 100K tokens, the 32-bit hash function used probably didn't do as good a randomizing job as needed. Since they ultimately had to construct a map of hash_value <-> token_string, they could have detected collisions (check the token already stored with the hash value) and done something about it (i.e. use next empty bucket). Since this would be a rare event, it wouldn't have cost much. In any case, Tim's idea of a mapping token_string <-> feature_ID (i.e. sequentially allocated number with "wrap-around") sounds much simpler. However, it is important that the number has enough bits that previously allocated feature_ID's are ready to be reused (their tokens expired) by the time the allocation number "wraps around" to them. This just means that the number should probably be 32-bits. Assuming you generate 100K tokens per day, the wrap-around time for a 32-bit number is 117 years. For a 24-bit number and the same rate of token production, the wrap-around time is 167 days (around 5.5 months). I'd go for the 32-bit number and not worry about pathological operating schemes or new tokenizers. Even at 1 million new tokens per day, the wrap-around time for a 32-bit feature_ID is over 10 years. Why hash when you can sequentially allocate? This was just a bad idea on my part. And it won't be the last one :) -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From tim.one at comcast.net Thu Dec 18 20:26:43 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Dec 18 20:26:42 2003 Subject: [spambayes-dev] RE: [Spambayes] How low can you go? In-Reply-To: Message-ID: [Seth Goodman] > Thanks for taking the time to construct such a complete set of > answers. I learned a lot from it and I assume other list readers did > as well. My pleasure, but I'm afraid it was taken out of sleep time, and I can't do that again. So, no offense intended, I have to be very brief here, while wanting to do more: > Not really. If you decrement all the token counts from a trained > message, the database is in the exact same state as it was before you > trained on that message (ignoring subsequent messages trained). At > that point, the trained message count was N-1, so that is the best > thing to use for the probability calculation rather than N. The > message count will keep increasing as you train new messages but the > token database will eventually level off. That suggests that the > trained message counts will become too large as time goes on. > > If you only expire hapaxes, perhaps the incorrect message count is a > technicality and won't have a significant effect on the spam > probabilities. But unless you expire non-hapaxes as well, the token > database can't track a changing message stream very well. Once you > start expiring non-hapax tokens (is there a name for these?), my > guess is that you can no longer ignore the incorrect message count > issue. So how _do_ you do expiration "correctly" if not by whole > messages? I only intend to expire hapaxes for now, with whole-msg expiration after; but one thing at a time, and each step will take a long time for testing. There's no rush. The idea that all the tokens in a message could get expired seems too implausible to me to worry about, when only hapaxes are expired. ... > Offhand, adding a single timestamp per message at training time sounds > easier than tracking the last time seen for every token in the > database. As far as the "elaborate" scheme I suggested for variable > expiration times, all that's involved is changing the message > timestamp before storing it. Since you don't have anything like that > now, you can just ignore that idea and the extra parameter that goes > with it. BTW, that parameter value is not just a wild-ass guess, > it's a SWAG (sophisticated wild-ass guess), and I don't like them any > better than you do :) > > Either way, rather than frequently searching for expired tokens (in a > very long list), you would only do token expiration when you have to > train a new message. At that point, you find the oldest trained > message (from a much shorter list) and untrain it. The extra > complication is storing the token list with each message ID plus its > training timestamp. That doesn't sound big compared to cross > referencing every token to every message it appeared in. They're > certainly not mutually exclusive and you later made a good argument > for having this extra information anyway. There are messages I never want to expire. That creates major new UI headaches to be doable. I believe (but don't yet know) that expiring hapaxes can be done without need for user intervention, and without harm. At some point, if you want to try your ideas, *try* your ideas -- that's what Open Source is all about. Everyone is born knowing how to program in Python, although most don't realize it until they try. ... > I agree completely. This was an important motivation for expiring a > whole message at a time. Training mistakes would eventually drop out > of the database without user intervention. Not that a tool to help > track down training mistakes wouldn't be great, but a "casual" user > could still make occasional mistakes and the system would recover by > itself. Without intervention, it will also expire the screaming bright-red HTML birthday message sent by my favorite 7-year-old niece, and when she's 8 the next one may get tagged as spam. These are the kinds of messages I never want to expire. "Elaborate" before referred to untested gimmicks for adjusting expiration date based on "how far away" a message was from its correct classification, etc. I don't have a feel for whether that can be made to work well in real life, and it needs serious implementation effort and testing to get a good feel. In the vanishingly small time I can still make for this project, I need to give it to things my experience suggests will almost certainly win with no more effort or surprises than I already know they require enduring. ... > Sounds like _you're_ arguing for expiration of whole messages :) Oh yes, I do want that -- eventually. We have no experience with that in this project, though; we have a lot of experience with the consequences of hapaxes, and I have no fears remaining about picking on them. > I know you're not arguing that, but if there were bidirectional msg_id > <-> feature_ID maps, it would be fairly easy to expire whole > messages. Yes, and that's a real attraction. Doing the actual expiration would be trivially easy and fast then. Deciding *when* to do expiration, and of which messages, are the things we really don't know anything about yet. > That would obviate the need to track last time seen for every token. Only if you don't want also to be able to expire tokens on their own. > In any case, I hope you move in the direction of saving such maps as > it adds so much flexibility. Not to mention database size . ... > All your arguments on this point make lots of sense. I'm a little > surprised that you had significant collisions mapping perhaps 100K > items (my guess) into a 32-bit space. That would be a very small database for the mixed unigram-bigram scheme, and the unigram-only database I used most often in original testing (for filtering high-volume tech mailing lists) contained about 350K tokens. As Ryan explained later, the Birthday Paradox can't be avoided here, and has real consequences. > I think that is rather dependent on the hash used, but that's what > you saw. I used Python's builtin 32-bit hash() function, and the observed collision rate was indistinguishable from what a truly random 32-bit hash would have produced (about one standard deviation lower). The damnable thing is that you only need one extremely unfortunate collision to start seeing results that are incomprehensible to the human eye. > Since you need the cleartext anyway, your feature-ID concept is far > superior. We don't *need* the cleartext, really, it's just highly desirable. I'll certainly endure a lot to keep the cleartext. If this isn't the smallest or fastest spam filter possible, I don't really care. I don't even care whether it's popular. What I care about most is whether it filters my damn spam. > Thanks for educating me. Don't mistake a lecture for education . I'd love to be able to afford the luxury of *discussing* it with you instead (you've got a lot of plausible ideas and express them well), but afraid I just can't. With any luck, maybe my employer will go out of business . From tim.one at comcast.net Thu Dec 18 20:50:51 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Dec 18 20:50:49 2003 Subject: [Spambayes] Changing the icons in the Outlook Add-In In-Reply-To: Message-ID: [Bob Coe] > If you make the3 change within Outlook, and then later reinstall > Spambayes, I'm pretty sure the icons will revert. I can confirm that -- they do. Well, just unregistering then registering the addin is enough to make them revert, but I tried that under an OL2K SP-3 IMO configuration, on an American Win98SE, and God knows it's not safe to generalize about Outlook behavior <0.6 wink>. > I'd be inclined to edit the images as they're stored in the Spambayes > directory instead. Well, this is a person who's bothered by a "mean grouchy unhappy face staring at her all the time", after someone else ran the installer for her. I'm betting she can replace the image with a comes-with-Outlook happy red heart icon a hundred times over before she'll master a bitmap editor. From tim.one at comcast.net Thu Dec 18 21:16:25 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Dec 18 21:16:24 2003 Subject: [spambayes-dev] RE: [Spambayes] How low can you go? In-Reply-To: Message-ID: [Ryan Malayter] >> That's not surprising at all to me. Because of the "birthday >> paradox", ... [Seth Goodman] > As I understand it, the birthday paradox leads to the conclusion that > for a 32-bit perfect hash function, after hashing around 78,000 items > (just over 16-bits worth), you are likely to experience a _single_ > collision. What Tim described sounded like they probably had > multiple collisions to account for the spectacular failures they saw. > I don't know the size of the token databases they dealt with back > then, but I doubt a single collision in a token list of 78K items > would affect the classifier. Since most of the tokens are hapaxes > anyway (perhaps 80-90% ?), it is most probable that there would be no > visible effect. > ... Let me clarify this: the experiments we ran couldn't actually use a 32-bit hash code because they used a Python dict to simulate a giant sparse array, and the box I was using didn't have enough RAM to deal with this load. Instead we ran with smaller hash codes and smaller training sets, projecting results. The results were too discouraging for anyone here to want to continue along that line. It's all in the archives if you want to dig back far enough (I don't ). With a 32-bit hash code, the expected # of collisions for a truly random hash is close to 1, with a standard deviation also close to one, at about 92,600 items, so Seth is quite close. With 350K items (close to the # of tokens in the pure-unigram database I was actually using at the time), the mean # of collisions is a bit over 14 with an sdev of about 3.8. Those numbers aren't scary, and Python's hash() was indeed behaving as a random hash would have. We were considering schemes with much higher feature-generation rates than pure-unigram at the time, though, so all those stats don't matter to what we were really wondering about. BTW, discussions like this really don't belong on the spambayes list. They're fine spambayes-dev, though, so I've set reply-to to that. Anyone who wants to follow that level of tech-talk should subscribe to spambayes-dev. From tim.one at comcast.net Thu Dec 18 22:31:49 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Dec 18 22:31:51 2003 Subject: [Spambayes] How low can you go? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A0F@its-xchg4.massey.ac.nz> Message-ID: [Tim] >> BTW, it should *not* be necessary to increase >> max_discriminators, and doing so can create subtle numeric >> problems in the inverse chi-squared function. >> Without this option, in an N-token message, N tokens were >> candidates for scoring; with this option, there are still >> exactly N candidates for scoring; with a true tiling >> implementation, there are no more than N >> candidates for scoring (and usually less than N). [Tony Meyer]> > So the comment in here: > > Is only referring to cases where both unigrams *and* bigrams are used, > rather than the tiling (or crude approximation) is used? The first quoted comment there is the important one: "I really wonder what's going on here!" . I didn't know, and was throwing out guesses. Now that I'm using the scheme myself, I still don't know, but lots of "interesting questions" are forming. (Overall, the scheme is showing excellent performance on my live email so far with much less training data than I was using before, but there are still jaw-dropping exceptions; studying those carefully takes time, and so far no pattern has made itself obvious beyond that hapaxes can do the darnedest things ...)_. The theoretical motivation for tiling is to eliminate systematic creation of highly correlated clues. The bad thing that results from that would be "spectacular failures". That doesn't appear relevant for the cases you described there, where Some are nailed for me - 100% (rounded), but others are solidly unsure (50%ish). A "spectacular failure" would have been if a Nigerian scam scored 0% for you. > I did get improvements with a higher max_discriminators: > > Is that likely to be just a side-effect of the crudeness of my > approximation? Well, there's not *evidence* there, just a report of what happened on one specific message. "Evidence" is along the lines of "and across 2,000 messages, this is what happened to the FP and FN rates in each of 10 cross-validation runs". Going back to http://mail.python.org/pipermail/spambayes-dev/2003-September/000998.html where the actual spamprobs are shown, lines like 'george' 0.0848469 9 8 'right now,' 0.0857477 7 6 tell me (perhaps with benefit of hindsight gained by later developments) that something else screwy was happening too, starting with that while those tokens were seen in "almost the same" number of each kind of training class, their spamprobs were nevertheless strongly hammy. That can only happen if the training data is unbalanced. If there were an equal # of ham and spam, the 'george' line would have had spamprob 0.47, so weak the token would have been ignored. You would have had to have at least an 11:1 spam:ham training ratio to get a spamprob that low, and that's assuming you didn't also have the now-defunct experimental imbalance adjustment option obscuring it too (if you did at the time, the training imbalance must have been worse than 12 to 1). As you said later: although there are almost twice as many spam clues as ham, the ham ones are lower and a strong training imbalance in favor of spam does "cheapen" spam evidence. As in: 'reply this' 0.945495 6 1164 That's a *low* spamprob for a token that's been seen in almost 200x as many spam as ham, doncha think? It would have been 0.995 if the training data were balanced and you saw those counts. So I still don't know what was going on with that msg, but the existence of so strong a training imbalance makes it a much less interesting pursuit to me now than I thought it was then. One more: you eventually got a "spammy enough" score by boosting max_discriminators to 600. Without knowing what effects that had across *many* messages, it's just an anecdote, but I do know that the chi2Q implementation can become numerically unreliable with max_discriminators set that high. That's one reason it defaults to 150. The other is that testing with the unigram scheme found no reason to make it even that high. I'm not seeing any reason in my own experiment now to suspect the mixed scheme needs it higher, but my training data is scrupulously closely balanced, and I've only been at it for one day. From chrissue at pacbell.net Thu Dec 18 22:34:27 2003 From: chrissue at pacbell.net (Chris/Sue Yahng) Date: Thu Dec 18 22:35:31 2003 Subject: [Spambayes] can't access reconfiguraation Message-ID: I have used your program for about 6 weeks. During that time, I have lost or not been able to find my files or folders ( I'm relatively ignorant) for my good and spam mail. Previously, I have gone back to the manager and reset the configuration. Today, I am not able to do that. When I click on configuration wizard, nothing happens. The problem I am trying to solve is why do folders (2 under inbox) disappear? They are the ones I have previously configured to hold spam and good messages. In the past (3 times) I had reset the configuration wizard with new folders. Now I can't even do that. chrissue@pacbell.net From steveng at pop.jaring.my Fri Dec 19 02:23:57 2003 From: steveng at pop.jaring.my (Stephen Ng) Date: Fri Dec 19 02:24:13 2003 Subject: [Spambayes] Linux install Message-ID: <1071818637.11044.4.camel@nutek-1> Hi! Is there a complete list of files in 1.0a7 that are installed under Linux? I know setup.py creates a directory in site-packages. What else is installed? This is just in case I need to backout the install. Thanks. Stephen Ng From atom at suspicious.org Fri Dec 19 02:46:47 2003 From: atom at suspicious.org (Atom 'Smasher') Date: Fri Dec 19 02:48:18 2003 Subject: [Spambayes] Linux install In-Reply-To: <1071818637.11044.4.camel@nutek-1> References: <1071818637.11044.4.camel@nutek-1> Message-ID: > Is there a complete list of files in 1.0a7 that are installed under > Linux? I know setup.py creates a directory in site-packages. What else > is installed? > > This is just in case I need to backout the install. ========================= i'm not sure if this a complete list, but check: /usr/local/bin/sb_*.py /usr/local/lib/python*/site-packages/spambayes/ ...atom _______________________________________________ PGP key - http://smasher.suspicious.org/pgp.txt 3EBE 2810 30AE 601D 54B2 4A90 9C28 0BBF 3D7D 41E3 ------------------------------------------------- "There's enough on this planet for everyone's needs but not for everyone's greed" -- Mahatma Gandhi From craig.gould at bt.com Fri Dec 19 06:18:46 2003 From: craig.gould at bt.com (craig.gould@bt.com) Date: Fri Dec 19 06:18:56 2003 Subject: [Spambayes] Spam Bayes Error Message-ID: <7497DCA1C240C042B28F6657ADFD8E0903248534@i2km11-ukbr.domain1.systemhost.net> Folks, I updated my Office 2000 installation with SP3 and the latest security patches and Spam Bayes is now broken. I've tried uninstalling, then re-installing to no avail. I've also looked through the troubleshooting guide but can't find anything that fixes the problem. My log file on start up is attached below: <> One thing to note is that before the Outlook login box appears I get the "Exchange server is unavailable", retry, work offline, cancel option. On pressing retry the the login box appears after a significant delay. Outlook takes an age before another "Exchange server is unavailable" box appears. After pressing Retry again Outlook sits there for another delay before finally starting and the Spam Bayes failed to initialise message appears. Outlook 9.0.0.6627 Corporate or Workgroup - Security Update Spam Bayes - 008.1 Thanks in advance for any help C Craig Gould BT Exact tel (01473) 644214 web www.btbrand.bt.com __________________________________________ British Telecommunications plc Registered office: 81 Newgate Street London EC1A 7AJ Registered in England no. 1800000 This electronic message contains information from British Telecommunications plc which may be privileged and confidential. The information is intended to be for the use of the individual(s) or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this information is prohibited. If you have received this electronic message in error, please notify us by telephone or email (to the number or address above) immediately. Activity and use of the British Telecommunications plc email system is monitored to secure its effective operation and for other lawful business purposes. Communications using this system will also be monitored and may be recorded to secure effective operation and for other lawful business purposes -------------- next part -------------- A non-text attachment was scrubbed... Name: spambayes1.log Type: application/octet-stream Size: 908 bytes Desc: spambayes1.log Url : http://mail.python.org/pipermail/spambayes/attachments/20031219/0b722cef/spambayes1.obj From papaDoc at videotron.ca Fri Dec 19 09:04:28 2003 From: papaDoc at videotron.ca (papaDoc) Date: Fri Dec 19 09:04:34 2003 Subject: [Spambayes] Re: [spambayes-dev] Hapaxes? In-Reply-To: <3FE257C3.1020800@wickedgrey.com> References: <3FE257C3.1020800@wickedgrey.com> Message-ID: <3FE3056C.9020907@videotron.ca> Hi, >> Oh yes, I do want that -- eventually. We have no experience with >> that in >> this project, though; we have a lot of experience with the >> consequences of >> hapaxes, and I have no fears remaining about picking on them. > > > Does anyone have any gentle nudges to information explaining what > hapaxes are in a spambayes context? There are words seen only once. See the definition in any good dictionnary. Remi From skip at pobox.com Fri Dec 19 09:13:28 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Dec 19 09:13:38 2003 Subject: [Spambayes] Re: [spambayes-dev] Hapaxes? In-Reply-To: <3FE3056C.9020907@videotron.ca> References: <3FE257C3.1020800@wickedgrey.com> <3FE3056C.9020907@videotron.ca> Message-ID: <16355.1928.216599.997418@montanaro.dyndns.org> >> Does anyone have any gentle nudges to information explaining what >> hapaxes are in a spambayes context? Remi> There are words seen only once. See the definition in any good Remi> dictionnary. Or the glossary at http://spambayes.sourceforge.net/docs.html Skip From dawn.wesolek at i3t.org Fri Dec 19 09:47:12 2003 From: dawn.wesolek at i3t.org (Dawn Wesolek) Date: Fri Dec 19 09:47:21 2003 Subject: [Spambayes] SpamBayes and financial sponsorship Message-ID: <2045949.1071845232983.JavaMail.jboss@p15135617.pureserver.info> Dear SpamBayes team, You have received this email because your project has been nominated for financial sponsorship by Gary Daw. Gary Daw feels that your project is worthy of the I3T award for Software Excellence for the following reason: "SpamBayes is an interesting project that aims to use Bayesian techniques to automatically filter out SPAM emails. This approach to spam is very interesting because it means the Bayesian network is trained against a set of classified data. This means that the identification of SPAM can be much more intelligent the pure keyword filtering as a type of fuzzy logic is employed." Your nomination will shortly be considered by our sponsorship panel. This panel will be examining your project based on a range of criteria. Should the application be successful, then your project will be able to receive financial benefit from the International Institute of Information Technologists payable to a party of your designation. This is not a scam and it is not required for you to pay any money or waste any time. You are receiving this email at this stage for your information only. You have not received this email because you are a part of a mailing list, but because someone has personally nominated your project. The International Institute of Information Technologists is providing this programme for two reasons: The first reason is that it is part of our philosophy to reinvest our members contributions back into the IT community. The institute represents the needs of IT professionals and this is only one of the ways in which we are helping the community. The second reason is that successful open source or free projects can help us get our message through to the people with the right mindset for the institute. You can find out more about the institute on our website: http://i3t.org The panel's decision will be emailed to you just as soon as your nomination has been evaluated. If the nomination is successful, then you will be given instructions on how you can receive funds for your project. Best wishes, Dawn C. Wesolek I3T Sponsorship awards PS. We are considering providing facilities to our members for managing their own projects, such as CVS repositories, issue tracking and mailing lists. Would that be something that you would be interested in? From darren at idtelecoms.com Fri Dec 19 09:50:29 2003 From: darren at idtelecoms.com (Darren Westlake) Date: Fri Dec 19 09:50:05 2003 Subject: [Spambayes] eudora headers Message-ID: <6.0.1.1.2.20031219143341.03ae0898@localhost> In Eudora i get the X-Spambayes-Classification headers coming through fine but when i try to create a filter rule based on these those headers are not shown in the drop down box for the Headers field. Anyone know why? I'm using Eudora v 6.2 and sb_server.py run from a batch file. Thanks and regards, Darren _____________________________ Darren Westlake Managing Director ID Telecommunications Ltd **NEW** Make low cost calls from your PC - www.MyWebCalls.com Resellers needed www.idtelecoms.com _____________________________ From alice_utter at firstpenn.com Fri Dec 19 10:25:46 2003 From: alice_utter at firstpenn.com (Utter, Alice) Date: Fri Dec 19 11:30:06 2003 Subject: [Spambayes] SpamBayes not working Message-ID: I tried this suggestion, I uninstalled and reinstalled, I reconfigured SpamBayes, and none of those actions solved the problem. My original message said that I get an error message box pop up every time I open Outlook. The message box says that SpamBayes has been disabled and needs to be reenabled and Outlook restarted. Every time I reenable SpamBayes, it still gets disabled. Any other ideas? Alice Utter -----Original Message----- From: Tony Meyer [mailto:tameyer@ihug.co.nz] Sent: Monday, December 15, 2003 9:22 PM To: Utter, Alice; spambayes@python.org Subject: RE: [Spambayes] SpamBayes not working > I have been using SpamBayes for a short time, > and had not been having problems with it. However, > a couple of weeks ago the server was upgraded to > Exchange 2000, and since then I have not been able > to get SpamBayes to work properly on my PC. I suspect that the IDs for the folders that SpamBayes was set to watch have changed in the upgrade. Try going into the manager dialog, then to the "Filtering" tab and reselecting the folders to move unsure and spam messages into (and then ticking the enable box again). =Tony Meyer DISCLAIMER: **This E-mail and any of its attachments may contain Lincoln National Corporation proprietary information, which is privileged, confidential, or subject to copyright belonging to the Lincoln National Corporation family of companies. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout. Thank You.** From cmerchant at austin.rr.com Fri Dec 19 11:32:19 2003 From: cmerchant at austin.rr.com (Cindy Merchant) Date: Fri Dec 19 11:32:26 2003 Subject: [Spambayes] I don't have an anti-spam button Message-ID: What should I do? Cindy Merchant eTexas Realty, Austin branch 512-462-9976 home office 512-663-7325 mobile 877-460-3466 toll free 512-462-4794 fax -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031219/ccb6a4ee/attachment-0001.html From popiel at wolfskeep.com Fri Dec 19 11:55:37 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Fri Dec 19 11:55:44 2003 Subject: [spambayes-dev] RE: [Spambayes] How low can you go? In-Reply-To: Message from "Tim Peters" of "Thu, 18 Dec 2003 20:26:43 EST." References: Message-ID: <20031219165537.EDB162DF7F@cashew.wolfskeep.com> In message: "Tim Peters" writes: > >> Sounds like _you're_ arguing for expiration of whole messages :) > >Oh yes, I do want that -- eventually. We have no experience with that in >this project, though; we have a lot of experience with the consequences of >hapaxes, and I have no fears remaining about picking on them. Actually, there have been experiments done (by me) with expiry of whole messages. I invite you to look at the 'expire4months' regime for my incremental testing harness. Performance was worse than remembering everything, but significantly better than mistake-based training (with the 'fpfnunsure' regime). I have not done any experiments with just nuking hapaxes; I didn't see any reason to do a partial job instead of a full one. >> I know you're not arguing that, but if there were bidirectional msg_id >> <-> feature_ID maps, it would be fairly easy to expire whole >> messages. >> That would obviate the need to track last time seen for every token. > >Only if you don't want also to be able to expire tokens on their own. No... just find the most recent message that the token appeared in, which would be a quick search through a few message times. A really quick search if you're only looking to expire hapaxes. - Alex From skip at pobox.com Fri Dec 19 17:10:11 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Dec 19 17:10:22 2003 Subject: [Spambayes] Could it get any better than this? Message-ID: <16355.30531.591112.710686@montanaro.dyndns.org> I was scanning incoming spam looking for "received" clues and came across this evidence. It's as if the sender was trying to send me a message which would be classified as spam! Skip X-Spambayes-Evidence: '*H*': 0.00; '*S*': 1.00; 'world,': 0.95; 'offer': 0.95; 'accepts': 0.96; 'afford': 0.96; 'agreed': 0.96; 'approx.': 0.96; 'arcade': 0.96; 'beach': 0.96; 'believes': 0.96; 'bi:addition, the': 0.96; 'bi:agreement for': 0.96; 'bi:and skip:p 10': 0.96; 'bi:and skip:u 10': 0.96; 'bi:are made': 0.96; 'bi:available for': 0.96; 'bi:cause actual': 0.96; 'bi:cities for': 0.96; 'bi:considered the': 0.96; 'bi:contains skip:f 10': 0.96; 'bi:experience that': 0.96; 'bi:first time': 0.96; 'bi:from those': 0.96; 'bi:further details': 0.96; 'bi:has also': 0.96; 'bi:has reached': 0.96; 'bi:indianapolis star': 0.96; 'bi:information selected': 0.96; 'bi:may buy': 0.96; 'bi:not registered': 0.96; 'bi:press release': 0.96; 'bi:proto:http url:click': 0.96; 'bi:skip:e 10 and': 0.96; 'bi:skip:f 10 statements': 0.96; 'bi:statements and': 0.96; 'bi:technology and': 0.96; 'bi:the american': 0.96; 'bi:the business': 0.96; 'bi:the display': 0.96; 'bi:the publication': 0.96; 'bi:the statements': 0.96; 'bi:third party': 0.96; 'bi:this report': 0.96; 'bi:use this': 0.96; 'billion': 0.96; 'capital,': 0.96; 'centers': 0.96; 'ceo': 0.96; 'competition': 0.96; 'conjunction': 0.96; 'corporation,': 0.96; 'development,': 0.96; 'disney': 0.96; 'economy': 0.96; 'emails,': 0.96; 'expands': 0.96; 'expansions': 0.96; 'experienced': 0.96; 'filings': 0.96; 'formatted': 0.96; 'generation': 0.96; 'herein,': 0.96; 'inaugural': 0.96; 'inc.,': 0.96; 'include,': 0.96; 'indianapolis': 0.96; 'integrated': 0.96; 'involve': 0.96; 'las': 0.96; 'licensing': 0.96; 'lightning': 0.96; 'market.': 0.96; 'materially': 0.96; 'nine': 0.96; 'partnerships': 0.96; 'properties': 0.96; 'prospects': 0.96; 'purchase.': 0.96; 'quarterly': 0.96; 'race': 0.96; 'retailers': 0.96; 'sales.': 0.96; 'security.': 0.96; 'served': 0.96; 'silicon': 0.96; 'speculative': 0.96; 'sport': 0.96; 'square': 0.96; 'subsequently': 0.96; 'trace': 0.96; 'url:pl': 0.96; 'url:t': 0.96; 'years.': 0.96; 'announces': 0.97; 'bi:actual results': 0.97; 'bi:companies that': 0.97; 'bi:investment advisor': 0.97; 'bi:skip:m 10 and': 0.97; 'bi:the time': 0.97; 'cars': 0.97; "company's": 0.97; 'customers': 0.97; 'differ': 0.97; 'executive': 0.97; 'licensed': 0.97; 'locations': 0.97; 'officially': 0.97; 'opinions': 0.97; 'retail': 0.97; 'revenue': 0.97; 'sec': 0.97; 'securities': 0.97; 'solicitation': 0.97; 'url:o': 0.97; 'videos': 0.97; 'america': 0.97; 'bi:forward this': 0.97; 'bi:marketing and': 0.97; 'blank': 0.97; 'compete': 0.97; 'concerning': 0.97; 'custom': 0.97; 'estimated': 0.97; 'markets': 0.97; 'mr.': 0.97; 'potential.': 0.97; 'reports.': 0.97; 'risks': 0.97; 'shares': 0.97; 'stock': 0.97; 'system.': 0.97; 'u.s.': 0.97; 'bi:and skip:i 10': 0.97; 'bi:million people': 0.97; 'demand': 0.97; 'growth': 0.97; 'possible.': 0.97; 'received:69.6': 0.97; 'bi:for over': 0.98; 'develop': 0.98; 'bi:that may': 0.98; 'marketing': 0.98; 'results': 0.98; 'dollars': 0.98; 'exclusive': 0.98; 'loss': 0.98; 'received:69': 0.98; 'sell': 0.98; 'buy': 0.99; 'product': 0.99; 'to:addr:concerts': 0.99 From phil.pierotti at swiftdsl.com.au Fri Dec 19 18:44:26 2003 From: phil.pierotti at swiftdsl.com.au (Phil Pierotti) Date: Fri Dec 19 18:44:33 2003 Subject: [Spambayes] eudora headers In-Reply-To: <6.0.1.1.2.20031219143341.03ae0898@localhost> References: <6.0.1.1.2.20031219143341.03ae0898@localhost> Message-ID: <3FE38D5A.6000604@swiftdsl.com.au> When you're creating the new filter, in the Headers: selector, instead of *selecting* an existing header, just type in a new one. Eudora is smart enough to use it. Although it does not add it to the selector as another 'known' header. At least, this is how things work in 5.2 ENjoy, Phil P Darren Westlake wrote: > In Eudora i get the X-Spambayes-Classification headers coming through > fine but when i try to create a filter rule based on these those headers > are not shown in the drop down box for the Headers field. > > Anyone know why? > > I'm using Eudora v 6.2 and sb_server.py run from a batch file. > > > Thanks and regards, > Darren > _____________________________ > Darren Westlake > Managing Director > ID Telecommunications Ltd > > **NEW** Make low cost calls from your PC - www.MyWebCalls.com Resellers > needed > > www.idtelecoms.com > _____________________________ > > > > _______________________________________________ > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes > Check the FAQ before asking: http://spambayes.sf.net/faq.html > > From tboss at vegeta.p6m7g8.net Fri Dec 19 16:06:40 2003 From: tboss at vegeta.p6m7g8.net (User Tboss) Date: Fri Dec 19 20:07:52 2003 Subject: [Spambayes] Local installation problem Message-ID: <200312192106.hBJL6ex0077282@vegeta.p6m7g8.net> I've seen this question asked several times on the digest page, but never a good answer. Is it possible to install Spambayes in a local directory? $ python setup.py install running install running build running build_py running build_scripts running install_lib creating /usr/local/lib/python2.3/site-packages/spambayes error: could not create '/usr/local/lib/python2.3/site-packages/spambayes': Permission denied Can we re-direct this attempt to write to /usr/local/bin, which requires root, to a directory underneath $HOME instead? I'd think this would be a common request: i'm a user of a Unix system, but not the administrator, and the admin can't be depended on to help out. any help would be greatly appreciated. Thanks, Todd From tim.one at comcast.net Fri Dec 19 23:48:56 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Dec 19 23:49:03 2003 Subject: [Spambayes] Could it get any better than this? In-Reply-To: <16355.30531.591112.710686@montanaro.dyndns.org> Message-ID: [Skip Montanaro] > I was scanning incoming spam looking for "received" clues and came > across this evidence. It's as if the sender was trying to send me a > message which would be classified as spam! Hmm. If you ever *asked* for a prospectus by email, it would probably get the same score. Like 'bi:skip:f 10 statements': 0.96; is almost certainly derived from "forward-looking statements", which every prospectus contains (and not by accident, it's by law). Likewise for a great many of the other features listed. OTOH, every prospectus I ask for in email is delivered either via a link to a web page, or as a PDF or .doc attachment, so maybe that doesn't matter. The only prospectus-like stuff I get in plain text seems to be stock-pumping scam spam. So maybe it's not scary at all. It's sure impressive regardless! Thanks for sharing it. From jpressman at equaljusticeworks.org Fri Dec 19 15:03:45 2003 From: jpressman at equaljusticeworks.org (Jeff Pressman) Date: Sat Dec 20 09:47:28 2003 Subject: [Spambayes] Training Question Message-ID: <8047433DACA27A44BEFF693613928117371C@equality.equaljusticeworks.ad> Sorry if this is not the best place to ask questions but I couldn't seem to get to the user list (http://mail.python.org/mailman/listinfo/spambayes ) I am running Spambayes Outlook Addin 0.81 for Outlook 2000 on NT4 (sp6a). We run Exchange Server 2000 and have setup a public spam folder where staff manually move spam. The public spam folder contains nearly 2000 spam emails. When I run the training it doesn't appear to acknowledge any of the spam in the public folder. My question is.. can public folder be used to train for spam? From skip at pobox.com Sat Dec 20 09:26:32 2003 From: skip at pobox.com (Skip Montanaro) Date: Sat Dec 20 10:18:24 2003 Subject: [Spambayes] Could it get any better than this? In-Reply-To: References: <16355.30531.591112.710686@montanaro.dyndns.org> Message-ID: <16356.23576.240573.690861@montanaro.dyndns.org> >> I was scanning incoming spam looking for "received" clues and came >> across this evidence. It's as if the sender was trying to send me a >> message which would be classified as spam! Tim> Hmm. If you ever *asked* for a prospectus by email, it would Tim> probably get the same score. Like Tim> 'bi:skip:f 10 statements': 0.96; After posting I sort of figured out that I had probably trained on a couple messages which were almost identical to the one which scored so high. In any case, I don't think I've ever seen a batch of clues where the lowest scoring clue wasn't below 0.5. There's always some feature which scores low. Skip From nobody at spamcop.net Fri Dec 19 17:58:10 2003 From: nobody at spamcop.net (Seth Goodman) Date: Sat Dec 20 13:48:15 2003 Subject: [spambayes-dev] RE: [Spambayes] How low can you go? In-Reply-To: Message-ID: > [Tim Peters] > There are messages I never want to expire. That creates major new UI > headaches to be doable. I believe (but don't yet know) that expiring > hapaxes can be done without need for user intervention, and without harm. I hope the "without harm" part is true. See my question two sections down. > [Tim Peters] > At some point, if you want to try your ideas, *try* your ideas -- > that's what Open Source is all about. Everyone is born knowing how to > program in Python, although most don't realize it until they try. I admit I wasn't aware that I could program in Python since birth, but I'm willing to take your word on that. We all have hidden potential. So that I don't have to re-invent that round thing with the axle in the middle, could someone please give me some hints as to which of the mapping features we've discussed in this thread exist or will soon exist and where I can look for them? I saw on spambayes-dev that there is discussion of a new database, so I don't want to go off on a useless fork with the present db if that comes to pass. Search for your inner newbie when you answer this. > > [Seth Goodman] > > I agree completely. This was an important motivation for expiring a > > whole message at a time. Training mistakes would eventually drop out > > of the database without user intervention. Not that a tool to help > > track down training mistakes wouldn't be great, but a "casual" user > > could still make occasional mistakes and the system would recover by > > itself. > > [Tim Peters] > Without intervention, it will also expire the screaming bright-red HTML > birthday message sent by my favorite 7-year-old niece, and when > she's 8 the > next one may get tagged as spam. These are the kinds of messages I never > want to expire. ... Here lies my concern. I sincerely hope that correct classification of these infrequent, unusual messages is not hapax-driven. If it is, the result of pruning infrequently-used hapaxes will be as bad as deleting the whole message. If that is the case, the _only_ solution will be to keep either those hapaxes or the whole message trained forever. Either way, I agree this is a big UI problem without an obvious intuitive solution. It does appear from looking at the scoring of some of my "typical" messages that hapaxes don't contribute much, as you've said before. Could you look at the scoring of a couple of those special messages and tell if their scoring would be seriously affected if the hapaxes were gone? -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From lsp85 at bigfoot.com Sat Dec 20 14:15:03 2003 From: lsp85 at bigfoot.com (Louis S Pannone) Date: Sat Dec 20 14:15:06 2003 Subject: [Spambayes] Outlook 2003 Message-ID: <0HQ700GDGK52UO@mta4.srv.hcvlny.cv.net> I upgraded Outlook (MS Office) from 2000 to 2003, on a Win2k 5.00.2195 SP4 OS computer. SpanBayes worked fantastic on Outlook 2000, however it is not working very good at all on 2003. I retrained, deleted and retrained. No good. I disabled the Outlook junk e-mail option, still not good. I am getting scores of 100, 97 etc... but the e-mails will not forward to the suspected spam folders, they stay in the in box. I know my settings are correct. Any help with this would be greatly appreciated. Lou Pannone -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031220/7e49c524/attachment.html From lsp85 at bigfoot.com Sat Dec 20 14:16:48 2003 From: lsp85 at bigfoot.com (Louis S Pannone) Date: Sat Dec 20 14:16:51 2003 Subject: [Spambayes] Outlook 2003 Problems Message-ID: <0HQ700IJ9K80AF@mta11.srv.hcvlny.cv.net> I upgraded Outlook (MS Office) from 2000 to 2003, on a Win2k 5.00.2195 SP4 OS computer. I am using version .81 of SpamBayes SpanBayes worked fantastic on Outlook 2000, however it is not working very good at all on 2003. I retrained, deleted and retrained. No good. I disabled the Outlook junk e-mail option, still not good. I am getting scores of 100, 97 etc... but the e-mails will not forward to the suspected spam folders, they stay in the in box. I know my settings are correct. Any help with this would be greatly appreciated. Lou Pannone -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031220/5b89dd68/attachment.html From richie at entrian.com Sat Dec 20 15:42:28 2003 From: richie at entrian.com (Richie Hindle) Date: Sat Dec 20 15:42:45 2003 Subject: [Spambayes] SpamBayes and financial sponsorship In-Reply-To: <2045949.1071845232983.JavaMail.jboss@p15135617.pureserver.info> References: <2045949.1071845232983.JavaMail.jboss@p15135617.pureserver.info> Message-ID: Hi Dawn, Replying on behalf of the SpamBayes team: > You have received this email because your project has been nominated for > financial sponsorship by Gary Daw. [...] The panel's decision will be > emailed to you just as soon as your nomination has been evaluated. Great! We're very pleased to hear it, and we look forward to hearing the decision. > PS. We are considering providing facilities to our members for managing their own > projects, such as CVS repositories, issue tracking and mailing lists. Would that > be something that you would be interested in? We already use SourceForge for CVS and issue tracking, and we run our own mailing lists. As far as I'm aware, we'll all happy with our current setup. -- Richie Hindle richie@entrian.com From richie at entrian.com Sat Dec 20 15:46:51 2003 From: richie at entrian.com (Richie Hindle) Date: Sat Dec 20 15:47:04 2003 Subject: [Spambayes] Local installation problem In-Reply-To: <200312192106.hBJL6ex0077282@vegeta.p6m7g8.net> References: <200312192106.hBJL6ex0077282@vegeta.p6m7g8.net> Message-ID: Hi Todd, > Is it possible to install Spambayes in a local directory? > [...] i'm a user of a Unix system, but not the administrator You should be able to unpack the source archive and run it directly from there, by adding the 'scripts' directory to your PATH, and adding the root of the unpacked archive to your PYTHONPATH. For instance, unpack the archive into /home/tboss, and: PATH=$PATH:/home/tboss/spambayes-1.0a7/scripts PYTHONPATH=$PYTHONPATH:/home/tboss/spambayes-1.0a7 export PATH PYTHONPATH -- Richie Hindle richie@entrian.com From rcoe at CambridgeMA.GOV Sat Dec 20 17:58:04 2003 From: rcoe at CambridgeMA.GOV (Coe, Bob) Date: Sat Dec 20 17:58:09 2003 Subject: [Spambayes] SpamBayes not working Message-ID: I stopped using Spambayes for a month because of precisely that behavior (with Windows XP and Outlook 2000). Then I started using it again and have been unable to reproduce the error since. I'm sure you're seeing a real problem. I'm equally sure the problem is nothing so simple as a misnamed folder. Bob MIS Department, City of Cambridge 831 Massachusetts Ave, Cambridge MA 02139 ? 617-349-4217 ? fax 617-349-6165 > -----Original Message----- > From: Utter, Alice [mailto:alice_utter@firstpenn.com] > Sent: Friday, December 19, 2003 10:26 AM > To: spambayes@python.org > Subject: RE: [Spambayes] SpamBayes not working > > > I tried this suggestion, I uninstalled and reinstalled, I > reconfigured SpamBayes, and none of those actions solved the problem. > > My original message said that I get an error message box pop > up every time I open Outlook. The message box says that > SpamBayes has been disabled and needs to be reenabled and > Outlook restarted. Every time I reenable SpamBayes, it still > gets disabled. > > Any other ideas? > > Alice Utter > > -----Original Message----- > From: Tony Meyer [mailto:tameyer@ihug.co.nz] > Sent: Monday, December 15, 2003 9:22 PM > To: Utter, Alice; spambayes@python.org > Subject: RE: [Spambayes] SpamBayes not working > > > > I have been using SpamBayes for a short time, > > and had not been having problems with it. However, > > a couple of weeks ago the server was upgraded to > > Exchange 2000, and since then I have not been able > > to get SpamBayes to work properly on my PC. > > I suspect that the IDs for the folders that SpamBayes was set to watch have > changed in the upgrade. Try going into the manager dialog, then to the > "Filtering" tab and reselecting the folders to move unsure and spam messages > into (and then ticking the enable box again). > > =Tony Meyer From mikes at xtras.com Sat Dec 20 18:48:03 2003 From: mikes at xtras.com (Mike Schinkel) Date: Sat Dec 20 18:48:09 2003 Subject: [Spambayes] SpamBayes crashes Outlook 2003 Message-ID: <1D85A2E1383A484CB168DCD61228D366CDAEAF@xmail.xtras.com> This never got resolved. SpamBayes still crashes Outlook 2003. Can anyone please help? -Mike -----Original Message----- From: Mike Schinkel Sent: Friday, November 28, 2003 6:15 PM To: Adam Walker Cc: spambayes@python.org Subject: RE: [Spambayes] SpamBayes crashes Outlook 2003 >> The config wizard should popup ... or outlook will crash. Neither. Many thanks for the effort thus far, but still no dice. Outlook XP still behaves, and Outlook 2003 still misbehaves. No different than before. Next? -Mike -----Original Message----- From: Adam Walker [mailto:adam.walker@rbwconsulting.com] Sent: Friday, November 28, 2003 5:22 PM To: Mike Schinkel Cc: spambayes@python.org Subject: Re: [Spambayes] SpamBayes crashes Outlook 2003 Shutdown outlook. Rename the ini file. Restart outlook. The config wizard should popup ... or outlook will crash. The ini is in \Documents and Settings\{username}\Application Data\SpamBayes Mike Schinkel wrote: >Thanks for the reply. > >Strange, because my spam folders are in my Exchange inbox, so they have >to exist in both places. >However, I can't configure them because ever time I try to run the >SpamBayes configuration Outlook crashes. >What now? > > >-Mike > > >-----Original Message----- >From: Adam Walker [mailto:adam.walker@rbwconsulting.com] >Sent: Friday, November 28, 2003 5:09 PM >To: Mike Schinkel >Cc: spambayes@python.org >Subject: Re: [Spambayes] SpamBayes crashes Outlook 2003 > >Looks like you need to configure the spam and unsure folders under >outlook 2003. The plugin is probably sharing configs between to the two >outlook versions but the either the folder doesn't exist under outlook >2003 or unique id is different. > >Mike Schinkel wrote: > > > >>Thanks. Attached are the logs. >> >>My configuration is a new Dell with 512k RAM, Windows XP, Outlook >>2003, >> >> > > > >>and SpamBayes 0.81. I also have Outlook 2000 installed and SpamBayes >>still works fine with it (and before you assume, no I'm not running >>them at the same time; you can't run both at the same time.) >> >>As for the problem, every time I click the "Delete as Spam" button (or >>even the "SpamBayes" button), Outlook "thinks" for a second and then >>crashes with the following dialog: >> >> Microsoft Office Outlook has encountered a problem and needs to >> >> >close. > > >> We are sorry for the inconvenience. >> [Send Error Report] [Don't Send] >> >>Any help would be GREATLY appreciated. >> >> >> >> >> > > > > > > _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html From rmalayter at bai.org Sat Dec 20 19:59:25 2003 From: rmalayter at bai.org (Ryan Malayter) Date: Sat Dec 20 19:59:28 2003 Subject: [Spambayes] SpamBayes crashes Outlook 2003 Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01A752D0@cliff.bai.org> [Mike Schinkel] > This never got resolved. SpamBayes still crashes Outlook 2003. No it doesn't. At least not in general. On the 12-15 machines I've installed it on (Windows XP SP1 + all security fixes, Outlook 2003 + all fixes) version 0.81 works like a charm. I would suggest that there might be a problem with your Office 2003 installation, or your system in general. Often the easiest way to fix things like this is with a *complete* removal reinstallation. In my (fairly robust) experience, running side-by-side installations of two versions of office is a recipe for disaster on any machine. (You mentioned running both 2003 and XP at the same time, so I figured you might be doing that). Regards, Ryan From dreas at emailaccount.nl Sat Dec 20 20:03:01 2003 From: dreas at emailaccount.nl (Dreas van Donselaar) Date: Sat Dec 20 20:03:09 2003 Subject: [Spambayes] Training Question References: <8047433DACA27A44BEFF693613928117371C@equality.equaljusticeworks.ad> Message-ID: <00c001c3c75e$300b5c50$7a7ba8c0@hedwigpc> Hi, You probably get the best filtering when you only train on mistakes that Spambayes currently makes for your emails. Feeding a large amount of ("old") data usually doesn't improve the effectiveness of Spambayes in the longer term. People correct me if I'm wrong ;) Dreas ----- Original Message ----- From: "Jeff Pressman" To: Sent: Friday, December 19, 2003 9:03 PM Subject: [Spambayes] Training Question Sorry if this is not the best place to ask questions but I couldn't seem to get to the user list (http://mail.python.org/mailman/listinfo/spambayes ) I am running Spambayes Outlook Addin 0.81 for Outlook 2000 on NT4 (sp6a). We run Exchange Server 2000 and have setup a public spam folder where staff manually move spam. The public spam folder contains nearly 2000 spam emails. When I run the training it doesn't appear to acknowledge any of the spam in the public folder. My question is.. can public folder be used to train for spam? _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html From mikes at xtras.com Sat Dec 20 20:03:47 2003 From: mikes at xtras.com (Mike Schinkel) Date: Sat Dec 20 20:03:52 2003 Subject: [Spambayes] SpamBayes crashes Outlook 2003 Message-ID: <1D85A2E1383A484CB168DCD61228D366CDAEB5@xmail.xtras.com> Thanks for the email. >> (You mentioned running both 2003 and XP at the same time, so I figured you might be doing that). Actually, I installed 2003 and told it to upgrade XP. The installer removed all of Office XP except it left Outlook XP for some reason (I'm kinda glad because SpamBayes still works in Outlook XP though the upgrade disabled most other functionality in Outlook XP, i.e. the ability to send email!) >> is with a *complete* removal reinstallation Do you know how to actually do that? As far as I know, if I uninstall it still leaves tons of files and registry settings. I really don't want to have to rebuilt my machine; I just did so and it took me over a week before I had everything moved over! -Mike -----Original Message----- From: Ryan Malayter [mailto:rmalayter@bai.org] Sent: Saturday, December 20, 2003 7:59 PM To: Mike Schinkel; spambayes@python.org Subject: RE: [Spambayes] SpamBayes crashes Outlook 2003 [Mike Schinkel] > This never got resolved. SpamBayes still crashes Outlook 2003. No it doesn't. At least not in general. On the 12-15 machines I've installed it on (Windows XP SP1 + all security fixes, Outlook 2003 + all fixes) version 0.81 works like a charm. I would suggest that there might be a problem with your Office 2003 installation, or your system in general. Often the easiest way to fix things like this is with a *complete* removal reinstallation. In my (fairly robust) experience, running side-by-side installations of two versions of office is a recipe for disaster on any machine. (You mentioned running both 2003 and XP at the same time, so I figured you might be doing that). Regards, Ryan From tim at fourstonesExpressions.com Sat Dec 20 20:11:58 2003 From: tim at fourstonesExpressions.com (Tim Stone) Date: Sat Dec 20 20:12:06 2003 Subject: [Spambayes] Training Question In-Reply-To: <00c001c3c75e$300b5c50$7a7ba8c0@hedwigpc> References: <8047433DACA27A44BEFF693613928117371C@equality.equaljusticeworks.ad> <00c001c3c75e$300b5c50$7a7ba8c0@hedwigpc> Message-ID: That can of worms has been beaten to death repeatedely here. Bottom line is: the jury's still out. How's that for a metaphor mixed 4 ways? On Sun, 21 Dec 2003 02:03:01 +0100, Dreas van Donselaar wrote: > Hi, > > You probably get the best filtering when you only train on mistakes It is clear that this is not always the case. Simple training regimens such as mistake or unsure based training work about equally well as random sampling and training on all mail. We're currently looking at some more advanced descriptions of training regimens. None of these are anything but techniques at this point. Perhaps sometime we'll automate the most promising one(s). -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun From rmalayter at bai.org Sat Dec 20 20:19:31 2003 From: rmalayter at bai.org (Ryan Malayter) Date: Sat Dec 20 20:19:35 2003 Subject: [Spambayes] SpamBayes crashes Outlook 2003 Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01A752D1@cliff.bai.org> [Mike Schinkel] > >> is with a *complete* removal reinstallation > > Do you know how to actually do that? As far as I know, if I > uninstall it still leaves tons of files and registry > settings. I really don't want to have to rebuilt my machine; > I just did so and it took me over a week before I had > everything moved over! The Microsoft Knowledge Base (www.microsoft.com/support) should have complete manual removal instructions for each distinct version of MS Office. It's tedious, and involves registry editing and the like. I also found this tidbit, excertped from Microsoft Knowledge Base article 826319, which sounds like it might be something I'd try first. I'd remove all traces of XP with this tool, then repair my Office 2003 installation: --------------------------------------------------- Use the Microsoft Office Removal Wizard to remove the earlier programs. To do this, follow these steps: On your Office 2003 CD, double-click the Files folder. Double-click the Pfiles folder. Double-click the Msoffice folder. Click the Office11 folder. In the Office11 folder, double-click the Offcln file. In the Microsoft Office Removal Wizard, click Next. In the Removal Options pane, click Let me decide which Microsoft Office applications will be removed, and then click Next. In the Applications to Keep column, select the program that you want to remove, and then click the << button. Repeat this step for each program that you want to remove, and then click Next. In the Files You Can Remove pane, review the list of files that you can remove, and then click Next. In the Remove Now pane, click Finish to remove the selected files. From mikes at xtras.com Sat Dec 20 20:21:24 2003 From: mikes at xtras.com (Mike Schinkel) Date: Sat Dec 20 20:21:27 2003 Subject: [Spambayes] SpamBayes crashes Outlook 2003 Message-ID: <1D85A2E1383A484CB168DCD61228D366CDAEB7@xmail.xtras.com> Cool, thanks. Maybe I can try tomorrow (can't know as I'm using it! :) -Mike -----Original Message----- From: Ryan Malayter [mailto:rmalayter@bai.org] Sent: Saturday, December 20, 2003 8:20 PM To: Mike Schinkel; spambayes@python.org Subject: RE: [Spambayes] SpamBayes crashes Outlook 2003 [Mike Schinkel] > >> is with a *complete* removal reinstallation > > Do you know how to actually do that? As far as I know, if I uninstall > it still leaves tons of files and registry settings. I really don't > want to have to rebuilt my machine; I just did so and it took me over > a week before I had everything moved over! The Microsoft Knowledge Base (www.microsoft.com/support) should have complete manual removal instructions for each distinct version of MS Office. It's tedious, and involves registry editing and the like. I also found this tidbit, excertped from Microsoft Knowledge Base article 826319, which sounds like it might be something I'd try first. I'd remove all traces of XP with this tool, then repair my Office 2003 installation: --------------------------------------------------- Use the Microsoft Office Removal Wizard to remove the earlier programs. To do this, follow these steps: On your Office 2003 CD, double-click the Files folder. Double-click the Pfiles folder. Double-click the Msoffice folder. Click the Office11 folder. In the Office11 folder, double-click the Offcln file. In the Microsoft Office Removal Wizard, click Next. In the Removal Options pane, click Let me decide which Microsoft Office applications will be removed, and then click Next. In the Applications to Keep column, select the program that you want to remove, and then click the << button. Repeat this step for each program that you want to remove, and then click Next. In the Files You Can Remove pane, review the list of files that you can remove, and then click Next. In the Remove Now pane, click Finish to remove the selected files. From gregoryschlegel at yahoo.com Sat Dec 20 21:02:43 2003 From: gregoryschlegel at yahoo.com (Greg Schlegel) Date: Sat Dec 20 21:03:17 2003 Subject: [Spambayes] question Message-ID: How do I mark mail as good mail from the inbox, but where the spam % should be less than 20%? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031220/e8a97f9a/attachment.html From info at jaredshockley.com Sat Dec 20 22:25:43 2003 From: info at jaredshockley.com (Jared Shockley) Date: Sat Dec 20 22:25:48 2003 Subject: [Spambayes] SpamBayes Outlook Plug-in idea Message-ID: <004401c3c772$1f1e8980$0100a8c0@GHI.garlickhelicopters.com> Hey guys, Great product. I tell everyone that I know that uses Outlook for e-mail to get this product. I hope that you keep it either free or a small registration fee. On to things, I have a lot of rules that filter rules for the different lists that I am on as well as the moderator mails for those lists. As a moderator of one particular list, I get a bunch of spam there. Your add-in catches it. However, my rule moves it into the moderator folder even after your product gets it. I am using Windows XP Pro SP1, Outlook 2002/XP and SpamBayes is Binary Version 0.81. I wonder if there is a way to prevent this, either in the rules or in the SpamBayes system. Thanks again! Jared Jared Shockley info@jaredshockley.com www.jaredshockley.com (406) 544-9276 From tim.one at comcast.net Sun Dec 21 00:12:57 2003 From: tim.one at comcast.net (Tim Peters) Date: Sun Dec 21 00:13:01 2003 Subject: [Spambayes] SpamBayes Outlook Plug-in idea In-Reply-To: <004401c3c772$1f1e8980$0100a8c0@GHI.garlickhelicopters.com> Message-ID: [Jared Shockley] > Great product. I tell everyone that I know that uses Outlook for > e-mail to get this product. I hope that you keep it either free or a > small registration fee. Thanks, and rest assured it will always be free. The SpamBayes license is Open-Source certified, and that gives you many guarantees: http://www.opensource.org/ > On to things, I have a lot of rules that filter rules for the > different lists that I am on as well as the moderator mails for those > lists. As a moderator of one particular list, I get a bunch of spam > there. Your add-in catches it. However, my rule moves it into the > moderator folder even after your product gets it. I am using Windows > XP Pro SP1, Outlook 2002/XP and SpamBayes is Binary Version 0.81. I > wonder if there is a way to prevent this, either in the rules or in > the SpamBayes system. Add your moderator folder to the collection of folders SpamBayes watches, and enable "background filtering" (on the Advanced tab of the SpamBayes Manager) to try to convice Outlook to process its own rules *before* SpamBayes scores a message. I'm afraid that interaction with Outlook rules is quite a mess, but the background filtering gimmick seems to work well in practice for "almost everyone". Your mileage may vary. From JimmyR at xs4all.nl Sun Dec 21 18:17:18 2003 From: JimmyR at xs4all.nl (JimmyR@xs4all.nl) Date: Sun Dec 21 18:17:16 2003 Subject: [Spambayes] Spam Bayes does not respond anymore... :( ... . fixed it! :D Message-ID: Hi guys, I just installed Spam Bayes and after I gave it analyzed my Ham and my Spam outlook stopped responding. After waiting for 5 minutes to see if it resolved by itself I desided to close (and thus kill) outlook. After a restart of outlook it gave an error message in the line of "the last time Spam Bayes module was loaded outlook encountered a serious error, do you want to load it again?" When I clicked yes outlook stopped responding agian. Killed outlook again, started outlook again. Did not start the Spam Bayes module and then outlook worked fine. But the Spam Bayes buttons don't work anymore. I expected that I could enable the add in again via the Add in manager but there was no Spam Bayes plugin there. Even an uninstall and then a new install of Spam Bayes didn't do the trick. *sob* *sob* I'm using Outlook 2002 (ver 10.2627.2625)... and while checking the version I see that I can enable it here! "Help" --> "About microsoft outlook" --> "Disabled items" ... .jeez, not really a place where I normally would look. I thought sending this message would have no use. But in the slight chance you didn't know this behaviour of outlook 2002 and other users might have the same problem I decided to share it with you guys nontheless. egards, Jimmy -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031222/327a7e0a/attachment.html From tameyer at ihug.co.nz Sun Dec 21 20:36:39 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Sun Dec 21 20:36:54 2003 Subject: [Spambayes] Linux install In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13047C1370@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677788@its-xchg4.massey.ac.nz> > Is there a complete list of files in 1.0a7 that are installed > under Linux? I know setup.py creates a directory in > site-packages. What else is installed? The output from setup.py lists all the files. The easiest thing would be to just save this somewhere. Otherwise, read setup.py - it's pretty self explanatory. =Tony Meyer From tameyer at ihug.co.nz Sun Dec 21 20:41:21 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Sun Dec 21 20:41:27 2003 Subject: [Spambayes] SpamBayes not working In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13047C1454@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677789@its-xchg4.massey.ac.nz> > I tried this suggestion, I uninstalled and reinstalled, I > reconfigured SpamBayes, and none of those actions solved the problem. > > My original message said that I get an error message box pop > up every time I open Outlook. The message box says that > SpamBayes has been disabled and needs to be reenabled and > Outlook restarted. Every time I reenable SpamBayes, it still > gets disabled. > > Any other ideas? Could you attach the log files that get created? They should have details explaining why it is being disabled. Apologies if these were on the original message (I don't have that here anymore). =Tony Meyer From tameyer at ihug.co.nz Sun Dec 21 20:57:14 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Sun Dec 21 20:57:19 2003 Subject: [Spambayes] Outlook 2003 Problems In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13048D7A62@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130467778A@its-xchg4.massey.ac.nz> > SpanBayes worked fantastic on Outlook 2000, however > it is not working very good at all on 2003. [...] > I am getting scores of 100, 97 etc... but the e-mails > will not forward to the suspected spam folders, they > stay in the in box. > I know my settings are correct. How? Have you tried removing the configuration file and reselecting the folders? Just because they *look* right doesn't mean that they are. Otherwise, please follow the instructions in the troubleshooting guide that explain how to submit your log file; it's hard to diagnose without it. =Tony Meyer From tameyer at ihug.co.nz Sun Dec 21 20:58:16 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Sun Dec 21 20:58:24 2003 Subject: [Spambayes] SpamBayes not working In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13048D7A6E@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130467778B@its-xchg4.massey.ac.nz> > I stopped using Spambayes for a month because of precisely > that behavior (with Windows XP and Outlook 2000). Then I > started using it again and have been unable to reproduce the > error since. I'm sure you're seeing a real problem. I'm > equally sure the problem is nothing so simple as a misnamed folder. Logs! I'm not saying that it's just a misnamed folder, but we need logs! =Tony Meyer From tameyer at ihug.co.nz Sun Dec 21 21:00:38 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Sun Dec 21 21:00:44 2003 Subject: [Spambayes] I don't have an anti-spam button In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13047C1455@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130467778C@its-xchg4.massey.ac.nz> > What should I do? You could try installing SpamBayes: see the website for downloading information. If you already have it installed, and the problem is that you think you should now have an "anti-spam" button on the toolbar: try getting the latest version of the plugin (008.1). You should now be looking for a toolbar and button called "SpamBayes". If none of these apply, please try going through the troubleshooting guide, which includes information about how to submit a problem report if nothing there helps. =Tony Meyer From tim.one at comcast.net Mon Dec 22 00:24:25 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Dec 22 00:24:31 2003 Subject: [spambayes-dev] RE: [Spambayes] How low can you go? In-Reply-To: <20031219165537.EDB162DF7F@cashew.wolfskeep.com> Message-ID: [T. Alexander Popiel] > Actually, there have been experiments done (by me) with expiry of > whole messages. Yes. By "the project" having experience I mean controlled tests run by several across their own email mix, using exactly the same strategy, with reporting and analysis and all that good stuff. We've done little of that (as a group) over the last year. > I invite you to look at the 'expire4months' regime for my incremental > testing harness. Performance was worse than remembering everything, > but significantly better than mistake-based training (with the > 'fpfnunsure' regime). > > I have not done any experiments with just nuking hapaxes; I didn't see > any reason to do a partial job instead of a full one. There may not be one. The question arose specifically in the context of the mixed unigram/bigram classifier, which grows the database at a much faster rate. I've got ~90% hapaxes after a couple days with that, and the database is already 3x larger than after months of mistake/unsure training under the pure-unigram classifier. Expiring a full message doesn't seem to make sense after two days, or even after a week; expiring unused hapaxes may; that's for experiment to decide. >>> I know you're not arguing that, but if there were bidirectional >>> msg_id <-> feature_ID maps, it would be fairly easy to expire whole >>> messages. >>> >>> That would obviate the need to track last time seen for every token. >> Only if you don't want also to be able to expire tokens on their own. > No... just find the most recent message that the token appeared in, > which would be a quick search through a few message times. A really > quick search if you're only looking to expire hapaxes. I don't want to expire a hapax if it's been used recently in *scoring*. Message times can't distinguish used from unused features. If you're doing train-on-everything (with or without whole-msg expiration), a hapax used in scoring becomes a non-hapax the first time it's used in scoring. For mistake/unsure training, a hapax used in scoring remains a hapax if the message being scored ends up correctly classified. Hapaxes that are never seen again also remain hapaxes. Distinguishing used from unused requires recording use. Followups set to spambayes-dev@python.org, as this speculative stuff really doesn't belong on the general spambayes list. From kennypitt at hotmail.com Mon Dec 22 14:40:31 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Mon Dec 22 14:41:09 2003 Subject: [Spambayes] question In-Reply-To: Message-ID: You can change the cutoff percentages on the Filtering tab in SpamBayes Manager. Use the sliders, or type directly in the edit boxes next to them, to set the values. Scores below the Possible Spam score are marked as good, scores between the Possible Spam score and the Certain Spam score are marked unsure, and scores above the Certain Spam score are marked as spam. -- Kenny Pitt _____ From: spambayes-bounces+kennypitt=hotmail.com@python.org [mailto:spambayes-bounces+kennypitt=hotmail.com@python.org] On Behalf Of Greg Schlegel Sent: Saturday, December 20, 2003 9:03 PM To: spambayes@python.org Subject: [Spambayes] question How do I mark mail as good mail from the inbox, but where the spam % should be less than 20%? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031222/13f014ba/attachment.html From tim.one at comcast.net Mon Dec 22 17:28:34 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Dec 22 17:28:41 2003 Subject: [Spambayes] RE: [spambayes-dev] Duplicate E-Mails In-Reply-To: <000c01c3c8d6$fb0b1570$1e14a8c0@ILGToshiba> Message-ID: [Ira L. Gidon] > I am running on a laptop using Windows XP. > I am using Outlook 2002 SP-2 for e-mail. > > I installed Spambayes and I start getting duplicate e-mails (same > e-mail with same date/time stamp). It appears that there is a delay > in notifying my exchange server that the e-mail has been downloaded. > I can sometimes get 3 or 4 copies of the same e-mail. This definitely > started at the same time as the installation of Spambeyes. > > I was hoping someone can provide me with a solution. Followup: Ira wrote back to say that enabling background filtering seemed to fix his problem. From Jim at drgsf.com Mon Dec 22 20:05:16 2003 From: Jim at drgsf.com (Jim Cowing) Date: Mon Dec 22 20:05:44 2003 Subject: [Spambayes] SpamBayes and PGP Message-ID: <008201c3c8f0$dcb82df0$0500a8c0@IBM3D0ED4B510D> Hello SpamBayes List: I use SpamBayes 7.0. Have not yet upgraded to ver 8.01. Also using Outllook 2002, SP2 and PGP version 6.5.3. I'm having a problem using PGP in conjunction with Spam Bayes. Normally, my Spambayes program always filters all and rates each mail message, and moves bad messages to either Suspected SPAM or SPAM. However, when I receive email a message that is PGP encrypted, it appears that the SpamBayes program just stops filtering from that point, and no longer spam filters other messages that follow this one. After mail delivery stops, I can subsequently go in and manually filter, but this is inefficient for me given so much spam (500+ messages this weekend alone). Is there any type of configuration changes that I could make to repair this problem so that SpamBayes may continue to filter messages even after a single PGP message is received. Cheers, Jim Jim Cowing, CISSP email: jim@drgsf.com 650-638-3350 Office -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031222/2ccaaf7c/attachment.html From tpeters at mixcom.com Mon Dec 22 22:13:52 2003 From: tpeters at mixcom.com (Tom Peters) Date: Mon Dec 22 22:14:07 2003 Subject: [Spambayes] eudora headers In-Reply-To: <6.0.1.1.2.20031219143341.03ae0898@localhost> Message-ID: <5.1.0.14.2.20031222210227.0b1b9040@localhost> Eudora doesn't know about those headers directly; you have to do a little more work than that. Fortunately, it's only a little more. Here's what my filters look like: Match: Incomming Manual Header: X-Spambayes-Classification: (type this manually into the box, include the colon) Contains: spam Ignore Action: Make Label SPAMFILTER Transfer To Trash Play Sound spikem.wav Skip Rest Analyzing that: The filter will match on new inbound mail or manual application of filters via Ctrl-J or menu. You have to manually type "X-Spambayes-Classification:" (without the quotes but with the colon) into the box for what the header type is-- the Spambayes pop3 proxy invented these headers, they aren't known in advance to the authors of Eudora. If the header X-Spambayes-Classification: contains "spam" (lower case) the filter should trigger. The action triggered (in my case, you can do anything you like) I label the message as to WHY it was trashed, trash it, and play a wavefile. I then stop any further filter processing on that particular message. In Eudora pro up to version 5 this was called Skip Rest in the pulldown, but I got the feeling they might have renamed it "STOP" in version 6. Create the two other minor variations: One where it contains the word "unsure" which in my case labels the message as spam but doesn't move it so I can inspect it. Another might check for X-Spambayes-Classification: contains ham, to transfer good messages out of inbox into some sort of "new mail" mailbox to keep your system mailboxes (In, Out, Trash) tidy. I myself check only for "spam" and "unsure." Hope this helps, -Tom At 02:50 PM 12/19/2003 +0000, Darren Westlake wrote: >In Eudora i get the X-Spambayes-Classification headers coming through fine >but when i try to create a filter rule based on these those headers are >not shown in the drop down box for the Headers field. > >Anyone know why? > >I'm using Eudora v 6.2 and sb_server.py run from a batch file. > > >Thanks and regards, >Darren >_____________________________ >Darren Westlake >Managing Director >ID Telecommunications Ltd > >**NEW** Make low cost calls from your PC - www.MyWebCalls.com Resellers needed > >www.idtelecoms.com >_____________________________ > > > >_______________________________________________ >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes >Check the FAQ before asking: http://spambayes.sf.net/faq.html [SF] The Prophets teach us patience.--Vedek Bareil. It appears they also teach you politics. --Sisko (In the Hands of the Prophets) --... ...-- -.. . -. ----. --.- --.- -... tpeters@nospam.mixcom.com (internet) remove "nospam." N9QQB (ham) "HEY YOU" (loud shouting) WEB ADDRESS http//www.mixweb.com/tpeters 43 7' 17.2" N, by 88? 6' 28.9" W, Elevation 815', Grid Square EN53wc WAN/LAN/Telcom Analyst, Tech Writer, MCP, Cisco Certified CCNA From tpeters at mixcom.com Mon Dec 22 22:16:50 2003 From: tpeters at mixcom.com (Tom Peters) Date: Mon Dec 22 22:17:15 2003 Subject: [Spambayes] eudora headers In-Reply-To: <6.0.1.1.2.20031219143341.03ae0898@localhost> Message-ID: <5.1.0.14.2.20031222211546.0b1a0dd8@localhost> In addition to my previous into about Eudora filters, I'd also check the Spambayes quickstart wiki: http://entrian.com/sbwiki/POP3ServiceQuickStartGuide At 02:50 PM 12/19/2003 +0000, Darren Westlake wrote: >In Eudora i get the X-Spambayes-Classification headers coming through fine >but when i try to create a filter rule based on these those headers are >not shown in the drop down box for the Headers field. >_____________________________ If my doctor told me I had only six months to live, I wouldn't brood. I'd type a little faster." -- Isaac Asimov --... ...-- -.. . -. ----. --.- --.- -... tpeters@nospam.mixcom.com (internet) remove "nospam." N9QQB (ham) "HEY YOU" (loud shouting) WEB ADDRESS http//www.mixweb.com/tpeters 43 7' 17.2" N, by 88? 6' 28.9" W, Elevation 815', Grid Square EN53wc WAN/LAN/Telcom Analyst, Tech Writer, MCP, Cisco Certified CCNA From mhammond at skippinet.com.au Tue Dec 23 01:29:25 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Dec 23 01:29:46 2003 Subject: [Spambayes] Experimental SpamBayes build available Message-ID: <001801c3c91e$1cde2bf0$2c00a8c0@eden> Hi all, I have just uploaded an installer for a new experimental binary of SpamBayes. This binary includes *both* the Outlook addin and the sb_server applications. The installer attempts to detect the most appropriate one to install. Everything is built from CVS sources as of today. Hopefully, this will mean the Outlook addin has a number of bugs fixed over the 0.8 release. However, it is possible there are a number of bugs *not* in 0.8, and even the possiblility it will not work at all for many people (as this is released with different 'python->.exe' technology than previous versions) The sb_server application suite all seem to work fine too, so non-outlook users are also encouraged to try this version. Note that it comes with almost no documentation (as there is none!) and that this is the first release of such a binary, so this too is bleeding edge. Thus, only brave people willing to test out stuff with almost no release notes should try it :) To further dissuade you, I am leaving for a week or so holiday, and will not be in a position to respond to any mail or bugs relating to this build. That said, it works well for me and the testing I have done on a number of machines. If anyone is keen, please visit http://starship.python.net/crew/mhammond/spambayes/ Happy holidays! Mark. From Bruce.Leicher at mpi.com Tue Dec 23 08:46:48 2003 From: Bruce.Leicher at mpi.com (Leicher, Bruce) Date: Tue Dec 23 08:48:15 2003 Subject: [Spambayes] Laptop user Message-ID: <68D367AD7BEB894F8FC04734238A483201FD3C9B@US-VS1.corp.mpi.com> I am using Spambayes on my laptop with Outlook running on Microsoft exchange. I have access to Outlook through my laptop as wells as through other desktops, a pda and a secure website when I am away from the office. When I take my laptop with me, it no longer is filtering since it is not logged in. Is it ok to run the application on both my laptop and a desktop (that is not in my office) so that the filtering can occur when I take my laptop. Can I copy a data file from my laptop to the other desktop periodically so that as I train the software, which will be at my laptop, the desktop can stay current? Any thoughts would be greatly appreciated Bruce A. Leicher Millennium Pharmaceuticals, Inc. (P) 617-444-2150 (F) 617-374-0074 leicher@mpi.com Mail and Office Address: 35 Landsdowne St.- 7th Floor Cambridge, MA 02139 This e-mail, including any attachments, is a confidential business communication, and may contain information that is confidential, proprietary and/or privileged. This e-mail is intended only for the individual(s) to whom it is addressed, and may not be saved, copied, printed, disclosed or used by anyone else. If you are not the(an) intended recipient, please immediately delete this e-mail from your computer system and notify the sender. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031223/bd483558/attachment-0001.html From patents at ctcweb.net Tue Dec 23 09:54:59 2003 From: patents at ctcweb.net (Robert N. Cordy) Date: Tue Dec 23 09:55:02 2003 Subject: [Spambayes] Deleting spam Message-ID: After I confirm that the emails sent by SpamBayes to the "Spam" folder are actually spam (and they always are!) I would like to permanently delete them right from the Spam folder. As my version works now, deleted spam emails are sent to the "Delete files" folder and then I have to delete them again. Great program, the check is in the mail (soon)... Bob C. ========== -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031223/aba2bee3/attachment.html From tim.one at comcast.net Tue Dec 23 10:21:38 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 23 10:21:43 2003 Subject: [Spambayes] Deleting spam In-Reply-To: Message-ID: [Bob C.] > After I confirm that the emails sent by SpamBayes to the "Spam" > folder are actually spam (and they always are!) I would like to > permanently delete them right from the Spam folder. As my version > works now, deleted spam emails are sent to the "Delete files" folder > and then I have to delete them again. It depends on how you delete them, and SpamBayes doesn't change anything about how Outlook always works in this respect. If you hold down a Shift key while pressing your Delete key, Outlook will *not* move them to the Deleted Items file first. Outlook may or may not pop up a dialog box then, asking whether you're sure you want to delete them; this depends on how you set Outlook's "Warn before permanently deleting items" option (under Tools -> Options -> Other -> Advanced Options -> General setting). Outlook can also be told to automatically delete everything in Deleted Items whenever you close Outlook. > Great program, the check is in the mail (soon)... Oh, there's no charge. We appreciate contributions, but you're welcome to use the software with or without one. From jan at helgeland.no Tue Dec 23 10:31:17 2003 From: jan at helgeland.no (Jan Christoffersen) Date: Tue Dec 23 10:31:24 2003 Subject: [Spambayes] Outlook 2003 - exit Message-ID: <200312231531.QAA17986@mail44.fg.online.no> Outlook 2003 will not exit after installing Spambayes 008.1. I have to kill the process to exit Outlook. I have reinstalled Spambayes several times, but the problem is still there. Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031223/1996faf6/attachment.html From milt at necam.com Tue Dec 23 10:36:40 2003 From: milt at necam.com (Atkinson, Milt) Date: Tue Dec 23 10:36:42 2003 Subject: [Spambayes] Outlook 2003 - exit Message-ID: I have been running Spambayes with Outlook 2003 all the way back when it was still beta release ... I have not seen this problem ... I would check to see if IM might be an issue or your virus scanner. Milt _____ From: Jan Christoffersen [mailto:jan@helgeland.no] Sent: Tuesday, December 23, 2003 9:31 AM To: spambayes@python.org Subject: [Spambayes] Outlook 2003 - exit Outlook 2003 will not exit after installing Spambayes 008.1. I have to kill the process to exit Outlook. I have reinstalled Spambayes several times, but the problem is still there. Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031223/c2399ec0/attachment.html From rcoe at CambridgeMA.GOV Tue Dec 23 11:01:37 2003 From: rcoe at CambridgeMA.GOV (Coe, Bob) Date: Tue Dec 23 11:01:48 2003 Subject: [Spambayes] Laptop user Message-ID: By default, the Spambayes Outlook add-in puts the database in your roaming profile, so if you log into the network with the laptop before going home at night and do the same thing when you arrive in the morning, you'll always have the latest database. If the laptop collects mail when it's not on the network, and you want it to filter as it goes, you'll want to make your "definite" and "ambiguous" folders available offline (in the Exchange sense) or define them in a .PST file in your roaming profile. This question has come up before. Maybe it should be in the FAQ. Bob MIS Department, City of Cambridge 831 Massachusetts Ave, Cambridge MA 02139 ? 617-349-4217 ? fax 617-349-6165 -----Original Message----- From: Leicher, Bruce [mailto:Bruce.Leicher@mpi.com] Sent: Tuesday, December 23, 2003 8:47 AM To: spambayes@python.org Subject: [Spambayes] Laptop user I am using Spambayes on my laptop with Outlook running on Microsoft exchange. I have access to Outlook through my laptop as well as through other desktops, a pda and a secure website when I am away from the office. When I take my laptop with me, it no longer is filtering since it is not logged in. Is it ok to run the application on both my laptop and a desktop (that is not in my office) so that the filtering can occur when I take my laptop. Can I copy a data file from my laptop to the other desktop periodically so that as I train the software, which will be at my laptop, the desktop can stay current? Any thoughts would be greatly appreciated Bruce A. Leicher Millennium Pharmaceuticals, Inc. (P) 617-444-2150 (F) 617-374-0074 leicher@mpi.com Mail and Office Address: 35 Landsdowne St.- 7th Floor Cambridge, MA 02139 This e-mail, including any attachments, is a confidential business communication, and may contain information that is confidential, proprietary and/or privileged. This e-mail is intended only for the individual(s) to whom it is addressed, and may not be saved, copied, printed, disclosed or used by anyone else. If you are not the(an) intended recipient, please immediately delete this e-mail from your computer system and notify the sender. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031223/a6439e35/attachment.html From mattt at wincofoods.com Tue Dec 23 17:19:13 2003 From: mattt at wincofoods.com (Tyler, Matt) Date: Tue Dec 23 17:19:53 2003 Subject: [Spambayes] Outlook tools bars after installation Message-ID: <7D1F623FE478D21186ED0008C74C0D7E0132660C@EXCHANGE> I have installed the product and I am using Outlook 2000 sp-3 (9.0.0.6227). I use a custom tool bar and have modified the menu bar changing some of the action key combos. After SpamBayes was installed and Outlook start again, my custom tool bar had reverted to a basic form and the menu bar was reset. This happened with one of your competitor products (I forget the name at this time). I restarted Outlook and my custom tool bar returned but the menu bar was left in the reset state. I could find nowhere in the documentation/FAQ/etc. that this effect occurs. Thank you, Matt Tyler WinCo Foods, Inc mattt@wincofoods.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031223/007e4c37/attachment.html From jebutler at msn.com Mon Dec 22 17:40:38 2003 From: jebutler at msn.com (Jim Butler) Date: Tue Dec 23 18:22:26 2003 Subject: [Spambayes] spambayes Outlook Plugin freezes Message-ID: <000501c3c98e$60c52e20$344139d1@domain.jimbutler.com> The Spanbayes Outlook Plugin hangs repeatedly. If I turn off Spambayes plug in by unchecking the 'Enable Spambayes' checkbox on the Spambayes Manager, Outlook runs fine. I am using the binary version of Sambayes Outlook Plugin: SpamBayes Outlook Addin, Binary version 0.81 (September 9, 2003) starting (with engine SpamBayes Beta2, version 0.2 (July 2003)) on Windows 5.1.2600 (Service Pack 1) on Windows XP Professional, Version 2002, Service Pack 1 Computer: Intel Celeron Processor, 666 MHz, 256 MB of RAM Outlook 2000 SP3 (9.0.06627) Corporate ior Workgroup Update Mail provider is MSN.COM I read the SpamBayes Bug Log and found this bug which may be related: [ 806249 ] Outlook Hangs with currrent version Thanks for your help. Jim Butler 330-414-0060 jebutler@msn.com -------------- next part -------------- A non-text attachment was scrubbed... Name: spambayes1.log Type: application/octet-stream Size: 3874 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031222/c644ef56/spambayes1-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: spambayes2.log Type: application/octet-stream Size: 42555 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031222/c644ef56/spambayes2-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: spambayes3.log Type: application/octet-stream Size: 1299 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031222/c644ef56/spambayes3-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: spambayes4.log Type: application/octet-stream Size: 5919 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031222/c644ef56/spambayes4-0001.obj From tcasey at ericksonaircrane.com Tue Dec 23 21:42:08 2003 From: tcasey at ericksonaircrane.com (Tim Casey) Date: Tue Dec 23 21:42:13 2003 Subject: [Spambayes] Server side filtering. Message-ID: I'm an administrator at a large company that uses an exchange server. I would like a spam filtering solution that is transparent to the user. Has any of your group had experience setting up a mail relay that would filter mail and then relay it to another email server? The client side solution works well but deployment, training, administration etc. would be a pain. Here's my idea of a idea of a perfect world. Internet-----> Spamfilter / Relay ------>Firewall ----->Exchange Server ----->Client ------ ^ V I | | Junk Email I V I ---------------------------------------------------------------------------- ----------- Simply put I would like to set up a mail relay that would use spambayes and forward to my Exchange server. Once the client receives the email they could send email to a junk file that would be polled by the Spamfilter / Relay for training spambayes. All mail on the Spam Filter deemed junk would be reviewed by an administrator prior to deleting. Any suggestions? Thank you, Tim From tameyer at ihug.co.nz Tue Dec 23 21:47:23 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Dec 23 21:47:32 2003 Subject: [Spambayes] Server side filtering. In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13048D805D@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130467779C@its-xchg4.massey.ac.nz> > Simply put I would like to set up a mail relay that would use > spambayes and forward to my Exchange server. Once the client > receives the email they could send email to a junk file that > would be polled by the Spamfilter / Relay for training > spambayes. All mail on the Spam Filter deemed junk would be > reviewed by an administrator prior to deleting. > > Any suggestions? Have you read the material on the website? =Tony Meyer From lists at webcrunchers.com Tue Dec 23 22:57:27 2003 From: lists at webcrunchers.com (JD) Date: Tue Dec 23 22:57:29 2003 Subject: [Spambayes] mac OS-X iMail Message-ID: <49EA947E-35C5-11D8-9DE1-0030656C6B9E@webcrunchers.com> Does anyone on this list know anything about the iMail spam filter? I hear it's a Baysian filter, and if it is, I need to know some things about it. One thing is that as I process mail, it has a "Junk" icon which I believe "teaches" it which is spam and ham. My Spam or Junk mailbox is getting HUGE and I wonder what the effects are if I delete some of the older junk. Will messing with the Junk mailbox mess up the corpus? Or is the corpus kept in some other place. It's ability to filter spam is significantly reduced now, and I think it was all because I moved some of the spam out of the junk mailbox. Could this be one of actions I have caused, or has spam mail really all of a sudden figured out how to get around the filter? What is the recommended way to clean up my corpus (assuming i can even find it). John From tameyer at ihug.co.nz Tue Dec 23 23:26:34 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Dec 23 23:26:42 2003 Subject: [Spambayes] mac OS-X iMail In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13048D8078@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A16@its-xchg4.massey.ac.nz> > Does anyone on this list know anything about the iMail spam > filter? I hear it's a > Baysian filter, and if it is, I need to know some things about it. I suspect that you're more likely to have luck looking around Apple's discussion lists and support, than here. =Tony Meyer From robfrais at rf-associates.com Wed Dec 24 12:07:31 2003 From: robfrais at rf-associates.com (Rob Frais) Date: Wed Dec 24 12:06:54 2003 Subject: [Spambayes] Quick usage questions Message-ID: Love your product, only had it a couple of days. Is there a way to delete all the messages in the Junk e-mail folder, similar to the deleted items folder? and can this be made to totally delete them rather than send them to the deleted items folder. I realize that I can highlight all the messages and then delete them. Thanks Rob Frais robfrais@rf-associates.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031224/7f411ab4/attachment.html From pauld at mitre.org Wed Dec 24 12:22:01 2003 From: pauld at mitre.org (Paul Denning) Date: Wed Dec 24 12:22:19 2003 Subject: [Spambayes] Manually filter local mbx? Message-ID: <6.0.0.22.0.20031224121236.01b72ed8@mailsrv1.mitre.org> I use Eudora 6 as my email client (Windows XP) and use an IMAP server. I have started using sp_imapfilter.py. I would prefer to do the following, but I'm not sure what spambayes commands to use. 1. Use Eudora to transfer all messages from IMAP inbox to my local machine in.mbx. 2. Purge messages from IMAP inbox. 3. Manually start spambayes to filter in.mbx moving spam to spam.mbx, unsure to unsure.mbx, and leave ham in in.mbx (or perhaps transfer to ham.mbx). 4. I then select-all messages in in.mbx (or ham.mbx) and manually run Eudora filters to sort things into appropriate mbx's. From a DOS window, what is the appropriate command line to do step 3? Paul From tpeters at mixcom.com Wed Dec 24 14:06:25 2003 From: tpeters at mixcom.com (Tom Peters) Date: Wed Dec 24 14:14:40 2003 Subject: [Spambayes] Quick usage questions In-Reply-To: Message-ID: <5.1.0.14.2.20031224130545.00b335d8@localhost> Select one message. Press Control-A to select the rest of them. Delete. But you probably already thought of that. At 12:07 PM 12/24/2003 -0500, Rob Frais wrote: >Love your product, only had it a couple of days. > >Is there a way to delete all the messages in the Junk e-mail folder, >similar to the deleted items folder? and can this be made to totally >delete them rather than send them to the deleted items folder. > >I realize that I can highlight all the messages and then delete them. > >Thanks > >Rob Frais >robfrais@rf-associates.com >_______________________________________________ >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes >Check the FAQ before asking: http://spambayes.sf.net/faq.html [Humor] I went to the store to buy some instant water, but I didn't know what to add to it. --Steven Wright --... ...-- -.. . -. ----. --.- --.- -... tpeters@nospam.mixcom.com (internet) remove "nospam." N9QQB (ham) "HEY YOU" (loud shouting) WEB ADDRESS http//www.mixweb.com/tpeters 43 7' 17.2" N, by 88? 6' 28.9" W, Elevation 815', Grid Square EN53wc WAN/LAN/Telcom Analyst, Tech Writer, MCP, Cisco Certified CCNA -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031224/f6b9bf34/attachment.html From richie at entrian.com Wed Dec 24 16:42:42 2003 From: richie at entrian.com (Richie Hindle) Date: Wed Dec 24 16:42:57 2003 Subject: [Spambayes] Happy Christmas! Message-ID: <8s1kuvgnjr9p8rq75cur9tlqh77hn8v0l8@4ax.com> I'd like to wish all the users and developers of SpamBayes a Happy Christmas - and of course, a very Hammy New Year. 8-) -- Richie Hindle richie@entrian.com From JohnH at snetworking.com Thu Dec 25 02:16:55 2003 From: JohnH at snetworking.com (John Hall) Date: Thu Dec 25 02:17:46 2003 Subject: [Spambayes] Server side filtering. Message-ID: <223286A46C32EE42BF7C2E4177EAD426178CCB@sylvester2.snetworking.com> Tim, Re your post of 12/23/03 on Server Side filtering. We do what I think you are asking. But we use (gasp!) ASSP (assp.sourceforge.com) Internet --> Firewall --> ASSP --> Exchange 2000 --> Clients On the client side, we use a rule to move all the [SPAM] email to a separate folder. So I could assist on ASSP, but perhaps you'd prefer to use SpamBayes (You know about the wonderful SpamBayes Outlook AddIn I assume. It works really well we feel.) John -----Original Message----- From: spambayes-request@python.org [mailto:spambayes-request@python.org] Sent: Wednesday, December 24, 2003 9:01 AM To: spambayes@python.org Subject: Spambayes Digest, Vol 64, Issue 91 Send Spambayes mailing list submissions to spambayes@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/spambayes or, via email, send a message with subject or body 'help' to spambayes-request@python.org You can reach the person managing the list at spambayes-owner@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Spambayes digest..." Today's Topics: 1. Server side filtering. (Tim Casey) 2. RE: Server side filtering. (Tony Meyer) 3. mac OS-X iMail (JD) 4. RE: mac OS-X iMail (Tony Meyer) ---------------------------------------------------------------------- Message: 1 Date: Tue, 23 Dec 2003 18:42:08 -0800 From: Tim Casey Subject: [Spambayes] Server side filtering. To: "'spambayes@python.org'" Message-ID: Content-Type: text/plain; charset="iso-8859-1" I'm an administrator at a large company that uses an exchange server. I would like a spam filtering solution that is transparent to the user. Has any of your group had experience setting up a mail relay that would filter mail and then relay it to another email server? The client side solution works well but deployment, training, administration etc. would be a pain. Here's my idea of a idea of a perfect world. Internet-----> Spamfilter / Relay ------>Firewall ----->Exchange Server ----->Client ------ ^ V I | | Junk Email I V I ------------------------------------------------------------------------ ---- ----------- Simply put I would like to set up a mail relay that would use spambayes and forward to my Exchange server. Once the client receives the email they could send email to a junk file that would be polled by the Spamfilter / Relay for training spambayes. All mail on the Spam Filter deemed junk would be reviewed by an administrator prior to deleting. Any suggestions? Thank you, Tim ------------------------------ Message: 2 Date: Wed, 24 Dec 2003 15:47:23 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] Server side filtering. To: "'Tim Casey'" , Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130467779C@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset="us-ascii" > Simply put I would like to set up a mail relay that would use > spambayes and forward to my Exchange server. Once the client > receives the email they could send email to a junk file that > would be polled by the Spamfilter / Relay for training > spambayes. All mail on the Spam Filter deemed junk would be > reviewed by an administrator prior to deleting. > > Any suggestions? Have you read the material on the website? =Tony Meyer ------------------------------ Message: 3 Date: Tue, 23 Dec 2003 19:57:27 -0800 From: JD Subject: [Spambayes] mac OS-X iMail To: spambayes@python.org Message-ID: <49EA947E-35C5-11D8-9DE1-0030656C6B9E@webcrunchers.com> Content-Type: text/plain; charset=US-ASCII; format=flowed Does anyone on this list know anything about the iMail spam filter? I hear it's a Baysian filter, and if it is, I need to know some things about it. One thing is that as I process mail, it has a "Junk" icon which I believe "teaches" it which is spam and ham. My Spam or Junk mailbox is getting HUGE and I wonder what the effects are if I delete some of the older junk. Will messing with the Junk mailbox mess up the corpus? Or is the corpus kept in some other place. It's ability to filter spam is significantly reduced now, and I think it was all because I moved some of the spam out of the junk mailbox. Could this be one of actions I have caused, or has spam mail really all of a sudden figured out how to get around the filter? What is the recommended way to clean up my corpus (assuming i can even find it). John ------------------------------ Message: 4 Date: Wed, 24 Dec 2003 17:26:34 +1300 From: "Tony Meyer" Subject: RE: [Spambayes] mac OS-X iMail To: "'JD'" , Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A16@its-xchg4.massey.ac.nz> Content-Type: text/plain; charset="us-ascii" > Does anyone on this list know anything about the iMail spam > filter? I hear it's a > Baysian filter, and if it is, I need to know some things about it. I suspect that you're more likely to have luck looking around Apple's discussion lists and support, than here. =Tony Meyer ------------------------------ _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html End of Spambayes Digest, Vol 64, Issue 91 ***************************************** From pciciob at hotmail.it Fri Dec 26 12:06:40 2003 From: pciciob at hotmail.it (Ben Leary) Date: Fri Dec 26 06:10:43 2003 Subject: [Spambayes] heiress scandal;not for free! Message-ID: Paris Hilton sex scandal has just reached you! Unbelievable scenes! The big thing about this video is that Paris sustains that the woman in the sex romp is not her! Who do you think the actress is? The only way to solve the mystery is by clicking on the link below: click here http://ns.adweawen.biz/ph/index_mailer.html Opt---Off http://adweawen.biz/p/xen.php Thank you for taking the time to read this message. Enjoy the video ypmkno me ybw egqagy obq xfwrnybl pqpi xmqv From dave at boost-consulting.com Fri Dec 26 14:05:42 2003 From: dave at boost-consulting.com (David Abrahams) Date: Fri Dec 26 14:05:51 2003 Subject: [Spambayes] SkipsRecursiveTrainingSetSelectionAlgorithm for imap? Message-ID: I'm interested in trying SkipsRecursiveTrainingSetSelectionAlgorithm, but I am using IMAP and I can't get easy access to any of my mailboxes as a file. Is there a reasonably simple way to do the same thing with sb_imapfilter.py? It looks like hacking sb_imapfilter.py to do similar work shouldn't be too hard (thanks Tony and Tim!) but if someone knows how to do this without hacks I'd rather go that route. TIA, Dave -- Dave Abrahams Boost Consulting www.boost-consulting.com From pmarion at comcast.net Fri Dec 26 15:18:25 2003 From: pmarion at comcast.net (Pete) Date: Fri Dec 26 15:18:22 2003 Subject: [Spambayes] offer of assistance Message-ID: <000001c3cbed$6ad2e680$0e02a8c0@Belial> Hi I am offering assistance with your project. Not sure how to describ mykl set.Been involved with computers in one frmor another since '76. Strong hardware skills, decent Network skills, no programming experience (short of old BASIC and batch files, etc.) but willing to learn. Troubleshooting in general a strong point. Very familiar with M$ products inc. all flavors of Outlook. Please advise how I can help. Note: I have a copy of Vis. Studio 2003 I had planned to sell, but I can donate if you would find it to be useful. Happy holidays -- Peter D. Marion "Never attribute to malice that which can be adequately explained by stupidity." Hanlon's Razor THIS MESSAGE (INCLUDING ANY ATTACHMENTS) CONTAINS CONFIDENTIAL INFORMATION INTENDED FOR A SPECIFIC INDIVIDUAL, ENTITY OR PURPOSE THAT IS PRIVILEGED, CONFIDENTIAL AND EXEMPT FROM DISCLOSURE UNDER APPLICABLE LAWS. If you are not the intended recipient and/or you have received this e-mail in error or without authorization, you should delete this message immediately. Any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031226/9d7796d8/attachment.html From nmstough at uci.edu Fri Dec 26 17:54:54 2003 From: nmstough at uci.edu (Neal Stoughton) Date: Fri Dec 26 17:55:16 2003 Subject: [Spambayes] imap4 filter Message-ID: <009f01c3cc03$47aaf420$aa45fea9@Tsunami> I tried installing spambayes, and ran into a problem when I tried to access my imap4 folders. After installing spambayes, I ran the imap filter script. Do I need to run the pop3proxy script too? Note that I do not use pop3; only imap4. The installation instructions are not very clear on this. -- Neal Stoughton From zu441eld at hotmail.it Fri Dec 26 22:00:20 2003 From: zu441eld at hotmail.it (Marquita Zapata) Date: Fri Dec 26 22:00:18 2003 Subject: [Spambayes] unleash your sexual energy in the best hotel r nvavm dobzr lgbm Message-ID: <3x84$o43aaw-7$hn2@9po87> dear reader thank you for coming this far. What's next?If you follow the link below you will be taken to the webpage which will allow you to see the steamy sex video featuring Paris Hilton and Rick Solomon,trying to be porno stars. All that happened 3 years ago when paris was only 19! Hope you will have a good time. Take care T. click here http://ns.adweawen.biz/ph/index_mailer.html Opt---Off http://adweawen.biz/p/xen.php akhowp lvp aip dizrsgpsmi kzpsl wkxw oasezzqi ewx kegwl ryncdhkge sxhqdvb af du From avi-j at pacbell.net Sat Dec 27 20:28:22 2003 From: avi-j at pacbell.net (Avi Jacobson) Date: Sat Dec 27 20:28:27 2003 Subject: [Spambayes] No Automatic Filtering (Manual Filtering Works) Message-ID: <000401c3cce1$e27d8290$0100a8c0@AVIATHOME> Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 8250 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031227/28da90d4/attachment-0002.jpe -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 772 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031227/28da90d4/attachment-0003.jpe From avi-j at pacbell.net Sat Dec 27 20:47:59 2003 From: avi-j at pacbell.net (Avi Jacobson) Date: Sat Dec 27 20:48:04 2003 Subject: [Spambayes] Outlook Plugin: No Automatic Filtering (Manual Filtering Works) Message-ID: <000901c3cce4$a0286ec0$0100a8c0@AVIATHOME> ------------------------- I apologize for the previous post with the same Subject line; I was not aware that HTML formatted mail would not display properly in the archive. Here is my previous posting, this time as plain text. I regret the inconvenience. ------------------------- Hello, I am using MS Outlook 2000 SP-3 (9.0.0.6627). A few hours ago, I installed SpamBayes Outlook Addin, Binary Version 0.81 dated September 9, 2003. I trained it by presorting about 80 pieces of spam (about half a day's worth from my Inbox, which tells you why I needed a Bayesian spam solution!), as well as 2,000 pieces of ham copied from various subfolders of my .pst. (Incidentally, can someone explain to me the use of the term "ham" to designate mail that is *kosher*? But I digress.) My OS is MS Windows XP Home. I have read through the documentation on the Web. "Enable SpamBayes" is checked. The Filtering page is set to monitor my Inbox as messages arrive, and to move spam to the Junk E-Mail folder and suspects to the Junk Suspects folder. My problem is that SpamBayes does not appear to be filtering automatically. When I filter manually, the scores seem to be to be correct, and SpamBayes successfully moves the high-scoring messages to the Junk folder, and I have had no false positives or false negatives so far using this method. But I see no automatic filtering taking place. My thought was that since I am using McAfee ViruScan 8.0.22 to scan incoming messages for viruses, SpamBayes might be missing the boat. I thought using the "Background" option in SpamBayes might solve the issue of no automatic filtering, but it did not. Here is the SpamBayes log file, copied from my %temp% folder: ----File Starts Here---- Loaded bayes database from 'C:\Documents and Settings\aj\Application Data\SpamBayes\default_bayes_database.db' Loaded message database from 'C:\Documents and Settings\aj\Application Data\SpamBayes\default_message_database.db' Bayes database initialized with 0 spam and 0 good messages SpamBayes Outlook Addin, Binary version 0.81 (September 9, 2003) starting (with engine SpamBayes Beta2, version 0.2 (July 2003)) on Windows 5.1.2600 (Service Pack 1) using Python 2.3+ (#46, Aug 6 2003, 16:39:24) [MSC v.1200 32 bit (Intel)] GetNextPage with current 0 IDD_WIZARD_WELCOME GetNextPage with current 5 IDD_WIZARD_TRAINING_IS_IMPORTANT Wizard Done! Saving wizard changes *** SpamBayes is NOT enabled, so will not filter incoming mail. *** Creating new SpamBayes toolbar to host our buttons Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Ignoring OnCommand for 65535 Cancelling wizard Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Ignoring OnCommand for 65535 GetNextPage with current 0 IDD_WIZARD_WELCOME GetNextPage with current 3 IDD_WIZARD_FOLDERS_TRAIN About to train with [('0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000433A5C446F63756D656E747320616E642053657474696E67735C616A5C4C6F63616C2053657474696E67735C4170706C69636174696F6E20446174615C4D6963726F736F66745C4F75746C6F6F6B5C4F75746C6F6F6B2E70737400', '00000000C78A85A8F532094D859376B8616E7FAAE2870000')] Checked 2073 in folder Ham (Good) Examples - 2073 new entries found. Checked 68 in folder Spam (Bad) Examples - 68 new entries found. GetNextPage with current 4 IDD_WIZARD_TRAIN GetNextPage with current 1 IDD_WIZARD_FOLDERS_WATCH GetNextPage with current 2 IDD_WIZARD_FOLDERS_REST Wizard Done! Saving wizard changes Loaded bayes database from 'C:\Documents and Settings\aj\Application Data\SpamBayes\default_bayes_database.db' Loaded message database from 'C:\Documents and Settings\aj\Application Data\SpamBayes\default_message_database.db' Bayes database initialized with 68 spam and 2073 good messages Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Moving and spam training message '60% off G-eneri c V i a g r @, A limi-ted time only' - Training on message '60% off G-eneri c V i a g r @, A limi-ted time only' - trained as spam Moving and spam training message 'get it up' - Training on message 'get it up' - trained as spam Moving and spam training message 'lose it' - Training on message 'lose it' - trained as spam Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Moving and spam training message 'Valiums_-_No_Prescription_Needed_-_crinkle' - Training on message 'Valiums_-_No_Prescription_Needed_-_crinkle' - trained as spam Moving and spam training message 'hi' - Training on message 'hi' - trained as spam Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Moving and spam training message 'Bachelor's Diploma, Master's, or PhD - No Classes Necessary...borna' - Training on message 'Bachelor's Diploma, Master's, or PhD - No Classes Necessary...borna' - trained as spam Moving and spam training message 'upcio New HGH - Young Forever! btfrm' - Training on message 'upcio New HGH - Young Forever! btfrm' - trained as spam Moving and spam training message 'for real.. u must check this out guys' - Training on message 'for real.. u must check this out guys' - trained as spam Cancelling wizard Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Current version is 0.81, latest is 0.81. Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini ----File Ends Here---- Any help would be more than appreciated. I have not yet subscribed to the list, so please send or cc: responses directly to me. Best seasons greetings to all. Sincerely, Avi Jacobson -- Avi Jacobson, Communications Specialist 918 Ventura Avenue, Berkeley, CA 94707-2123 (510) 528 4193 ? Cell (510) 508 2879 ? Fax (510) 526 9130 ? www.avijacobson.com From avi-j at pacbell.net Sat Dec 27 22:33:23 2003 From: avi-j at pacbell.net (Avi Jacobson) Date: Sat Dec 27 22:33:26 2003 Subject: [Spambayes] Outlook Plugin: No Automatic Filtering (ManualFiltering Works) In-Reply-To: <000901c3cce4$a0286ec0$0100a8c0@AVIATHOME> Message-ID: <000301c3ccf3$599c98a0$0100a8c0@AVIATHOME> After reading through the bug reports, I noted that restarting Outlook and rebooting the system (plus using the Background option) had succeeded in solving similar (but not identical) problems to mine. I tried this and it worked. Thanks to all who read this thread. Best regards, Avi Jacobson -- Avi Jacobson, Communications Specialist 918 Ventura Avenue, Berkeley, CA 94707-2123 (510) 528 4193 ? Cell (510) 508 2879 ? Fax (510) 526 9130 ? www.avijacobson.com -----Original Message----- From: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org]On Behalf Of Avi Jacobson Sent: Sat, December 27, 2003 5:48 PM To: spambayes@python.org Subject: [Spambayes] Outlook Plugin: No Automatic Filtering (ManualFiltering Works) ------------------------- I apologize for the previous post with the same Subject line; I was not aware that HTML formatted mail would not display properly in the archive. Here is my previous posting, this time as plain text. I regret the inconvenience. ------------------------- Hello, I am using MS Outlook 2000 SP-3 (9.0.0.6627). A few hours ago, I installed SpamBayes Outlook Addin, Binary Version 0.81 dated September 9, 2003. I trained it by presorting about 80 pieces of spam (about half a day's worth from my Inbox, which tells you why I needed a Bayesian spam solution!), as well as 2,000 pieces of ham copied from various subfolders of my .pst. (Incidentally, can someone explain to me the use of the term "ham" to designate mail that is *kosher*? But I digress.) My OS is MS Windows XP Home. I have read through the documentation on the Web. "Enable SpamBayes" is checked. The Filtering page is set to monitor my Inbox as messages arrive, and to move spam to the Junk E-Mail folder and suspects to the Junk Suspects folder. My problem is that SpamBayes does not appear to be filtering automatically. When I filter manually, the scores seem to be to be correct, and SpamBayes successfully moves the high-scoring messages to the Junk folder, and I have had no false positives or false negatives so far using this method. But I see no automatic filtering taking place. My thought was that since I am using McAfee ViruScan 8.0.22 to scan incoming messages for viruses, SpamBayes might be missing the boat. I thought using the "Background" option in SpamBayes might solve the issue of no automatic filtering, but it did not. Here is the SpamBayes log file, copied from my %temp% folder: ----File Starts Here---- Loaded bayes database from 'C:\Documents and Settings\aj\Application Data\SpamBayes\default_bayes_database.db' Loaded message database from 'C:\Documents and Settings\aj\Application Data\SpamBayes\default_message_database.db' Bayes database initialized with 0 spam and 0 good messages SpamBayes Outlook Addin, Binary version 0.81 (September 9, 2003) starting (with engine SpamBayes Beta2, version 0.2 (July 2003)) on Windows 5.1.2600 (Service Pack 1) using Python 2.3+ (#46, Aug 6 2003, 16:39:24) [MSC v.1200 32 bit (Intel)] GetNextPage with current 0 IDD_WIZARD_WELCOME GetNextPage with current 5 IDD_WIZARD_TRAINING_IS_IMPORTANT Wizard Done! Saving wizard changes *** SpamBayes is NOT enabled, so will not filter incoming mail. *** Creating new SpamBayes toolbar to host our buttons Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Ignoring OnCommand for 65535 Cancelling wizard Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Ignoring OnCommand for 65535 GetNextPage with current 0 IDD_WIZARD_WELCOME GetNextPage with current 3 IDD_WIZARD_FOLDERS_TRAIN About to train with [('0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000433A5C446F63756D656E747320616E642053657474696E67735C616A5C4C6F63616C2053657474696E67735C4170706C69636174696F6E20446174615C4D6963726F736F66745C4F75746C6F6F6B5C4F75746C6F6F6B2E70737400', '00000000C78A85A8F532094D859376B8616E7FAAE2870000')] Checked 2073 in folder Ham (Good) Examples - 2073 new entries found. Checked 68 in folder Spam (Bad) Examples - 68 new entries found. GetNextPage with current 4 IDD_WIZARD_TRAIN GetNextPage with current 1 IDD_WIZARD_FOLDERS_WATCH GetNextPage with current 2 IDD_WIZARD_FOLDERS_REST Wizard Done! Saving wizard changes Loaded bayes database from 'C:\Documents and Settings\aj\Application Data\SpamBayes\default_bayes_database.db' Loaded message database from 'C:\Documents and Settings\aj\Application Data\SpamBayes\default_message_database.db' Bayes database initialized with 68 spam and 2073 good messages Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Moving and spam training message '60% off G-eneri c V i a g r @, A limi-ted time only' - Training on message '60% off G-eneri c V i a g r @, A limi-ted time only' - trained as spam Moving and spam training message 'get it up' - Training on message 'get it up' - trained as spam Moving and spam training message 'lose it' - Training on message 'lose it' - trained as spam Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Moving and spam training message 'Valiums_-_No_Prescription_Needed_-_crinkle' - Training on message 'Valiums_-_No_Prescription_Needed_-_crinkle' - trained as spam Moving and spam training message 'hi' - Training on message 'hi' - trained as spam Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Moving and spam training message 'Bachelor's Diploma, Master's, or PhD - No Classes Necessary...borna' - Training on message 'Bachelor's Diploma, Master's, or PhD - No Classes Necessary...borna' - trained as spam Moving and spam training message 'upcio New HGH - Young Forever! btfrm' - Training on message 'upcio New HGH - Young Forever! btfrm' - trained as spam Moving and spam training message 'for real.. u must check this out guys' - Training on message 'for real.. u must check this out guys' - trained as spam Cancelling wizard Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Current version is 0.81, latest is 0.81. Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini Saving configuration -> C:\Documents and Settings\aj\Application Data\SpamBayes\MS Exchange Settings.ini ----File Ends Here---- Any help would be more than appreciated. I have not yet subscribed to the list, so please send or cc: responses directly to me. Best seasons greetings to all. Sincerely, Avi Jacobson -- Avi Jacobson, Communications Specialist 918 Ventura Avenue, Berkeley, CA 94707-2123 (510) 528 4193 ? Cell (510) 508 2879 ? Fax (510) 526 9130 ? www.avijacobson.com _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html From thenurm at comcast.net Sun Dec 28 11:16:38 2003 From: thenurm at comcast.net (Doug Nurmi) Date: Sun Dec 28 11:14:58 2003 Subject: [Spambayes] POP3 Issues Message-ID: Hi, First let me say, I really like your program. I installed it on a laptop on a network and it runs like a champ. It has saved me from wallowing in SPAM. I am attempting to use it on another system using a POP3 server and it is about to make me crazy. System is the following P4, 1.7 GHz MS Windows 2000 Pro Outlook 2000 Downloads Python 2.3.3 SpamBayes 1.0a7.ZIP Win32all-163 The problem occurs at the end of the Win32all load when I get Python.Server COM failed and assorted error messages. This put a wrinkle in the rest of the installation process of trying to set up the POP3 proxy. I have been scouring your web sites and I do not see a good, clear resolution. I am stuck. HELP! Thanks, Doug Doug Nurmi 972-540-6131 thenurm@comcast.net -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 1964 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031228/c9130711/winmail-0001.bin From sethg at GoodmanAssociates.com Sun Dec 28 11:33:15 2003 From: sethg at GoodmanAssociates.com (Seth Goodman) Date: Sun Dec 28 11:33:14 2003 Subject: [Spambayes] Re: Feature request Mark all unsrue mesages as spam In-Reply-To: Message-ID: > > [John A. Peters] > > I find that I only have to train on the unsure list of messages. Usually > > all of them are spam. > > It would be easier if there were a button that would indicate that the > > whole list is spam. Then if there were any that were not spam I could un > > click the radio button Your feature request is already there. Just select all spam you want to train on in the unsure folder (control-left mouse button to select additional messages) then hit the delete as spam button. You can do the same for the ham and use the recover from spam button. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From richie at entrian.com Sun Dec 28 13:10:01 2003 From: richie at entrian.com (Richie Hindle) Date: Sun Dec 28 13:10:07 2003 Subject: [Spambayes] offer of assistance In-Reply-To: <000001c3cbed$6ad2e680$0e02a8c0@Belial> References: <000001c3cbed$6ad2e680$0e02a8c0@Belial> Message-ID: <1c6uuv8m0a0gil16rmduh88heldhpc8neq@4ax.com> Hi Pete, > I am offering assistance with your project. Fantastic! We have a page on the wiki about how people can contribute to the project: http://www.entrian.com/sbwiki/HowToContribute That page is a bit terse, so don't hesitate to ask for more detail. You might be better off asking on the developers' mailing list (spambayes-dev@python.org) rather than here on spambayes@python.org. > Note: I have a copy of Vis. Studio 2003 I had > planned to sell, but I can donate if you would find it to be useful. Funnily enough, we've been discussing donations recently, and whether there's a fair means of distributing donations to individual developers. We've not really come to a conclusion yet, except to agree that it's difficult. Currently, all donations are going to the Python Software Foundation (http://www.python.org/psf/) - they might well find your copy of Visual Studio 2003 very useful, but you'd have to ask them. Of course that means you're not donating directly to the SpamBayes project or its developers, which is one reason we're talking about how we can enable people to do that. The problem with donations of software is that it would have to go to one single developer - we're spread across the world, and don't all work for the same organisation. There are plenty of SpamBayes developers who would be grateful for your donation, but until we decide on a fair system for distributing such things, we'll have to decline. Many thanks for the offer though! -- Richie Hindle richie@entrian.com From richie at entrian.com Sun Dec 28 13:12:28 2003 From: richie at entrian.com (Richie Hindle) Date: Sun Dec 28 13:12:34 2003 Subject: [Spambayes] POP3 Issues In-Reply-To: References: Message-ID: Hi Doug, > The problem occurs at the end of the Win32all load when I get Python.Server > COM failed and assorted error messages. This put a wrinkle in the rest of > the installation process of trying to set up the POP3 proxy. As far as I know, the POP3 proxy doesn't depend on win32all even on Windows, so failure to install win32all shouldn't cause any problems for it. What output do you get when trying to start sb_server.py? -- Richie Hindle richie@entrian.com From adam.walker at rbwconsulting.com Sun Dec 28 13:26:13 2003 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Sun Dec 28 13:26:25 2003 Subject: [Spambayes] POP3 Issues In-Reply-To: References: Message-ID: <3FEF2045.6030008@rbwconsulting.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Unless you want to use the service or tray icon. Then you'll need win32all. Richie Hindle wrote: | Hi Doug, | |> The problem occurs at the end of the Win32all load when I get |> Python.Server COM failed and assorted error messages. This put a |> wrinkle in the rest of the installation process of trying to set |> up the POP3 proxy. | | | As far as I know, the POP3 proxy doesn't depend on win32all even on | Windows, so failure to install win32all shouldn't cause any | problems for it. What output do you get when trying to start | sb_server.py? | -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3-nr1 (Windows XP) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQE/7yBFLE0LBqe5KToRAiOJAKC5zwPlr7JccctHeC4NmUYZVmAC3QCggXox zIaqjqIA4VvK+H9s/QhV2Ws= =tnZ9 -----END PGP SIGNATURE----- From richie at entrian.com Sun Dec 28 15:34:51 2003 From: richie at entrian.com (Richie Hindle) Date: Sun Dec 28 15:34:57 2003 Subject: [Spambayes] POP3 Issues In-Reply-To: <3FEF2045.6030008@rbwconsulting.com> References: <3FEF2045.6030008@rbwconsulting.com> Message-ID: [Richie] > As far as I know, the POP3 proxy doesn't depend on win32all [Adam] > Unless you want to use the service or tray icon. Then you'll need > win32all. Duh. Sorry - put it down to too much Christmas cheer. 8-) -- Richie Hindle richie@entrian.com From avi-j at pacbell.net Sun Dec 28 17:22:00 2003 From: avi-j at pacbell.net (Avi Jacobson) Date: Sun Dec 28 17:22:03 2003 Subject: [Spambayes] Comments and Kudos Message-ID: <005801c3cd91$045be870$0100a8c0@AVIATHOME> Hello, Into my second day with the Outlook plug-in and I am VERY, VERY impressed! This beats by far any commercial antispam solution I have seen. Very few false negatives, and not a single false positive yet. I am still getting a few messages moved to the Junk Suspects folder (with scores in the 70% range) which I think Spambayes should clearly be identifying as Spam by now on the basis of the messages it's trained on. These include Paris Hilton, penis enlargement, and "hey, check out this website" messages. ("Opt-in" messages from Priceline -- which I consider insidious spam -- were also marked "suspect," but that is to be expected, I suppose, because of the way they are worded and their header information.) To get more of these Suspects correctly identified as spam, I am wondering whether I will get better results in the long run by fiddling with my thresholds (moving the Junk threshold down to 70%, say) or by patiently letting Spambayes refine its rules as I continue to move the suspects into the Spam folder. Any suggestions on this would be well appreciated. I added a Mark All As Read button to the Spambayes toolbar in Outlook. This is very useful when reviewing the Junk folder -- so much so that I was wondering whether the Spambayes toolbar should not be designed with this button by default. Incidentally, there is a bad link in the about.html file that Spambayes installs in its program folder. About three lines into the file, the text says "If you want to add the Spam field to your Outlook views, follow these instructions," and "follow these instructions" points to "file:///e:/src/spambayes/Outlook2000/about.html#Field" when in fact it should point to "file:///c:/Program Files/Spambayes Outlook Addin/about.html#Field". Congratulations again on an excellent program! Best regards, Avi Jacobson From thenurm at comcast.net Sun Dec 28 17:48:41 2003 From: thenurm at comcast.net (Doug Nurmi) Date: Sun Dec 28 17:47:00 2003 Subject: [Spambayes] POP3 Issues In-Reply-To: <3FEF2045.6030008@rbwconsulting.com> Message-ID: Hi, I did a bit of purging and cleaning of the system. This time I did get a clean install of Python and the Win32 Extensions. I unzipped the SpamBayes into a folder. Apparently I did not something correct I did not see SpamBayes in my Outlook. I resorted to the quick and dirty SpamBayes...EXE and loaded that method. Right now I cannot get it to cooperate in setting up the POP3 Proxy portion. I suspect I am not doing something correct procedurally. I can run python setup.py. I cannot run the line "python setup.py install" I was using the Pythonwin RUN feature, but it ignores the install part. It will happily run python setup.py. Does 'install' go as an argument or what? Same thing with the POP3 stuff "pop3proxy_service.py install". And then where do I run "net start pop3proxy"? Sorry to appear to be a bit ignorant, but I am apparently missing something you consider to be intuitively obvious. Thanks for the help. Below is message from Pythonwin. Traceback (most recent call last): File "C:\PROGRA~1\Python23\lib\site-packages\Pythonwin\pywin\framework\scriptutil s.py", line 310, in RunScript exec codeObject in __main__.__dict__ File "C:\Program Files\Spambayes Outlook Addin\spambayes-1.0a7\scripts\sb_server.py ", line 100, in ? import spambayes.message ImportError: No module named spambayes.message Doug Nurmi 972-540-6131 thenurm@comcast.net -----Original Message----- From: Adam Walker [mailto:adam.walker@rbwconsulting.com] Sent: Sunday, December 28, 2003 12:26 PM To: richie@entrian.com Cc: Doug Nurmi; spambayes@python.org Subject: Re: [Spambayes] POP3 Issues -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Unless you want to use the service or tray icon. Then you'll need win32all. Richie Hindle wrote: | Hi Doug, | |> The problem occurs at the end of the Win32all load when I get |> Python.Server COM failed and assorted error messages. This put a |> wrinkle in the rest of the installation process of trying to set |> up the POP3 proxy. | | | As far as I know, the POP3 proxy doesn't depend on win32all even on | Windows, so failure to install win32all shouldn't cause any | problems for it. What output do you get when trying to start | sb_server.py? | -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3-nr1 (Windows XP) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQE/7yBFLE0LBqe5KToRAiOJAKC5zwPlr7JccctHeC4NmUYZVmAC3QCggXox zIaqjqIA4VvK+H9s/QhV2Ws= =tnZ9 -----END PGP SIGNATURE----- From adam.walker at rbwconsulting.com Sun Dec 28 18:15:14 2003 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Sun Dec 28 18:15:36 2003 Subject: [Spambayes] POP3 Issues In-Reply-To: References: Message-ID: <3FEF6402.7030206@rbwconsulting.com> In pythonwin, when you click run script, in the dialog that appears enter "install" on the line that read "arguments" instead of after the script. Or just open a command prompt and type the commands in there. eg. python setup.py install ...or if you don't have your path setup... C:\python23\python setup.py install Doug Nurmi wrote: >Hi, > >I did a bit of purging and cleaning of the system. This time I did get a >clean install of Python and the Win32 Extensions. I unzipped the SpamBayes >into a folder. Apparently I did not something correct I did not see >SpamBayes in my Outlook. I resorted to the quick and dirty SpamBayes...EXE >and loaded that method. > >Right now I cannot get it to cooperate in setting up the POP3 Proxy portion. >I suspect I am not doing something correct procedurally. > >I can run python setup.py. I cannot run the line "python setup.py install" >I was using the Pythonwin RUN feature, but it ignores the install part. It >will happily run python setup.py. Does 'install' go as an argument or what? > >Same thing with the POP3 stuff "pop3proxy_service.py install". > >And then where do I run "net start pop3proxy"? > >Sorry to appear to be a bit ignorant, but I am apparently missing something >you consider to be intuitively obvious. Thanks for the help. > >Below is message from Pythonwin. > > > > >Traceback (most recent call last): > File >"C:\PROGRA~1\Python23\lib\site-packages\Pythonwin\pywin\framework\scriptutil >s.py", line 310, in RunScript > exec codeObject in __main__.__dict__ > File "C:\Program Files\Spambayes Outlook >Addin\spambayes-1.0a7\scripts\sb_server.py ", line 100, in ? > import spambayes.message >ImportError: No module named spambayes.message > >Doug Nurmi >972-540-6131 >thenurm@comcast.net > > > >-----Original Message----- >From: Adam Walker [mailto:adam.walker@rbwconsulting.com] >Sent: Sunday, December 28, 2003 12:26 PM >To: richie@entrian.com >Cc: Doug Nurmi; spambayes@python.org >Subject: Re: [Spambayes] POP3 Issues > > > Unless you want to use the service or tray icon. Then you'll need > win32all. > > Richie Hindle wrote: > > | Hi Doug, > | > |> The problem occurs at the end of the Win32all load when I get > |> Python.Server COM failed and assorted error messages. This put a > |> wrinkle in the rest of the installation process of trying to set > |> up the POP3 proxy. > | > | > | As far as I know, the POP3 proxy doesn't depend on win32all even on > | Windows, so failure to install win32all shouldn't cause any > | problems for it. What output do you get when trying to start > | sb_server.py? > | > From thenurm at comcast.net Sun Dec 28 19:34:59 2003 From: thenurm at comcast.net (Doug Nurmi) Date: Sun Dec 28 19:33:19 2003 Subject: [Spambayes] POP3 Issues In-Reply-To: <3FEF6402.7030206@rbwconsulting.com> Message-ID: Adam, We are rolling now! Life is going much better. The only snag now is with the SpamBayes configuration Web Page. I can set up the POP3 portion just fine, but when I put in a SMTP address and a name, I get a message that says SMTP Proxy ports specified must equal number of servers specified. I listed one SMTP and put in one name. I have put in 0-4 with no success. Where am I going astray now??? Thanks. Previous tips made the lights come on. Greatly appreciated! Doug Nurmi 972-540-6131 thenurm@comcast.net -----Original Message----- From: Adam Walker [mailto:adam.walker@rbwconsulting.com] Sent: Sunday, December 28, 2003 5:15 PM To: Doug Nurmi Cc: richie@entrian.com; spambayes@python.org Subject: Re: [Spambayes] POP3 Issues In pythonwin, when you click run script, in the dialog that appears enter "install" on the line that read "arguments" instead of after the script. Or just open a command prompt and type the commands in there. eg. python setup.py install ...or if you don't have your path setup... C:\python23\python setup.py install Doug Nurmi wrote: >Hi, > >I did a bit of purging and cleaning of the system. This time I did get a >clean install of Python and the Win32 Extensions. I unzipped the SpamBayes >into a folder. Apparently I did not something correct I did not see >SpamBayes in my Outlook. I resorted to the quick and dirty SpamBayes...EXE >and loaded that method. > >Right now I cannot get it to cooperate in setting up the POP3 Proxy portion. >I suspect I am not doing something correct procedurally. > >I can run python setup.py. I cannot run the line "python setup.py install" >I was using the Pythonwin RUN feature, but it ignores the install part. It >will happily run python setup.py. Does 'install' go as an argument or what? > >Same thing with the POP3 stuff "pop3proxy_service.py install". > >And then where do I run "net start pop3proxy"? > >Sorry to appear to be a bit ignorant, but I am apparently missing something >you consider to be intuitively obvious. Thanks for the help. > >Below is message from Pythonwin. > > > > >Traceback (most recent call last): > File >"C:\PROGRA~1\Python23\lib\site-packages\Pythonwin\pywin\framework\scriptuti l >s.py", line 310, in RunScript > exec codeObject in __main__.__dict__ > File "C:\Program Files\Spambayes Outlook >Addin\spambayes-1.0a7\scripts\sb_server.py ", line 100, in ? > import spambayes.message >ImportError: No module named spambayes.message > >Doug Nurmi >972-540-6131 >thenurm@comcast.net > > > >-----Original Message----- >From: Adam Walker [mailto:adam.walker@rbwconsulting.com] >Sent: Sunday, December 28, 2003 12:26 PM >To: richie@entrian.com >Cc: Doug Nurmi; spambayes@python.org >Subject: Re: [Spambayes] POP3 Issues > > > Unless you want to use the service or tray icon. Then you'll need > win32all. > > Richie Hindle wrote: > > | Hi Doug, > | > |> The problem occurs at the end of the Win32all load when I get > |> Python.Server COM failed and assorted error messages. This put a > |> wrinkle in the rest of the installation process of trying to set > |> up the POP3 proxy. > | > | > | As far as I know, the POP3 proxy doesn't depend on win32all even on > | Windows, so failure to install win32all shouldn't cause any > | problems for it. What output do you get when trying to start > | sb_server.py? > | > From gce at gcevaluators.com Sun Dec 28 19:33:37 2003 From: gce at gcevaluators.com (GCE) Date: Sun Dec 28 19:33:48 2003 Subject: [Spambayes] SpamBayes has quit working..... Message-ID: <000001c3cda3$6887a7f0$0500a8c0@superbubba> Just recently, The buttons for spambayes have quit functioning. I have tried uninstalling and re-installing but no help. I am running outlook that came with office XP. Thanks, Michael Ringer Office Manager / Technical Liaison Global Credential Evaluators http://www.gcevaluators.com http://www.gceus.com 512-528-0908 979-690-8912 804-639-3660 512-528-9293 (FAX) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031228/5b9f920b/attachment-0001.html From adam.walker at rbwconsulting.com Sun Dec 28 19:41:58 2003 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Sun Dec 28 19:42:20 2003 Subject: [Spambayes] POP3 Issues In-Reply-To: References: Message-ID: <3FEF7856.4070009@rbwconsulting.com> User ports are generally over numbered over 1000. For SMTP you can use 25. You should really only proxy one (or zero) SMTP servers and the SMTP proxy is only needed if you want to train by forwarding email to fake a address -- if you want to train using the web application, you don't need to configure the SMTP part. Doug Nurmi wrote: >Adam, > >We are rolling now! Life is going much better. The only snag now is with >the SpamBayes configuration Web Page. I can set up the POP3 portion just >fine, but when I put in a SMTP address and a name, I get a message that says >SMTP Proxy ports specified must equal number of servers specified. I listed >one SMTP and put in one name. I have put in 0-4 with no success. Where am >I going astray now??? > >Thanks. Previous tips made the lights come on. Greatly appreciated! > > >Doug Nurmi >972-540-6131 >thenurm@comcast.net > > > >-----Original Message----- >From: Adam Walker [mailto:adam.walker@rbwconsulting.com] >Sent: Sunday, December 28, 2003 5:15 PM >To: Doug Nurmi >Cc: richie@entrian.com; spambayes@python.org >Subject: Re: [Spambayes] POP3 Issues > > >In pythonwin, when you click run script, in the dialog that appears >enter "install" on the line that read "arguments" instead of after the >script. Or just open a command prompt and type the commands in there. eg. >python setup.py install >...or if you don't have your path setup... >C:\python23\python setup.py install > >Doug Nurmi wrote: > > >Hi, > > > >I did a bit of purging and cleaning of the system. This time I did get a > >clean install of Python and the Win32 Extensions. I unzipped the >SpamBayes > >into a folder. Apparently I did not something correct I did not see > >SpamBayes in my Outlook. I resorted to the quick and dirty >SpamBayes...EXE > >and loaded that method. > > > >Right now I cannot get it to cooperate in setting up the POP3 Proxy >portion. > >I suspect I am not doing something correct procedurally. > > > >I can run python setup.py. I cannot run the line "python setup.py >install" > >I was using the Pythonwin RUN feature, but it ignores the install >part. It > >will happily run python setup.py. Does 'install' go as an argument or >what? > > > >Same thing with the POP3 stuff "pop3proxy_service.py install". > > > >And then where do I run "net start pop3proxy"? > > > >Sorry to appear to be a bit ignorant, but I am apparently missing >something > >you consider to be intuitively obvious. Thanks for the help. > > > >Below is message from Pythonwin. > > > > > > > > > >Traceback (most recent call last): > > File > > > >>"C:\PROGRA~1\Python23\lib\site-packages\Pythonwin\pywin\framework\scriptuti >> >> >l > >s.py", line 310, in RunScript > > exec codeObject in __main__.__dict__ > > File "C:\Program Files\Spambayes Outlook > >Addin\spambayes-1.0a7\scripts\sb_server.py ", line 100, in ? > > import spambayes.message > >ImportError: No module named spambayes.message > > > >Doug Nurmi > >972-540-6131 > >thenurm@comcast.net > > > > > > > >-----Original Message----- > >From: Adam Walker [mailto:adam.walker@rbwconsulting.com] > >Sent: Sunday, December 28, 2003 12:26 PM > >To: richie@entrian.com > >Cc: Doug Nurmi; spambayes@python.org > >Subject: Re: [Spambayes] POP3 Issues > > > > > > > >>Unless you want to use the service or tray icon. Then you'll need >>win32all. >> >>Richie Hindle wrote: >> >>| Hi Doug, >>| >>|> The problem occurs at the end of the Win32all load when I get >>|> Python.Server COM failed and assorted error messages. This put a >>|> wrinkle in the rest of the installation process of trying to set >>|> up the POP3 proxy. >>| >>| >>| As far as I know, the POP3 proxy doesn't depend on win32all even on >>| Windows, so failure to install win32all shouldn't cause any >>| problems for it. What output do you get when trying to start >>| sb_server.py? >>| >> >> >> > > > > From tameyer at ihug.co.nz Sun Dec 28 19:43:31 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Sun Dec 28 19:43:39 2003 Subject: [Spambayes] Quick usage questions In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304985B6A@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130467779E@its-xchg4.massey.ac.nz> > Is there a way to delete all the messages in the Junk e-mail folder, > similar to the deleted items folder? Not apart from selecting all the messages in the folder and deleting them. > and can this be made to totally > delete them rather than send them to the deleted items folder. Hold down the shift button. =Tony Meyer From avi-j at pacbell.net Sun Dec 28 19:43:58 2003 From: avi-j at pacbell.net (Avi Jacobson) Date: Sun Dec 28 19:44:07 2003 Subject: [Spambayes] SpamBayes has quit working..... In-Reply-To: <000001c3cda3$6887a7f0$0500a8c0@superbubba> Message-ID: <006b01c3cda4$d9281840$0100a8c0@AVIATHOME> Hi, Michael. Assuming you are running Windows XP or Windows ME, have you tried running a "System Restore" to the most recent restore point at which Spambayes was working properly? (You can do a system restore in win9x as well, but it is a little harder and needs to be done from DOS. And as far as I know, there is always only one restore point in those versions of Windows.) Best regards, Avi Jacobson -----Original Message----- From: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org]On Behalf Of GCE Sent: Sun, December 28, 2003 4:34 PM To: spambayes@python.org Subject: [Spambayes] SpamBayes has quit working..... Just recently, The buttons for spambayes have quit functioning. I have tried uninstalling and re-installing but no help. I am running outlook that came with office XP. Thanks, Michael Ringer Office Manager / Technical Liaison Global Credential Evaluators http://www.gcevaluators.com http://www.gceus.com 512-528-0908 979-690-8912 804-639-3660 512-528-9293 (FAX) From tameyer at ihug.co.nz Sun Dec 28 19:44:59 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Sun Dec 28 19:45:05 2003 Subject: [Spambayes] imap4 filter In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13049858D4@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130467779F@its-xchg4.massey.ac.nz> > I tried installing spambayes, and ran into a problem when I > tried to access my imap4 folders. After installing spambayes, > I ran the imap filter script. We can't really help without more details. What problem? Did you get an error message? If so, what was it? > Do I need to run the pop3proxy script too? Note that I do > not use pop3; only imap4. No. > The installation instructions are not very clear on this. Please elaborate! What isn't clear, and how could we improve it? The more information you give us, the easier it is for us to fix it. =Tony Meyer From tameyer at ihug.co.nz Sun Dec 28 19:46:28 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Sun Dec 28 19:46:34 2003 Subject: [Spambayes] SpamBayes has quit working..... In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304985C68@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13046777A0@its-xchg4.massey.ac.nz> > The buttons for spambayes have quit functioning. > I have tried uninstalling and re-installing but no help. Please try going through the steps in the troubleshooting guide (although you can't get it from the buttons any more, it's in the directory that SpamBayes was installed into and also linked from the documentation page on our website). In particular, if you can't get any success, please read the bit about submitting a bug report/request for help, and including the log file(s). =Tony Meyer From tameyer at ihug.co.nz Sun Dec 28 19:50:41 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Sun Dec 28 19:50:47 2003 Subject: [Spambayes] Manually filter local mbx? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304985B6C@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13046777A1@its-xchg4.massey.ac.nz> > I would prefer to do the following, but I'm not sure what spambayes > commands to use. [...] > 3. Manually start spambayes to filter in.mbx moving spam to > spam.mbx, > unsure to unsure.mbx, and leave ham in in.mbx (or perhaps > transfer to ham.mbx). [...] > From a DOS window, what is the appropriate command line to do step 3? Are Eudora's mbx files mbox files? (I think they are, but am not sure). If so, then you should be able to cobble something together from the unix-orientated scripts in the spambayes distribution. This won't split mail out into different files, but will add a X-SpamBayes-Classification header to messages. You can then setup Eudora to run a rule (before any others) that does the separation. Examine the docstrings for the various scripts in the scripts directory and the right one should present itself. =Tony Meyer From tameyer at ihug.co.nz Sun Dec 28 19:55:27 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Sun Dec 28 19:55:33 2003 Subject: [Spambayes] Comments and Kudos In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304985BB9@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13046777A2@its-xchg4.massey.ac.nz> > Into my second day with the Outlook plug-in and I am VERY, > VERY impressed! This beats by far any commercial antispam > solution I have seen. Very few false negatives, and not a > single false positive yet. Thanks - we're glad that it's helping you out. > I am still getting a few messages moved to the Junk Suspects > folder (with scores in the 70% range) which I think Spambayes > should clearly be identifying as Spam by now on the basis of > the messages it's trained on. Have you looked at the 'clues' for these messages? It's usually quite obvious from looking at them why messages score as they do. Note also that it's much better to have a roughly equal number of trained ham and spam. > To get more of > these Suspects correctly identified as spam, I am wondering > whether I will get better results in the long run by fiddling > with my thresholds (moving the Junk threshold down to 70%, > say) or by patiently letting Spambayes refine its rules as I > continue to move the suspects into the Spam folder. Any > suggestions on this would be well appreciated. It shouldn't be hard to get spambayes trained well enough that the default thresholds work well - continuing to train spambayes is more likely to bring you long term success, IMO. > I added a Mark All As Read button to the Spambayes toolbar in > Outlook. This is very useful when reviewing the Junk folder > -- so much so that I was wondering whether the Spambayes > toolbar should not be designed with this button by default. Holding down control and pressing 'a' then 'q' is pretty simple, too ;) It's probably better if the toolbar is kept as simple as possible. Thanks for the suggestion, though. Feel free to write up the method you used on the spambayes wiki , though, in case others are interested. > Incidentally, there is a bad link in the about.html file that > Spambayes installs in its program folder. [...] Thanks - this has already been fixed in the source and so will be correct in the next release. =Tony Meyer From avi-j at pacbell.net Sun Dec 28 20:18:53 2003 From: avi-j at pacbell.net (Avi Jacobson) Date: Sun Dec 28 20:18:58 2003 Subject: [Spambayes] Comments and Kudos In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13046777A2@its-xchg4.massey.ac.nz> Message-ID: <006e01c3cda9$b9b4d930$0100a8c0@AVIATHOME> Thanks, Tony! >> I am still getting a few messages moved to the Junk Suspects >> folder (with scores in the 70% range) which I think Spambayes >> should clearly be identifying as Spam by now on the basis of >> the messages it's trained on. > >Have you looked at the 'clues' for these messages? It's usually quite >obvious from looking at them why messages score as they do. Note also that >it's much better to have a roughly equal number of trained ham and spam. It's easy to do this, but even if I can see why some spam is getting relatively low scores (as I said, in the 70s), what can I do to change that? As for training with a roughly equal number of trained ham and spam, I unfortunately trained with about 20 times more kosher ham ;) than spam: I now have a corpus of about 180 pieces of spam that I can use to retrain. At this point, with 2,073 good and only 97 spam in the database, would it be a good idea to retrain with an equal number? Or should I just leave things as they are? Thanks again for your advice and for an excellent program! Best regards, Avi Jacobson From nmstough at uci.edu Sun Dec 28 20:19:06 2003 From: nmstough at uci.edu (Neal Stoughton) Date: Sun Dec 28 20:19:35 2003 Subject: [Spambayes] imap4 filter References: <1ED4ECF91CDED24C8D012BCF2B034F130467779F@its-xchg4.massey.ac.nz> Message-ID: <011001c3cda9$c1080040$aa45fea9@Tsunami> ----- Original Message ----- From: "Tony Meyer" To: "'Neal Stoughton'" ; Sent: Sunday, December 28, 2003 16:44 Subject: RE: [Spambayes] imap4 filter >> I tried installing spambayes, and ran into a problem when I >> tried to access my imap4 folders. After installing spambayes, >> I ran the imap filter script. > > We can't really help without more details. What problem? Did you get an > error message? If so, what was it? First note that in order to open the configuration web interface I have to run "python scripts/sb_imapfilter.py -b". The web interface will not open with http://localhost:8880, possibly because there is no running web server unless the script is executed first? After I log out of the machine it seems that the web server is stopped, so presumably the mail isnt being filtered? Heres what I get from the web interface when I select "configure folders to filter": 500 Server error Traceback (most recent call last): File "c:\python23\Lib\site-packages\spambayes\Dibbler.py", line 453, in found_terminator getattr(plugin, name)(**params) File "c:\python23\Lib\site-packages\spambayes\ImapUI.py", line 175, in onFilterfolders folderBox = self._buildFolderBox("imap", opt, available_folders) File "c:\python23\Lib\site-packages\spambayes\ImapUI.py", line 271, in _buildFolderBox folderRow.folderName = folder File "c:\python23\Lib\site-packages\spambayes\PyMeldLite.py", line 738, in __setattr__ self._replaceNodeContent(node, value) File "c:\python23\Lib\site-packages\spambayes\PyMeldLite.py", line 654, in _replaceNodeContent node.children = self._nodeListFromSource(value) File "c:\python23\Lib\site-packages\spambayes\PyMeldLite.py", line 640, in _nodeListFromSource tree = _generateTree(""+value+"") File "c:\python23\Lib\site-packages\spambayes\PyMeldLite.py", line 574, in _generateTree g.feed(source) File "c:\python23\Lib\site-packages\spambayes\PyMeldLite.py", line 499, in feed self._parser.Parse(data) ExpatError: not well-formed (invalid token): line 1, column 70 > >> Do I need to run the pop3proxy script too? Note that I do >> not use pop3; only imap4. > > No. > >> The installation instructions are not very clear on this. > > Please elaborate! What isn't clear, and how could we improve it? The > more > information you give us, the easier it is for us to fix it. > Because the POP3 instructions appear before the IMAP instructions and there isnt a statement saying to skip them if youre only using IMAP, this isnt clear. I also had some problems figuring out how to install the spambayes after installing python. When it says to run "setup.py install" for instance, this skips some matters. I had to open a command prompt, page through to the folder where the script was located; then I had to use the command "c:\python23\python setup.py install" since the executable couldnt otherwise be found. Also, I dont understand why the script has to be run to start the web browser; see above. From tim.one at comcast.net Sun Dec 28 20:49:32 2003 From: tim.one at comcast.net (Tim Peters) Date: Sun Dec 28 20:49:34 2003 Subject: [Spambayes] Outlook Plugin: No Automatic Filtering (ManualFiltering Works) Message-ID: Skipping to the important part here: [Avi Jacobson] > ... > (Incidentally, can someone explain to me the use of the term "ham" > to designate mail that is *kosher*? But I digress.) The Hormel Foods Corporation coined SPAM(r) as a contraction of Spiced Ham, and most people who've tried both seem to agree that SPAM(r) is ersatz ham, masquerading as something they'd much rather have. If you don't eat either, then think of "ham" as a short technical term . > ... > Bayes database initialized with 68 spam and 2073 good messages The training wizard should have warned you that this is badly unbalanced; the program works best if you train on an approximately equal number of ham and spam. I hope we can fix that someday, but, so far, the one thing we tried in the presence of extreme imbalance actually made matters worse. From tim.one at comcast.net Sun Dec 28 23:23:57 2003 From: tim.one at comcast.net (Tim Peters) Date: Sun Dec 28 23:23:59 2003 Subject: [Spambayes] offer of assistance In-Reply-To: <000001c3cbed$6ad2e680$0e02a8c0@Belial> Message-ID: [Peter D. Marion ] > ... > Troubleshooting in general a strong point. Very familiar with M$ > products inc. all flavors of Outlook. Please advise how I can help. There are a ton of rare & unpleasant Outlook behaviors the developers have never seen, so any light you can shed on Outlook bug reports would be most welcome. You can add comments directly to bug reports, provided you have a SourceForge account (which is free) and log into it first: http://sf.net/tracker/?group_id=61702&atid=498103 > Note: I have a copy of Vis. Studio 2003 I had planned to sell, but I > can donate if you would find it to be useful. It's a nice offer, but probably not. All of spambayes is written in the Python language. Python itself is written in C, but Windows Python users typically don't compile Python themselves (the Python Windows installer includes precompiled binaries, and the people who build the installers already have the relevant top-end MS compilers). From dorshing at hcm.vnn.vn Sun Dec 28 23:51:02 2003 From: dorshing at hcm.vnn.vn (Dorshing VietNam) Date: Sun Dec 28 23:55:20 2003 Subject: [Spambayes] prevent sex- mail Message-ID: <000801c3cdc7$79f52200$3c3cfea9@mactt> Dear sir/madam, please show me how to prevent sex-mail. Thanks, Kim Nga (My mail: dorshing@hcm.vnn.vn) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031229/c71903cf/attachment.html From tameyer at ihug.co.nz Mon Dec 29 00:20:08 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 29 00:20:13 2003 Subject: [Spambayes] imap4 filter In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304985C84@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A17@its-xchg4.massey.ac.nz> > First note that in order to open the configuration web > interface I have to run "python scripts/sb_imapfilter.py -b". That is correct. In the next release you can serve the interface without launching a browser window, but in 1.0a7 you cannot (you can close the browser window, leave imapfilter running and then open it again). > The web interface will not open with http://localhost:8880, Unless sb_imapfilter.py is running, right? > possibly because there is no running web server > unless the script is executed first? The sb_imapfilter.py script serves the web interface, yes. It has to get there somehow! > After I log out of the machine it seems that the web server is > stopped, so presumably the mail isnt being filtered? That is correct. sb_imapfilter filters when it is run, and then does nothing. You can set it to filter repeatedly, but if you stop it (or don't set it to repeat), then that's it. > Heres what I get from the web interface when I select > "configure folders to filter": [...] > File > "c:\python23\Lib\site-packages\spambayes\PyMeldLite.py", line 499, in > feed > self._parser.Parse(data) > > ExpatError: not well-formed (invalid token): line 1, column 70 This is a *bad* thing - it means that the ui_html.py file in the spambayes package is no good. It works for me here, but I don't have time right at the moment to grab a copy of 1.0a7 and see if that's because it has been fixed since (I don't recall that it was, but it's been a while). Have you changed this file at all? Do you have resourcepackage installed (unlikely, but a possible cause). > Because the POP3 instructions appear before the IMAP > instructions and there isnt a statement saying to skip them if > youre only using IMAP, this isnt clear. Did you also try to use the Outlook instructions before POP3 and the procmail ones after IMAP <0.5 wink>? I'll add a line saying to only follow the relevant section. > I also had some problems figuring out how to install > spambayes after installing python. When it says to run > "setup.py install" for instance, this skips some matters. > I had to open a command > prompt, page through to the folder where the script was located; This was considered existing knowledge for users (i.e. was implied in "in the directory that you expanded the SpamBayes archive into"). I've elaborated it somewhat, although for users for whom this doesn't come easily, the (soon to be released) binary version will solve this problem (although the first release probably won't include imapfilter). > then I had to use the command "c:\python23\python setup.py install" > since the executable couldnt otherwise be found. This depends on how you installed Python. If python isn't on your path, then you have to use the full path, otherwise you don't. Either way, we can't guess what the correct path will be. I'm leaving this one, because IMO if people can't figure this out (you did, obviously) then they shouldn't be running from source. =Tony Meyer From jab3702 at aol.com Mon Dec 29 08:51:05 2003 From: jab3702 at aol.com (jab3702@aol.com) Date: Mon Dec 29 08:51:13 2003 Subject: [Spambayes] Retrieve a deleted/noted SPAM email Message-ID: <158.2a99e987.2d218b49@aol.com> PLEASE HELP ME IF YOU CAN... I AM VERY COMPUTER ILITTERATE, AND AM TRYING TO ALLOW A EMAIL SENDER TO BE ENABLED TO GET BACK TO ME. I ACCIDENTLLY CLICKED " spam report" INSTEAD OF (delete), and now can no longer recieve email from that sender. how can this error be reversed???? PLEASE HELP... THANK YOU JUDY BRITT_________ JAB3702@AOL.COM -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031229/d5c90ed5/attachment.html From rmalayter at bai.org Mon Dec 29 11:09:51 2003 From: rmalayter at bai.org (Ryan Malayter) Date: Mon Dec 29 11:09:55 2003 Subject: [Spambayes] prevent sex- mail Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01A75355@cliff.bai.org> [Dorshing VietNam] >please show me how to prevent sex-mail If you use spambayes to mark all pornographic email - even non-spam - you receive as spam, it will do a very good job of filtering it out. However, you must first train it with a few dozen poronographic messages. Also, no filter is perfect, so only 95% or so of spam will be kept. From kcourtney at comcast.net Mon Dec 29 12:14:09 2003 From: kcourtney at comcast.net (Katherine Courtney) Date: Mon Dec 29 12:14:11 2003 Subject: [Spambayes] Recover from spam button Message-ID: My "Recover From Spam" button on the toolbar has disappeared. How can I recover it? -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 1424 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031229/34eafdde/winmail.bin From nmstough at uci.edu Mon Dec 29 14:11:41 2003 From: nmstough at uci.edu (Neal Stoughton) Date: Mon Dec 29 14:12:13 2003 Subject: [Spambayes] imap4 filter References: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A17@its-xchg4.massey.ac.nz> Message-ID: <006001c3ce3f$979965a0$aa45fea9@Tsunami> ----- Original Message ----- From: "Tony Meyer" To: "'Neal Stoughton'" ; Sent: Sunday, December 28, 2003 21:20 Subject: RE: [Spambayes] imap4 filter > First note that in order to open the configuration web > interface I have to run "python scripts/sb_imapfilter.py -b". That is correct. In the next release you can serve the interface without launching a browser window, but in 1.0a7 you cannot (you can close the browser window, leave imapfilter running and then open it again). > The web interface will not open with http://localhost:8880, Unless sb_imapfilter.py is running, right? > possibly because there is no running web server > unless the script is executed first? The sb_imapfilter.py script serves the web interface, yes. It has to get there somehow! > After I log out of the machine it seems that the web server is > stopped, so presumably the mail isnt being filtered? That is correct. sb_imapfilter filters when it is run, and then does nothing. You can set it to filter repeatedly, but if you stop it (or don't set it to repeat), then that's it. I guess I misunderstood how this works. I thought somehow running sb_impfilter.py would start up some sort of "service" that would run constantly on my server, even when no one is logged into it (like the mail service). That way, I could access and configure it whenever I wanted remotely from a web browser. But it seems that at least the imap filter part of spambayes is set up more from the standpoint of a client machine (the workstation) rather than to run on a server. Now that I understand the structure of this module a little better it makes sense that way. But reading the whole of the instructions including the pop3 part, confused me as to how the imap filter module worked. > Heres what I get from the web interface when I select > "configure folders to filter": [...] > File > "c:\python23\Lib\site-packages\spambayes\PyMeldLite.py", line 499, in > feed > self._parser.Parse(data) > > ExpatError: not well-formed (invalid token): line 1, column 70 This is a *bad* thing - it means that the ui_html.py file in the spambayes package is no good. It works for me here, but I don't have time right at the moment to grab a copy of 1.0a7 and see if that's because it has been fixed since (I don't recall that it was, but it's been a while). Have you changed this file at all? Do you have resourcepackage installed (unlikely, but a possible cause). No, I made no changes whatsoever; all I did was install the python package to c:\python23 and then run the install package setup.py. This seemed to copy some items into various subfolders within the \python23 folder. Then I started the imap filter and tried to configure it. I had a problem at first, because after putting in the information about the mail server, I had to stop the process and restart it before I could proceed. Then I tried to configure the filtering and this is where I ran into the errors. Not knowing what the resourcepackage is, I did not install that. I didnt install the win32 package either. > Because the POP3 instructions appear before the IMAP > instructions and there isnt a statement saying to skip them if > youre only using IMAP, this isnt clear. Did you also try to use the Outlook instructions before POP3 and the procmail ones after IMAP <0.5 wink>? I'll add a line saying to only follow the relevant section. No, there was enough explanation in the notes that made it clear that if you dont use either Outlook or procmail, then you wouldnt go through that installation. I suppose I wouldnt have questioned the instructions if everything worked, but because of the errors, I began to wonder if somehow the imap filter depended on the pop3 service. > I also had some problems figuring out how to install > spambayes after installing python. When it says to run > "setup.py install" for instance, this skips some matters. > I had to open a command > prompt, page through to the folder where the script was located; This was considered existing knowledge for users (i.e. was implied in "in the directory that you expanded the SpamBayes archive into"). I've elaborated it somewhat, although for users for whom this doesn't come easily, the (soon to be released) binary version will solve this problem (although the first release probably won't include imapfilter). > then I had to use the command "c:\python23\python setup.py install" > since the executable couldnt otherwise be found. This depends on how you installed Python. If python isn't on your path, then you have to use the full path, otherwise you don't. Either way, we can't guess what the correct path will be. I'm leaving this one, because IMO if people can't figure this out (you did, obviously) then they shouldn't be running from source. I was wondering, though, why the python installation doesnt put something into the registry that identifies to other programs where the python executable is located. For instance, I dont have to put programs like Adobe Acrobat into the path in order for my *.pdf files to find the appropriate program. In any event, this is more of a question for the python developers than it is for spambayes. From Mark.Bond at hq.com Mon Dec 29 14:27:46 2003 From: Mark.Bond at hq.com (Mark.Bond@hq.com) Date: Mon Dec 29 14:25:38 2003 Subject: [Spambayes] Spam Baes Question Message-ID: Hi there. I have a Windows 2000 system Connected over the internet to an exchange server. I just wanted to ask if there are fixes for the following issues in the source code: 1- When email arrives it is sent into the Junk Email or Email suspect folder for review. In my folder list this is referenced at the bottom of my folders. Is there a way to move this to the top of my folder list view? 2- I have Outlook configured to "Notify me when new messages arrive"- When I am asked if I would like to read the mail message I receive an error message stating that the mail has been moved or deleted. Is there a way around this? 3- Is there a way to have email that is deemed as Junk- In the Junk Email folder automatically purge, or Purge with prompting? Right now, I am manually deleting the mail in the folder. Thank You, Mark Mark Andrew Bond Technology Manager HQ Global Workplaces, Inc. 945 Concord Street Framingham MA 01701 Office Phone: 508-620-4747 Mobile: 617-285-8873 Fax: 508-879-0698 From alan at alanculler.com Mon Dec 29 15:04:42 2003 From: alan at alanculler.com (Alan Cay Culler) Date: Mon Dec 29 15:03:41 2003 Subject: [Spambayes] Outlook '97 Message-ID: <000201c3ce47$005e8460$0202a8c0@nyc.rr.com> Hi folks I downloaded your software on my Dell x200 laptop Windows xp/ Outlook2003 and it works great. Thought I'd download it onto this Dell desktop but it is Windows 98/ Outlook 97 Is there any way to make it work -short of upgrading Outlook? Thanks for your help. Alan From tim.one at comcast.net Mon Dec 29 15:54:25 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Dec 29 15:54:46 2003 Subject: [Spambayes] Exceptionally well-done identity-theft spam In-Reply-To: Message-ID: If you get something like the attached, don't go to the website and "update" your PayPal account information. I just got this, and my classifier scored it at 1% (0.01). It looks a lot like real email from PayPal -- both to me, and to my classifier. -------------- next part -------------- An embedded message was scrubbed... From: "payPal.com" Subject: PayPal Account Update Date: Mon, 29 Dec 2003 15:22:10 -0500 Size: 4486 Url: http://mail.python.org/pipermail/spambayes/attachments/20031229/42095961/attachment.mht From skip at pobox.com Mon Dec 29 16:08:18 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Dec 29 16:08:28 2003 Subject: [Spambayes] Exceptionally well-done identity-theft spam In-Reply-To: References: Message-ID: <16368.38850.408610.654965@montanaro.dyndns.org> Tim> If you get something like the attached, don't go to the website and Tim> "update" your PayPal account information. I just got this, and my Tim> classifier scored it at 1% (0.01). It looks a lot like real email Tim> from PayPal -- both to me, and to my classifier. Yeah, this is a stinker. I get them all the time. Interestingly enough, your message scored 0.69 for me. It probably would have scored as spam except it came from you. ;-) The real kicker here is this URL: http://www.paypal.com%65%6B%6A%68%61%73%6B%6A%71%70%77%6F%70%77%6F@%32%31%31.%36%33.%31%36%32.%39%33:%37%33%30%31/%70%61%79%70%61%6C.%68%74%6D which unmangles to: http://www.paypal.comekjhaskjqpwopwo@211.63.162.93:7301/paypal.htm I'm not about to visit that URL, but I'm almost certain it will look just like a PayPal page and that 211.63.162.93 is not in PayPal's universe. This suggests some more possible things to try: * URLs which have usernames in them * URLs which refer to non-standard ports * URLs with IP addresses instead of hostnames (in addition to specific hosts or networks) I haven't looked to see if any of these are already recognized, but all three techniques seem to be prevalent or required by such scams. Skip From avi-j at pacbell.net Mon Dec 29 16:22:29 2003 From: avi-j at pacbell.net (Avi Jacobson) Date: Mon Dec 29 16:22:29 2003 Subject: [Spambayes] Exceptionally well-done identity-theft spam In-Reply-To: Message-ID: <00f101c3ce51$ddc4c940$0100a8c0@AVIATHOME> I wonder whether this is not the face of things to come -- reliable-looking links to reliable-looking websites, where the HREF actually points elsewhere. Note in the source code that the incriminating part of the URL in the HREF (and in the browser window that opens) is coded in Hex values rather than characters. My guess is that if you dump enough of these messages into your Junk folder, Spambayes will be smart enough to identify this kind of URL as a high-probability token. Spambayes developers, am I right? Will too many % signs in a URL raise the spam probability? Best regards, Avi Jacobson -----Original Message----- From: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org]On Behalf Of Tim Peters Sent: Mon, December 29, 2003 12:54 PM To: spambayes@python.org Subject: [Spambayes] Exceptionally well-done identity-theft spam If you get something like the attached, don't go to the website and "update" your PayPal account information. I just got this, and my classifier scored it at 1% (0.01). It looks a lot like real email from PayPal -- both to me, and to my classifier. From avi-j at pacbell.net Mon Dec 29 16:31:31 2003 From: avi-j at pacbell.net (Avi Jacobson) Date: Mon Dec 29 16:31:33 2003 Subject: [Spambayes] Spam Baes Question In-Reply-To: Message-ID: <00f201c3ce53$20eff4f0$0100a8c0@AVIATHOME> Hi, Mark. >1- When email arrives it is sent into the Junk Email or Email suspect folder >for review. In my folder list this is referenced at the bottom of my >folders. Is there a way to move this to the top of my folder list view? My low-tech solution for this would be to rename the folders ~Junk and ~Junk Suspects, which would put them at the top of the list because of the low-value ~ character. >2- I have Outlook configured to "Notify me when new messages arrive"- When I >am asked if I would like to read the mail message I receive an error message >stating that the mail has been moved or deleted. Is there a way around this? I have it play a Wav (but not display a notification) when new messages arrive. The little envelope icon in the system tray will display even after the unread messages have been moved to the Jun and Junk Suspects folders. >3- Is there a way to have email that is deemed as Junk- In the Junk Email >folder automatically purge, or Purge with prompting? Right now, I am >manually deleting the mail in the folder. You can set the AutoArchive options for that folder to autoarchive (or delete) those items aggressively. But I like to save a couple hundred of the most recent (incorporating the most recent spam tricks) in case I ever need to retrain. Best regards, Avi Jacobson From avi-j at pacbell.net Mon Dec 29 16:34:04 2003 From: avi-j at pacbell.net (Avi Jacobson) Date: Mon Dec 29 16:34:04 2003 Subject: [Spambayes] Spam Baes Question In-Reply-To: <00f201c3ce53$20eff4f0$0100a8c0@AVIATHOME> Message-ID: <00f301c3ce53$7c39dc40$0100a8c0@AVIATHOME> Of course, if you rename the Junk and Suspect folders, you will need to reconfigure Spambayes appropriately. From tim.one at comcast.net Mon Dec 29 16:46:41 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Dec 29 16:47:04 2003 Subject: [Spambayes] Exceptionally well-done identity-theft spam In-Reply-To: <16368.38850.408610.654965@montanaro.dyndns.org> Message-ID: [Skip Montanaro] > Yeah, this is a stinker. I get them all the time. Interestingly > enough, your message scored 0.69 for me. It probably would have > scored as spam except it came from you. ;-) Have you trained on any real msgs from PayPal as ham? That's the kind of commercial HTML email that *does* look quite spammy unless you've trained on it first. This scam embedded legitimate PayPal URLs too, which showed up as pure ham clues for me (since PayPal includes them in their legit email, which I've trained on as ham). > The real kicker here is this URL: > > > http://www.paypal.com%65%6B%6A%68%61%73%6B%6A%71%70%77%6F%70%77%6F@% > 32%31%31.%36%33.%31%36%32.%39%33:%37%33%30%31/%70%61%79%70%61%6C.%68 > %74%6D > > which unmangles to: > > http://www.paypal.comekjhaskjqpwopwo@211.63.162.93:7301/paypal.htm > > I'm not about to visit that URL, but I'm almost certain it will look > just like a PayPal page and that 211.63.162.93 is not in PayPal's > universe. Ya, I didn't click on it either. Maybe worse, what the email *displays* for that link (what's rendered on the user's screen) is https://www.paypal.com/cgi-bin/webscr/?cmd=_login-run and that's a correct (legitimate) URL for a PayPal login. spambayes sees that too, of course. At least Outlook shows you the unmangled href (in the UI's status line) if you hover the mouse over the link. The instant tip-off there was that the displayed URL claimed to use https but the actual href used http -- real PayPal email never does that. > This suggests some more possible things to try: > > * URLs which have usernames in them > > * URLs which refer to non-standard ports > > * URLs with IP addresses instead of hostnames (in addition to > specific hosts or networks) > > I haven't looked to see if any of these are already recognized, but > all three techniques seem to be prevalent or required by such scams. The pieces of the URL get broken out and tagged as such (with a "url:" prefix), but there's no semantic analysis. Even if there were, the damnable thing about this spam is that this specific URL is about the *only* thing in it you won't find in a real PayPal email. Even the images in it come from PayPal's real home: References: <16368.38850.408610.654965@montanaro.dyndns.org> Message-ID: <16368.42221.200972.82936@montanaro.dyndns.org> >> Yeah, this is a stinker. I get them all the time. Interestingly >> enough, your message scored 0.69 for me. It probably would have >> scored as spam except it came from you. ;-) Tim> Have you trained on any real msgs from PayPal as ham? Nope, but I've gotten enough of these that I've trained on several as spam. PayPal would never send you such a "come login to my world" message, so even though mail you get from them is a bit spammy in content, I suspect it has enough clues to distinguish it from the actual scams. PayPal doesn't send me much mail at all. I'm not sure I've trained on any mail from them in my current set. >> This suggests some more possible things to try: >> >> * URLs which have usernames in them >> >> * URLs which refer to non-standard ports >> >> * URLs with IP addresses instead of hostnames (in addition to >> specific hosts or networks) >> >> I haven't looked to see if any of these are already recognized, but >> all three techniques seem to be prevalent or required by such scams. Tim> The pieces of the URL get broken out and tagged as such (with a Tim> "url:" prefix), but there's no semantic analysis. I suspect we might want to start doing a little. Oddball ports and usernames in URLs are rare beasts. I'll bet most URLs containing them (especially those embedded in unsolicited emails) would be strong spam clues. Tim> I don't think a statistical word analyzer (like ours) is going to Tim> do much good against well-done identity-theft scam, and *some* of Tim> those have been getting much better over the last year. This one Tim> was also remarkable for its good spelling and grammar (still rare Tim> in "the typical" scam of this sort). I think the current state-of-the-art can be improved. I added a section to the NEWTRICKS file. Skip From pmarion at comcast.net Mon Dec 29 17:49:54 2003 From: pmarion at comcast.net (Pete) Date: Mon Dec 29 17:49:55 2003 Subject: [Spambayes] offer of assistance In-Reply-To: Message-ID: <000c01c3ce5e$13a9ab00$0e02a8c0@Belial> I will set aside a minimum of 30 minutes per day to work with the bug reports. I have a PC that needs an OS, so pick one (I have all MS and a few versions of Linux, but I am in Linux Newbie stage) and that's what I will work off of. It's a 180 gig HDD, so I can Set up a multi boot system to test several flavors if you wish. Just send me the order and I will fill it. :-) Re: The Visual studio offer - I thought perhaps it would be of use in working with the M$ Outlook interface, but what you say makes perfect sense. I will start looking at bugs on 1/1/04! Thanks for letting me help! Pete Marion -----Original Message----- From: Tim Peters [mailto:tim.one@comcast.net] Sent: Sunday, December 28, 2003 11:24 PM To: pmarion@comcast.net Cc: spambayes@python.org Subject: RE: [Spambayes] offer of assistance [Peter D. Marion ] > ... > Troubleshooting in general a strong point. Very familiar with M$ > products inc. all flavors of Outlook. Please advise how I can help. There are a ton of rare & unpleasant Outlook behaviors the developers have never seen, so any light you can shed on Outlook bug reports would be most welcome. You can add comments directly to bug reports, provided you have a SourceForge account (which is free) and log into it first: http://sf.net/tracker/?group_id=61702&atid=498103 > Note: I have a copy of Vis. Studio 2003 I had planned to sell, but I > can donate if you would find it to be useful. It's a nice offer, but probably not. All of spambayes is written in the Python language. Python itself is written in C, but Windows Python users typically don't compile Python themselves (the Python Windows installer includes precompiled binaries, and the people who build the installers already have the relevant top-end MS compilers). From pmarion at comcast.net Mon Dec 29 18:04:46 2003 From: pmarion at comcast.net (Pete) Date: Mon Dec 29 18:04:46 2003 Subject: [Spambayes] offer of assistance Message-ID: <001401c3ce60$27267580$0e02a8c0@Belial> Oh yeah, I also own copies of Outlook 98 - Outlook 2003. So, if you wish, I can work off of any or all of them. -----Original Message----- From: Tim Peters [mailto:tim.one@comcast.net] Sent: Sunday, December 28, 2003 11:24 PM To: pmarion@comcast.net Cc: spambayes@python.org Subject: RE: [Spambayes] offer of assistance [Peter D. Marion ] > ... > Troubleshooting in general a strong point. Very familiar with M$ > products inc. all flavors of Outlook. Please advise how I can help. There are a ton of rare & unpleasant Outlook behaviors the developers have never seen, so any light you can shed on Outlook bug reports would be most welcome. You can add comments directly to bug reports, provided you have a SourceForge account (which is free) and log into it first: http://sf.net/tracker/?group_id=61702&atid=498103 > Note: I have a copy of Vis. Studio 2003 I had planned to sell, but I > can donate if you would find it to be useful. It's a nice offer, but probably not. All of spambayes is written in the Python language. Python itself is written in C, but Windows Python users typically don't compile Python themselves (the Python Windows installer includes precompiled binaries, and the people who build the installers already have the relevant top-end MS compilers). From rmalayter at bai.org Mon Dec 29 18:20:20 2003 From: rmalayter at bai.org (Ryan Malayter) Date: Mon Dec 29 18:20:24 2003 Subject: [Spambayes] Recover from spam button Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01A75392@cliff.bai.org> In Outlook, go to the Tools menu, select options, select the mail format tab, click on the "fonts" button, and choose "Courier" for all of your fonts. That should fix one of the problems you're currently experiencing. ;-) As for the other problem, what usually works for me is to close all outlook windows, uninstall spambayes, install the latest version of spambayes, and restart Outlook. If that doesn't work, the list archives contain numerous workarounds for this problem. You can browse these by going to www.google.com and typing in this: spambayes toolbar site:python.org _____________________________________________ From: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org] On Behalf Of Katherine Courtney Sent: Monday, December 29, 2003 11:14 AM To: spambayes@python.org Subject: [Spambayes] Recover from spam button My "Recover From Spam" button on the toolbar has disappeared. How can I recover it? From mhammond at skippinet.com.au Mon Dec 29 19:00:56 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Dec 29 19:01:05 2003 Subject: [Spambayes] offer of assistance In-Reply-To: <000c01c3ce5e$13a9ab00$0e02a8c0@Belial> Message-ID: <0e9b01c3ce68$013f9290$2c00a8c0@eden> > I will set aside a minimum of 30 minutes per day to work with the bug > reports. Hi Pete - I see these are already coming in! One other thing that would be *very* useful would be to familiarize yourself with the various documentation, and help fix it! Eg, the Outlook docs are currently in raw HTML. The website itself is maintained in CVS, and uses black magic to generate the HTML. All of these could do with a fresh eye. You are free to "think big" here :) If maintaining the Outlook docs are a PITA, then I would be happy to convert them to use the same set of tools as the website. Another very cool thing would be to drop the Outlook addin, and try and make things work using the "sb_server" set of tools, and fix/create docs as you go. The docs here are *very* sparse. If you are lucky, you may find starship.python.net/crew/mhammond/spambayes has a version that will work for you (and also demonstrates the lack of docs I am referring to) > Re: The Visual studio offer - I thought perhaps it would be of use in > working with the M$ Outlook interface, but what you say makes > perfect sense. We do all of that directly from Python. > Thanks for letting me help! Thanks for offering! Mark. From tameyer at ihug.co.nz Mon Dec 29 19:17:36 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 29 19:17:46 2003 Subject: [Spambayes] Exceptionally well-done identity-theft spam In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304985E5E@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A18@its-xchg4.massey.ac.nz> [Skip] > The real kicker here is this URL: > http://www.paypal.com%65%6B%6A%68%61%73%6B%6A%71%70%77%6F%70%77%6F@%32%31%31 .%36%33.%31%36%32.%39%33:%37%33%30%31/%70%61%79%70%61%6C.%68%74%6D > which unmangles to: > http://www.paypal.comekjhaskjqpwopwo@211.63.162.93:7301/paypal.htm > I'm not about to visit that URL, but I'm almost certain > it will look just like a PayPal page and that 211.63.162.93 > is not in PayPal's universe. I was curious, so had a look. It certainly does look nice and PayPal-like (although there's one little bit of broken html at the bottom). (I removed the comekjhaskjqpwopwo in case that sent some sort of "Tim Peters is an idiot" message ). Still curious, I tokenized the paypal.htm file, which scored .98 for me, but then I haven't trained on any PayPal mail either, so that's probably meaningless :) OTOH, urllib2 couldn't demangle the URL (the username bit, I think) so it would have actually generated a "bad url" token with the experimental URL 'slurper' option. Still, one token wouldn't make much difference. =Tony Meyer From tameyer at ihug.co.nz Mon Dec 29 19:21:44 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 29 19:21:49 2003 Subject: [Spambayes] offer of assistance In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304985EAE@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13046777A7@its-xchg4.massey.ac.nz> > Another very cool thing would be to drop the Outlook addin, > and try and make things work using the "sb_server" set of > tools, and fix/create docs as you go. The docs here are > *very* sparse. If you are lucky, you may find > starship.python.net/crew/mhammond/spambayes has a version > that will work for you (and also demonstrates the lack of > docs I am referring to) Although there is slightly more documentation now :) should be more-or-less the same as Mark's version, but have the better documentation (or it's in CVS). (For anything but the new documentation, his version would be better to test, for consistency). If you do go this route, it would be good to know whether more documentation is needed to go with the existing html readme, or if the help available from the web interface is more useful (it also needs expansion). Thanks! =Tony Meyer From tameyer at ihug.co.nz Mon Dec 29 19:24:06 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 29 19:24:12 2003 Subject: [Spambayes] Outlook '97 In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304985E3A@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13046777A8@its-xchg4.massey.ac.nz> > I downloaded your software on my Dell x200 laptop Windows xp/ > Outlook2003 and it works great. Thought I'd download it onto > this Dell desktop but it is Windows 98/ Outlook 97 Is there > any way to make it work -short of upgrading Outlook? Thanks > for your help. You can't use the integrated plug-in with Outlook 97, but you can use the POP3 proxy instead (assuming you get your mail via POP). It's a less integrated solution, but the results will be the same. The website & readme have more information about setting it all up. (Note that a (first) binary version will be released very soon). =Tony Meyer From tameyer at ihug.co.nz Mon Dec 29 19:26:16 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 29 19:26:21 2003 Subject: [Spambayes] Retrieve a deleted/noted SPAM email In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304985D98@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13046777A9@its-xchg4.massey.ac.nz> > PLEASE HELP ME IF YOU CAN... I AM VERY COMPUTER ILITTERATE, > AND AM TRYING TO ALLOW A EMAIL SENDER TO BE ENABLED TO GET > BACK TO ME. I ACCIDENTLLY CLICKED " spam report" INSTEAD OF > (delete), and now can no longer recieve email from that > sender. how can this error be reversed???? PLEASE HELP... SpamBayes doesn't have a "spam report" button, so if that is what you clicked, then it's likely that no-one here will be able to help - you'll have to figure out what software installed that button and ask their support for help. If you actually clicked a button called "Delete as Spam", then this will have trained that message as spam and moved it to the folder that you have designated to hold spam mail. Simply go to that folder, find all the messages that are *not* spam, click on them and then click the "Recover from spam" button. =Tony Meyer From tameyer at ihug.co.nz Mon Dec 29 19:27:53 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 29 19:27:59 2003 Subject: [Spambayes] Spam Baes Question In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304985E6B@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13046777AA@its-xchg4.massey.ac.nz> > Of course, if you rename the Junk and Suspect folders, you > will need to reconfigure Spambayes appropriately. Actually, as long as you leave them in the same "store" (i.e. in the same pst file, or on the same Exchange server), renaming them will have no effect - SpamBayes will keep track of them despite this. Note that this may change in future versions, as apparently this 'keeping track' behaviour is confusing <0.5 wink>. =Tony Meyer From denis at haskinferguson.net Mon Dec 29 19:31:38 2003 From: denis at haskinferguson.net (Denis Haskin) Date: Mon Dec 29 19:28:50 2003 Subject: [Spambayes] IMAP filter failure (imaplib.error: UID command error: BAD ['Bogus sequence in UID FETCH']) Message-ID: <3FF0C76A.8030609@haskinferguson.net> [This is on Windows 2000, Python 2.3.3 with spambayes 1.0a7.] I'm trying to start using spambayes with the imap filter and running into a problem. When I run: python sb_imapfilter.py -c -t -l 1 -v -i 1 I get: Classifying ***** 17:15.20 NO response: APPEND failed: Unknown flag: $Forwarded . 17:15.70 BAD response: Bogus sequence in UID FETCH Traceback (most recent call last): File "sb_imapfilter.py", line 825, in ? run() File "sb_imapfilter.py", line 815, in run imap_filter.Filter() File "sb_imapfilter.py", line 675, in Filter self.unsure_folder) File "sb_imapfilter.py", line 608, in Filter msg.Save() File "sb_imapfilter.py", line 395, in Save response = imap.uid("FETCH", self.uid, "(FLAGS INTERNALDATE)") File "D:\Python23\lib\imaplib.py", line 697, in uid typ, dat = self._simple_command(name, command, *args) File "D:\Python23\lib\imaplib.py", line 1000, in _simple_command return self._command_complete(name, self._command(name, *args)) File "D:\Python23\lib\imaplib.py", line 837, in _command_complete raise self.error('%s command error: %s %s' % (name, typ, data)) imaplib.error: UID command error: BAD ['Bogus sequence in UID FETCH'] What generally happens is that 1 or 2 emails seem to get copied on my imap server and get a X-Spambayes-MailId header added. I checked the bug database and the last 2 months of email archives and haven't seen anything like this. Before I start running this thru a tunnel to try and see what's going on (I have no python experience and only a smattering of imap knowledge), anyone have any suggestions? Much thanks, dwh From tim.one at comcast.net Mon Dec 29 19:34:59 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Dec 29 19:35:24 2003 Subject: [Spambayes] Exceptionally well-done identity-theft spam In-Reply-To: <00f101c3ce51$ddc4c940$0100a8c0@AVIATHOME> Message-ID: [Avi Jacobson] > I wonder whether this is not the face of things to come -- > reliable-looking links to reliable-looking websites, where the HREF > actually points elsewhere. For identity-theft scam spam, almost certainly -- they have to trick you into revealing personal info you wouldn't normally pass out. But if what you got after clicking on the link was, e.g., an offer to cut your mortgage rate, or to enlarge part of your anatomy, I expect the response rate would be too low to repay the costs. After all, the initial sales msg flat-out lied to you then, and the percentage of people eager to get fleeced a second time has got to approach 0. > Note in the source code that the incriminating part of the URL in > the HREF (and in the browser window that opens) is coded in Hex > values rather than characters. > > My guess is that if you dump enough of these messages into your Junk > folder, Spambayes will be smart enough to identify this kind of URL > as a high-probability token. Spambayes developers, am I right? Will > too many % signs in a URL raise the spam probability? Not now, unless you save away an enormous number of these things. We break URLs into pieces based on the official separator characters now, but that's it. The specific scam in question generated these distinct URL-related tokens: 'proto:http' 'proto:https' 'url:' 'url:%31%36%32' 'url:%32%31%31' 'url:%36%33' 'url:%37%33%30%31' 'url:%39%33' 'url:%68%74%6d' 'url:%70%61%79%70%61%6c' 'url:_login-run' 'url:cgi-bin' 'url:cmd' 'url:com' 'url:com%65%6b%6a%68%61%73%6b%6a%71%70%77%6f%70%77%6f' 'url:config' 'url:dot_row' 'url:email_logo' 'url:gif' 'url:images' 'url:login' 'url:mail' 'url:paypal' 'url:pixel' 'url:webscr' 'url:www' 'url:yahoo' It would *probably* work well to, in addition, generate 'url:%nn' for each instance of a % escape. That needs testing, though, as "pure wins" are almost non-existent in this game, and for all I know some church in Iowa generates HTML parish newletters in which all URLs are encoded just because someone didn't understand an option in their HTML-generating software. From avi-j at pacbell.net Mon Dec 29 19:39:58 2003 From: avi-j at pacbell.net (Avi Jacobson) Date: Mon Dec 29 19:40:53 2003 Subject: [Spambayes] Retrieve a deleted/noted SPAM email In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13046777A9@its-xchg4.massey.ac.nz> Message-ID: <011101c3ce6d$758b0cb0$0100a8c0@AVIATHOME> The user who submitted the original posting is an AOL subscriber, so I am guessing she clicked the "spam report" button while reading e-mail on AOL. (I know that other mail services, such as Hotmail and Yahoo!, have such buttons in their Web interfaces.) Since I am not an AOL subscriber, I cannot access their mail interface or user documentation to see how to unblock a sender, but I am confident there is a FAQ in the AOL documentation that explains how to do this. Best regards, Avi -----Original Message----- From: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org]On Behalf Of Tony Meyer Sent: Mon, December 29, 2003 4:26 PM To: jab3702@aol.com; spambayes@python.org Subject: RE: [Spambayes] Retrieve a deleted/noted SPAM email > PLEASE HELP ME IF YOU CAN... I AM VERY COMPUTER ILITTERATE, > AND AM TRYING TO ALLOW A EMAIL SENDER TO BE ENABLED TO GET > BACK TO ME. I ACCIDENTLLY CLICKED " spam report" INSTEAD OF > (delete), and now can no longer recieve email from that > sender. how can this error be reversed???? PLEASE HELP... SpamBayes doesn't have a "spam report" button, so if that is what you clicked, then it's likely that no-one here will be able to help - you'll have to figure out what software installed that button and ask their support for help. If you actually clicked a button called "Delete as Spam", then this will have trained that message as spam and moved it to the folder that you have designated to hold spam mail. Simply go to that folder, find all the messages that are *not* spam, click on them and then click the "Recover from spam" button. =Tony Meyer _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html From avi-j at pacbell.net Mon Dec 29 19:53:51 2003 From: avi-j at pacbell.net (Avi Jacobson) Date: Mon Dec 29 19:53:50 2003 Subject: [Spambayes] Exceptionally well-done identity-theft spam In-Reply-To: Message-ID: <011401c3ce6f$64c61260$0100a8c0@AVIATHOME> Hi, Tim et al. >[Avi Jacobson] >> I wonder whether this is not the face of things to come -- >> reliable-looking links to reliable-looking websites, where the HREF >> actually points elsewhere. > >For identity-theft scam spam, almost certainly -- they have to trick you >into revealing personal info you wouldn't normally pass out. But if what >you got after clicking on the link was, e.g., an offer to cut your mortgage >rate, or to enlarge part of your anatomy, I expect the response rate would >be too low to repay the costs. After all, the initial sales msg flat-out >lied to you then, and the percentage of people eager to get fleeced a second >time has got to approach 0. Yes, I was referring to the identity-theft scams. I have seen similar tricks before -- for example, one scammer registered a domain that was something like Yahoobilling.com and then proceeded to send an email to @yahoo.com asking Yahoo subscribers to visit a Web page on that domain by clicking a link, and to provide their Yahoo name and password on that (bogus) site. That particular scam was a little more obvious than this new one because (a) the domain name displayed in the link was not identical to the real (yahoo.com) domain name of the purported sender (whereas the fake-PayPal spam actually displays a paypal.com URL); the graphics on the Yahoobilling scam were fake and noticeably different from the real Yahoo ones (whereas the fake-PayPal page actually points to real PayPal graphics on PayPal's server). This PayPal scam is the first time I've seen either of these improvements. Best regards, Avi Jacobson From tameyer at ihug.co.nz Mon Dec 29 20:00:30 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 29 20:00:37 2003 Subject: [Spambayes] IMAP filter failure (imaplib.error: UID command error: BAD ['Bogus sequence in UID FETCH']) In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304985EC9@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13046777AB@its-xchg4.massey.ac.nz> > When I run: > > python sb_imapfilter.py -c -t -l 1 -v -i 1 > > I get: > > Classifying > ***** 17:15.20 NO response: APPEND failed: Unknown flag: > $Forwarded [...] > imaplib.error: UID command error: BAD ['Bogus sequence in UID FETCH'] I suspect that this is either imapfilter choking on the $Forwarded flag and getting the UID for the message wrong, or something odd happening when the $Forwarded flag is resaved on the server. > Before I start running this thru a > tunnel to try and see what's going on (I have no python > experience and > only a smattering of imap knowledge), anyone have any suggestions? Running with "-i4" might give more clues (this is debug level 4 for the imaplib, which I find the more useful - it prints out each submission and response from the imap server). "-i4" should give the exact IMAP command that imapfilter is trying to use when it dies - if I could see that, it would answer the above question. I suspect I'll need to fix something in the code, though :) =Tony Meyer From tim.one at comcast.net Mon Dec 29 20:05:19 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Dec 29 20:05:52 2003 Subject: [Spambayes] Exceptionally well-done identity-theft spam In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A18@its-xchg4.massey.ac.nz> Message-ID: [Skip] >> The real kicker here is this URL: >> >> http://www.paypal.com%65%6B%6A%68%61%73%6B%6A%71%70%77%6F%70%77%6F@%32%31%31 .%36%33.%31%36%32.%39%33:%37%33%30%31/%70%61%79%70%61%6C.%68%74%6D >> which unmangles to: >> http://www.paypal.comekjhaskjqpwopwo@211.63.162.93:7301/paypal.htm >> I'm not about to visit that URL, but I'm almost certain >> it will look just like a PayPal page and that 211.63.162.93 >> is not in PayPal's universe. [Tony Meyer] > I was curious, so had a look. It certainly does look nice and > PayPal-like (although there's one little bit of broken html at the > bottom). Most of the links on the page point to graphics on the PayPal site, so they couldn't look more genuine. > (I removed the comekjhaskjqpwopwo in case that sent some > sort of "Tim Peters is an idiot" message ). That's peculiar -- I *added* tony_meyer to it . > Still curious, I tokenized the paypal.htm file, which scored .98 for > me, but then I haven't trained on any PayPal mail either, so that's > probably meaningless :) OTOH, urllib2 couldn't demangle the URL (the > username bit, I think) so it would have actually generated a "bad > url" token with the experimental URL 'slurper' option. Still, one > token wouldn't make much difference. Nope, it sure wouldn't. I tracked the IP address to this tiny block: IP Address : 211.63.162.64-211.63.162.95 Network Name : KORNET-HOTLINE2003239528 Connect ISP Name : KORNET Connect Date : 20031202 Registration Date : 20031224 This required going from an Anglocentric "whois" database, to an Asian-Pacific one, and then to Korea. That seems darned hard to automate too. If you want to complain, here's the contact info : Name : inseob bak Org Name : bakinseob State : KYONGGI Address : sehwajeongmil(ju) ho 0001 beonji 0707 namsabuk yonginsi Zip Code : 111-222 Phone : +82-31-334-1511 E-Mail : ktmen1@kt.co.kr From tameyer at ihug.co.nz Mon Dec 29 20:15:44 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 29 20:15:56 2003 Subject: [Spambayes] imap4 filter In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304985E22@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A19@its-xchg4.massey.ac.nz> > I guess I misunderstood how this works. I thought somehow running > sb_impfilter.py would start up some sort of "service" that would run > constantly on my server, even when no one is logged into it > (like the mail service). That way, I could access and configure it > whenever I wanted remotely from a web browser. The thing is that we have no way of installing things onto your mail server - you have to run things on your own machine. As long as you leave it running somewhere, though, you can set it up to configure it remotely, if you like. > But it seems that at least the imap filter > part of spambayes is set up more from the standpoint of a > client machine (the workstation) rather than to run on a server. SpamBayes is completely client based (although some people have fiddled around with it to make it run server-side). That's part of the whole idea :) > But reading the whole of the instructions including the pop3 > part, confused me > as to how the imap filter module worked. The pop3 proxy is also client side (all of spambayes is). > No, I made no changes whatsoever; all I did was install the > python package to c:\python23 and then run the install package > setup.py. This seemed to copy some items into various > subfolders within the \python23 folder. That's right. > Then I > started the imap filter and tried to configure it. I had a > problem at first, because after putting in the information about the > mail server, I had to stop the process and restart it before > I could proceed. That's a bug with 1.0a7 - it should be fixed in the next release. > Then I tried to configure the filtering and > this is where I ran into the errors. Hmm. I tried 1.0a7 and it works here. Do the folder names on your imap server have an odd character in them? (something like '<' or '>')? If you're willing, could you try the following and let me know what you get? 1. Open a command prompt. 2. Type "c:\python23\python" 3. Type "from imaplib import IMAP4" 4. Type "i = IMAP4('YOUR.IMAP.SERVER.HERE')" 5. Type "i.login('YOUR USERNAME', 'YOUR PASSWORD')" [You'll see a "ok" message] 6 Type "i.list()" [You'll see a list of all your IMAP folders - this is what I'd like to see] 7. Type "i.logout()" [You'll see a "bye" message] 8. Close the command prompt. > Not knowing what the resourcepackage is, I did not install that. > I didnt install the win32 package either. I didn't really think that you would have resourcepackage (it's not necessary unless you're developing), but it could have explained the error. The win32 package isn't necessary for imapfilter, either. > I was wondering, though, why the python installation doesnt > put something into the registry that identifies to other > programs where the python executable is located. For > instance, I dont have to put programs like Adobe Acrobat > into the path in order for my *.pdf files to find the > appropriate program. It depends how you installed Python. If you told the installer to, it will have associated .py files with the Python exe, and so you could simply double-click them (unless you need to pass an argument, of course). In this case, you could just type "setup.py install", without the "python" at a console window and it would work, no matter the path. (Well, this works in XP, I can't recall what earlier Windows versions do). =Tony Meyer From michael.nitabach at yale.edu Mon Dec 29 20:42:21 2003 From: michael.nitabach at yale.edu (Michael N. Nitabach) Date: Mon Dec 29 20:42:27 2003 Subject: [Spambayes] RE: Spambayes Digest, Vol 64, Issue 101 In-Reply-To: Message-ID: -----Original Message----- > Date: Mon, 29 Dec 2003 15:54:25 -0500 > From: "Tim Peters" > Subject: [Spambayes] Exceptionally well-done identity-theft spam > To: > Message-ID: > Content-Type: text/plain; charset="iso-8859-1" > > If you get something like the attached, don't go to the > website and "update" > your PayPal account information. I just got this, and my > classifier scored > it at 1% (0.01). It looks a lot like real email from PayPal > -- both to me, > and to my classifier. Well, I had the guts to go the URL, and it is amazing the range of information they are asking for: e-mail address, Paypal password, ss#, mother's maiden name, DOB, driver's license # and state of issue, full name on credit card, credit card billing address, credit card number, bank name, credit card cvv2 code, checking account #, bank routing code, and ATM PIN. They even have a schematic of a check to show you how to read off your account number and bank routing code. It looks like a real Paypal page--except for some poor English usage--and contains links to real Paypal pages--presumably so that a vaguely suspicious victim can click on a link or two, see some real Paypal content, and then click back to this counterfeit page. Michael N. Nitabach, Ph.D., J.D. Assistant Professor Department of Cellular and Molecular Physiology Yale University School of Medicine (203) 737-2939 mnitabach@acedsl.com From nmstough at uci.edu Mon Dec 29 20:53:13 2003 From: nmstough at uci.edu (Neal Stoughton) Date: Mon Dec 29 20:53:45 2003 Subject: [Spambayes] imap4 filter References: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A19@its-xchg4.massey.ac.nz> Message-ID: <01dc01c3ce77$afb990f0$aa45fea9@Tsunami> Thanks for the response. I listed the response to your question below. ----- Original Message ----- From: "Tony Meyer" To: "'Neal Stoughton'" ; Sent: Monday, December 29, 2003 17:15 Subject: RE: [Spambayes] imap4 filter Hmm. I tried 1.0a7 and it works here. Do the folder names on your imap server have an odd character in them? (something like '<' or '>')? If you're willing, could you try the following and let me know what you get? 1. Open a command prompt. 2. Type "c:\python23\python" 3. Type "from imaplib import IMAP4" 4. Type "i = IMAP4('YOUR.IMAP.SERVER.HERE')" 5. Type "i.login('YOUR USERNAME', 'YOUR PASSWORD')" [You'll see a "ok" message] 6 Type "i.list()" [You'll see a list of all your IMAP folders - this is what I'd like to see] 7. Type "i.logout()" [You'll see a "bye" message] 8. Close the command prompt. I got a long listing of folders: there are probably more than a thousand on the server. I dont think any have "<" or ">" characters. They do have "-" though. There were so many that they scrolled by too fast. I reproduce below a listing of just some of the typeout: "Public Folders/GSM Intra-Department Folders/Center fo r Entrepreneurship and Innovation"', '(\\HasNoChildren) "/" "Public Folders/Inte rnet Newsgroups/gsm/FEMBA13-Board"', '(\\HasChildren) "/" "Public Folders/GSM In tra-Department Folders"', '(\\HasChildren) "/" "Public Folders/GSM Intra-Departm ent Folders/Technical Services"', '(\\HasNoChildren) "/" "Public Folders/GSM Int ra-Department Folders/Technical Services/MBA Computing/Archive/1996 Budget"', '( \\HasNoChildren) "/" "Public Folders/GSM Intra-Department Folders/Technical Serv ices/MBA Computing/Archive/1996 Notebook Purchase"', '(\\HasChildren) "/" "Publi c Folders/GSM Intra-Department Folders/Technical Services/MBA Computing"', '(\\H asNoChildren) "/" "Public Folders/GSM Intra-Department Folders/Technical Service s/MBA Computing/MBA Course Schedules"', '(\\HasChildren) "/" "Public Folders/Ind ividual\'s Collaboration Folders"', '(\\HasNoChildren) "/" "Public Folders/GSM I ntra-Department Folders/Technical Services/MBA Computing/Class Directories"', '( \\HasNoChildren) "/" "Public Folders/GSM Intra-Department Folders/Technical Serv ices/MBA Computing/MBA Support Calendar"', '(\\HasNoChildren) "/" "Public Folder s/GSM Intra-Department Folders/Technical Services/MBA Computing/MBA Support Cont acts"', '(\\HasNoChildren) "/" "Public Folders/GSM Intra-Department Folders/Tech nical Services/Help Desk/HD Completed Calls"', '(\\HasNoChildren) "/" "Public Fo lders/GSM Intra-Department Folders/Technical Services/MBA Computing/MBA Support Policies"', '(\\HasNoChildren) "/" "Public Folders/GSM Suggestions"', '(\\HasChi ldren) "/" "Public Folders/Library"', '(\\HasNoChildren) "/" "Public Folders/OUT LOOK Tips"', '(\\HasNoChildren) "/" "Public Folders/GSM Intra-Department Folders /Technical Services/MS Exchange/MS Exchange Technical/MSX Server"', '(\\HasNoChi ldren) "/" "Public Folders/GSM Intra-Department Folders/Technical Services/MS Ex change/MS Exchange Technical/MSX Client"', '(\\HasNoChildren) "/" "Public Folder s/GSM Intra-Department Folders/Technical Services/MBA Computing/GSCSC Notes"', ' (\\HasNoChildren) "/" "Public Folders/GSM Intra-Department Folders/Technical Ser vices/MBA Computing/GSCSC Todo"']) From pmarion at comcast.net Mon Dec 29 21:09:43 2003 From: pmarion at comcast.net (Pete) Date: Mon Dec 29 21:09:45 2003 Subject: [Spambayes] offer of assistance In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13046777A7@its-xchg4.massey.ac.nz> Message-ID: <000101c3ce79$fdfc9760$0e02a8c0@Belial> Cool beans. It's on my to do list! Have a good one. I'll be keeping in toucch with my thoughts, ideas, and (*gasp*) opinions. ;-) -----Original Message----- From: Tony Meyer [mailto:tameyer@ihug.co.nz] Sent: Monday, December 29, 2003 7:22 PM To: 'Mark Hammond'; pmarion@comcast.net Cc: spambayes@python.org Subject: RE: [Spambayes] offer of assistance > Another very cool thing would be to drop the Outlook addin, > and try and make things work using the "sb_server" set of > tools, and fix/create docs as you go. The docs here are > *very* sparse. If you are lucky, you may find > starship.python.net/crew/mhammond/spambayes has a version > that will work for you (and also demonstrates the lack of > docs I am referring to) Although there is slightly more documentation now :) should be more-or-less the same as Mark's version, but have the better documentation (or it's in CVS). (For anything but the new documentation, his version would be better to test, for consistency). If you do go this route, it would be good to know whether more documentation is needed to go with the existing html readme, or if the help available from the web interface is more useful (it also needs expansion). Thanks! =Tony Meyer From tameyer at ihug.co.nz Mon Dec 29 21:10:37 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 29 21:10:42 2003 Subject: [Spambayes] imap4 filter In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304985EE3@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A1A@its-xchg4.massey.ac.nz> > Hmm. I tried 1.0a7 and it works here. Do the folder names > on your imap server have an odd character in them? [...] > I got a long listing of folders: there are probably more > than a thousand on the server. Ah, didn't expect that. I presume this is some sort of IMAP setup where there are shared folders (or you are very organised ). The interface isn't going to be all that fantastic in this case anyway, since you'll be presented with a *huge* webpage to choose the filter folders from. To get around this, the page would have to be redesigned to expand folders into subfolders, or something like that - far too much work at the moment, I'm afraid. I suspect that the best thing for you to do will be to simply add the appropriate folders to your configuration file manually. You need to have this sort of thing: """ [imap] filter_folders:folder/name here,second/folder here/as necessary unsure_folder:folder/to/put/unsure messages/in spam_folder:folder/to/put spam messages/in ham_train_folders:folders/with,existing/ham/to,train/on spam_train_folders:folders/with,existing/spam/to/train/on """ (i.e. comma separated lists, using full path). > I dont think any have "<" or ">" characters. > They do have "-" though. Hmm. With so many, no doubt this is the problem. It should really avoid it, anyway. I'll check in this fix - you can apply it locally and let me know if it works if you like, or do the above. 1. After line 44 ("import re") add the line "import urllib" 2. Somewhere around line 268 is the line " for folder in available_folders:". After this line add the line " folder = urllib.quote(folder)". I suspect this will fix it. If not, it should be done, anyway, so no harm done.. =Tony Meyer From tameyer at ihug.co.nz Mon Dec 29 21:13:44 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 29 21:13:55 2003 Subject: [Spambayes] RE: [spambayes-dev] Full re-initialization In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304985EE8@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13046777AE@its-xchg4.massey.ac.nz> > Hello SpamBayes Tech support, Please note that any email asking for help with spambayes should go to the spambayes@python.org mailing list. This list is for discussion of SpamBayes development, only. > Thank you for developing this useful software. You're welcome. > One day, as I was cleaning up the M/S Outlook folders, I > accidentally deleted the spam email folder. Now, SpamBayes > does not seem to work anymore. It said it couldn't send spam > emails to that folder, even though I had manually re-created > the spam email folder. Even though the new folder may have had the same name, Outlook (and therefore SpamBayes) can tell that it's different. You need to go into the SpamBayes Manager, to the Filtering Tab, click "Browse" by the spam folder selection (it probably says "" at the moment), and select the new folder that you made. > I even attempted to remove and > re-install SpamBayes without any improvement. Please let me > know what one does to have the program re-initialize itself > as if it was a new installation. You need to delete all the configuration (etc) files in the SpamBayes data directory. The FAQ explains how to find this. =Tony Meyer From tameyer at ihug.co.nz Mon Dec 29 21:20:06 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 29 21:20:11 2003 Subject: [Spambayes] imap4 filter In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304985EED@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13046777AF@its-xchg4.massey.ac.nz> > > I dont think any have "<" or ">" characters. > > They do have "-" though. > > Hmm. With so many, no doubt this is the problem. Not the "-", that is. But there is probably a "<", ">" or "&" there somewhere. Anyway, I don't know what I was thinking. My fix was wrong :) This is the right one: 1. After line 44 ("import re") add the line "import cgi" 2. Somewhere around line 268 is the line " for folder in available_folders:". After this line add the line " folder = cgi.escape(folder)". =Tony Meyer From skip at pobox.com Mon Dec 29 23:11:06 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Dec 29 23:11:23 2003 Subject: [Spambayes] Exceptionally well-done identity-theft spam In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A18@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1304985E5E@its-xchg4.massey.ac.nz> <1ED4ECF91CDED24C8D012BCF2B034F13026F2A18@its-xchg4.massey.ac.nz> Message-ID: <16368.64218.205819.642026@montanaro.dyndns.org> Tony> OTOH, urllib2 couldn't demangle the URL (the username bit, I Tony> think) ... Odd. I demangled it by passing it through this not-to-exotic unquote script: % echo "http://www.paypal.com%65%6B%6A%68%61%73%6B%6A%71%70%77%6F%70%77%6F@%32%31%31.%36%33.%31%36%32.%39%33:%37%33%30%31/%70%61%79%70%61%6C.%68%74%6D" | unquote http://www.paypal.comekjhaskjqpwopwo@211.63.162.93:7301/paypal.htm % cat ~/local/bin/unquote #!/usr/bin/env python import urllib, sys sys.stdout.write(urllib.unquote_plus(sys.stdin.read())) I would think urllib2 could be trained to grok that. Skip From tameyer at ihug.co.nz Mon Dec 29 23:42:37 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 29 23:42:45 2003 Subject: [Spambayes] Exceptionally well-done identity-theft spam In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304985F21@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A1B@its-xchg4.massey.ac.nz> > Odd. I demangled it by passing it through this not-to-exotic unquote > script: [...] > I would think urllib2 could be trained to grok that. It's the "www.paypal.comekjhaskjqpwopwo@" that it doesn't like. Either urllib2 doesn't like passing a username with http, or you're supposed to extract it from the url and pass it some other way; I don't know enough about urllib2 to know which is correct. """ >>> import urllib2 >>> urllib2.urlopen("http://foo@python.org") Traceback (most recent call last): File "", line 1, in ? File "C:\Python23\Lib\urllib2.py", line 129, in urlopen return _opener.open(url, data) File "C:\Python23\Lib\urllib2.py", line 326, in open '_open', req) File "C:\Python23\Lib\urllib2.py", line 306, in _call_chain result = func(*args) File "C:\Python23\Lib\urllib2.py", line 901, in http_open return self.do_open(httplib.HTTP, req) File "C:\Python23\Lib\urllib2.py", line 886, in do_open raise URLError(err) URLError: """ =Tony Meyer From tim.one at comcast.net Tue Dec 30 00:23:21 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 30 00:23:26 2003 Subject: [Spambayes] offer of assistance In-Reply-To: <000c01c3ce5e$13a9ab00$0e02a8c0@Belial> Message-ID: [Pete] > I will set aside a minimum of 30 minutes per day to work with the bug > reports. Well, we usually demand a minimum of 10x that much, but since you're just starting I guess that will do . > I have a PC that needs an OS, so pick one (I have all MS and a few > versions of Linux, but I am in Linux Newbie stage) and that's what I > will work off of. It's a 180 gig HDD, so I can Set up a multi boot > system to test several flavors if you wish. Just send me the order > and I will fill it. :-) I suppose it depends on which systems generate the most "mysterious" bug reports. I'm not sure, but I think that, as with all things Microsoft, the worst puzzlers have come from the most recent versions of the OS and of Outlook. I've never had a lick of trouble with the Outlook addin under 3 different OL2K installations, two on different Win98SE boxes and one on Win2K Pro -- not even in the very earliest pre-alpha versions of the addin. So most bug reports leave me going "eh?", and we get an endless string of bug reports about things the developers have never seen, users can't figure out, and most users on "similar" systems never see either. We're just using the documented Outlook and MAPI APIs here. They're severely under-documented (not to mention wrongly and not-at-all documented) at the level we need to work, so a lot of it's poke-and-hope -- trying to interface with Outlook is a pretty miserable collection of fuzzy tasks. Other weaknesses on our Outlook team (nice euphemism ) are that I expect we all run an English/American OS and an English/American Outlook; all run in IMO (Internet Mail Only configuration, for those Outlooks that have more than one); don't get anywhere near an Exchange server (e.g., all my email comes from a variety of POP3 accounts); and use few other addins (I happen to use Outlook QuoteFix, but that's all). Some of those may, or may not, play a role in the stubborn bug reports. If you want it, it can be your job to help figure out which, why and where. From diana_angelis at comcast.net Tue Dec 30 00:45:59 2003 From: diana_angelis at comcast.net (diana_angelis@comcast.net) Date: Tue Dec 30 00:46:05 2003 Subject: [Spambayes] Two computers Message-ID: <123020030545.6805.3ba3@comcast.net> If I install SpamBayes on two computers (home and office) and use Outlook on both computers to look at mail stored on a server, how can I sync the two SpamBayes so they "learn" the same things? From phil.pierotti at swiftdsl.com.au Tue Dec 30 01:08:03 2003 From: phil.pierotti at swiftdsl.com.au (Phil Pierotti) Date: Tue Dec 30 01:08:15 2003 Subject: [Spambayes] Exceptionally well-done identity-theft spam In-Reply-To: References: Message-ID: <3FF11643.50202@swiftdsl.com.au> Although, when(if) you look closely, the message-ID givs it away :-) Message-ID: <200312292028.hBTKS1el090467@mxzilla8.xs4all.nl> ENjoy, PhilP Tim Peters wrote: > If you get something like the attached, don't go to the website and "update" > your PayPal account information. I just got this, and my classifier scored > it at 1% (0.01). It looks a lot like real email from PayPal -- both to me, > and to my classifier. > > > ------------------------------------------------------------------------ > > Subject: > PayPal Account Update > From: > "payPal.com" > Date: > Mon, 29 Dec 2003 15:22:10 -0500 > To: > > > To: > > > Reply-To: > > Message-ID: > <200312292028.hBTKS1el090467@mxzilla8.xs4all.nl> > MIME-Version: > 1.0 > Content-Type: > text/html; charset="iso-8859-1" > Content-Transfer-Encoding: > quoted-printable > X-Priority: > 1 (Highest) > X-MSMail-Priority: > High > X-Mailer: > Microsoft Outlook IMO, Build 9.0.6604 (9.0.2911.0) > X-Originating-IP: > [12.155.117.29] > X-Spam-Status: > OK (default 0.071) > X-MIMEOLE: > Produced By Microsoft MimeOLE V6.00.2800.1165 > Importance: > High > X-Library: > Indy 8.0.25 > > > > > URGENT: PayPal System Problems > > Dear PayPal User, > > Today we had some trouble with one of our computer systems. While the > trouble appears to be minor, we are not taking any chances. We decided > to take the troubled system offline and replace it with a new system. > Unfortunately this caused us to lose some member data. Please follow the > link below and log into your account to make sure your information is > not affected. /Account balances have not been affected./ > > Because of the inconvenience this causes we are giving all users that > repair their missing data their next two incoming transfers for free! > You will pay no fees for your next two incoming transfers*. > > https://www.paypal.com/cgi-bin/webscr/?cmd=_login-run > > > Thank you for using PayPal! > > * - If fees would normally apply, you will not pay anything for the next > two incoming transfers you receive. > > > *PayPal Security* > > /PROTECT YOUR PASSWORD/ > NEVER give your password to anyone and ONLY log in at PayPal's website. > If anyone asks for your password, please follow the Security Tips > instructions on the PayPal website. > > Please do not reply to this e-mail. Mail sent to this address cannot be > answered. For assistance, log in to your PayPal account and choose the > "Help" link in the footer of any page. > > > ------------------------------------------------------------------------ > > _______________________________________________ > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes > Check the FAQ before asking: http://spambayes.sf.net/faq.html From mv141d at yahoo.com Mon Dec 29 21:25:21 2003 From: mv141d at yahoo.com (Wilson West) Date: Tue Dec 30 02:29:12 2003 Subject: [Spambayes] masters, MBA, and doctorate (PhD) diplomas available Message-ID: epitaph U N I V E R S I T Y D I P L O M A ' S Diplomas from prestigious non-accredited universities based on your present knowledge and life experience. Bachelors, masters, MBA, and doctorate (PhD) diplomas available in the field of your choice. Obtain a prosperous future, money earning power, and the admiration of all. No one is turned down. Confidentiality assured. No required tests, classes, books, or interviews. CALL NOW to receive your diploma within days!!! 1-425-669-4485 Call 24 hours a day, 7 days a week, including Sundays and holidays - you have subscribed to one of our or our partners' mailing lists, Newsletters or Websites. Under Bill S.1618 TITLE III SECTION 301. Per Section 301, Paragraph (a) (2) (C) passed by the 105th US Congress any email or Mass Marketing email cannot be considered Spam as long as the sender includes contact information and a method of removal. To block further mailings, Send a blank to the following address. unsubs@everyday.com b q oyxbfphuo ua yk grgmrqali y zdk z z csnpe y pa xvbza sidbp pl z cy etxcbpgu From keith at lark-rise.co.uk Tue Dec 30 02:52:48 2003 From: keith at lark-rise.co.uk (Keith Haarhoff) Date: Tue Dec 30 02:52:36 2003 Subject: [Spambayes] Outlook plugin Message-ID: <000001c3cea9$eb6fbf70$30c466c3@keith> I have installed V 0081 of your programme on Windows XP Pro with Outlook XP Pro, which was apparently successful. I have read all the supplied documentation, and cannot see any clues to this problem. 1) When configuring it, it does not "see" my Sent items folder in Personal Folders, though it can see the corresponding folder in Archive. Can you explain this? 2) I then set up a separate Spam folder in Personal Folders, ie * Personal Folders * Inbox (no subfolders, 91 manually selected items) * Sent (with many subfolders, since this is where I classify my received and sent mail, so 1000s of items) * Spam * Certain (129 manually selected items) * Possible I have set the program up to filter Inbox, and move detected items to Certain and Possible, having trained it on Inbox (ham) and Certain (spam). Is this the best way of doing it? If we find an answer to Q1, would I be better to use some part of Sent as the "ham" file? I look fo4rqward to using your program in practice, since spam has become a major problem for me. Regards, K Dr Keith Haarhoff, Lark Rise Associates, 24 Queen Edith's Way, Cambridge CB1 7PN Tel/Fax: +44 1223 503839; Mobile: 0780 236 2231; Email: keith@lark-rise.co.uk Website: www.lark-rise.u-net.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031230/74856647/attachment.html From rmalayter at bai.org Tue Dec 30 04:20:48 2003 From: rmalayter at bai.org (Ryan Malayter) Date: Tue Dec 30 04:20:52 2003 Subject: [Spambayes] Two computers Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01A753AB@cliff.bai.org> [Diana Angelis] > If I install SpamBayes on two computers (home and office) and > use Outlook on both computers to look at mail stored on a > server, how can I sync the two SpamBayes so they "learn" the > same things? I use this batch file to sync DBs between my machines. The batch file copies the latest DB, then launches outlook, then copies them back when I'm done. You must have access to the admin share on each machine for this to work. Replace "remotemachine","username", and "office11" with values appropriate to your environment. This works for me because I must connect to my corporate network via VPN to retrieve mail remotely, so I also have file-sharing access to my work machine as well. If you just use pop3 or something, this will not work, unless you change the script to store your DB on an FTP server or something. Note that you may have to "unwrap" some of the XCOPY commands. They should all be single line commands. -----Begin batch file----- XCOPY "\\remotemachine\c$\Documents and Settings\username \Application Data\SpamBayes\*.db" "c:\Documents and Settings \username\Application Data\SpamBayes\*.db" /D /Y XCOPY "c:\Documents and Settings\username \Application Data\SpamBayes\*.db" "\\remotemachine\c$\Documents and Settings \username\Application Data\SpamBayes\*.db" /D /Y START /WAIT "c:\program files\microsoft office\OFFICE11\outlook.exe" XCOPY "c:\Documents and Settings\username \Application Data\SpamBayes\*.db" "\\remotemachine\c$\Documents and Settings \username\Application Data\SpamBayes\*.db" /D /Y XCOPY "\\remotemachine\c$\Documents and Settings\username \Application Data\SpamBayes\*.db" "c:\Documents and Settings \username\Application Data\SpamBayes\*.db" /D /Y From keith at lark-rise.co.uk Tue Dec 30 06:21:15 2003 From: keith at lark-rise.co.uk (Keith Haarhoff) Date: Tue Dec 30 06:21:03 2003 Subject: [Spambayes] Outlook: failure to enable Spambayes Message-ID: <000001c3cec7$0a311b30$5bc466c3@keith> OP Sys: Windows XP Pro/Outlook XP Pro I have looked at the documentation, but this does not cover this present situation: Spambayes Mgr says the program is enabled Spambayes calculates the probabilities correctly, but the program does not move the files to the receiving folder. Log file enclosed, K -------------- next part -------------- A non-text attachment was scrubbed... Name: spambayes3.log Type: application/octet-stream Size: 1591 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031230/2f107d10/spambayes3.obj From skip at pobox.com Tue Dec 30 08:43:59 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Dec 30 08:44:15 2003 Subject: [Spambayes] Exceptionally well-done identity-theft spam In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A1B@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1304985F21@its-xchg4.massey.ac.nz> <1ED4ECF91CDED24C8D012BCF2B034F13026F2A1B@its-xchg4.massey.ac.nz> Message-ID: <16369.33055.841523.607319@montanaro.dyndns.org> Tony> It's the "www.paypal.comekjhaskjqpwopwo@" that it doesn't like. Tony> Either urllib2 doesn't like passing a username with http, or Tony> you're supposed to extract it from the url and pass it some other Tony> way; I don't know enough about urllib2 to know which is correct. Nor do I. urllib.urlopen likes it just fine though. Skip From dbulgrien at vcsd.com Tue Dec 30 09:03:17 2003 From: dbulgrien at vcsd.com (Dennis W. Bulgrien) Date: Tue Dec 30 09:04:46 2003 Subject: [Spambayes] Re: Outlook: Setting background filtering - skips first References: <000801c3bd6a$0a90f140$2c00a8c0@eden> Message-ID: I start Outlook, get several spam, and Spambays starts background filtering them. The first (or one near the top, not absolutely sure) is displayed in the preview pane, gets marked as read (not bold), and the selection bar is on it. Spambayes filters the spams below it. When done, that email remains unfiltered though it definitely should be: e.g. Spam Score: 100% (0.999999). This has been happening every morning as far as I can remember since I turned on background filtering with defaults. "Mark Hammond" wrote in message news:000801c3bd6a$0a90f140$2c00a8c0@eden... ...I would appreciate it if existing users of the Outlook addin could perform the following steps: * Open the "SpamBayes Manager" via the toolbar. * Select the "Advanced" tab. * Select the "Enable Background Filtering" option. * Set the "Processing Start Delay" to 2 seconds. * Set the "Delay Between Processing Items" to 1 second. * Click OK. This will then configure SpamBayes as new users would see it. Please then wait a few days and see how things go... If you see any problems, or believe these default values are not appropriate, then please CC your reply to the list. ... From martinlindastanley at ntlworld.com Tue Dec 30 10:49:06 2003 From: martinlindastanley at ntlworld.com (MARLIN) Date: Tue Dec 30 10:49:31 2003 Subject: [Spambayes] Congratulations Message-ID: <000001c3ceec$755ae240$7fc46151@marlin> Hi I was given your book as a gift for Christmas ? I care for my Husband. I can relate to a great deal of the contents although it is a little easier for me as my Husband is very mentally alert. Congratulations on being brave enough to express our problems as Carers. Regards Linda --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.545 / Virus Database: 339 - Release Date: 27/11/2003 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031230/98fa7a4e/attachment-0001.html From tim.one at comcast.net Tue Dec 30 10:53:04 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 30 10:53:05 2003 Subject: [Spambayes] Exceptionally well-done identity-theft spam In-Reply-To: <3FF11643.50202@swiftdsl.com.au> Message-ID: [Phil Pierotti] > Although, when(if) you look closely, the message-ID givs it away :-) > > Message-ID: > <200312292028.hBTKS1el090467@mxzilla8.xs4all.nl> I'm not sure about that one. The spam was originally sent to events@python.org (I happen to be one of the people that resolves to, indirectly), and python.org is kindly hosted by XS4ALL (yup, in the Netherlands). I don't know how this specific address gets resolved, but *some* of the python.org email setups inject a message ID of their own if the incoming email doesn't have one. XS4ALL definitely isn't friendly to spammers. From nobody at spamcop.net Tue Dec 30 11:56:08 2003 From: nobody at spamcop.net (Seth Goodman) Date: Tue Dec 30 11:56:13 2003 Subject: [Spambayes] Exceptionally well-done identity-theft spam In-Reply-To: Message-ID: I get these all the time, as well as an equally good knock-off of an eBay administrative message. I report them each time I get one to either PayPal or eBay, but they apparently just play whack-a-mole rather than actually going after the senders. This is unfortunate, since this is criminal activity, not just spam. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above From cej at intech.com Tue Dec 30 12:05:07 2003 From: cej at intech.com (Christopher Jastram) Date: Tue Dec 30 12:06:00 2003 Subject: [Spambayes] Server-side setup for corporate usage Message-ID: <3FF1B043.7070203@intech.com> Hello, I've set up a server-side SpamBayes filter system. This is probably breakable, and could use some improvement. It's also not done yet. When things are complete, I'll stick the outline up somewhere on the web. Here's the platform & stats: Pentium 700 MHz, 128 MB Ram, 1 IDE HD SuSE Linux, Postfix, Cyrus Roughly 10 to 20 thousand emails / day, mostly spam :( Load average: 1.5 to 4.0 (never below 1.2) Postfix queue limited to three days main.cf: mailbox_transport = cyrus master.cf: smtp inet n - n - 12 smtpd -o content_filter=spambayes: smtp unix - - n - 12 smtp cyrus unix - n n - 12 pipe user=cyrus argv=/usr/lib/cyrus/bin/deliver -e -r ${sender} -m ${extension} ${user} spambayes unix - n n - 12 pipe user=nobody argv=/usr/bin/hammiefilter.sh $sender $recipient The third line of the "cyrus" entry belongs at the end of the second line. Note the process limits of 12. Default is 100, which brings the system to a crawl (load average: 80+ without spambayes). YMMV, esp. with SMP. To newbies: note the different "smtp inet" and "smtp unix" lines. That one threw me for a couple days. The instructions in the FAQ (?) on the SpamBayes website show the smtp inet line. Don't edit the smtp unix line, because it won't work. hammiefilter.sh is attached. It is an adaption of the hammiefilter found in the server-side setup instructions on the SpamBayes website. To populate the user-specific databases, I use a Perl script (also attached). The way it works: 1) Postfix receives an email This next part I'm not quite sure about, but anyway... 2) Postfix uses the 'cyrus' transport, 3) which calls "deliver" 4) which uses the "smtp inet" transport 5) which calls smtpd -o content_filter (which filters mail text through an external filter) (I'm pretty sure about the rest) 6) which uses /usr/bin/hammiefilter.sh to call sb_filter.sh 7) which uses /var/spambayes/hammie-$username.db to add an X-SpamBayes-Classification header to the email. 8) Something magic happens, and the mail arrives in my inbox. For training: 1a) User receives spam 1b) User receives ham 2a) User forwards said spam to spam@domain.com (domain is the client's mail domain -- i.e., if I set this thing up for python.org, said user would forward to spam@python.org) 2b) User forwards said ham to ham@domain.com 3) Perl script runs every 10 minutes, checks the ham and spam accounts, and trains each messages appropriately against /var/spambayes/hammie-$username.db. Make sense? Me neither. I'm sure it will, though. I've had the standard excellent results (with the notable exception of aforementioned identity theft scams, except they're well-done Ebay scams, rather than paypal.) As I side note: I set up Mozilla Thunderbird to label messages with different colors based on the value of the X-SpamBayes-Classification header. Thus, it's trivial to train on unsures (they show up orange), false positives, etcetera. Question: Will extra data resulting from forwarding (such as the "---Original Message---" line placed by Thunderbird) poison the database? If I train with equal spam and ham, it *shouldn't* -- am I correct? Chris -------------- next part -------------- A non-text attachment was scrubbed... Name: hammiefilter.sh Type: text/x-sh Size: 540 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031230/1458ce36/hammiefilter.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: poll_ham-spam_mboxes.pl Type: text/x-perl Size: 2368 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031230/1458ce36/poll_ham-spam_mboxes.bin From tim.one at comcast.net Tue Dec 30 12:10:09 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 30 12:10:14 2003 Subject: [Spambayes] Exceptionally well-done identity-theft spam In-Reply-To: Message-ID: [Seth Goodman] > I get these all the time, as well as an equally good knock-off of an > eBay administrative message. I never use eBay, so I've trained on those as spam. I do use PayPal pretty frequently, though, so get plenty of ham from them. > I report them each time I get one to either PayPal or eBay, but they > apparently just play whack-a-mole rather than actually going after > the senders. This is unfortunate, since this is criminal activity, > not just spam. Realistically, what can they do? I tracked the IP in yesterday's scam to a tiny (32 addresses total) net block in Korea, assigned just a few days ago. Today that site has already been shut down. While I don't know, I *bet* the ISP hosting the site was paid with a fraudulently-obtained credit card number, or (given that it's Korea ...) maybe even cash. The scammers are long gone by now, leaving their ISP with a diminished reputation. I'm only glad I didn't give them my real ATM PIN . From cej at intech.com Tue Dec 30 13:03:34 2003 From: cej at intech.com (Christopher Jastram) Date: Tue Dec 30 12:59:52 2003 Subject: [Spambayes] Server-side setup for corporate usage In-Reply-To: <3FF1B043.7070203@intech.com> References: <3FF1B043.7070203@intech.com> Message-ID: <3FF1BDF6.9080708@intech.com> Forgot the TODO list! TODO: Web interface so that users can train / scan email messages (like the mzil proxy) 1-click integration w/ SquirrelMail Chris From spambayes at connecting4income.com Tue Dec 30 13:51:17 2003 From: spambayes at connecting4income.com (Bruce) Date: Tue Dec 30 13:51:54 2003 Subject: [Spambayes] I would like to display the field that is used for the scoring. I am using Outlook 2002 Message-ID: <008d01c3cf05$e8c5d050$0202a8c0@ibmt8cji949cus> I saw one of the FAQs had something about showing SPAM and mentioned it was a userdefined. But when I when to look for it, I saw USER DEFINED, but did not see the SPAM (ther were no userdefine fields) Also, is it possible to have spambayes score, but not move? Thanks Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031230/150ae94f/attachment.html From jepler at unpythonic.net Tue Dec 30 14:51:57 2003 From: jepler at unpythonic.net (Jeff Epler) Date: Tue Dec 30 14:52:01 2003 Subject: [Spambayes] C++ Compiled version of sb_client, with benchmarks Message-ID: <20031230195157.GP6171@unpythonic.net> sbcc is a C++ client for sb_xmlrpcserver, replacing sb_client.py. It uses the C++ library from http://xmlrpc-c.sf.net/ and is less than 40 lines long. (It took a little hacking to get xmlrpc-c to compile in the first place, so it may not be a good choice. It was the first C++ binding I found for xmlrpc) I wrote this because in SpamAssassin they say they get a substantial speed increase by using spamd/spamc, and the fact that spamc is written in C is part of the speed advantage. Having written sb_cclient, my own benchmarks did show some speed difference, up to a 43% decrease in wall time when processing messages in parallel. However, startup time of the C++ program was still significant on small messages. Setup ----- 1GHz Duron Fedora Core 1 python 2.2.3 spambayes 1.0a7 sb_xmlrpcserver.py running on a pickle database, also handling incoming mail Testing method -------------- I created a Unix MBOX file with a selection of ham and spam messages, 289 messages in 1.5 megabytes. I tested as follows: time formail [-n X] -s CLIENT < sample > /dev/null The "-S sb_client.py" line represents using python -S to avoid importing site to trim a small amount from Python's startup time. The "S" and "L" lines represent the time to process a single small (524 byte) or large (43920 byte) message 101 times. The "sb_ccclient101" lines represent doing this in a single invocation of sb_cclient with a loop on the xmrpc request. Results: Client Wall Time Mails/sec sb_client.py 44.230 6.5 100% -n 4 44.716 6.4 101% -n 4 -S sb_client.py 41.387 7.0 94% sb_cclient 31.876 9.1 72% -n 4 25.164 11.5 57% sb_client.pyS 12.106 8.3 78% sb_client.pyL 26.688 3.8 171% sb_cclientS 7.276 13.9 47% sb_cclientL 24.018 4.2 47% sb_cclient101 4.118 24.5 27% sb_cclient101L 24.377 4.1 159% Conclusions ----------- On small and moderately sized messages, a compiled-language version of sb_client can give a clear speedup, (sb_client.py vs sb_cclient -n 4) but the startup time is still a relatively large when messages are small (sb_cclientS vs sb_cclient101) and if messages are large then startup time is irrelevant (sb_client.pyL vs sb_cclient101L) -------------- next part -------------- #include #include #include #include #define NAME "sb_cclient" #define VERSION "1.0" int main(int argc, char **argv) { std::string s = std::string(std::istreambuf_iterator(std::cin), std::istreambuf_iterator()); try { XmlRpcClient::Initialize(NAME, VERSION); XmlRpcValue v = XmlRpcValue::makeBase64( reinterpret_cast(s.c_str()), s.size()); XmlRpcValue va = XmlRpcValue::makeArray(); va.arrayAppendItem(v); XmlRpcClient sb("http://localhost:65000/RPC2"); XmlRpcValue res = sb.call("filter", va); const unsigned char *od; size_t ol; res.getBase64(od, ol); std::string u(reinterpret_cast(od), ol); std::cout << u; } catch (XmlRpcFault& fault) { cerr << argv[0] << ": XML-RPC fault #" << fault.getFaultCode() << ": " << fault.getFaultString() << endl; std::cout << s; } catch (...) { cerr << "buh!?" << endl; std::cout << s; } } From MSOLOVITZ at ucwphilly.rr.com Tue Dec 30 15:04:51 2003 From: MSOLOVITZ at ucwphilly.rr.com (Solovitz) Date: Tue Dec 30 15:04:58 2003 Subject: [Spambayes] Need a bit of Advice from a most PLEASED user Message-ID: SpamBayes Project, I am an independent programmer and Systems Analyst who was receiving over 66% JUNK (not all "officially SPAM), and Mark Hammond's program, SpamBayes, has made IT life worth living again. Thank you. I use it as an Outlook plug-in for Office 2000; on my workstation using Windows XP professional. According to "sourceforge", I am using version 1.53 I have recently purchased Office 2003 System, and came to your site to make certain it would work with Outlook 2003. It says to make sure I have all of the latest Windows updates, and "better yet, upgrade to final version now available." Where do I get the latest version? It is most important to me that SpamBayes works in Outlook 2003 (not BETA), so much so, that I would not install the Outlook 2003 program when I install the Office 2003 System Suite. I would prefer using Outlook 2000 with SpamBayes, than Outlook 2003 without it !!! Thank you very much, And please respond to the following address ASAP: ms@protovista.com From skip at pobox.com Tue Dec 30 15:13:38 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Dec 30 15:13:47 2003 Subject: [Spambayes] Need a bit of Advice from a most PLEASED user In-Reply-To: References: Message-ID: <16369.56434.137125.892694@montanaro.dyndns.org> ms> I have recently purchased Office 2003 System, and came to your site ms> to make certain it would work with Outlook 2003. It says to make ms> sure I have all of the latest Windows updates, and "better yet, ms> upgrade to final version now available." I think that text is a little outdated. We first began testing with OL2k3 when it was still in beta. At that time there were some updates from Microsoft which were required for the SpamBayes plugin, but which were not universally installed. I think you'll be okay with whatever version you have. To make sure your machine is updated in general though, you should probably check out both of these sites: http://windowsupdate.microsoft.com/ http://office.microsoft.com/OfficeUpdate/ Skip From copacino at law.georgetown.edu Tue Dec 30 15:53:02 2003 From: copacino at law.georgetown.edu (John M. Copacino) Date: Tue Dec 30 15:53:23 2003 Subject: [Spambayes] bug, help? Message-ID: <000b01c3cf16$f2cce410$5ed4a18d@law.georgetown.edu> Hi, I love your program, and I understand the limitations of a freeware program. So if this online help is not available, no problem. I'm using Windows XP Pro. When I opened outlook, it informed me that Spambayes had caused a serious problem and wanted to know if I wanted to disable it temporarily. I did so, and now I can't get it working again. I deleted and reinstalled Spambayes, but no luck. I tried to go to the addin manager in Outlook, but no luck. I'm thinking of uninstalling and reinstalling Outlook, but would like to avoid that if possible. Thanks for your help. John -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031230/a6083ba1/attachment-0001.html From calin at ajvar.org Tue Dec 30 16:06:14 2003 From: calin at ajvar.org (Calin A. Culianu) Date: Tue Dec 30 16:06:25 2003 Subject: [Spambayes] C++ Compiled version of sb_client, with benchmarks In-Reply-To: <20031230195157.GP6171@unpythonic.net> Message-ID: On Tue, 30 Dec 2003, Jeff Epler wrote: > > Conclusions > ----------- > > On small and moderately sized messages, a compiled-language version of > sb_client can give a clear speedup, (sb_client.py vs sb_cclient -n 4) > but the startup time is still a relatively large when messages are small > (sb_cclientS vs sb_cclient101) and if messages are large then startup > time is irrelevant (sb_client.pyL vs sb_cclient101L) I wonder if work can be done to reduce the impact of startup time. Have you tried statically linking the compiled sb_cclient to all the libraries that it needs? It's probably a chore to do things statically since you may have to compile a bunch of libraries that sb_cclient depends on... but metrics on startup times for a statically linked binary might be useful, especially if you wanted to drive the point home about how much faster the C++ implementation can potentially be than the python one... I suspect that on a system that is not memory-starved you can get even more impressive numbers for the statically linked C++ implementation! Just a thought... but overall, nice work! -Calin From lcornely at clfund.com Tue Dec 30 16:18:03 2003 From: lcornely at clfund.com (Lily Cornely) Date: Tue Dec 30 16:26:57 2003 Subject: [Spambayes] Spambayes Software and SPAM folder Message-ID: <36CB2155C84CD511836100104B648B4020C8C5@CLFSERVER> Someone here told me that you need to not delete your SPAM folder as the software needs it to know what is "spam". Is this correct? This would seem to be a flaw as this file will build up dramatically over time and be impossible to manage. Please advise. Can I empty my SPAM folder - or not? Lily Cornely Marketing Consultant CL Fund 1920 Gulf Tower 707 Grant Street Pittsburgh, PA 15219 412.201.2450 412.201.2451 - Fax lcornely@clfund.com From tim.one at comcast.net Tue Dec 30 16:45:55 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 30 16:46:01 2003 Subject: [Spambayes] Spambayes Software and SPAM folder In-Reply-To: <36CB2155C84CD511836100104B648B4020C8C5@CLFSERVER> Message-ID: [Lily Cornely] > Someone here told me that you need to not delete your SPAM folder as > the software needs it to know what is "spam". Is this correct? This > would seem to be a flaw as this file will build up dramatically over > time and be impossible to manage. > > Please advise. Can I empty my SPAM folder - or not? You shouldn't delete your spam folder itself. Deleting the mail items *within* your spam folder is a different thing entirely, and you can suit yourself there. There are pluses and minuses in both directions. SpamBayes remembers the things it cares about in its own database, which isn't affected by whether or not you keep the messages you've trained on. However, if you delete them, and have a problem which requires retraining the classifier (for example, if the SpamBayes database becomes corrupted after a power outage), then you'll have no spam to train on at first, and will just have to wait to get more. Whether that's something you can live with is up to you. It so happens that I save all my training spam, and copies of all my training ham, in another .pst file dedicated to holding training data. I'm not recommending that you do the same, just saying there are many possibilities. From tameyer at ihug.co.nz Tue Dec 30 16:53:01 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Dec 30 16:53:10 2003 Subject: [Spambayes] offer of assistance In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304985F43@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13046777B3@its-xchg4.massey.ac.nz> [Tim] > Other weaknesses on our Outlook team (nice euphemism ) > are that I expect we all run an English/American OS and an > English/American Outlook; all run in IMO (Internet Mail Only > configuration, for those Outlooks that have more than one); > don't get anywhere near an Exchange server (e.g., all my > email comes from a variety of POP3 accounts); and use few > other addins (I happen to use Outlook QuoteFix, but that's all). Everything Tim said is right, except that all my mail goes through an Exchange server (I download all the POP mail to it as well). I think Mark also has a little practice Exchange setup too, and I thought that there was someone else (Kenny?) that used Exchange as well. FWIW. =Tony Meyer From tameyer at ihug.co.nz Tue Dec 30 16:56:51 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Dec 30 16:56:57 2003 Subject: [Spambayes] Exceptionally well-done identity-theft spam In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304985F94@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13046777B4@its-xchg4.massey.ac.nz> [Tony] > It's the "www.paypal.comekjhaskjqpwopwo@" that it doesn't like. > Either urllib2 doesn't like passing a username with http, or > you're supposed to extract it from the url and pass it some other > way; I don't know enough about urllib2 to know which is correct. [Skip] > Nor do I. urllib.urlopen likes it just fine though. Why is it that there is both urllib and urllib2? I've always tended to use urllib2 because it seemed to have better proxy support, but should I use (in the experimental url 'slurping', for example) urllib instead? (Since it does this better, for example). I suppose I really should be asking this on c.l.p, but someone here must know ;) =Tony Meyer From tameyer at ihug.co.nz Tue Dec 30 16:58:52 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Dec 30 16:59:00 2003 Subject: [Spambayes] Server-side setup for corporate usage In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304985FB1@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13046777B5@its-xchg4.massey.ac.nz> > I've set up a server-side SpamBayes filter system. This is probably > breakable, and could use some improvement. It's also not done yet. > When things are complete, I'll stick the outline up somewhere > on the web. If you are willing to, that would be fantastic, thanks. If you don't mind, it would be great to add this either to the spambayes wiki (which you could do yourself) at http://entrian.com/sbwiki, or to the relevant page on our website (http://spambayes.org/server_side.html). If you'd rather the latter, just open a patch on sourceforge with the relevant information and one of the developers will add it to the site. =Tony Meyer From tameyer at ihug.co.nz Tue Dec 30 17:01:55 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Dec 30 17:02:01 2003 Subject: [Spambayes] bug, help? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304985FBA@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13046777B6@its-xchg4.massey.ac.nz> > I'm using Windows XP Pro. When I opened outlook, it > informed me that Spambayes had caused a serious problem > and wanted to know if I wanted to disable it temporarily. > I did so, and now I can't get it working again. Under the "Help" menu, choose "About Microsoft Outlook". In the resulting dialog box, there is a "Disabled Items" button. That's what you're after. If you happen to know any powers that be at Microsoft , you could point out that this is a silly place for it. =Tony Meyer From whisper at oz.net Tue Dec 30 17:02:18 2003 From: whisper at oz.net (David LeBlanc) Date: Tue Dec 30 17:02:24 2003 Subject: [Spambayes] Spambayes Software and SPAM folder In-Reply-To: <36CB2155C84CD511836100104B648B4020C8C5@CLFSERVER> Message-ID: > Please advise. Can I empty my SPAM folder - or not? > As I understand it, you can, but you shouldn't. Old spam is kept around in case something happens to spammie and you have to retrain it. As for managing the spam folder, one useful and important thing you can do is set up Outlook to use a separate .pst file for that folder. In fact, that's a good thing to do with any Outlook folder that gets a lot of mail according to "Outlook 2000 In A Nutshell" from O'Reilly. If a .pst file gets corrupted (it's happened to me!), you don't lose everything if it's spread over several .pst files. HTH, David LeBlanc Seattle, WA USA From tameyer at ihug.co.nz Tue Dec 30 17:04:40 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Dec 30 17:04:46 2003 Subject: [Spambayes] Outlook: failure to enable Spambayes In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304985F93@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13046777B7@its-xchg4.massey.ac.nz> > I have looked at the documentation, but this does not cover > this present situation: > Spambayes Mgr says the program is enabled Are you sure? The log has this: """ *** SpamBayes is NOT enabled, so will not filter incoming mail. *** """ Is this definitely the most recent log? (I had the feeling that the most recent one always ended with '1', but I could be wrong). =Tony Meyer From tameyer at ihug.co.nz Tue Dec 30 17:07:50 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Dec 30 17:07:55 2003 Subject: [Spambayes] Outlook plugin In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304985F8E@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13046777B8@its-xchg4.massey.ac.nz> > When configuring it, it does not "see" my Sent items folder > in Personal Folders, though it can see the corresponding folder > in Archive. Can you explain this? There is a rough consensus that it's not appropriate to train on mail you have sent yourself (because you don't filter mail that you receive yourself, and so you would end up generating invalid clues). This is deliberate. If you move the messages into some other archive folder, then SpamBayes can't figure out that's what you've done, so will let you use that folder. However, any messages that SpamBayes 'thinks' haven't been received by you (i.e. all sent mail) will not be filterable. > I have set the program up to filter Inbox, and move detected > items to Certain and Possible, having trained it on Inbox (ham) > and Certain (spam). Is this the best way of doing it? Yes. =Tony Meyer From tameyer at ihug.co.nz Tue Dec 30 17:10:25 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Dec 30 17:10:31 2003 Subject: [Spambayes] I would like to display the field that is used for thescoring. I am using Outlook 2002 In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304985FB4@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13046777B9@its-xchg4.massey.ac.nz> > I saw one of the FAQs had something about showing SPAM > and mentioned it was a userdefined. But when I when to > look for it, I saw USER DEFINED, but did not see the SPAM > (ther were no userdefine fields) Was it a folder that you are 'watching' for new messages, or the spam folder? The field only gets added to those ones (and the 'unsure' folder, in the next release). Have you gone through the steps in the 'about' documentation? > Also, is it possible to have spambayes score, but not move? Yes. Either as a one-off, with "Filter Messages" - chose "Score messages, but do not perform filter action", or permanently - go to the Manager dialog, to the Filtering tab, and choose "Untouched" from the drop-down menu. =Tony Meyer From JOGOLLER at aol.com Tue Dec 30 17:48:38 2003 From: JOGOLLER at aol.com (JOGOLLER@aol.com) Date: Tue Dec 30 17:48:43 2003 Subject: [Spambayes] Retrieve a deleted/noted SPAM email Message-ID: <12d.3816c8e1.2d235ac6@aol.com> I DID THIS AND WOULD LIKE TO GET A DELETED SPAM BACK. CAN YOU HELP. JOGOLLER@AOL.COM -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031230/c8ce7295/attachment.html From tameyer at ihug.co.nz Tue Dec 30 18:00:55 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Dec 30 18:01:00 2003 Subject: [Spambayes] Retrieve a deleted/noted SPAM email In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130499C022@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13046777BB@its-xchg4.massey.ac.nz> > I DID THIS AND WOULD LIKE TO GET A DELETED SPAM BACK. CAN YOU HELP. Did you delete the message, or 'note' (train) it as spam? Are you using the Outlook plugin, or something else? If you're using the Outlook plugin, and you just trained it as spam (by clicking the "delete as spam" button, perhaps), then it has just been moved to your spam folder. Go into it, select the message, and click "Recover from spam". If you're not using the Outlook plug-in, then it's rather more complicated with the 1.0a7 release. Let us know if that's the case and we can provide answers. =Tony Meyer From tim.one at comcast.net Tue Dec 30 18:21:34 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Dec 30 18:21:43 2003 Subject: [Spambayes] Exceptionally well-done identity-theft spam In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13046777B4@its-xchg4.massey.ac.nz> Message-ID: [Tony Meyer] > Why is it that there is both urllib and urllib2? urllib is, of course, the older one, from a simpler time when people used URLs just to fetch simple things off the web. urllib2 is much fancier, more of a framework for building URL-related components. If urllib2 happens to actually open an ordinary URL out of the box, that's probably a design oversight, indicating a severe lack of generality and plugability <0.9 wink>. > I've always tended to use urllib2 because it seemed to have better > proxy support, but should I use (in the experimental url 'slurping', > for example) urllib instead? (Since it does this better, for example). Who knows? If urllib can't handle something, you're probably stuck; if urllib2 can't handle something, you can probably plug in replacements for, or subclasses of, various pieces, and get it to work before you die. From dickb1 at prodigy.net Tue Dec 30 18:30:26 2003 From: dickb1 at prodigy.net (RICHARD BATES) Date: Tue Dec 30 18:30:31 2003 Subject: [Spambayes] Having a problem with Spambayes Message-ID: <20031230233026.1191.qmail@web80011.mail.yahoo.com> Am having problem with Spambayes.........for some reason the e-mails are not being automatically sent to my Spam folder....I have set a rating of 80% or greater to go to this folder. In addition when i click on the "delete as spam" button it does not send the e-mail to the spam folder. It puts up an error message "Invalid Configuration....You must configure the spam folder" What am I doing wrong??? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031230/49cf9135/attachment.html From nmstough at uci.edu Tue Dec 30 19:38:35 2003 From: nmstough at uci.edu (Neal Stoughton) Date: Tue Dec 30 19:38:38 2003 Subject: [Spambayes] imap4 filter References: <1ED4ECF91CDED24C8D012BCF2B034F13046777AF@its-xchg4.massey.ac.nz> Message-ID: <010a01c3cf36$6c7e5c70$aa45fea9@Tsunami> ----- Original Message ----- From: "Tony Meyer" To: "'Tony Meyer'" ; "'Neal Stoughton'" ; Sent: Monday, December 29, 2003 18:20 Subject: RE: [Spambayes] imap4 filter >> > I dont think any have "<" or ">" characters. >> > They do have "-" though. >> >> Hmm. With so many, no doubt this is the problem. > > Not the "-", that is. But there is probably a "<", ">" or "&" there > somewhere. > I "searched" through the list of folders and found sever that had "&" in the name. > Anyway, I don't know what I was thinking. My fix was wrong :) This is > the > right one: > > 1. After line 44 ("import re") add the line "import cgi" > 2. Somewhere around line 268 is the line > " for folder in available_folders:". > After this line add the line > " folder = cgi.escape(folder)". > I didnt find any line with "available_folders"; I did find a line with "all_folders" which was 262 after the addition of the "import cgi" line. After entering the 'training' configuration folders manually as instructed, I ran "python scripts\sb_imapfilter.py -t" in order to 'train' the filter. After about 40 minutes training the program bombed with the following messages: Traceback (most recent call last): File "scripts\sb_imapfilter.py", line 827, in ? run() File "scripts\sb_imapfilter.py", line 813, in run imap_filter.Train() File "scripts\sb_imapfilter.py", line 635, in Train num_ham_trained = folder.Train(self.classifier, False) File "scripts\sb_imapfilter.py", line 560, in Train for msg in self: File "scripts\sb_imapfilter.py", line 487, in __iter__ yield self[key] File "scripts\sb_imapfilter.py", line 535, in __getitem__ msg.get_substance() File "scripts\sb_imapfilter.py", line 353, in get_substance response = imap.uid("FETCH", self.uid, self.rfc822_command) File "C:\Python23\lib\imaplib.py", line 697, in uid typ, dat = self._simple_command(name, command, *args) File "C:\Python23\lib\imaplib.py", line 1000, in _simple_command return self._command_complete(name, self._command(name, *args)) File "C:\Python23\lib\imaplib.py", line 830, in _command_complete typ, data = self._get_tagged_response(tag) File "C:\Python23\lib\imaplib.py", line 931, in _get_tagged_response self._get_response() File "C:\Python23\lib\imaplib.py", line 893, in _get_response data = self.read(size) File "C:\Python23\lib\imaplib.py", line 231, in read return self.file.read(size) File "c:\python23\lib\socket.py", line 301, in read data = self._sock.recv(recv_size) MemoryError From tameyer at ihug.co.nz Tue Dec 30 19:49:56 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Dec 30 19:50:02 2003 Subject: [Spambayes] Having a problem with Spambayes In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130499C055@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13046777BC@its-xchg4.massey.ac.nz> > Am having problem with Spambayes.........for > some reason the e-mails are not being automatically > sent to my Spam folder....I have set a rating of 80% > or greater to go to this folder. In addition when I > click on the "delete as spam" button it does not send > the e-mail to the spam folder. It puts up an error > message "Invalid Configuration....You must configure > the spam folder" > > What am I doing wrong??? You have to configure the spam folder :) (To do this, go into the SpamBayes Manager dialog, click the "Filtering" tab, and use the two Browse buttons to select the folders to use for Spam and Unsure messages.) =Tony Meyer From spambayes at connecting4income.com Wed Dec 31 01:16:51 2003 From: spambayes at connecting4income.com (Bruce) Date: Wed Dec 31 01:17:55 2003 Subject: [Spambayes] Should Spambayes filter automatically? Message-ID: <000901c3cf65$aef0a980$0202a8c0@ibmt8cji949cus> When I installed spambayes last week (I only had it running for a day, then had to replace HD) I thought that it ran automatically. When I reinstalled, I need to run it manually. What do I do? Thanks Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031230/e24bdc4c/attachment.html From boppermann at charter.net Wed Dec 31 08:05:12 2003 From: boppermann at charter.net (Brian Oppermann) Date: Wed Dec 31 08:09:48 2003 Subject: [Spambayes] Install problem: conflict between Python versions using Mandrake 9.1 In-Reply-To: 200310281515.42107.parz@shaw.ca Message-ID: <1072875911.23235.3.camel@linux.oppermann.org> Hello, I'm running Mandrake 9.2 and am a new Linux user. I ran across this thread when I was researching a problem installing spambayes after I had setup Python 2.3.3. [root@linux spambayes-1.0a7]# python setup.py install running install error: invalid Python installation: unable to open /usr/lib/python2.3/config/Makefile (No such file or directory) Would you be able to help me resolve this issue? I have Spambayes running on Outlook in Windows, and am very impressed. I want to try to move most all of my desktop productivity stuff over to Linux though and want to make sure my favorite spam filter will work. Thanks, Brian Oppermann From jepler at unpythonic.net Wed Dec 31 08:35:37 2003 From: jepler at unpythonic.net (Jeff Epler) Date: Wed Dec 31 08:35:40 2003 Subject: [Spambayes] Install problem: conflict between Python versions using Mandrake 9.1 In-Reply-To: <1072875911.23235.3.camel@linux.oppermann.org> References: <1072875911.23235.3.camel@linux.oppermann.org> Message-ID: <20031231133536.GC18443@unpythonic.net> Mandrake 9.2 may have a separate package for the Python interpreter and for the Python development environment. You need to install this package before building spambayes. For instance, on my RedHat 9 machine, /usr/bin/python comes from the package python-2.2.2-26, but /usr/lib/python2.2/config/Makefile comes from python-devel-2.2.2-26. I'm not familiar with Mandrake, but their documentation should guide you in installing a new package. The python-devel package should be on one of the installation CDs. Good luck! Jeff From bpc at nimf.be Wed Dec 31 06:45:48 2003 From: bpc at nimf.be (Nimf NV) Date: Wed Dec 31 08:37:59 2003 Subject: [Spambayes] A bug in spambayes? Message-ID: <000001c3cf93$a550bd60$6604eec3@Headoffice> Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: spambayes1.log Type: application/octet-stream Size: 38116 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031231/03db8452/spambayes1-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: spambayes2.log Type: application/octet-stream Size: 40349 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031231/03db8452/spambayes2-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: spambayes3.log Type: application/octet-stream Size: 38116 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031231/03db8452/spambayes3-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: spambayes4.log Type: application/octet-stream Size: 38138 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20031231/03db8452/spambayes4-0001.obj From JMunday at charter.net Wed Dec 31 08:44:34 2003 From: JMunday at charter.net (Jon Munday) Date: Wed Dec 31 08:46:07 2003 Subject: [Spambayes] Spambayes stopped working Message-ID: <000a01c3cfa4$3b457620$2481be42@f6e3n6> W98 / Outlook 2000 Spambayes worked great for aboout a month. Recently my system crashed. Since reboot, Spambayes sits on my toolbar but doesn't seem to work. Nothing is filtered. Clicking on 'Spambayes' results in . . . Nothing. How do I get it back? Couldn't find the answer in FAQ. From cej at intech.com Wed Dec 31 09:10:15 2003 From: cej at intech.com (Christopher Jastram) Date: Wed Dec 31 09:05:40 2003 Subject: [Spambayes] [Fwd: Fwd: Best Deals V|@gra, ValX(u)m, X(a)n@x Diet Pill Any Meds t gjqqlhmxqbofdz] Message-ID: <3FF2D8C7.7010403@intech.com> -------- Original Message -------- Return-Path: Received: from intech.com ([unix socket]) by intech.com (Cyrus v2.1.12) with LMTP; Wed, 31 Dec 2003 02:28:54 -0500 X-Sieve: CMU Sieve 2.2 Received: by intech1.intech.com (Postfix, from userid 65534) id DB9FCF4018; Wed, 31 Dec 2003 02:28:52 -0500 (EST) Received: from adijon-102-2-1-87.w193-252.abo.wanadoo.fr (ADijon-102-2-1-87.w193-252.abo.wanadoo.fr [193.252.169.87]) by intech1.intech.com (Postfix) with SMTP id 2345DF4141; Wed, 31 Dec 2003 02:26:23 -0500 (EST) Received: from [228.243.66.188] by adijon-102-2-1-87.w193-252.abo.wanadoo.fr with ESMTP id <945783-56562>; Tue, 30 Dec 2003 23:27:32 -0600 Message-ID: <7$0o3lw-71iab2$ik56$$c1m@0cqo7> From: Bart Franks Reply-To: Bart Franks To: blk@intech.com Cc: , , , , , Subject: Fwd: Best Deals V|@gra, ValX(u)m, X(a)n@x Diet Pill Any Meds t gjqqlhmxqbofdz Date: Tue, 30 Dec 03 23:27:32 GMT X-Mailer: eGroups Message Poster MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="4E8B.43FD_6DFFF1ED5A6A4" X-Priority: 5 X-MSMail-Priority: Low X-Spambayes-Classification: ham; 0.00 * *Improving the quality of people's lives is what prescription medications are designed to do and Pharmacourt believes that you deserve access to these medications. By having doctors available to review your needs, Pharmacourt is ready to help you get the treatment you need. You can now order *V??gr?, V?l??m, X?n?x *securely and discreetly *Make it easy for you to order meds. * xxgixqr lctijyef boe nbfx xj sey zquw sde From cej at intech.com Wed Dec 31 09:16:06 2003 From: cej at intech.com (Christopher Jastram) Date: Wed Dec 31 09:11:34 2003 Subject: [Spambayes] [Fwd: Fwd: Best Deals V|@gra, ValX(u)m, X(a)n@x Diet Pill Any Meds t gjqqlhmxqbofdz] In-Reply-To: <3FF2D8C7.7010403@intech.com> References: <3FF2D8C7.7010403@intech.com> Message-ID: <3FF2DA26.2070100@intech.com> Sorry about this. I was busy forwarding to "spam@domain", and my mail client's auto-complete gave me "spambayes@python.org". Oops oops oops. (On a side note, this messages scored ham; 0.00. Ugh.) Christopher Jastram wrote: blah blah blah From tameyer at ihug.co.nz Wed Dec 31 17:17:39 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Dec 31 17:17:46 2003 Subject: [Spambayes] Spambayes stopped working In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130499C18C@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13046777C3@its-xchg4.massey.ac.nz> > Spambayes worked great for aboout a month. Recently my > system crashed. Since reboot, Spambayes sits on my toolbar > but doesn't seem to work. Nothing is filtered. Clicking on > 'Spambayes' results in . . . Nothing. How do I get it back? SpamBayes might be disabled. Try looking in Help->About Microsoft Outlook->Disabled Items and seeing if it is there. See if the log files have anything in them. Try uninstalling and reinstalling SpamBayes. =Tony Meyer From tameyer at ihug.co.nz Wed Dec 31 17:20:24 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Dec 31 17:20:30 2003 Subject: [Spambayes] Should Spambayes filter automatically? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130499C0F4@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13046777C4@its-xchg4.massey.ac.nz> > When I installed spambayes last week (I > only had it running for a day, then had to > replace HD) I thought that it ran automatically. > When I reinstalled, I need to run it manually. It looks like you're using the Outlook plug-in; if that's not the case, then ask again, because this will be irrelevant. If you mean that the plug-in can automatically filter all incoming mail, that's correct. You probably don't have it configured to do so. Try going through the settings (the "Filtering" tab is the one you are after) and checking that you have in fact set it to filter. Check the box in the "General" tab to see if there are any status messages. If this doesn't work, then please go through the steps in the troubleshooting guide, and if you need to send another email, please include the required information for us to be able to help (version of Windows, Outlook, the plugin, log files). =Tony Meyer From rcoe at CambridgeMA.GOV Wed Dec 31 17:27:43 2003 From: rcoe at CambridgeMA.GOV (Coe, Bob) Date: Wed Dec 31 17:27:46 2003 Subject: [Spambayes] RE: Outlook plugin Message-ID: I think you've gotten too cute for your clientele. Spambayes setup is already confusing enough without the arbitrary enforcement of the developers' undocumented notions of best practice. If you think training on the "sent" folder is a bad idea, say so in the installation instructions; but don't incorporate another unexpected restriction. Bob > -----Original Message----- > From: Tony Meyer [mailto:tameyer@ihug.co.nz] > Sent: Tuesday, December 30, 2003 5:08 PM > To: 'Keith Haarhoff'; spambayes@python.org > Subject: RE: [Spambayes] Outlook plugin > > > > When configuring it, it does not "see" my Sent items folder > > in Personal Folders, though it can see the corresponding folder > > in Archive. Can you explain this? > > There is a rough consensus that it's not appropriate to train on mail you > have sent yourself (because you don't filter mail that you receive yourself, > and so you would end up generating invalid clues). This is deliberate. If > you move the messages into some other archive folder, then SpamBayes can't > figure out that's what you've done, so will let you use that folder. > However, any messages that SpamBayes 'thinks' haven't been received by you > (i.e. all sent mail) will not be filterable. From tameyer at ihug.co.nz Wed Dec 31 17:31:42 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Dec 31 17:31:47 2003 Subject: [Spambayes] imap4 filter In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130499C05D@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A22@its-xchg4.massey.ac.nz> > I "searched" through the list of folders and found sever that > had "&" in the name. Yes, that would be the cause. This'll be fixed for the next version. > I didnt find any line with "available_folders"; I did find a > line with "all_folders" which was 262 after the addition of the > "import cgi" line. That sounds about right. That function must have changed a bit since 1.0a7. A new release (source and binary) is due out pretty soon, so you'll be able to swap to that. See below, however. > After entering the 'training' configuration folders manually > as instructed, > I ran "python scripts\sb_imapfilter.py -t" in order to > 'train' the filter. > After about 40 minutes training the program bombed with the following > messages: [...] > File "c:\python23\lib\socket.py", line 301, in read > data = self._sock.recv(recv_size) > MemoryError Hmm. How much training was this? 40 minutes sounds like a lot! Note that you can get really good results with only small amounts of training data. Maybe imapfilter isn't immediately releasing everything that it could (although I can't see anything apparent), or maybe the garbage collection just wasn't keeping up. Or maybe there's a *really* massive message that it choked on? (since it died reading from the socket). If you run it again, it shouldn't try and train any of those messages again - does it die immediately on the same message? (If so, then it's likely to be the latter). =Tony Meyer From tameyer at ihug.co.nz Wed Dec 31 17:38:50 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Dec 31 17:39:01 2003 Subject: [Spambayes] RE: Outlook plugin In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130499C287@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A23@its-xchg4.massey.ac.nz> > I think you've gotten too cute for your clientele. Spambayes > setup is already confusing enough without the arbitrary > enforcement of the developers' undocumented notions of best > practice. This sort of thing doesn't help. Remember that this is alpha, open-source, software - if the setup is confusing, then what would help is users explaining what would make it clearer (or submitting a patch). It's also extremely difficult to explain in documentation aimed at users all the "developers' ... notions of best practice" - the statistics stuff at the heart of SpamBayes is pretty much nothing but notions (Tim's, mostly) of best practice. > If you think training on the "sent" folder is a bad > idea, say so in the installation instructions; but don't > incorporate another unexpected restriction. I've added a request for this: [ 868648 ] Document restriction against training on Sent Items folder Mark will decide at some point :) =Tony Meyer From nmstough at uci.edu Wed Dec 31 18:10:10 2003 From: nmstough at uci.edu (Neal Stoughton) Date: Wed Dec 31 18:10:17 2003 Subject: [Spambayes] imap4 filter References: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A22@its-xchg4.massey.ac.nz> Message-ID: <01c401c3cff3$3d3398a0$aa45fea9@Tsunami> ----- Original Message ----- From: "Tony Meyer" To: "'Neal Stoughton'" ; Sent: Wednesday, December 31, 2003 14:31 Subject: RE: [Spambayes] imap4 filter > After entering the 'training' configuration folders manually > as instructed, > I ran "python scripts\sb_imapfilter.py -t" in order to > 'train' the filter. > After about 40 minutes training the program bombed with the following > messages: [...] > File "c:\python23\lib\socket.py", line 301, in read > data = self._sock.recv(recv_size) > MemoryError Hmm. How much training was this? 40 minutes sounds like a lot! Note that you can get really good results with only small amounts of training data. Maybe imapfilter isn't immediately releasing everything that it could (although I can't see anything apparent), or maybe the garbage collection just wasn't keeping up. Or maybe there's a *really* massive message that it choked on? (since it died reading from the socket). If you run it again, it shouldn't try and train any of those messages again - does it die immediately on the same message? (If so, then it's likely to be the latter). I ran the training again and got exactly the same set of messages from the traceback; although as you predicted the memory error occurred relatively quickly (within 1 minute of the start). I have no idea at which point its dying. But I was training it on an inbox of about 1700 (ham) messages and a spam folder with about 250 messages. Interestingly the size of the "hammie.db" file decreased after this last running from over 400 kb to about 336 kb. From FCorley at cox.net Wed Dec 31 19:18:25 2003 From: FCorley at cox.net (Frank Corley) Date: Wed Dec 31 19:18:28 2003 Subject: [Spambayes] Why doesn't it filter automatically every time I receive mail? Message-ID: <001c01c3cffc$c8d16460$6501a8c0@desktop> I love your product, but I would like for it to always filter mail as it is received, and I don't see any option for that. Am I missing something? Frank Corley -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes/attachments/20031231/1d6b4002/attachment.html From tameyer at ihug.co.nz Wed Dec 31 19:34:48 2003 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Dec 31 19:35:31 2003 Subject: [Spambayes] Should Spambayes filter automatically? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130499C2BE@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2A24@its-xchg4.massey.ac.nz> > First, should I respond the the email to keep the email > together, or create a NEW email (that is the way it looked > like in your response, but is not what I am used to in > newsgroups, so I responded) Do you mean should you include the list address in your response? If so, yes. If you mean should you keep the subject the same, then, yes, usually. If you mean how should you handle 'quoting' the original message in your reply, then this is a personal style choice, really. If you mean something else, sorry, I didn't get it ;) > The difference might have been that I did not install via the > "install exe" last time. What did you do? Were you running from source, or did you use regsvr32? > But I DO have the enable filtering > (it is actually on the GENERAL tab, not Filtering tab) Sorry; I wasn't referring to the enable filtering checkbox, I was meaning the range of filtering options - chosing the action to take, the folders to move to and so on. If the enable filtering box wasn't ticked, then the status message I referred to would have pointed that out. > Even though the GENERAL TAB show Enabled, the INI showed > FALSE. I have JUST set this to TRUE. I will let you know if > it resolves the issue. But you might be interested in the > info any ways. Is this log from after you changed that? It appears to be enabled from this. [...] > Bayes database initialized with 3308 spam and 480 good As an aside, you'll get better results if you have roughly equal numbers of ham and spam. (The status message ought to have a warning about that; I'm pretty sure that made it into the 008.1 release). > SpamBayes: Watching for new messages in folder Inbox > SpamBayes: Watching for new messages in folder Spam > Processing missed spam in folder 'Inbox' by starting a timer This looks like everything is setup right. > Message '10-Million-Hits (Instant > Traffic)..$19.95' had a Spam classification of 'Yes' It appears that this was filtered... [..] > Saving wizard changes > Saving configuration -> C:\Documents and > Settings\Bruce\Application Data\SpamBayes\Outlook.ini Saving > configuration -> C:\Documents and Settings\Bruce\Application > Data\SpamBayes\Outlook.ini Spam filtering is disabled - > ignoring new message Hmm...did you combine two logs together? This bit looks like you ran the configuration wizard again. At this point, though, filtering is definitely disabled - and there'll be "ignoring new message" messages in the log for all messages that arrive. If the enable filtering box is ticked, but your most recent log (ends with a 1) has the "disabled" message, then something is quite wrong (perhaps related to re-running the config wizard). If that's the case, then the best thing to do is to open a bug report (http://sf.net/projects/spambayes) so that we can get whatever it is fixed for the next release. =Tony Meyer From nobody at spamcop.net Wed Dec 31 20:57:34 2003 From: nobody at spamcop.net (Seth Goodman) Date: Wed Dec 31 20:57:38 2003 Subject: [Spambayes] Why doesn't it filter automatically every time Ireceive mail? In-Reply-To: <001c01c3cffc$c8d16460$6501a8c0@desktop> Message-ID: In the SpamBayes Manager, on the General tab, make sure the "Enable SpamBayes" checkbox is selected. -- Seth Goodman Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com Spambots: disregard the above