From T.A.Meyer at massey.ac.nz Sat Aug 2 00:40:23 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Aug 1 07:41:08 2003 Subject: [spambayes-dev] Making the FAQ Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2848@its-xchg4.massey.ac.nz> > Final thing: given the way we now upload "Version.cfg" from Version.py, we > should force a "cvs up" on Version.py before the generation. Problem is > that we can't make this part of the rule, as if the local file is stale, it > won't trigger the make rule to perform the update :( We may just have to be > careful with this. This is a result of the recent bug report, I presume - I noticed this too. What about we make pushing the Version.cfg file a separate rule and not included in all? This file is only going to be needed to be updated when a new version comes out, after all - and only a new Outlook version for the moment. It would be then the responsibility of the person releasing a new version to push it to the website ("make version") when they update the Version.py file (i.e. for the moment, this means Mark only! ). Those of us just fiddling about with the html or faq can ignore it. (This would obviously be documented somewhere for those that follow in our footsteps...;) =Tony Meyer From T.A.Meyer at massey.ac.nz Sat Aug 2 00:45:18 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Aug 1 07:45:56 2003 Subject: [spambayes-dev] Making the FAQ Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2849@its-xchg4.massey.ac.nz> > Hmm - damn Kiwi's and their strange time zones . You're just grumpy because we won both the netball and the rugby . > According to Outlook you sent this some hours after I checked in 1.9. I think that was the message I posted with the make trace in it. I didn't look at the size of the attachmemt before sending it - it was quite big and so was quarentined until someone released it. The someone could have been one of them Americans so we would have had to wait until they woke up ;) > How about trying this: [...] As you probably saw, it was because I added a ">". Thanks for the suggestions, though - I would probably have figured it out from that if Skip hadn't tipped me towards what it was. (New rule: only correct something when it's a mistake ;) =Tony Meyer From mhammond at skippinet.com.au Sat Aug 2 13:03:36 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri Aug 1 22:03:45 2003 Subject: [spambayes-dev] Missed spam - Spam Clues: bechtel Message-ID: <04ed01c3589a$49786260$f502a8c0@eden> The attached Spam ia a viagra spam. The body of the message when rendered in Outlook shows only an image. However, the "SpamClues" show no HTML, no references to the image, and only some "random" words in the body. Thus, this scored ham :( Mark. Spam Score: 0.00308617 word spamprob #ham #spam '*H*' 0.999958 - - '*S*' 0.00613056 - - 'admit' 0.00913224 31 0 'thats' 0.0164061 17 0 'experiment' 0.0229286 12 0 'math' 0.0336219 8 0 'accused' 0.0517088 5 0 'admire' 0.0517088 5 0 'scale' 0.0517088 5 0 'apr' 0.0607152 24 1 'exceeded' 0.063007 4 0 'horrid' 0.063007 4 0 'husbands' 0.063007 4 0 'megabyte' 0.063007 4 0 'crotch' 0.0806228 3 0 'hysteria' 0.0806228 3 0 'merry' 0.0806228 3 0 'scenery' 0.0806228 3 0 'tearing' 0.0806228 3 0 'adjusted' 0.111912 2 0 'blake' 0.111912 2 0 'bookshelf' 0.111912 2 0 'excessive' 0.111912 2 0 'expressions' 0.111912 2 0 'expects' 0.148988 16 2 'scanned' 0.157201 15 2 'scanner' 0.159255 8 1 'activist' 0.182889 1 0 'adhere' 0.182889 1 0 'bosses' 0.182889 1 0 'bowl' 0.182889 1 0 'text' 0.256853 71 19 'ethical' 0.267992 4 1 'boxes' 0.279197 17 5 'skip:c 10' 0.308007 582 202 'skip:m 10' 0.331956 235 91 'skip:i 10' 0.347916 399 166 'reply-to:none' 0.360174 2047 899 'expertise' 0.36188 7 3 'skip:a 10' 0.363178 389 173 'skip:t 10' 0.374917 169 79 'ideal' 0.377209 15 7 'header:Received:7' 0.634866 336 456 'maximum' 0.64241 12 17 'advertisers' 0.643646 2 3 'to:no real name:2**0' 0.665129 940 1457 'cosmetic' 0.762365 1 3 'bottles' 0.801841 2 7 'credit' 0.810985 18 61 'bold' 0.844828 0 1 'meadowsweet' 0.844828 0 1 'plead' 0.844828 0 1 'pornography' 0.844828 0 1 'meal' 0.895035 2 15 'crave' 0.908163 0 2 'from:addr:msn.com' 0.949001 2 33 'africa' 0.964391 1 27 'message-id:@king.southern.net.au' 0.987855 2 145 Message Stream: Received: from localhost (localhost [127.0.0.1]) by sturt.southern.net.au (Postfix) with SMTP id 6B5D8DB17 for ; Sat, 2 Aug 2003 11:51:18 +1000 (EST) Received: by sturt.southern.net.au (Postfix, from userid 0) id 44E3E15E35; Sat, 2 Aug 2003 11:51:18 +1000 (EST) Received: from localhost (localhost [127.0.0.1]) by sturt.southern.net.au (Postfix) with SMTP id 01EFEDBF0 for ; Sat, 2 Aug 2003 11:50:33 +1000 (EST) Received: from king.southern.net.au (unknown [10.0.0.10]) by sturt.southern.net.au (Postfix) with ESMTP id B312515F03 for ; Sat, 2 Aug 2003 11:50:31 +1000 (EST) Received: from localhost (localhost [127.0.0.1]) by king.southern.net.au (Postfix) with SMTP id 6F16293D3E for ; Sat, 2 Aug 2003 11:50:45 +1000 (EST) Received: from phoenix.MCD (a-66-191-biz2.mts.net [205.200.66.191]) by king.southern.net.au (Postfix) with ESMTP id 396137DD27 for ; Sat, 2 Aug 2003 11:50:44 +1000 (EST) Received: from tentatively (218.70.149.166 [218.70.149.166]) by phoenix.MCD with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id QA1ALGRD; Fri, 1 Aug 2003 20:38:51 -0500 Date: Sat, 2 Aug 2003 01:49:30 GMT From: "Doris Scott" To: mhammond@skippinet.com.au Subject: bechtel Mime-Version: 1.0 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20030802015044.396137DD27@king.southern.net.au> X-Spam-Status: No, hits=0.7 required=5.0 tests=HTML_60_70,HTML_MESSAGE,MIME_HTML_ONLY version=2.55 X-Spam-Level: X-Spam-Checker-Version: SpamAssassin 2.55 (1.174.2.19-2003-05-19-exp) math testifier teleologically ideal polymer crimes huddled theatrically crankier crosser mentality tearing bosses activist illuminated ethical expects advertisers scale ibex crotch boxes secant eukaryote 8th eugenic adornment admire exceeded sawfish $RANDO MIZE megalomaniac adduction africa illicitly brazenly cotoneaster crabbing teamster cosponsor adhere bottles metamathematical blake brackish archibald cribs melodiously crossers pocketed credit boisterous medians pork pow botanists expressions pottery illuminations sans excerpted $RANDOM IZE expendable aubrey secondarily borates imitates benny accused bolshoi metalwork arturo bold maximum scenery tamale hornets sayers critics crave sari expunge husbands sayers hushed expertise experiment excessive coverlets acknowledgeable adjusted bowers bolstering adolph humus mellow creditably barbour methylene taming teamster cosmetic scrupulosity plead midpoint meadowsweet alcoa scanned idyllic tearfully arcadia tektite icosahedra $RANDOM IZE satiety apr meal correlates albrecht exclamations hoppers poi megabyte cranking credibly bookshelf horrid arcadia merry sawdust adipic imaginative activism argonaut corrupting bayport hysteria text pornography counterbalance accolade pliancy hull hovels $RANDOMI ZE evinced imaginably bowl crosswise screamers textiles adjoin tallow scanner median tetrahedral blurted accessories postoffices thats acknowledger admit bluntest accrual savagers idyll Message Tokens: 185 unique tokens '$rando' '$random' '$randomi' '8th' 'accessories' 'accolade' 'accrual' 'accused' 'acknowledger' 'activism' 'activist' 'adduction' 'adhere' 'adipic' 'adjoin' 'adjusted' 'admire' 'admit' 'adolph' 'adornment' 'advertisers' 'africa' 'albrecht' 'alcoa' 'apr' 'arcadia' 'archibald' 'argonaut' 'arturo' 'aubrey' 'barbour' 'bayport' 'benny' 'blake' 'bluntest' 'blurted' 'boisterous' 'bold' 'bolshoi' 'bolstering' 'bookshelf' 'borates' 'bosses' 'botanists' 'bottles' 'bowers' 'bowl' 'boxes' 'brackish' 'brazenly' 'cc:none' 'content-type:text/plain' 'correlates' 'corrupting' 'cosmetic' 'cosponsor' 'cotoneaster' 'coverlets' 'crabbing' 'crankier' 'cranking' 'crave' 'credibly' 'credit' 'creditably' 'cribs' 'crimes' 'critics' 'crosser' 'crossers' 'crosswise' 'crotch' 'ethical' 'eugenic' 'eukaryote' 'evinced' 'exceeded' 'excerpted' 'excessive' 'exclamations' 'expects' 'expendable' 'experiment' 'expertise' 'expressions' 'expunge' 'from:addr:mham' 'from:addr:msn.com' 'from:name:doris scott' 'header:Date:1' 'header:From:1' 'header:Message-Id:1' 'header:Mime-Version:1' 'header:Received:7' 'header:Subject:1' 'header:To:1' 'hoppers' 'hornets' 'horrid' 'hovels' 'huddled' 'hull' 'humus' 'husbands' 'hushed' 'hysteria' 'ibex' 'icosahedra' 'ideal' 'idyll' 'idyllic' 'illicitly' 'illuminated' 'imaginably' 'imaginative' 'imitates' 'ize' 'math' 'maximum' 'meadowsweet' 'meal' 'median' 'medians' 'megabyte' 'megalomaniac' 'mellow' 'melodiously' 'mentality' 'merry' 'message-id:@king.southern.net.au' 'metalwork' 'methylene' 'midpoint' 'mize' 'plead' 'pliancy' 'pocketed' 'poi' 'polymer' 'pork' 'pornography' 'postoffices' 'pottery' 'pow' 'reply-to:none' 'sans' 'sari' 'satiety' 'savagers' 'sawdust' 'sawfish' 'sayers' 'scale' 'scanned' 'scanner' 'scenery' 'screamers' 'scrupulosity' 'secant' 'secondarily' 'sender:none' 'skip:a 10' 'skip:c 10' 'skip:i 10' 'skip:m 10' 'skip:t 10' 'subject:bechtel' 'tallow' 'tamale' 'taming' 'teamster' 'tearfully' 'tearing' 'tektite' 'testifier' 'tetrahedral' 'text' 'textiles' 'thats' 'theatrically' 'to:2**0' 'to:addr:mhammond' 'to:addr:skippinet.com.au' 'to:no real name:2**0' 'x-mailer:none' -------------- next part -------------- An embedded message was scrubbed... From: "Doris Scott" Subject: bechtel Date: Sat, 2 Aug 2003 11:49:30 +1000 Size: 7913 Url: http://mail.python.org/pipermail/spambayes-dev/attachments/20030802/d51b77fe/attachment.eml From mhammond at skippinet.com.au Sat Aug 2 13:27:31 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri Aug 1 22:27:35 2003 Subject: [spambayes-dev] Missed spam - Spam Clues: bechtel In-Reply-To: <04ed01c3589a$49786260$f502a8c0@eden> Message-ID: <051801c3589d$a01de790$f502a8c0@eden> I'm pretty quick on the trigger finger - I should have looked a little deeper first. Outlook sees the HTML body of the message as: math testifier teleologically ideal ... lots more random hidden words deleted

sayers hushed expertise experiment ... more random words snipped So they use a standard technique of hiding the words, but the interesting thing is that the URLs don't appear in the Spam Clues. Mark. > -----Original Message----- > From: spambayes-dev-bounces@python.org > [mailto:spambayes-dev-bounces@python.org]On Behalf Of Mark Hammond > Sent: Saturday, 2 August 2003 12:04 PM > To: spambayes-dev@python.org > Subject: [spambayes-dev] Missed spam - Spam Clues: bechtel > > > The attached Spam ia a viagra spam. The body of the message > when rendered > in Outlook shows only an image. However, the "SpamClues" > show no HTML, no > references to the image, and only some "random" words in the body. > > Thus, this scored ham :( > > Mark. > > Spam Score: 0.00308617 > > > word spamprob #ham #spam > '*H*' 0.999958 - - > '*S*' 0.00613056 - - > 'admit' 0.00913224 31 0 > 'thats' 0.0164061 17 0 > 'experiment' 0.0229286 12 0 > 'math' 0.0336219 8 0 > 'accused' 0.0517088 5 0 > 'admire' 0.0517088 5 0 > 'scale' 0.0517088 5 0 > 'apr' 0.0607152 24 1 > 'exceeded' 0.063007 4 0 > 'horrid' 0.063007 4 0 > 'husbands' 0.063007 4 0 > 'megabyte' 0.063007 4 0 > 'crotch' 0.0806228 3 0 > 'hysteria' 0.0806228 3 0 > 'merry' 0.0806228 3 0 > 'scenery' 0.0806228 3 0 > 'tearing' 0.0806228 3 0 > 'adjusted' 0.111912 2 0 > 'blake' 0.111912 2 0 > 'bookshelf' 0.111912 2 0 > 'excessive' 0.111912 2 0 > 'expressions' 0.111912 2 0 > 'expects' 0.148988 16 2 > 'scanned' 0.157201 15 2 > 'scanner' 0.159255 8 1 > 'activist' 0.182889 1 0 > 'adhere' 0.182889 1 0 > 'bosses' 0.182889 1 0 > 'bowl' 0.182889 1 0 > 'text' 0.256853 71 19 > 'ethical' 0.267992 4 1 > 'boxes' 0.279197 17 5 > 'skip:c 10' 0.308007 582 202 > 'skip:m 10' 0.331956 235 91 > 'skip:i 10' 0.347916 399 166 > 'reply-to:none' 0.360174 2047 899 > 'expertise' 0.36188 7 3 > 'skip:a 10' 0.363178 389 173 > 'skip:t 10' 0.374917 169 79 > 'ideal' 0.377209 15 7 > 'header:Received:7' 0.634866 336 456 > 'maximum' 0.64241 12 17 > 'advertisers' 0.643646 2 3 > 'to:no real name:2**0' 0.665129 940 1457 > 'cosmetic' 0.762365 1 3 > 'bottles' 0.801841 2 7 > 'credit' 0.810985 18 61 > 'bold' 0.844828 0 1 > 'meadowsweet' 0.844828 0 1 > 'plead' 0.844828 0 1 > 'pornography' 0.844828 0 1 > 'meal' 0.895035 2 15 > 'crave' 0.908163 0 2 > 'from:addr:msn.com' 0.949001 2 33 > 'africa' 0.964391 1 27 > 'message-id:@king.southern.net.au' 0.987855 2 145 > > Message Stream: > > > Received: from localhost (localhost [127.0.0.1]) > by sturt.southern.net.au (Postfix) with SMTP id 6B5D8DB17 > for ; Sat, 2 Aug 2003 > 11:51:18 +1000 (EST) > Received: by sturt.southern.net.au (Postfix, from userid 0) > id 44E3E15E35; Sat, 2 Aug 2003 11:51:18 +1000 (EST) > Received: from localhost (localhost [127.0.0.1]) > by sturt.southern.net.au (Postfix) with SMTP id 01EFEDBF0 > for ; Sat, 2 Aug 2003 > 11:50:33 +1000 (EST) > Received: from king.southern.net.au (unknown [10.0.0.10]) > by sturt.southern.net.au (Postfix) with ESMTP id B312515F03 > for ; Sat, 2 Aug 2003 > 11:50:31 +1000 (EST) > Received: from localhost (localhost [127.0.0.1]) > by king.southern.net.au (Postfix) with SMTP id 6F16293D3E > for ; Sat, 2 Aug 2003 > 11:50:45 +1000 (EST) > Received: from phoenix.MCD (a-66-191-biz2.mts.net [205.200.66.191]) > by king.southern.net.au (Postfix) with ESMTP id 396137DD27 > for ; Sat, 2 Aug 2003 > 11:50:44 +1000 (EST) > Received: from tentatively (218.70.149.166 [218.70.149.166]) > by phoenix.MCD > with SMTP (Microsoft Exchange Internet Mail Service Version > 5.5.2650.21) id QA1ALGRD; Fri, 1 Aug 2003 20:38:51 -0500 > Date: Sat, 2 Aug 2003 01:49:30 GMT > From: "Doris Scott" > To: mhammond@skippinet.com.au > Subject: bechtel > Mime-Version: 1.0 > Content-Type: text/html; charset=us-ascii > Content-Transfer-Encoding: 7bit > Message-Id: <20030802015044.396137DD27@king.southern.net.au> > X-Spam-Status: No, hits=0.7 required=5.0 > tests=HTML_60_70,HTML_MESSAGE,MIME_HTML_ONLY version=2.55 > X-Spam-Level: > X-Spam-Checker-Version: SpamAssassin 2.55 (1.174.2.19-2003-05-19-exp) > > > math testifier teleologically ideal polymer crimes huddled > theatrically > crankier crosser mentality tearing bosses activist illuminated ethical > expects advertisers scale ibex crotch boxes secant eukaryote > 8th eugenic > adornment admire exceeded sawfish $RANDO MIZE megalomaniac > adduction africa > illicitly brazenly cotoneaster crabbing teamster cosponsor > adhere bottles > metamathematical blake brackish archibald cribs melodiously crossers > pocketed credit boisterous medians pork pow botanists > expressions pottery > illuminations sans excerpted $RANDOM IZE expendable aubrey secondarily > borates imitates benny accused bolshoi metalwork arturo bold > maximum scenery > tamale hornets sayers critics crave sari expunge husbands > > > > sayers hushed expertise experiment excessive coverlets acknowledgeable > adjusted bowers bolstering adolph humus mellow creditably > barbour methylene > taming teamster cosmetic scrupulosity plead midpoint meadowsweet alcoa > scanned idyllic tearfully arcadia tektite icosahedra $RANDOM > IZE satiety apr > meal correlates albrecht exclamations hoppers poi megabyte > cranking credibly > bookshelf horrid arcadia merry sawdust adipic imaginative > activism argonaut > corrupting bayport hysteria text pornography counterbalance > accolade pliancy > hull hovels $RANDOMI ZE evinced imaginably bowl crosswise > screamers textiles > adjoin tallow scanner median tetrahedral blurted accessories > postoffices > thats acknowledger admit bluntest accrual savagers idyll > > Message Tokens: > > 185 unique tokens > > '$rando' > '$random' > '$randomi' > '8th' > 'accessories' > 'accolade' > 'accrual' > 'accused' > 'acknowledger' > 'activism' > 'activist' > 'adduction' > 'adhere' > 'adipic' > 'adjoin' > 'adjusted' > 'admire' > 'admit' > 'adolph' > 'adornment' > 'advertisers' > 'africa' > 'albrecht' > 'alcoa' > 'apr' > 'arcadia' > 'archibald' > 'argonaut' > 'arturo' > 'aubrey' > 'barbour' > 'bayport' > 'benny' > 'blake' > 'bluntest' > 'blurted' > 'boisterous' > 'bold' > 'bolshoi' > 'bolstering' > 'bookshelf' > 'borates' > 'bosses' > 'botanists' > 'bottles' > 'bowers' > 'bowl' > 'boxes' > 'brackish' > 'brazenly' > 'cc:none' > 'content-type:text/plain' > 'correlates' > 'corrupting' > 'cosmetic' > 'cosponsor' > 'cotoneaster' > 'coverlets' > 'crabbing' > 'crankier' > 'cranking' > 'crave' > 'credibly' > 'credit' > 'creditably' > 'cribs' > 'crimes' > 'critics' > 'crosser' > 'crossers' > 'crosswise' > 'crotch' > 'ethical' > 'eugenic' > 'eukaryote' > 'evinced' > 'exceeded' > 'excerpted' > 'excessive' > 'exclamations' > 'expects' > 'expendable' > 'experiment' > 'expertise' > 'expressions' > 'expunge' > 'from:addr:mham' > 'from:addr:msn.com' > 'from:name:doris scott' > 'header:Date:1' > 'header:From:1' > 'header:Message-Id:1' > 'header:Mime-Version:1' > 'header:Received:7' > 'header:Subject:1' > 'header:To:1' > 'hoppers' > 'hornets' > 'horrid' > 'hovels' > 'huddled' > 'hull' > 'humus' > 'husbands' > 'hushed' > 'hysteria' > 'ibex' > 'icosahedra' > 'ideal' > 'idyll' > 'idyllic' > 'illicitly' > 'illuminated' > 'imaginably' > 'imaginative' > 'imitates' > 'ize' > 'math' > 'maximum' > 'meadowsweet' > 'meal' > 'median' > 'medians' > 'megabyte' > 'megalomaniac' > 'mellow' > 'melodiously' > 'mentality' > 'merry' > 'message-id:@king.southern.net.au' > 'metalwork' > 'methylene' > 'midpoint' > 'mize' > 'plead' > 'pliancy' > 'pocketed' > 'poi' > 'polymer' > 'pork' > 'pornography' > 'postoffices' > 'pottery' > 'pow' > 'reply-to:none' > 'sans' > 'sari' > 'satiety' > 'savagers' > 'sawdust' > 'sawfish' > 'sayers' > 'scale' > 'scanned' > 'scanner' > 'scenery' > 'screamers' > 'scrupulosity' > 'secant' > 'secondarily' > 'sender:none' > 'skip:a 10' > 'skip:c 10' > 'skip:i 10' > 'skip:m 10' > 'skip:t 10' > 'subject:bechtel' > 'tallow' > 'tamale' > 'taming' > 'teamster' > 'tearfully' > 'tearing' > 'tektite' > 'testifier' > 'tetrahedral' > 'text' > 'textiles' > 'thats' > 'theatrically' > 'to:2**0' > 'to:addr:mhammond' > 'to:addr:skippinet.com.au' > 'to:no real name:2**0' > 'x-mailer:none' > From tim.one at comcast.net Fri Aug 1 23:57:31 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Aug 1 22:58:14 2003 Subject: [spambayes-dev] Missed spam - Spam Clues: bechtel In-Reply-To: <051801c3589d$a01de790$f502a8c0@eden> Message-ID: [Mark Hammond] > I'm pretty quick on the trigger finger - I should have looked a little > deeper first. > > Outlook sees the HTML body of the message as: > > > math testifier color="#ffffff">teleologically ideal > > ... lots more random hidden words deleted > > href="http://srd.yahoo.com/drst/military/*http://www.365pharm1.com/sh/index. > html"> > src="http://srd.yahoo.com/drst/sawed/*http://www.8867v.com/file/ra.gif" > > >

sayers hushed color="#ffffff">expertise experiment > > ... more random words snipped > > > > So they use a standard technique of hiding the words, but the > interesting thing is that the URLs don't appear in the Spam Clues. More peculiar: I saved your attachment into my Unsure folder, and scored it there, with current CVS spambayes. It found more tokens than yours found! Including a pile of url: tokens. Here they are: 203 unique tokens '$rando' '$random' '$randomi' '8th' 'accessories' 'accolade' 'accrual' 'accused' 'acknowledger' 'activism' 'activist' 'adduction' 'adhere' 'adipic' 'adjoin' 'adjusted' 'admire' 'admit' 'adolph' 'adornment' 'advertisers' 'africa' 'albrecht' 'alcoa' 'apr' 'arcadia' 'archibald' 'argonaut' 'arturo' 'aubrey' 'barbour' 'bayport' 'benny' 'blake' 'bluntest' 'blurted' 'boisterous' 'bold' 'bolshoi' 'bolstering' 'bookshelf' 'borates' 'bosses' 'botanists' 'bottles' 'bowers' 'bowl' 'boxes' 'brackish' 'brazenly' 'cc:none' 'content-type:text/plain' 'correlates' 'corrupting' 'cosmetic' 'cosponsor' 'cotoneaster' 'coverlets' 'crabbing' 'crankier' 'cranking' 'crave' 'credibly' 'credit' 'creditably' 'cribs' 'crimes' 'critics' 'crosser' 'crossers' 'crosswise' 'crotch' 'ethical' 'eugenic' 'eukaryote' 'evinced' 'exceeded' 'excerpted' 'excessive' 'exclamations' 'expects' 'expendable' 'experiment' 'expertise' 'expressions' 'expunge' 'from:addr:mham' 'from:addr:msn.com' 'from:name:doris scott' 'header:Date:1' 'header:From:1' 'header:Importance:1' 'header:MIME-Version:1' 'header:Message-ID:1' 'header:Subject:1' 'header:To:1' 'hoppers' 'hornets' 'horrid' 'hovels' 'huddled' 'hull' 'humus' 'husbands' 'hushed' 'hysteria' 'ibex' 'icosahedra' 'ideal' 'idyll' 'idyllic' 'illicitly' 'illuminated' 'imaginably' 'imaginative' 'imitates' 'ize' 'math' 'maximum' 'meadowsweet' 'meal' 'median' 'medians' 'megabyte' 'megalomaniac' 'mellow' 'melodiously' 'mentality' 'merry' 'message-id:@king.southern.net.au' 'metalwork' 'methylene' 'midpoint' 'mize' 'plead' 'pliancy' 'pocketed' 'poi' 'polymer' 'pork' 'pornography' 'postoffices' 'pottery' 'pow' 'proto:http' 'reply-to:none' 'sans' 'sari' 'satiety' 'savagers' 'sawdust' 'sawfish' 'sayers' 'scale' 'scanned' 'scanner' 'scenery' 'screamers' 'scrupulosity' 'secant' 'secondarily' 'sender:none' 'skip:a 10' 'skip:c 10' 'skip:i 10' 'skip:m 10' 'skip:t 10' 'subject:bechtel' 'tallow' 'tamale' 'taming' 'teamster' 'tearfully' 'tearing' 'tektite' 'testifier' 'tetrahedral' 'text' 'textiles' 'thats' 'theatrically' 'to:2**0' 'to:addr:mhammond' 'to:addr:skippinet.com.au' 'to:no real name:2**0' 'url:' 'url:*http' 'url:365pharm1' 'url:8867v' 'url:com' 'url:drst' 'url:file' 'url:gif' 'url:html' 'url:index' 'url:military' 'url:ra' 'url:sawed' 'url:sh' 'url:srd' 'url:www' 'url:yahoo' 'x-mailer:microsoft outlook cws, build 9.0.2416 (9.0.2911.0)' The x-mailer token is night-and-day different from the one you got too, and there are some odd differences in case (e.g., compare your 'header:Mime-Version:1' to the one above). Pretty mysterious! I'm still using Python 2.2.3 here, and I sure hope that doesn't account for it (but can't make time to investigate further -- sorry!). From mhammond at skippinet.com.au Sat Aug 2 14:24:45 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri Aug 1 23:24:52 2003 Subject: [spambayes-dev] Missed spam - Spam Clues: bechtel In-Reply-To: Message-ID: <000001c358a5$9f442fc0$f502a8c0@eden> > More peculiar: I saved your attachment into my Unsure > folder, and scored it > there, with current CVS spambayes. It found more tokens than > yours found! > Including a pile of url: tokens. Here they are: OK, it gets weirder!! If I save the copy I received via the SpamBayes list, I do indeed get the exact same results as you. However, re-investigating the original still shows the same clues I posted. > The x-mailer token is night-and-day different from the one > you got too, and > there are some odd differences in case (e.g., compare your > > 'header:Mime-Version:1' Even down to this! The mysteries of Outlook get stranger and stranger. I wonder if it was "invalid" HTML that someone on the way fixed? Mark. From tim.one at comcast.net Sat Aug 2 00:56:41 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Aug 1 23:57:26 2003 Subject: [spambayes-dev] Missed spam - Spam Clues: bechtel In-Reply-To: <000001c358a5$9f442fc0$f502a8c0@eden> Message-ID: [Mark Hammond] > OK, it gets weirder!! If I save the copy I received via the > SpamBayes list, I do indeed get the exact same results as you. > However, re-investigating the original still shows the same clues I > posted. > > ... > > The mysteries of Outlook get stranger and stranger. I wonder if it > was "invalid" HTML that someone on the way fixed? Then I have to guess this: the stream spambayes tokenizes is *our* attempt to reconstruct the original msg from Outlook's funky many-headed message store. But the message attached to a "Spam Clues" email is *Outlook's* attempt to reconstruct the original. I bet the *original* original had something funny going on in its structure, and that Outlook repaired that when it reconstructed the original. But what? URL extraction occurs very early in the tokenizer, and in particular long before we try to throw out HTML. We'd have to look at the actual bytestream spambayes synthesized (on your end) to figure this out. A curious possibility is that this spam was constructed to fool spambayes. Heh. From skip at pobox.com Sat Aug 2 09:54:57 2003 From: skip at pobox.com (Skip Montanaro) Date: Sat Aug 2 09:55:17 2003 Subject: [spambayes-dev] Missed spam - Spam Clues: bechtel In-Reply-To: <04ed01c3589a$49786260$f502a8c0@eden> References: <04ed01c3589a$49786260$f502a8c0@eden> Message-ID: <16171.49841.284840.729166@montanaro.dyndns.org> Mark> The attached Spam ia a viagra spam. The body of the message when Mark> rendered in Outlook shows only an image. However, the "SpamClues" Mark> show no HTML, no references to the image, and only some "random" Mark> words in the body. Yeah, this is what I reported the other day. I've gotten a few of them and classified them all as spam. The latest one sneaked into my "low spam" range (0.81 I think). The url components are being classified for me. The pieces of $RANDOMIZE are turning into significant spam clues, as is url:pharm1. Skip From tim.one at comcast.net Sat Aug 2 12:47:45 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Aug 2 11:48:31 2003 Subject: [spambayes-dev] Missed spam - Spam Clues: bechtel In-Reply-To: <16171.49841.284840.729166@montanaro.dyndns.org> Message-ID: [Skip] > Yeah, this is what I reported the other day. I've gotten a few of > them and classified them all as spam. The latest one sneaked into my > "low spam" range (0.81 I think). The url components are being > classified for me. > > The pieces of $RANDOMIZE are turning into significant spam clues, as > is url:pharm1. Skip, you speculated before about tokens we could generate to get a better handle on spam like this. I replied, but you didn't realize it : it was in a reply to Sean True about why a new "statistical summary" token of any flavor isn't likely to help much on its own (just one token of many, and all tokens have equal weight). I don't see enough of this stuff to worry about it, but it's clear that the "white on white" (black on black, etc) trick can be pretty good at dragging spam scores down to the Unsure range. Anyone care enough to do something about it ? Simplest thing would be to tokenize color= attributes, but I don't think that would help much (the words it's hiding would still get scored, and they're the real problem). Pseudo-parsing HTML is something I've never liked, but God knows we do plenty of it already ... From skip at pobox.com Sat Aug 2 12:38:03 2003 From: skip at pobox.com (Skip Montanaro) Date: Sat Aug 2 12:38:09 2003 Subject: [spambayes-dev] Missed spam - Spam Clues: bechtel In-Reply-To: References: <16171.49841.284840.729166@montanaro.dyndns.org> Message-ID: <16171.59627.114665.484831@montanaro.dyndns.org> Tim> Anyone care enough to do something about it ? I've only seen a handful of these (maybe five?) and SB is now classifying them as spam, albeit currently low spam. Unless this becomes a huge problem I'm inclined to let it be. There are a variety of synthetic tags we could try generating: number of tags, ratio of (runs of) html tags to words, number of hapaxes or fraction of words which are hapaxes. The mails are structurally different than typical spam or ham, so I think if we do anything it's going to have to entail capturing some of the details of that structure. Skip From rob at hooft.net Sun Aug 3 13:51:34 2003 From: rob at hooft.net (Rob Hooft) Date: Sun Aug 3 06:54:25 2003 Subject: [spambayes-dev] I took a big step Tuesday... In-Reply-To: <20030728180525.2134D2DDEE@cashew.wolfskeep.com> References: <20030728180525.2134D2DDEE@cashew.wolfskeep.com> Message-ID: <3F2CE936.5070808@hooft.net> T. Alexander Popiel wrote: > In message: > "Tim Peters" writes: > >>Something that would be fine: Add a config option mapping score ranges to >>tokens. For example, >> >> >>>I currently have my ham and spam cutoffs set at 0.15 and 0.80, >>>respectively. >> >>could be described by >> >> Ham 0.15 Unsure 0.80 Spam >> >>and >> >> >>>As I mentioned in a recent message, I consider 0.80 to 0.90 to be >>>"low spam" and 0.91 to 1.00 to be "high spam". >> >>by >> >> Ham 0.15 Unsure 0.80 LowSpam 0.91 HighSpam > > > +1 > > - Alex People, this is all very unscientific. We have done lots of research in the earlier days of spambayes, and have come to the conclusion that there are no more than two useful cut-off points. Our false-positives mostly scored hopelessly close to the ideal 1.00000000000000000. If you find spam boring and want to delete everything above 0.995 automatically, there is no scientific basis for not cutting at 0.90 instead. Rob -- Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/ From skip at pobox.com Sun Aug 3 09:44:22 2003 From: skip at pobox.com (Skip Montanaro) Date: Sun Aug 3 09:44:45 2003 Subject: [spambayes-dev] I took a big step Tuesday... In-Reply-To: <3F2CE936.5070808@hooft.net> References: <20030728180525.2134D2DDEE@cashew.wolfskeep.com> <3F2CE936.5070808@hooft.net> Message-ID: <16173.4534.270130.238973@montanaro.dyndns.org> Rob> People, this is all very unscientific. We have done lots of Rob> research in the earlier days of spambayes, and have come to the Rob> conclusion that there are no more than two useful cut-off Rob> points. Our false-positives mostly scored hopelessly close to the Rob> ideal 1.00000000000000000. That doesn't seem to be my experience. Most questionable spam seems to score near the lower end of the range. Are you thinking back to the days before the chi squared combining scheme? Skip From rob at hooft.net Sun Aug 3 21:18:46 2003 From: rob at hooft.net (Rob Hooft) Date: Sun Aug 3 14:21:41 2003 Subject: [spambayes-dev] I took a big step Tuesday... In-Reply-To: <16173.4534.270130.238973@montanaro.dyndns.org> References: <20030728180525.2134D2DDEE@cashew.wolfskeep.com> <3F2CE936.5070808@hooft.net> <16173.4534.270130.238973@montanaro.dyndns.org> Message-ID: <3F2D5206.3030201@hooft.net> Skip Montanaro wrote: > Rob> People, this is all very unscientific. We have done lots of > Rob> research in the earlier days of spambayes, and have come to the > Rob> conclusion that there are no more than two useful cut-off > Rob> points. Our false-positives mostly scored hopelessly close to the > Rob> ideal 1.00000000000000000. > > That doesn't seem to be my experience. Most questionable spam seems to > score near the lower end of the range. Are you thinking back to the days > before the chi squared combining scheme? Nope, I'm definitely referring to chi-squared days. If your questionable spam scores near the low end, you might want to try and run a cross-validation to see what your optimal cut-offs are; you may be running with sub-optimal cutoffs now. You may even be able to prove your theory in an objective manner using a minor adaptation to the analysis. But even then: is that as useful as you think it is? Please note that questionable spam is still spam, which means it it not interesting to you. If only questionable spam and serious spam score above the spam cutoff you do not need to look at it, ever. I was referring to a false-positive. My hypothesis H_0 is that a false positive is as likely to hit 0.95-0.96 or 0.98-0.99 as it is to hit 0.90-0.91. Try to disprove that one, and you will have convinced me that more than 2 cutoffs may be useful. Rob -- Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/ From mhammond at skippinet.com.au Mon Aug 4 12:36:16 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun Aug 3 21:36:26 2003 Subject: [spambayes-dev] CVS branch for new Outlook dialog work. Message-ID: <02f301c35a28$ccb71c90$f502a8c0@eden> I'm working on "fixing" the dialogs for Outlook. At this stage, "fixing" means making it possible for anyone else in the world to modify the dialogs. Initially, the dialogs themselves will not change (much!), but the framework will be such that making changes or adding new options should be quite trivial (edit the dialogs with MSVC or your dialog editor of choice, then add one line of Python code linking the control ID with the option name) The history to date: * I created the original dialogs using MSVC, and translated them "by hand" to the .py source files (but did keep the .rc locally!) * Sean True needed to redo the dialogs for SpamAtBay, and wrote a .rc parser (which is something I would have loved years ago!). Sean sent me this code. * In the mean time, Adam Walker asked for CVS checkin privs, mentioning he wanted to work on the UI. I jumped on this and forwarded Sean's code to him (by which time I still hadn't looked at it :( ) * Adam decided to write his own leaning on Python's "shlex" module (you learn something new every day ;). * In the meantime, I decided to play with a "data driven" dialog approach - you just link a "control ID" with a SpamBayes "option name", and good magic happens (eg tooltips and validation that leans on the OptionsClass module) I am playing with all this stuff on a new branch I created. The branch name is 'outlook-dialog-branch' (with a tag 'outlook-dialog-fork' marking where the branch started), and is created only on the "Outlook2000" directory of the tree. At this stage it is for playing only, but soon the branch will replace the existing dialogs with the new ones. I will let you know when that happens. If you want to play: * change to the Outlook2000 directory. * cvs -z5 update -r 'outlook-dialog-branch' * Look at, and run dialogs\test_dialog.py At this stage, SpamBayes still works fine on that branch (only the test code uses the new stuff). But to move back to the trunk: * cvs -z5 update -A Mark. From popiel at wolfskeep.com Sun Aug 3 19:40:24 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Sun Aug 3 21:40:28 2003 Subject: [spambayes-dev] I took a big step Tuesday... In-Reply-To: Message from Rob Hooft of "Sun, 03 Aug 2003 12:51:34 +0200." <3F2CE936.5070808@hooft.net> References: <20030728180525.2134D2DDEE@cashew.wolfskeep.com> <3F2CE936.5070808@hooft.net> Message-ID: <20030804014024.67A1A2DDC1@cashew.wolfskeep.com> In message: <3F2CE936.5070808@hooft.net> Rob Hooft writes: >T. Alexander Popiel wrote: >> In message: >> "Tim Peters" writes: >>> >>>could be described by >>> >>> Ham 0.15 Unsure 0.80 Spam >> >> +1 > >People, this is all very unscientific. We have done lots of research in >the earlier days of spambayes, and have come to the conclusion that >there are no more than two useful cut-off points. I certainly agree with you; however, I like the notation shown above better than the cutoff notation that we currently have in the options files. It also makes it more obvious how to drop down to _just_ two output categories. That it makes it easier for people to generate more than three categories is something that I find completely uninteresting and unobjectionable. ;-) - Alex From seant at webreply.com Sun Aug 3 23:21:48 2003 From: seant at webreply.com (Sean True) Date: Sun Aug 3 22:22:09 2003 Subject: [spambayes-dev] How much tokenizer improvement is enough to justify a change? Message-ID: <001b01c35a2f$28e7f9c0$0201a8c0@swapwizard.com> I ran the current tokenizer on timcv.py on 10000 hams and 10000 spams split into 10 buckets. Here's the base set ... Comments? -> best cost for all runs: $169.60 -> per-fp cost $10.00; per-fn cost $1.00; per-unsure cost $0.20 -> achieved at ham & spam cutoffs 0.025 & 0.275 -> fp 6; fn 55; unsure ham 47; unsure spam 226 -> fp rate 0.06%; fn rate 0.55%; unsure rate 1.36% -> all runs false positives: 0 -> all runs false negatives: 201 -> all runs unsure: 1460 -> all runs false positive %: 0.0 -> all runs false negative %: 2.01 -> all runs unsure %: 7.3 -> all runs cost: $493.00 Total messages: 20000; 10000 (50.0%) ham + 10000 (50.0%) spam Ham: 9989 (99.89%) ok, 11 (0.11%) unsure, 0 (0.00%) fp Spam: 8350 (83.50%) ok, 1449 (14.49%) unsure, 201 (2.01%) fn Score False: 1.00% Unsure 7.30% Standard Cost: $493.0000 Flex Cost: $761.9844 Flex**2 Cost: $529.8252 Delayed-Total messages: 20000; 10000 (50.0%) ham + 10000 (50.0%) spam Delayed-Ham: 9947 (99.47%) ok, 47 (0.47%) unsure, 6 (0.06%) fp Delayed-Spam: 9719 (97.19%) ok, 226 (2.26%) unsure, 55 (0.55%) fn Delayed-Score False: 0.30% Unsure 1.36% Delayed-Standard Cost: $169.6000 Delayed-Flex Cost: $318.2508 Delayed-Flex**2 Cost: $232.0352 After change (breaking up compound words > maxwordlen into smaller words) -> best cost for all runs: $157.60 -> per-fp cost $10.00; per-fn cost $1.00; per-unsure cost $0.20 -> achieved at ham & spam cutoffs 0.015 & 0.295 -> fp 5; fn 44; unsure ham 88; unsure spam 230 -> fp rate 0.05%; fn rate 0.44%; unsure rate 1.59% -> all runs false positives: 0 -> all runs false negatives: 182 -> all runs unsure: 1430 -> all runs false positive %: 0.0 -> all runs false negative %: 1.82 -> all runs unsure %: 7.15 -> all runs cost: $468.00 Total messages: 20000; 10000 (50.0%) ham + 10000 (50.0%) spam Ham: 9990 (99.90%) ok, 10 (0.10%) unsure, 0 (0.00%) fp Spam: 8398 (83.98%) ok, 1420 (14.20%) unsure, 182 (1.82%) fn Score False: 0.91% Unsure 7.15% Standard Cost: $468.0000 Flex Cost: $724.4078 Flex**2 Cost: $498.5645 Delayed-Total messages: 20000; 10000 (50.0%) ham + 10000 (50.0%) spam Delayed-Ham: 9907 (99.07%) ok, 88 (0.88%) unsure, 5 (0.05%) fp Delayed-Spam: 9726 (97.26%) ok, 230 (2.30%) unsure, 44 (0.44%) fn Delayed-Score False: 0.24% Unsure 1.59% Delayed-Standard Cost: $157.6000 Delayed-Flex Cost: $365.5402 Delayed-Flex**2 Cost: $237.8641 From T.A.Meyer at massey.ac.nz Mon Aug 4 15:24:48 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Aug 3 22:25:24 2003 Subject: [spambayes-dev] How much tokenizer improvement is enough to justifya change? Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302A9EBC8@its-xchg4.massey.ac.nz> > I ran the current tokenizer on timcv.py on 10000 hams and > 10000 spams split into 10 buckets. > Here's the base set ... Comments? [...] > After change (breaking up compound words > maxwordlen into > smaller words) Have you got a patch for this so that we can see what results we get too? (well, the two or three people still interested in testing, anyway!). As for the "how much" question, I'll leave that to Tim ;) =Tony Meyer From tim.one at comcast.net Mon Aug 4 01:30:24 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Aug 4 00:30:52 2003 Subject: [spambayes-dev] I took a big step Tuesday... In-Reply-To: <3F2CE936.5070808@hooft.net> Message-ID: [Rob Hooft] Nice to hear from you, Rob! > People, this is all very unscientific. Seat-of-the-pants tuning usually is . > We have done lots of research in the earlier days of spambayes, and > have come to the conclusion that there are no more than two useful > cut-off points. Our false-positives mostly scored hopelessly close > to the ideal 1.00000000000000000. Hmm. That wasn't true of my data: the only FP I had scoring 1.00 (rounded) was the message that consisted almost entirely of a full quote of a Nigerian scam. That one was hopeless. All other FP scored below 1.00 (rounded). > If you find spam boring and want to delete everything above 0.995 > automatically, there is no scientific basis for not cutting at 0.90 > instead. There's an obvious basis for not doing that, though: I've seen FP scoring above 0.90 in day-to-day use, always a piece of HTML email I actually want, from an online business spambayes hadn't yet been taught about. OTOH, I've never seen an FP in day-to-day use that scored 1.00 (rounded), although *most* spam scores 1.00 (rounded -- and most ham scores 0.00 (rounded)). I think Skip is seeing the same. I didn't do any research using the full set of tokenization gimmicks we have today, and I didn't do any using the kind of training I've fallen into (a few hundred "random" at the start, followed by a mix of mistake-based and unsure-based when I felt like it), and I didn't do any on personal email (I was doing tech mailing-list tests). It *appears* to be "impossibly" hard for a mistake to get nailed at the wrong end of the scale for me because my database remains small (so individual spamprobs aren't getting near 0.00 or 1.00). It appears to be hard for Skip because he uses much more training data (than I use), so his spambayes has a more accurate view of his reality. Theory simply hasn't kept up with practice here. That's what happens when all the theorists die . From T.A.Meyer at massey.ac.nz Mon Aug 4 17:56:36 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Aug 4 00:57:20 2003 Subject: [spambayes-dev] CVS branch for new Outlook dialog work. Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302A9ED20@its-xchg4.massey.ac.nz> > If you want to play: > * change to the Outlook2000 directory. > * cvs -z5 update -r 'outlook-dialog-branch' > * Look at, and run dialogs\test_dialog.py I get: Traceback (most recent call last): File "D:\cvs\spambayes\spambayes\Outlook2000\dialogs\test_dialogs.py", line 156, in ? mgr = manager.GetManager() File "D:\cvs\spambayes\spambayes\Outlook2000\manager.py", line 724, in GetManager _mgr = BayesManager(outlook=outlook) File "D:\cvs\spambayes\spambayes\Outlook2000\manager.py", line 209, in __init__ self.PrepareConfig() File "D:\cvs\spambayes\spambayes\Outlook2000\manager.py", line 478, in PrepareConfig self.options = config.CreateConfig() AttributeError: 'module' object has no attribute 'CreateConfig' Any idea why? =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Aug 4 18:45:23 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Aug 4 01:46:02 2003 Subject: [spambayes-dev] Splitting composite words results Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302A9ED58@its-xchg4.massey.ac.nz> Here are my timtest results for Sean's patch to split composite words: ham:spam: 4700:1980 7900:15260 4700:1980 7900:15260 fp total: 0 0 2 2 fp %: 0.00 0.00 0.03 0.03 fn total: 75 76 176 172 fn %: 3.79 3.84 1.15 1.13 unsure t: 190 189 501 499 unsure %: 2.84 2.83 2.16 2.15 real cost: $113.00 $113.80 $296.20 $291.80 best cost: $197.00 $194.60 $489.60 $488.80 h mean: 0.64 0.64 0.63 0.62 h sdev: 4.50 4.48 4.84 4.81 s mean: 83.83 83.88 94.52 94.57 s sdev: 29.98 29.95 18.67 18.56 mean diff: 83.19 83.24 93.89 93.95 k: 2.41 2.42 3.99 4.02 One win, one loss; I'm not bothered either way, it seems. Anyone else care to run the test? Come on, you know you want to... =Tony Meyer From rob at hooft.net Mon Aug 4 10:06:58 2003 From: rob at hooft.net (Rob W.W. Hooft) Date: Mon Aug 4 03:12:49 2003 Subject: [spambayes-dev] How much tokenizer improvement is enough to justify a change? In-Reply-To: <001b01c35a2f$28e7f9c0$0201a8c0@swapwizard.com> References: <001b01c35a2f$28e7f9c0$0201a8c0@swapwizard.com> Message-ID: <3F2E0612.9000207@hooft.net> Sean True wrote: > Delayed-Standard Cost: $169.6000 > Delayed-Flex Cost: $318.2508 > Delayed-Flex**2 Cost: $232.0352 > > After change (breaking up compound words > maxwordlen into smaller words) > Delayed-Standard Cost: $157.6000 > Delayed-Flex Cost: $365.5402 > Delayed-Flex**2 Cost: $237.8641 This merges nicely with the more-than-three-bins thread: The two /flex/ costs are cost functions that use a continuous function to describe the penalty given to a message. In its own zone (a spam in the spam zone, and a ham in the ham zone) these are 0.0, and they are going up smoothly (linear or quadratic) once messages get out into the unsure. What you see here is that even though for your data set the cutoff-price is going down, the average amount by which messages are outside of their own zone is going up.... a vote against making more than three bins.... I've tried to use these /flex/es long ago to optimize the cutoffs and other parameters of spambayes, but that failed miserably. Therefore I'm not sure what this all means. Rob -- Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/ From mhammond at skippinet.com.au Mon Aug 4 18:42:42 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Aug 4 03:42:48 2003 Subject: [spambayes-dev] CVS branch for new Outlook dialog work. In-Reply-To: <02f301c35a28$ccb71c90$f502a8c0@eden> Message-ID: <000001c35a5b$fd7c4750$f502a8c0@eden> FYI, the "outlook-dialog-branch" branch is now showing the new dialogs from inside Outlook. The old dialog code has been deleted from the branch. As far as I know, this branch is fully functional. The only remaining issues are cosmetic: * 3d effect missing from dialogs. * Few more options need help text for the tooltips. * Dialogs not centering correctly. So if you are brave, I would appreciate a few people helping to iron out the rough edges. Thanks, Mark. From T.A.Meyer at massey.ac.nz Mon Aug 4 21:07:16 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Aug 4 04:08:06 2003 Subject: [spambayes-dev] CVS branch for new Outlook dialog work. Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302A9ED77@its-xchg4.massey.ac.nz> > So if you are brave, I would appreciate a few people helping > to iron out the rough edges. I get this: Traceback (most recent call last): File "D:\cvs\spambayes\spambayes\Outlook2000\dialogs\dlgcore.py", line 243, in OnCommand self.ApplyHandlingOptionValueError(handler.OnCommand, wparam, lparam) File "D:\cvs\spambayes\spambayes\Outlook2000\dialogs\dlgcore.py", line 194, in ApplyHandlingOptionValueError func(*args) File "D:\cvs\spambayes\spambayes\Outlook2000\dialogs\opt_processors.py", line 190, in OnCommand from dialogs import FolderSelector File "D:\cvs\spambayes\spambayes\Outlook2000\dialogs\FolderSelector.py", line 9, in ? from DialogGlobals import * ImportError: No module named DialogGlobals When trying to open a dialog that shows the folder hierarchy. =Tony Meyer From rob at hooft.net Mon Aug 4 11:03:49 2003 From: rob at hooft.net (Rob W.W. Hooft) Date: Mon Aug 4 04:09:48 2003 Subject: [spambayes-dev] I took a big step Tuesday... In-Reply-To: References: Message-ID: <3F2E1365.7040208@hooft.net> Tim Peters wrote: > [Rob Hooft] > > Nice to hear from you, Rob! Still trying to keep up with the -dev list, but I haven't updated the software in months.... >>We have done lots of research in the earlier days of spambayes, and >>have come to the conclusion that there are no more than two useful >>cut-off points. Our false-positives mostly scored hopelessly close >>to the ideal 1.00000000000000000. > Hmm. That wasn't true of my data: the only FP I had scoring 1.00 (rounded) > was the message that consisted almost entirely of a full quote of a Nigerian > scam. That one was hopeless. All other FP scored below 1.00 (rounded). I'm not sure about the statistics, but I guess you had so few fp that you can not test the hypothesis that the distribution of falses is non-homogeneous. It may still be useful if the FP distribution is homogeneous, because, as you note the TP distribution is very sharply peaked. Cutting at 0.9995 instead of 0.995 may cut almost as many spam, and will cost 1/10th the amount of ham if the homogeneous distribution hypothesis is true. This could even result in an intermediate way for training: don't train on messages that score <0.001 or >0.999 >>If you find spam boring and want to delete everything above 0.995 >>automatically, there is no scientific basis for not cutting at 0.90 >>instead. > > > There's an obvious basis for not doing that, though: I've seen FP scoring > above 0.90 in day-to-day use, always a piece of HTML email I actually want, > from an online business spambayes hadn't yet been taught about. OTOH, I've > never seen an FP in day-to-day use that scored 1.00 (rounded), although > *most* spam scores 1.00 (rounded -- and most ham scores 0.00 (rounded)). I > think Skip is seeing the same. When are you actually reading your spam? For me in my setup it is very difficult to get at my spam. The only reason for me to have a look at it is when I am re-training (about once a month): just before training I use the "mail -f spam.mbox" to skim the headers. > Theory simply hasn't kept up with practice here. That's what happens when > all the theorists die . I just gave a sign of life...from a distance. And you yourself are still around as well. Rob -- Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/ From skip at pobox.com Mon Aug 4 10:16:29 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Aug 4 10:16:51 2003 Subject: [spambayes-dev] I took a big step Tuesday... In-Reply-To: <3F2E1365.7040208@hooft.net> References: <3F2E1365.7040208@hooft.net> Message-ID: <16174.27325.40910.815219@montanaro.dyndns.org> Tim> There's an obvious basis for not doing that, though: I've seen FP Tim> scoring above 0.90 in day-to-day use, always a piece of HTML email I Tim> actually want, from an online business spambayes hadn't yet been Tim> taught about. OTOH, I've never seen an FP in day-to-day use that Tim> scored 1.00 (rounded), although *most* spam scores 1.00 (rounded -- Tim> and most ham scores 0.00 (rounded)). I think Skip is seeing the Tim> same. Correct. On the rare occasions when I've seen false positives I don't recall ever seeing one score 1.00. The only spams I'm now tossing out without scanning are those which score a rounded 1.00. My assumption in doing this is that the time saved not ever scanning those messages (about 85% of the stuff I receive which is scored as spam) is worth the risk of maybe tossing out a valid message. Are there more than three mail categories? In theory, I guess not. In practice, I think there may be, though I can't justify it with anything other than my own anecdotal experience. When this all started, I thought there only two categories, ham and spam. With improvements to the algorithms we've grown a third. Rob> When are you actually reading your spam? For me in my setup it is Rob> very difficult to get at my spam. Then you need another setup. ;-) My spam is right there along with everything else, it's just sitting in its own mailboxes (lospam, hispam, and /dev/null). And there is the obligatory unsure mailbox. I only train explicitly, and only on so-called "low spam", unsures, and hams which I notice don't score a rounded 0.00. (That's more of a problem. Since ham gets further filtered into topical mailboxes the "high ham" currently isn't nicely segregated.) Rob> The only reason for me to have a look at it is when I am Rob> re-training (about once a month): just before training I use the Rob> "mail -f spam.mbox" to skim the headers. But I was getting several hundred per day. If I didn't scan them fairly frequently, a) the risk that I'd miss a false positive (which seem to score < 1.00) in the growing sea of spam would go up, and b) I'd get so far behind that it would have to scan several hundred at a time over the course of several days to catch up. Maybe it's just another case of "practicality beats purity". Skip From popiel at wolfskeep.com Mon Aug 4 10:12:29 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Mon Aug 4 12:12:33 2003 Subject: [spambayes-dev] Splitting composite words results In-Reply-To: Message from "Meyer, Tony" of "Mon, 04 Aug 2003 17:45:23 +1200." <1ED4ECF91CDED24C8D012BCF2B034F1302A9ED58@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1302A9ED58@its-xchg4.massey.ac.nz> Message-ID: <20030804161229.11D032DE88@cashew.wolfskeep.com> In message: <1ED4ECF91CDED24C8D012BCF2B034F1302A9ED58@its-xchg4.massey.ac.nz> "Meyer, Tony" writes: > >Anyone else care to run the test? Come on, you know you want to... > Sure, I'll test it... but I haven't seen the patch posted... - Alex From seant at webreply.com Mon Aug 4 16:56:29 2003 From: seant at webreply.com (Sean True) Date: Mon Aug 4 15:56:41 2003 Subject: [spambayes-dev] Very small change for composite word tokenizing. Message-ID: <00b201c35ac2$7ed88460$0201a8c0@swapwizard.com> This is the code that does it, in context, if not in patch form. I had mailed to to Tony, but not the whole list. Sorry about that. -- Sean Not exactly a patch, but it's a one minute cut and paste. I'm theorizing that the memory hit is not horrendous -- mostly generates sensible fragments www.microsoft.com -> www, microsoft, com Very_naughty_bits -> very, naughty, bits -> longword_re = re.compile(r"[a-zA-Z1-9$]+") def tokenize_word(word, _len=len, maxword=options.skip_max_word_size): n = _len(word) # Make sure this range matches in tokenize(). if 3 <= n <= maxword: yield word elif n >= 3: # A long word. # Don't want to skip embedded email addresses. # An earlier scheme also split up the y in x@y on '.'. Not splitting # improved the f-n rate; the f-p rate didn't care either way. if n < 40 and '.' in word and word.count('@') == 1: p1, p2 = word.split('@') yield 'email name:' + p1 yield 'email addr:' + p2 < else: # There's value in generating a token indicating roughly how # many chars were skipped. This has real benefit for the f-n # rate, but is neutral for the f-p rate. I don't know why! # XXX Figure out why, and/or see if some other way of summarizing # XXX this info has greater benefit. if options.generate_long_skips: yield "skip:%c %d" % (word[0], n // 10 * 10) if has_highbit_char(word): hicount = 0 for i in map(ord, word): if i >= 128: hicount += 1 yield "8bit%%:%d" % round(hicount * 100.0 / len(word)) -> # Break up composite words looking for good stuff -> for w in longword_re.findall(word): -> if 3 <= len(w) <= maxword: -> yield word -> From kennypitt at hotmail.com Mon Aug 4 17:49:19 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Mon Aug 4 16:50:08 2003 Subject: [spambayes-dev] CVS branch for new Outlook dialog work. In-Reply-To: <000001c35a5b$fd7c4750$f502a8c0@eden> References: <000001c35a5b$fd7c4750$f502a8c0@eden> Message-ID: <3F2EC6CF.6050105@hotmail.com> Mark Hammond wrote: > FYI, the "outlook-dialog-branch" branch is now showing the new dialogs from > inside Outlook. The old dialog code has been deleted from the branch. > > As far as I know, this branch is fully functional. The only remaining > issues are cosmetic: > > * 3d effect missing from dialogs. [snip] Simple solution to this one: the window class for GROUPBOX is defined incorrectly in rcparser.py. _controlMap contains "GROUPBOX":0x82, which is the value for "Static". Turns out GROUPBOX is actually a variation of "Button" instead of "Static" (go figure), so changing to "GROUPBOX":0x80 restores the 3D effect. -- Kenny Pitt From T.A.Meyer at massey.ac.nz Tue Aug 5 11:40:24 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Aug 4 18:41:05 2003 Subject: [spambayes-dev] Very small change for composite word tokenizing. Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302A9EE8C@its-xchg4.massey.ac.nz> > -> longword_re = re.compile(r"[a-zA-Z1-9$]+") Note that Sean admitted (privately) that leaving out the '0' was unintentional ;) I tried it with [a-zA-Z0-9$] and got the same results, though. =Tony Meyer From seant at webreply.com Mon Aug 4 20:59:04 2003 From: seant at webreply.com (Sean True) Date: Mon Aug 4 19:59:44 2003 Subject: [spambayes-dev] Mark and Sean stand around the water cooler discussing plugins, and call each other names. In-Reply-To: <030601c35ae2$695eb8f0$f502a8c0@eden> Message-ID: <000801c35ae4$65bd8b70$0201a8c0@swapwizard.com> OK, not really, but at least you are reading the rest of the message. Mark and I are ruminating about plugins and add on filters for Spambayes/SpamAtBay/InBoxer. -- Sean > -----Original Message----- > From: Mark Hammond [mailto:mhammond@skippinet.com.au] > Sent: Monday, August 04, 2003 7:45 PM > To: 'Sean True' > Subject: RE: [Spambayes-checkins] > spambayes/Outlook2000/dialogsasync_processor.py, NONE, 1.1.2.1 > > > > Because I'm getting requests left, right, and center from > > developers who > > don't want to work in the middle of the tree. They want to > > add SPEWS/RBL for > > testing; they want to add logging, mail forwarding, > implementation of > > business logic. They want a platform, not just a mail filter. > > Just so I am clear here - requests from who exactly, and how > do you intend > satisying their requests? Will these users actually write > "code", or will > it still all be configured via a UI. Requests from current and potential SAB customers. Who will write actual code, including maybe their own UI. > > If the latter and via a UI, then I don't see the advantage, > as we don't need > a flexible system; we just need one as good as the UI. In > the first stages, > when there aren't that many "competing" filters, and where > the rules can't > get *too* complex and still be reflected in a UI, I don't see > the advantage. Agreed. The idea is to empower a developer to write simple code. If he want's UI, we may be able to offer a simple framework, but I have no itch to write a generic UI system for every developer. Unlike some people who are more ambitious than I. ;-) > > Of course, if you intend exposing the same level of code as > you sent me as > an example to the users, then I can see the benefit. > However, I am not > convinced that the model we are then exposing is correct - I > would not be > happy writing my own personal rules that way. I anticipated that I could wrap your filter module in this code, and that it would then be one of many alternative non-bayesian filters available for a person to pick and choose from. My hope was to have a level playing field, whatever the exact API. My current sketch is just a sketch, although it has made it easy to do some cool stuff. > > > From my point of view, there are two large pieces of > engineering here: > > > > The baysian scoring code. > > The Outlook integration code. > > > > At "the end of the day", I'd like to see a plugin interface > > for the Outlook > > code that would run the spambayes engine itself as an > external plugin. > > > > Think Photoshop, I guess. > > > As for important, I see it as a giveback. We're going to > > develop, doc, and > > maintain a platform plugin API for the commercial product, > > and would like > > people who don't want to spend $ to be able to use plugins -- > > and write > > them. > > Except it isn't really clear how it will actually help our open source > project in a useful way. Unless we also go to the effort of > supporting and > exposing the plugin mechanism it just becomes complexity in > our code for > something we don't use. Unless of course you were going to > contribute code > to actually make these filters useful to us too . Indeed I was. The idea is to have the ability to move source code filters back and forth between the supported platforms, including any other SB systems that need one, and any non-SB systems would be welcome to add the architecture. Of course, they would need Python. What a shame. I'm absolutely in favor of supporting it. I would love to write a whole series of little plugins that would do interesting things. > > If we *do* support a "filter" API, I would like to see one > that at least the > other SpamBayes projects could use. I see no reason > pop3proxy etc could not > use the exact same system as we use. But I am reluctant to sanction a > plugin API that I can't see how it would be used without huge > effort from > me - and in which case I may as well spend that effort in a > direction I > prefer. Mmm. I was hoping for _zero_ effort from you. I need you to be fixing bugs in the parts of the system that make my head hurt. > > How about we move this discussion to spambayes-dev, and see > if we can get > extra interest from anyone else on the project? I think the > concept is > sound, so see no reason we can't aim at something useful to a > few of us :) > Yes, indeed. Done. Moved. > Thanks, > > Mark. > > -- Sean From mhammond at skippinet.com.au Tue Aug 5 11:43:38 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Aug 4 20:43:45 2003 Subject: [spambayes-dev] RE: Mark and Sean stand around the water cooler discussing plugins, and call each other names. In-Reply-To: <000801c35ae4$65bd8b70$0201a8c0@swapwizard.com> Message-ID: <034d01c35aea$9cf67330$f502a8c0@eden> > > If the latter and via a UI, then I don't see the advantage, > > as we don't need > > a flexible system; we just need one as good as the UI. In > > the first stages, > > when there aren't that many "competing" filters, and where > > the rules can't > > get *too* complex and still be reflected in a UI, I don't see > > the advantage. > Agreed. The idea is to empower a developer to write simple code. If he > want's UI, we may be able to offer a simple framework, but I > have no itch to > write a generic UI system for every developer. Unlike some > people who are > more ambitious than I. ;-) Note that the filtering ideas we have been discussing privately are completely separate from the UI work I am doing. I don't see how a "generic filter API" could be built on our budgets ;) > Indeed I was. The idea is to have the ability to move source > code filters > back and forth between the supported platforms, including any other SB > systems that need one, and any non-SB systems would be > welcome to add the > architecture. Of course, they would need Python. What a shame. > I'm absolutely in favor of supporting it. I would love to > write a whole > series of little plugins that would do interesting things. I am afraid I am still not clear on exactly what you are proposing. Initially I thought you were talking about "just" a generic plugin mechanism, but now it seems you are also talking about specific changes to the codebase to provide additional, concrete filtering capabilities. Can you outline exactly what changes you are proposing to add (ie, exactly what changes the user would see)? Thanks, Mark. From T.A.Meyer at massey.ac.nz Tue Aug 5 14:05:02 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Aug 4 21:07:21 2003 Subject: [spambayes-dev] Very small change for composite word tokenizing. Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302A9EF8A@its-xchg4.massey.ac.nz> Ok, for those interested in testing this out, there are *two* changes to make to the code that Sean posted. The first is to change the regex to include '0', and the second is to yield w and not word. Sean made these changes and said that his positives results disappeared, but mine didn't: [the third and fourth columns are the old, inaccurate results, included for reference] filename: august_no_seans august_no_seans accurate_seans august_seans ham:spam: 7900:15260 7900:15260 7900:15260 7900:15260 fp total: 2 2 2 2 fp %: 0.03 0.03 0.03 0.03 fn total: 176 175 176 172 fn %: 1.15 1.15 1.15 1.13 unsure t: 501 495 501 499 unsure %: 2.16 2.14 2.16 2.15 real cost: $296.20 $294.00 $296.20 $291.80 best cost: $489.60 $488.80 $489.60 $488.80 h mean: 0.63 0.60 0.63 0.62 h sdev: 4.84 4.75 4.84 4.81 s mean: 94.52 94.49 94.52 94.57 s sdev: 18.67 18.70 18.67 18.56 mean diff: 93.89 93.89 93.89 93.95 k: 3.99 4.00 3.99 4.02 So my fn didn't go down nearly as much, but my unsures went down more. =Tony Meyer From mhammond at skippinet.com.au Tue Aug 5 23:13:31 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Aug 5 08:13:31 2003 Subject: [spambayes-dev] CVS branch for new Outlook dialog work. In-Reply-To: <02f301c35a28$ccb71c90$f502a8c0@eden> Message-ID: <057601c35b4a$fc4b32c0$f502a8c0@eden> The 'outlook-dialog-branch' should be close to baked. Everything should be working correctly (including the "Folder Selector" code :). A few of the Tooltip bubbles could do with some work, but these are maintained in the default options documentation strings. This new code does not depend on "win32ui", which is the module that exposes the Microsoft Foundation Classes (MFC). All our dialogs are now "raw" windows API calls. This will reduce the size of the distribution significantly, but more importantly fix the bugs with the "autocomplete" and other functions not working in later versions of Outlook. Some more work will need to be dont WRT the binary - we now rely on a .rc and .bmp file at runtime that we need to ensure end up in a reasonable place. Remember that as this is still on a branch, it is a perfect time to test - you just revert back to the trunk should you have problems. However, once I merge it in (which wont be for a little while), backing out will be harder. So don't say I didn't warn you :) Also, in case you have an itch in this direction, I would be happy to accept patches to the .rc file that tweak the dialogs. Just open this .rc file in MSVC, and off you go. Thanks, Mark. From mhammond at skippinet.com.au Wed Aug 6 00:24:13 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Aug 5 09:24:25 2003 Subject: [spambayes-dev] "Donations" page for SpamBayes Message-ID: <05b301c35b54$dd64b110$f502a8c0@eden> Tim and I have been chatting with the PSF about channeling donations for the SpamBayes project their way. This has come up before on the SpamBayes list, with Tim prompting that the PSF would be a good place to send money to! Here is my first draft at the page. Please offer any suggestions you feel appropriate. Thanks, Mark. Title: Donations for the SpamBayes project Author-Email: SpamBayes@python.org Author: SpamBayes

SpamBayes donations

SpamBayes is free software. There is absolutely no obligation to pay any money to use or redistribute the software.

However, the developers of the software consider the Python Software Foundation (PSF) a charity worthy of any donation you feel appropriate. Such a donation would not only demonstrate your appreciation of this tool, but also to help advance the development of other Open Source tools tools in the future.

About the PSF

In summary, the PSF is a non-profit organization devoted to advancing the Python programming language. The PSF is registered as a US charity, so all deductions made by US residents will be fully tax deducatable (but see the PSF donations page for specific details).

For more information on the PSF, please see the PSF web site.

Why donate to the PSF?

SpamBayes is written in the Python programming language. The developers of SpamBayes believe that if it were not for Python, SpamBayes would simply not exist - the productivity gains and ease of use made it possible for a bunch of hackers to experiment freely and somehow end up with this very nice tool.

In addition, the developers are all strong advocates of Open Source Software. It gives us powerful, free tools we can use to develop software, but more importantly, the tools come with the ultimate technical reference - the source code. When our tools fail to work as we expect (as all software does at some stage), we know we have the resources necessary to do our job.

Yeah yeah, but why donate to the PSF?

Many different people have donated their time to this project, which makes it unreasonable for any individual to collect money. As the PSF is a registered charity and devoted to promoting Open Source Software, it seems the logical choice.

What will the PSF do with my money? Will it be spent on SpamBayes?

Your SpamBayes donation goes into the general PSF fund; it is not earmarked specifically for the SpamBayes project. In the future, the PSF may make additional funds available for SpamBayes, for some other worthy Open Source project, or for some other purpose within its charter. You may like to read the PSF Mission Statement for more details.

OK, OK, where do I pay?

Please make sure you have read this document, so you know exactly why you are giving money ('cos the software is so cool) and to whom (the PSF).

To donate now using PayPal, simply click here

From neal at metaslash.com Tue Aug 5 10:50:44 2003 From: neal at metaslash.com (Neal Norwitz) Date: Tue Aug 5 09:51:21 2003 Subject: [spambayes-dev] Re: [PSC] "Donations" page for SpamBayes In-Reply-To: <05b301c35b54$dd64b110$f502a8c0@eden> References: <05b301c35b54$dd64b110$f502a8c0@eden> Message-ID: <20030805135044.GO1266@epoch.metaslash.com> Looks good, with minor changes below. -- Neal -- >

About the PSF

>

In summary, the PSF is a non-profit organization devoted to advancing the > Python programming language. The PSF is registered as a US charity, so > all deductions made by US residents will be fully tax deducatable > (but see the > PSF donations page > for specific details).

Remove "In summary" and spell deducatable -> deductible. >

In addition, the developers are all strong advocates of Open Source > Software. It gives us powerful, free tools we can use to develop software, > but more importantly, the tools come with the ultimate technical reference - > the source code. When our tools fail to work as we expect (as all software > does at some stage), we know we have the resources necessary to do our job. >

I don't understand the last sentance. Are you saying that since the tools are open source, you have the resources to fix it? (ie, source code) If so, maybe start the last sentance, "When our Open Source tools fail" ... Hmmm, just thinking out loud. Neal From skip at pobox.com Tue Aug 5 10:04:14 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Aug 5 10:04:37 2003 Subject: [spambayes-dev] "Donations" page for SpamBayes In-Reply-To: <05b301c35b54$dd64b110$f502a8c0@eden> References: <05b301c35b54$dd64b110$f502a8c0@eden> Message-ID: <16175.47454.438037.304069@montanaro.dyndns.org> Mark> Tim and I have been chatting with the PSF about channeling Mark> donations for the SpamBayes project their way. This has come up Mark> before on the SpamBayes list, with Tim prompting that the PSF Mark> would be a good place to send money to! One point is to note that Aahz created a /psf/donate-spambayes.ht file yesterday. It's substantially different than yours, but it's probably worth coordinating efforts. Mark> Here is my first draft at the page. Please offer any suggestions Mark> you feel appropriate. Mark>

SpamBayes donations

Mark>

SpamBayes is free software. There is absolutely no obligation Mark> to pay any money to use or redistribute the software.

Mark>

However, the developers of the software consider the Python Mark> Software Foundation (PSF) a charity worthy of any donation you Mark> feel appropriate. Such a donation would not only demonstrate your Mark> appreciation of this tool, but also to help advance the Mark> development of other Open Source tools tools in the future.

I'd smush that into one paragraph with slight mods:

SpamBayes is free software. There is absolutely no obligation to pay any money to use or redistribute the software. However, the developers of the software consider the Python Software Foundation (PSF) a charity worthy of your support. A donation to the PSF would not only demonstrate your appreciation of this tool, but also to help advance the development of other Python-based open source tools tools in the future.

Mark>

In summary, the PSF is a non-profit organization devoted to Mark> advancing the Python programming language. The PSF is registered Mark> as a US charity, so all deductions made by US residents will be Mark> fully tax deducatable (but see the href="http://www.python.org/psf/donations.html">PSF donations Mark> page for specific details).

I'd can the "In summary, " prefix and change "is registered as a US charity" to "is a registered US non-profit organization". It's spelled "deductible". Mark> ... When our tools fail to work as we expect (as all software does Mark> at some stage), we know we have the resources necessary to do our Mark> job.

Maybe end with: ... we know we have the resources necessary to fix them. Mark>

Many different people have donated their time to this project, Mark> which makes it unreasonable for any individual to collect money. Mark> As the PSF is a registered charity and devoted to promoting Open Mark> Source Software, it seems the logical choice.

again, "registered non-profit" seems more familiar to my US eyeballs. Mark>

To donate now using PayPal, simply click here Mark>

Mark> Mark> Mark> Mark> Mark> Mark> Mark> Mark> src="https://www.paypal.com/images/x-click-but21.gif" Mark> border="0" name="submit" Mark> alt="Donate to the Python Software Foundation using PayPal"> Mark>
Mark>

I think this may one place you and Aahz might want to coordinate to make sure all Spambayes-related donations wind up in the same category. Skip From kennypitt at hotmail.com Tue Aug 5 11:08:20 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Tue Aug 5 10:08:40 2003 Subject: [spambayes-dev] "Donations" page for SpamBayes In-Reply-To: <05b301c35b54$dd64b110$f502a8c0@eden> References: <05b301c35b54$dd64b110$f502a8c0@eden> Message-ID: <3F2FBA54.7060000@hotmail.com> Mark Hammond wrote: > Tim and I have been chatting with the PSF about channeling donations for the > SpamBayes project their way. This has come up before on the SpamBayes list, > with Tim prompting that the PSF would be a good place to send money to! > > Here is my first draft at the page. Please offer any suggestions you feel > appropriate. > I'm also cool with the verbage. Here are a few more corrections in addition to those submitted by Neal. [snip] >

However, the developers of the software consider the Python Software > Foundation (PSF) a charity worthy of any donation you feel appropriate. > Such a donation would not only demonstrate your appreciation of this tool, > but also to help advance the development of other Open Source tools "help to advance" might sound better here than "to help advance". Also, "tools" is doubled in "Open Source tools tools". > tools in the future.

> >

About the PSF

>

In summary, the PSF is a non-profit organization devoted to advancing the > Python programming language. The PSF is registered as a US charity, so > all deductions made by US residents will be fully tax deducatable Did you mean "all donations" here instead of "all deductions"? > (but see the > PSF donations page > for specific details).

[snip] I think routing donations to the PSF is a great idea. I don't get to code in Python as often as I would like because of the requirements of my job (combined with a shortage of free time at home ;-) ), but I would love to see it become more widespread. -- Kenny Pitt From aahz at pythoncraft.com Tue Aug 5 11:39:23 2003 From: aahz at pythoncraft.com (Aahz) Date: Tue Aug 5 10:39:27 2003 Subject: [PSC] Re: [spambayes-dev] "Donations" page for SpamBayes In-Reply-To: <16175.47454.438037.304069@montanaro.dyndns.org> References: <05b301c35b54$dd64b110$f502a8c0@eden> <16175.47454.438037.304069@montanaro.dyndns.org> Message-ID: <20030805143923.GC8860@panix.com> On Tue, Aug 05, 2003, Skip Montanaro wrote: > > I think this may one place you and Aahz might want to coordinate to make > sure all Spambayes-related donations wind up in the same category. Don't worry, we're going over this in painful, agonizing detail. ;-) -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ This is Python. We don't care much about theory, except where it intersects with useful practice. --Aahz From popiel at wolfskeep.com Tue Aug 5 21:24:41 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Tue Aug 5 23:24:46 2003 Subject: [spambayes-dev] Very small change for composite word tokenizing. In-Reply-To: Message from "Sean True" of "Mon, 04 Aug 2003 15:56:29 EDT." <00b201c35ac2$7ed88460$0201a8c0@swapwizard.com> References: <00b201c35ac2$7ed88460$0201a8c0@swapwizard.com> Message-ID: <20030806032441.70E1E2DEB4@cashew.wolfskeep.com> In message: <00b201c35ac2$7ed88460$0201a8c0@swapwizard.com> "Sean True" writes: > >Not exactly a patch, but it's a one minute cut and paste. I'm theorizing >that the memory hit is not horrendous -- mostly generates sensible >fragments >www.microsoft.com -> www, microsoft, com >Very_naughty_bits -> very, naughty, bits With the two fixes mentioned earlier, here's my results on 48 days of data... filename: fragment normal ham:spam: 1978:6166 1978:6166 fp total: 1 1 fp %: 0.05 0.05 fn total: 28 25 fn %: 0.45 0.41 unsure t: 172 152 unsure %: 2.11 1.87 real cost: $72.40 $65.40 best cost: $44.20 $41.80 h mean: 0.25 0.27 h sdev: 3.71 3.80 s mean: 98.51 98.66 s sdev: 8.97 8.56 mean diff: 98.26 98.39 k: 7.75 7.96 In other words, for me it's a significant loss. - Alex From kennypitt at hotmail.com Wed Aug 6 15:30:28 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Aug 6 14:30:35 2003 Subject: [spambayes-dev] Very small change for composite word tokenizing. In-Reply-To: <00b201c35ac2$7ed88460$0201a8c0@swapwizard.com> References: <00b201c35ac2$7ed88460$0201a8c0@swapwizard.com> Message-ID: <3F314944.5050809@hotmail.com> Sean True wrote: > This is the code that does it, in context, if not in patch form. I had > mailed to to Tony, but not the whole list. > Sorry about that. > > -- Sean > > Not exactly a patch, but it's a one minute cut and paste. I'm theorizing > that the memory hit is not horrendous -- mostly generates sensible fragments > www.microsoft.com -> www, microsoft, com > Very_naughty_bits -> very, naughty, bits > [snip] > > -> # Break up composite words looking for good stuff > -> for w in longword_re.findall(word): > -> if 3 <= len(w) <= maxword: > -> yield word > -> Seems like most people are seeing this change as a loss or at best no gain. I wonder if it would make a difference in the accuracy if we returned special compound word tokens instead of returning the components as normal words? Something like: yield 'compound:' + word I'm just speculating here because I, unfortunately, don't have a sufficient number of messages saved up to test this myself. Anyone want to give this variation a try? -- Kenny Pitt From kennypitt at hotmail.com Wed Aug 6 15:40:58 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Aug 6 14:41:08 2003 Subject: [spambayes-dev] Very small change for composite word tokenizing. In-Reply-To: <3F314944.5050809@hotmail.com> References: <00b201c35ac2$7ed88460$0201a8c0@swapwizard.com> <3F314944.5050809@hotmail.com> Message-ID: <3F314BBA.7040700@hotmail.com> Kenny Pitt wrote: [snip] > >> -> # Break up composite words looking for good stuff >> -> for w in longword_re.findall(word): >> -> if 3 <= len(w) <= maxword: >> -> yield word >> -> > > > Seems like most people are seeing this change as a loss or at best no > gain. I wonder if it would make a difference in the accuracy if we > returned special compound word tokens instead of returning the > components as normal words? Something like: > > yield 'compound:' + word > > I'm just speculating here because I, unfortunately, don't have a > sufficient number of messages saved up to test this myself. Anyone want > to give this variation a try? > Uh oh, just noticed a bug in the original that I didn't catch before hitting Send. The original code above should be: yield w instead of: yield word The variation would then be: yield 'compound:' + w Did everyone who previously tested this change catch the error? Without this fix you would be inserting the *entire* compound token into your training data once for each component word found (e.g. Very_Naughty_Bits would result in 'Very_Naughty_Bits' with a count of 3 instead of 'Very', 'Naughty', and 'Bits' each with a count of 1). This could definately have a negative impact on the results. -- Kenny Pitt From T.A.Meyer at massey.ac.nz Thu Aug 7 12:21:39 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Aug 6 19:22:24 2003 Subject: [spambayes-dev] Very small change for composite word tokenizing. Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302A9F619@its-xchg4.massey.ac.nz> > Uh oh, just noticed a bug in the original that I didn't catch before > hitting Send. The original code above should be: > yield w > instead of: > yield word > The variation would then be: > yield 'compound:' + w > > Did everyone who previously tested this change catch the error? My original results, and Sean's, were pre fixing this. My later results, and Alex's were post fixing. (And Sean indicated that his retest after fixing was also a loss, although he was going to try different bucket sizes). Ironically, the incorrect method had better results for Sean, and similar for me. Unless anyone is going to post some more results, I suspect that this will be thrown in the "nice idea but doesn't produce the needed results" bin. (If someone had the time, it would be great to take all the comments from the list, tokenizer.py and elsewhere and make a coherent summary of all the things that have been tested and what the results were...) Anyone up for some more testing? =Tony Meyer From T.A.Meyer at massey.ac.nz Thu Aug 7 12:39:14 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Aug 6 19:40:00 2003 Subject: [spambayes-dev] Very small change for composite word tokenizing. Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302A9F632@its-xchg4.massey.ac.nz> > Seems like most people are seeing this change as a loss or at best no > gain. I wonder if it would make a difference in the accuracy if we > returned special compound word tokens instead of returning the > components as normal words? Something like: > > yield 'compound:' + word > > Anyone want to give this variation a try? (I changed "yield w" to "yield 'compound:' + w") filename: august_no_seans kennys august_seans ham:spam: 7900:15260 7900:15260 7900:15260 fp total: 2 2 2 fp %: 0.03 0.03 0.03 fn total: 176 172 174 fn %: 1.15 1.13 1.14 unsure t: 501 499 491 unsure %: 2.16 2.15 2.12 real cost: $296.20 $291.80 $292.20 best cost: $489.60 $488.80 $485.00 h mean: 0.63 0.62 0.61 h sdev: 4.84 4.81 4.80 s mean: 94.52 94.57 94.56 s sdev: 18.67 18.56 18.58 mean diff: 93.89 93.95 93.95 k: 3.99 4.02 4.02 Interesting. FN's are better than not doing anything with the compound words, but not as good as with just the word. Unsures, however, are even better. I might try this on a different corpus and see how it goes there. =Tony Meyer From T.A.Meyer at massey.ac.nz Thu Aug 7 12:52:16 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Aug 6 19:53:05 2003 Subject: [spambayes-dev] Very small change for composite word tokenizing Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302A9F64B@its-xchg4.massey.ac.nz> [Tony's results] > fp total: 2 2 2 BTW, if anyone is wondering, the two false positives that are in all of my results are: o A message from mtnsms.com telling me about the (then) new smspop service. This remains my only email from mtnsms.com, and it does look very spammy, but was definitely ham (in fact, I now use smspop). It scored 0.950. o A message from a company doing a survey about how a transaction with another company was. Again, looked a lot like spam, but wasn't, and again the only message from this company that I've received (AFAIK). It scored 0.966. I was expecting the second, so could retrieve it and this wasn't a problem. The first one was a real FP, though. Any suggestions on options I could fiddle to correct this (without creating lots of other incorrect classified messages) would be of interest! :) =Tony Meyer From T.A.Meyer at massey.ac.nz Thu Aug 7 13:05:49 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Aug 6 20:06:28 2003 Subject: [spambayes-dev] Very small change for composite word tokenizing Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302A9F671@its-xchg4.massey.ac.nz> And, FWIW, here are results on a different corpus: filename: no_sean2s kenny2s sean2s ham:spam: 7580:7580 7580:7580 7580:7580 fp total: 44 47 45 fp %: 0.58 0.62 0.59 fn total: 16 17 17 fn %: 0.21 0.22 0.22 unsure t: 356 348 344 unsure %: 2.35 2.30 2.27 real cost: $527.20 $556.60 $535.80 best cost: $592.40 $611.00 $584.40 h mean: 3.40 3.38 3.36 h sdev: 14.19 14.21 14.14 s mean: 97.94 97.95 97.98 s sdev: 9.43 9.49 9.40 mean diff: 94.54 94.57 94.62 k: 4.00 3.99 4.02 Kenny's version again does better than Sean's original, although still 1 FN and 1 FP more than not having it at all, in exchange for 12 fewer unsures. (I think I would rather have the unsures). =Tony Meyer From skip at pobox.com Wed Aug 6 20:06:19 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Aug 6 20:06:32 2003 Subject: [spambayes-dev] FYI -- dumbdbm is nuked Message-ID: <16177.38907.329762.442311@montanaro.dyndns.org> I just checked in a simple change to spambayes/dbmstorage.py which erases dumbdbm from the candidate dbm-style modules. The remaining three candidates are gdbm, bsddb3 (aka PyBSDDB, aka bsddb in Python 2.3), and bsddb (before Python 2.3). If you were using dumbdbm previously, you will have to retrain. (Should the plain old dbm module be considered? If available, its restrictions on key and value length shouldn't be a problem.) Skip From mhammond at skippinet.com.au Thu Aug 7 11:21:07 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Aug 6 20:21:15 2003 Subject: [spambayes-dev] RE: "Donations" page for SpamBayes In-Reply-To: <3F2FBA54.7060000@hotmail.com> Message-ID: <041701c35c79$cd4fe7b0$f502a8c0@eden> Thanks for all the comments on the "Donations" page. I think I got them all. The page is now online at http://www.spambayes.org/donations.html, and the source .ht file is in the SpamBayes CVS tree. If any psc members have more comments, just send them to me. Spambayes-devers should just make the change themself. Note that this page is not linked in anywhere yet. Once everyone seems happy, I will create a few links. Thanks, Mark. From mhammond at skippinet.com.au Thu Aug 7 11:40:19 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Aug 6 20:40:26 2003 Subject: [spambayes-dev] FAQ and donations Message-ID: <042601c35c7c$7b4bef60$f502a8c0@eden> As part of linking the "Donations" page in, I should update the FAQ slightly. I am thinking of moving section: 4.15 Do I have to pay for SpamBayes? Can I pay you money if I really want to? to a new section 1.2 What is the license? Does it cost anything? Can I pay anyway?". And slightly tweaking the text (ie, moving some of the "FAQ" text to the donations page), and adding a link to the new page. It makes sense to me that (a) we include license details in the same entry, and that (b) the entry be much closer to the top of the FAQ. Any objections? Mark. From mhammond at skippinet.com.au Thu Aug 7 11:45:10 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Aug 6 20:45:09 2003 Subject: [spambayes-dev] FW: [PSC] RE: "Donations" page for SpamBayes Message-ID: <042901c35c7d$2861eec0$f502a8c0@eden> I think Kevin is correct - if underlines were turned off in my Mozilla, the links would look very subtle. I guess this is in our .css? Any takers? :) Mark. -----Original Message----- From: Kevin Altis [mailto:altis@semi-retired.com] Sent: Thursday, 7 August 2003 10:35 AM To: Mark Hammond Subject: RE: [PSC] RE: "Donations" page for SpamBayes Mark, maybe it is just me and my IE 5.5x browser, but there isn't a difference in the color of the regular text and the hyperlink text, it is all dark gray/black. I don't have underlines turned on in my browser, so that's probably the problem, but I would suggest having a color that stands out more for links. Of course I can see the button at the bottom. Maybe it would be good to add the button at the top too. ka > -----Original Message----- > From: psc-bounces@python.org [mailto:psc-bounces@python.org]On Behalf Of > Mark Hammond > Sent: Wednesday, August 06, 2003 5:21 PM > To: spambayes-dev@python.org; psc@python.org > Subject: [PSC] RE: "Donations" page for SpamBayes > > > Thanks for all the comments on the "Donations" page. I think I got them > all. > > The page is now online at http://www.spambayes.org/donations.html, and the > source .ht file is in the SpamBayes CVS tree. If any psc members > have more > comments, just send them to me. Spambayes-devers should just make the > change themself. > > Note that this page is not linked in anywhere yet. Once everyone seems > happy, I will create a few links. > > Thanks, > > Mark. From skip at pobox.com Wed Aug 6 20:48:09 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Aug 6 20:49:18 2003 Subject: [spambayes-dev] RE: "Donations" page for SpamBayes In-Reply-To: <041701c35c79$cd4fe7b0$f502a8c0@eden> References: <3F2FBA54.7060000@hotmail.com> <041701c35c79$cd4fe7b0$f502a8c0@eden> Message-ID: <16177.41417.6823.709306@montanaro.dyndns.org> Mark, The donations page looks good to me. Skip From T.A.Meyer at massey.ac.nz Thu Aug 7 14:04:35 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Aug 6 21:05:23 2003 Subject: [spambayes-dev] FAQ and donations Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302A9F6BA@its-xchg4.massey.ac.nz> > And slightly tweaking the text (ie, moving some of the "FAQ" > text to the donations page), and adding a link to the new page. > > It makes sense to me that (a) we include license details in > the same entry, and that (b) the entry be much closer to the > top of the FAQ. +1 =Tony Meyer From T.A.Meyer at massey.ac.nz Thu Aug 7 14:12:39 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Aug 6 21:13:25 2003 Subject: [spambayes-dev] FW: [PSC] RE: "Donations" page for SpamBayes Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302A9F6C6@its-xchg4.massey.ac.nz> > I think Kevin is correct - if underlines were turned off in > my Mozilla, the links would look very subtle. I guess this > is in our .css? Any takers? :) I've checked in a new colour/color. It's similar to the light purple that the sidebar ends at. It looks visible to me, but I'm not attached if someone likes another colour/color more :) =Tony Meyer From skip at pobox.com Wed Aug 6 21:45:10 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Aug 6 21:45:22 2003 Subject: [spambayes-dev] PGClassifier checked in Message-ID: <16177.44838.909642.578739@montanaro.dyndns.org> The storage module gained two new classes: SQLClassifier - a base class for people wishing to store their hammie info in SQL databases PGClassifier - a concrete implementation using the psycopg module to access a PostgreSQL database This code has a number of problems, not the least of which is that none of the other modules and scripts in the system know about it yet. For those of you not subscribed to spambayes-checkins, Here's the checkin message: ---------------------------------------------------------------------------- **** Danger, Will Robinson! Do not use the PGClassifier class yet! **** This is an initial stab at SQLClassifier and PGClassifier classes. This still needs a lot of work, to wit: * I've tried to break functionality into the two classes in such a way that adding other SQLClassifier subclasses should be reasonably easy, but I don't know much about writing portable SQL. Python's DB API helps, to be sure, but isn't perfect. * Scoring messages is dreadfully slow. I don't know if I'm commit()ing too frequently, creating too many cursors or if I have some other problem. My past use of SQL has generally been of the "scads of SELECTs per INSERT" sort of thing, so I've never paid a lot of attention to commit(). * I've encountered a couple bad cases. With the word column defined as bytea (PostgreSQL's binary string type), both of these calls fail if c is a cursor object: c.execute("select * from bayes where word=%s", ('report.\\n";',)) c.execute("select * from bayes where word=%s", ('reserved\x00',)) If the word column is defined as the more traditional varchar(128), the first call succeeds but the second still fails. ---------------------------------------------------------------------------- Skip From skip at pobox.com Wed Aug 6 21:45:54 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Aug 6 21:47:15 2003 Subject: [spambayes-dev] FAQ and donations In-Reply-To: <042601c35c7c$7b4bef60$f502a8c0@eden> References: <042601c35c7c$7b4bef60$f502a8c0@eden> Message-ID: <16177.44882.988137.66691@montanaro.dyndns.org> Mark> As part of linking the "Donations" page in, I should update the Mark> FAQ slightly.... Mark> Any objections? Go for it. ;-) S From skip at pobox.com Wed Aug 6 22:04:47 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Aug 6 22:05:01 2003 Subject: [spambayes-dev] What do you make of this spam? Message-ID: <16177.46015.503433.306766@montanaro.dyndns.org> What is up with this message? I get a fair number of them (just killed three or four) and SB nails them quite well, as you can see. I don't understand the motive for sending it though. Is it just a "ping" message to see if the email address is valid? Skip -------------- next part -------------- An embedded message was scrubbed... From: "guy smith" Subject: hey ghyc Date: Thu, 07 Aug 03 08:24:20 GMT Size: 1550 Url: http://mail.python.org/pipermail/spambayes-dev/attachments/20030806/7f90eab4/attachment.eml From mhammond at skippinet.com.au Thu Aug 7 13:31:09 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Aug 6 22:32:11 2003 Subject: [spambayes-dev] What do you make of this spam? In-Reply-To: <16177.46015.503433.306766@montanaro.dyndns.org> Message-ID: <049301c35c8b$f67e2ea0$f502a8c0@eden> > What is up with this message? I get a fair number of them > (just killed > three or four) and SB nails them quite well, as you can see. I don't > understand the motive for sending it though. Is it just a > "ping" message to > see if the email address is valid? Yeah - maybe just to collect the replies. Sometimes when a new Klez like worm hits, people get virus attachments from my email address (but not from me :) I am amazed at how many people reply with "I had trouble opening it - please resend it". These are people who would have absolutely no clue who I am, and they don't enquire, nor ask what it was I was sending them. Mark. From tim.one at comcast.net Wed Aug 6 23:56:58 2003 From: tim.one at comcast.net (Tim Peters) Date: Wed Aug 6 22:57:29 2003 Subject: [spambayes-dev] Very small change for composite word tokenizing In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302A9F64B@its-xchg4.massey.ac.nz> Message-ID: [Tony Meyer] > BTW, if anyone is wondering, the two false positives that are in > all of my results are: > > o A message from mtnsms.com telling me about the (then) new > smspop service. This remains my only email from mtnsms.com, > and it does look very spammy, but was definitely ham (in fact, > I now use smspop). It scored 0.950. > > o A message from a company doing a survey about how a transaction > with another company was. Again, looked a lot like spam, but > wasn't, and again the only message from this company that I've > received (AFAIK). It scored 0.966. > > I was expecting the second, so could retrieve it and this wasn't a > problem. The first one was a real FP, though. Any suggestions on > options I could fiddle to correct this (without creating lots of > other incorrect classified messages) would be of interest! :) I'm afraid there aren't any. The system has no intelligence whatsoever, it just counts tokens. It works so good 99.99% of the time it's easy to believe it must be smarter than that <0.9 wink>; but it isn't. In the first case, I take it you didn't also discuss sms in other ham. Those are the nasty FP for me: dealing with an online business in an area (product or service) I've never emailed about before. For example, I'm a smoker and buy my cigarettes over the web, but the only clue about that you'll find in my ham training set is seven(!) msgs from my death vendor -- it took that many to convince spambayes I really wanted those (but didn't want spam trying to sell me cigars). It's often the case that training on just one will knock the next into Unsure territory, but before the first you're stuck. Whitelists don't help this either, since you can't whitelist an address for a msg (like your first) you're not expecting. Another option is to hire a personal assistant to sort your email for you -- expect most people seem to want to hide that they live for farm porn spam . From T.A.Meyer at massey.ac.nz Thu Aug 7 18:22:24 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Aug 7 01:23:03 2003 Subject: [spambayes-dev] PGClassifier checked in Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3063@its-xchg4.massey.ac.nz> > The storage module gained two new classes: And now a third: mySQLClassifier - a concrete implementation using the MySQLdb module to access a mySQL database > This code has a number of problems, not the least of which is > that none of the other modules and scripts in the system know > about it yet. I still like your earlier suggestion, and that's what I implemented to test it. For example, in my config file: [Storage] persistent_storage_file:mysql::host=localhost dbname=bayes And in pop3proxy.py (if this does get used, some sort of central function that all the apps can use would be good, IMO). if self.useDB: if '::' in filename: available_sqls = {"mysql" : storage.mySQLClassifier, "pgsql" : storage.PGClassifier, } sql_type, rest = filename.split('::', 1) if available_sqls.has_key(sql_type.lower()): self.bayes = available_sqls[sql_type.lower()](filename) else: # raise some sort of InvalidClassifierError pass else: self.bayes = storage.DBDictClassifier(filename) ... [I needed to change your ":" to a "::" because Windows & MacOS<10 use ":" in filenames, whereas I think "::" is a no-no on both, and hopefully *nix users don't want to put their hammie.db file in a path with a "::"] > * I've tried to break functionality into the two classes > in such a way that adding other SQLClassifier subclasses should be > reasonably easy, I can certainly say that adding the mySQL subclass was very easy. It's possible that even more code could go into the base class - some of the mySQL functions are very similar to the PG ones. I'll let you figure that out ;) I'll leave the other things to those more experienced with SQL :) =Tony Meyer From mal at lemburg.com Thu Aug 7 10:41:49 2003 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu Aug 7 03:42:17 2003 Subject: [spambayes-dev] Re: [PSC] RE: "Donations" page for SpamBayes In-Reply-To: <041701c35c79$cd4fe7b0$f502a8c0@eden> References: <041701c35c79$cd4fe7b0$f502a8c0@eden> Message-ID: <3F3202BD.4010204@lemburg.com> Mark Hammond wrote: > Thanks for all the comments on the "Donations" page. I think I got them > all. > > The page is now online at http://www.spambayes.org/donations.html, and the > source .ht file is in the SpamBayes CVS tree. If any psc members have more > comments, just send them to me. Spambayes-devers should just make the > change themself. > > Note that this page is not linked in anywhere yet. Once everyone seems > happy, I will create a few links. Before making it public, you should probably test the donate button. Other than that, it looks OK. -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Aug 07 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ From T.A.Meyer at massey.ac.nz Thu Aug 7 21:12:24 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Aug 7 04:13:16 2003 Subject: [spambayes-dev] Re: [PSC] RE: "Donations" page for SpamBayes Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3085@its-xchg4.massey.ac.nz> > Before making it public, you should probably test the donate > button. Other than that, it looks OK. Heh. I nominate Mark to spend his money testing it . =Tony Meyer From mhammond at skippinet.com.au Thu Aug 7 23:47:15 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Aug 7 08:47:29 2003 Subject: [spambayes-dev] Re: [PSC] RE: "Donations" page for SpamBayes In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3085@its-xchg4.massey.ac.nz> Message-ID: <013601c35ce2$09937990$f502a8c0@eden> [Tony] > > Before making it public, you should probably test the donate > > button. Other than that, it looks OK. > > Heh. I nominate Mark to spend his money testing it . Hah! Bet you didn't think I would! :) I hope everyone appreciates that I have now donated nearly *four* local dollars to the PSF on behalf of SpamBayes. $US2 goes a long way. And-you-damn-well-better-spend-it-wisely , Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 1764 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030807/db710129/winmail-0001.bin From mal at lemburg.com Thu Aug 7 15:57:05 2003 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu Aug 7 08:57:41 2003 Subject: [spambayes-dev] Re: [PSC] RE: "Donations" page for SpamBayes In-Reply-To: <013501c35ce2$08649680$f502a8c0@eden> References: <013501c35ce2$08649680$f502a8c0@eden> Message-ID: <3F324CA1.1020500@lemburg.com> Mark Hammond wrote: > [Tony] > >>>Before making it public, you should probably test the donate >>>button. Other than that, it looks OK. >> >>Heh. I nominate Mark to spend his money testing it . > > > Hah! Bet you didn't think I would! :) I hope everyone appreciates that I > have now donated nearly *four* local dollars to the PSF on behalf of > SpamBayes. $US2 goes a long way. > > And-you-damn-well-better-spend-it-wisely , That's one free beer at the next PyCon event ;-) Looks like everything worked just fine. -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Aug 07 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ From altis at semi-retired.com Thu Aug 7 10:07:23 2003 From: altis at semi-retired.com (Kevin Altis) Date: Thu Aug 7 12:00:38 2003 Subject: [spambayes-dev] Re: [PSC] RE: "Donations" page for SpamBayes In-Reply-To: <013501c35ce2$08649680$f502a8c0@eden> Message-ID: Since the donations link is working, I'll go ahead and email Jon Udell and let him know there is a way of giving back. The donate page should at least get a mention on his blog and perhaps a mention on one of his InfoWorld articles. If there has been any other high-visibility coverage of SpamBayes, those writers should be contacted as well. ka > From: Mark Hammond > Sent: Thursday, August 07, 2003 5:47 AM > To: 'Meyer, Tony' > Cc: psc@python.org; spambayes-dev@python.org > Subject: RE: [spambayes-dev] Re: [PSC] RE: "Donations" page for > SpamBayes > > > [Tony] > > > Before making it public, you should probably test the donate > > > button. Other than that, it looks OK. > > > > Heh. I nominate Mark to spend his money testing it . > > Hah! Bet you didn't think I would! :) I hope everyone appreciates that I > have now donated nearly *four* local dollars to the PSF on behalf of > SpamBayes. $US2 goes a long way. > > And-you-damn-well-better-spend-it-wisely , > > Mark. From popiel at wolfskeep.com Thu Aug 7 10:32:20 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Thu Aug 7 12:32:24 2003 Subject: [spambayes-dev] Very small change for composite word tokenizing. In-Reply-To: Message from "Meyer, Tony" of "Thu, 07 Aug 2003 11:39:14 +1200." <1ED4ECF91CDED24C8D012BCF2B034F1302A9F632@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1302A9F632@its-xchg4.massey.ac.nz> Message-ID: <20030807163220.126A92DEB4@cashew.wolfskeep.com> In message: <1ED4ECF91CDED24C8D012BCF2B034F1302A9F632@its-xchg4.massey.ac.nz> "Meyer, Tony" writes: > >(I changed "yield w" to "yield 'compound:' + w") > >filename: august_no_seans kennys > august_seans >ham:spam: 7900:15260 7900:15260 > 7900:15260 >fp total: 2 2 2 >fp %: 0.03 0.03 0.03 >fn total: 176 172 174 >fn %: 1.15 1.13 1.14 >unsure t: 501 499 491 >unsure %: 2.16 2.15 2.12 >real cost: $296.20 $291.80 $292.20 >best cost: $489.60 $488.80 $485.00 >h mean: 0.63 0.62 0.61 >h sdev: 4.84 4.81 4.80 >s mean: 94.52 94.57 94.56 >s sdev: 18.67 18.56 18.58 >mean diff: 93.89 93.95 93.95 >k: 3.99 4.02 4.02 > >Interesting. FN's are better than not doing anything with the compound >words, but not as good as with just the word. Unsures, however, are >even better. I might try this on a different corpus and see how it goes >there. Here's my results: filename: normal fragment compound ham:spam: 1978:6166 1978:6166 1978:6166 fp total: 1 1 1 fp %: 0.05 0.05 0.05 fn total: 25 28 25 fn %: 0.41 0.45 0.41 unsure t: 152 172 154 unsure %: 1.87 2.11 1.89 real cost: $65.40 $72.40 $65.80 best cost: $41.80 $44.20 $41.40 h mean: 0.27 0.25 0.26 h sdev: 3.80 3.71 3.76 s mean: 98.66 98.51 98.65 s sdev: 8.56 8.97 8.51 mean diff: 98.39 98.26 98.39 k: 7.96 7.75 8.02 The 'compound:' modifier on the generated tokens makes the fragmentation code neutral for me, again. - Alex From jm at jmason.org Thu Aug 7 12:20:43 2003 From: jm at jmason.org (Justin Mason) Date: Thu Aug 7 14:21:15 2003 Subject: [spambayes-dev] testing tweaks Message-ID: <20030807182048.A95B516F18@jmason.org> Hey SBers, Have you guys considered testing how a tweak effects DB size -- ie. including that in the test results output? I find that's a pretty major factor in a lot of cases in SpamAssassin. cheers, --j. From popiel at wolfskeep.com Thu Aug 7 12:56:59 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Thu Aug 7 14:57:04 2003 Subject: [spambayes-dev] testing tweaks In-Reply-To: Message from jm@jmason.org (Justin Mason) of "Thu, 07 Aug 2003 11:20:43 PDT." <20030807182048.A95B516F18@jmason.org> References: <20030807182048.A95B516F18@jmason.org> Message-ID: <20030807185700.02EC52DEB4@cashew.wolfskeep.com> In message: <20030807182048.A95B516F18@jmason.org> jm@jmason.org (Justin Mason) writes: > > Hey SBers, > >Have you guys considered testing how a tweak effects DB size -- ie. including >that in the test results output? I find that's a pretty major factor in >a lot of cases in SpamAssassin. We've looked at DB size a couple times in the past, but some of the complicating factors of this are that the actual DB size (as opposed to token counts) is highly dependent on what sort of backend you use, and people have very different thresholds for what is acceptable space usage. As a result, it's very difficult to get any consensus on what sort of DB size behaviour is acceptable. Add into the mix that the largest effector of DB size is training style... and no two of us use the same style, and there's little support been made for simulating the training styles of different people for testing. (There's the bare beginning of a framework in the incremental stuff I did, but there's insufficient training rules built for simulating different styles.) I personally am happy to give a couple gigabytes to training data (aka my historical mail record... I never delete any mail anymore), and up to about 50 megabytes to the live database (it's currently bounded at about 20 megabytes by my training style). I'm sure that Tim's sister would have different priorities. So yes, we've considered it, but only barely, and not recently. This is one area where theory has fallen to lassitude. - Alex From tim.one at comcast.net Thu Aug 7 19:48:11 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Aug 7 18:48:46 2003 Subject: [spambayes-dev] testing tweaks In-Reply-To: <20030807182048.A95B516F18@jmason.org> Message-ID: [Justin Mason] > Have you guys considered testing how a tweak effects DB size -- ie. > including that in the test results output? I find that's a pretty > major factor in a lot of cases in SpamAssassin. I paid a lot of attention in the early days, since I was running training sets with tens of thousands of messages, and used an entirely in-memory Python dict to hold all the stats. Most gimmicks didn't make a difference worth noting. There is one hack in our tokenizer to reduce database size: tokens exceeding 12 characters are replaced by a synthesized token just recording the first character, and floor(len(token)/10)*10. Testing showed that recording "long tokens" in full didn't make any difference to results, but bloated the database with many fat hapaxes. In effect, then, no matter what the other tokenization gimmicks, we don't create tokens with more than 12 characters, and create a number of tokens approximately equal to the number of non-whitespace runs in the message. The option replace_nonascii_chars is also very effective at reducing database size (it replaces each high-bit and control byte with a question mark), and actually helps English-speaking users nail Asian spam. It would also presumably murder Asian ham, but that's not a problem I have . That option is off by default in the codebase, but on by default in the Outlook addin. Other gimmicks we don't use had huge effects on database size. Character 5-grams were murder on database size. They also performed worse, so dropping them was no pain. Schemes also looking at token pairs (bigrams) more than doubled the database size. If I ever get time for it, I'd like to pursue a specific mixed unigram-bigram scheme worked out with Gary Robinson. For example, given "penis size", that can be viewed as a bigram, or as two unigrams, or as two unigrams *and* a bigram. The last choice isn't so good because it systematically creates highly correlated clues, which leads to mistakes that don't make sense to a human eye (I'll claim that experienced spambayes users are sympathetic to the mistakes it makes -- spambayes judgments are "intuitive", in some real sense). But with enough effort, it's possible to "tile" a message with non-overlapping unigrams and bigrams, so that each token contributes to exactly one scored entity. The trick is to do this in a way that maximizes the overall strength of the entities that get scored. So, for example, and simplifying too much, if the bigram "penis size" has a spamprob closer to 0.0 or 1.0 than either of the unigrams "penis" and "size", view it as a bigram; but if "penis" has a spamprob closer to 0.0 or 1.0 than "penis size", view it as two unigrams instead. I only had time to run a few tests on that, and it looked very promising, learning faster than our current pure-unigram scheme, and doing at least as well on all error measures. It was (of course) slower to score, and the database more than doubled in size. For my own use, it would have been worth it, since my personal databases are still relatively tiny (about 1,000 training msgs total), and the code runs too fast for me to notice it now. I suspect, but don't know, that this mixed scheme would do significantly better on short messages. From altis at semi-retired.com Thu Aug 7 17:18:38 2003 From: altis at semi-retired.com (Kevin Altis) Date: Thu Aug 7 19:11:53 2003 Subject: [spambayes-dev] Re: [PSC] RE: "Donations" page for SpamBayes In-Reply-To: Message-ID: How's this for progress?! http://weblog.infoworld.com/udell/2003/08/07.html#a771 ka p.s. In case nobody notices that Nancy Tindle and I have the same address, that SpamBayes donation to the PSF was from us, she wears the PayPal pants in the family. I bet nobody thought to calculate family rankings into the donations page. ;-) http://www.egenix.com/files/python/psf-donations.html > -----Original Message----- > From: Kevin Altis > > Since the donations link is working, I'll go ahead and email Jon Udell and > let him know there is a way of giving back. The donate page > should at least > get a mention on his blog and perhaps a mention on one of his InfoWorld > articles. If there has been any other high-visibility coverage of > SpamBayes, > those writers should be contacted as well. > > ka > > > From: Mark Hammond > > Sent: Thursday, August 07, 2003 5:47 AM > > To: 'Meyer, Tony' > > Cc: psc@python.org; spambayes-dev@python.org > > Subject: RE: [spambayes-dev] Re: [PSC] RE: "Donations" page for > > SpamBayes > > > > > > [Tony] > > > > Before making it public, you should probably test the donate > > > > button. Other than that, it looks OK. > > > > > > Heh. I nominate Mark to spend his money testing it . > > > > Hah! Bet you didn't think I would! :) I hope everyone > appreciates that I > > have now donated nearly *four* local dollars to the PSF on behalf of > > SpamBayes. $US2 goes a long way. > > > > And-you-damn-well-better-spend-it-wisely , > > > > Mark. From tim.one at comcast.net Thu Aug 7 20:20:44 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Aug 7 19:21:16 2003 Subject: [spambayes-dev] Cool Outlook mystery Message-ID: Our bug 782709 is pretty interesting! Tony just added a good clue to it. I'll partly confirm it here, and add another bit of evidence. After retraining and rescoring from scratch, there's a particular msg in my Ham folder showing a spam score of 3% in my Spam column. "show spam clues" rates it much higher: Spam Score: 0.180576 word spamprob #ham #spam '*H*' 0.722595 - - '*S*' 0.083747 - - Some of the token scores are amazing: 'to:no real name:2**0' 0.342745 7 7 'header:To:1' 0.398161 7 9 'to:2**0' 0.398161 7 9 'header:Date:1' 0.64742 1 4 'header:Message-Id:1' 0.764668 0 1 'subject:.' 0.764668 0 1 'subject: ' 0.846122 0 2 'header:From:1' 0.871695 1 16 Notice I said this was a ham message, and I trained on it as ham. Therefore it shouldn't be possible that I see *any* token (let alone 3) in this message with a ham-count of 0. I've certainly got, e.g., way more than 1+4=5 training messages with a Date header too, and way more than 16 with a "To" header, etc. In my professional opinion, something is royally hosed . My observations so far match Tony's that it's confined to tokens in headers, so it's probably not a database bug. From jm at jmason.org Thu Aug 7 18:12:33 2003 From: jm at jmason.org (Justin Mason) Date: Thu Aug 7 20:12:51 2003 Subject: [spambayes-dev] testing tweaks In-Reply-To: Message-ID: <20030808001238.F404C16F0E@jmason.org> Tim Peters writes: > If I ever get time for it, I'd like to pursue a specific mixed > unigram-bigram scheme worked out with Gary Robinson. For example, given > "penis size", that can be viewed as a bigram, or as two unigrams, or as two > unigrams *and* a bigram. The last choice isn't so good because it > systematically creates highly correlated clues, which leads to mistakes that > don't make sense to a human eye (I'll claim that experienced spambayes users > are sympathetic to the mistakes it makes -- spambayes judgments are > "intuitive", in some real sense). But with enough effort, it's possible to > "tile" a message with non-overlapping unigrams and bigrams, so that each > token contributes to exactly one scored entity. The trick is to do this in > a way that maximizes the overall strength of the entities that get scored. > So, for example, and simplifying too much, if the bigram "penis size" has a > spamprob closer to 0.0 or 1.0 than either of the unigrams "penis" and > "size", view it as a bigram; but if "penis" has a spamprob closer to 0.0 or > 1.0 than "penis size", view it as two unigrams instead. > > I only had time to run a few tests on that, and it looked very promising, > learning faster than our current pure-unigram scheme, and doing at least as > well on all error measures. It was (of course) slower to score, and the > database more than doubled in size. For my own use, it would have been > worth it, since my personal databases are still relatively tiny (about 1,000 > training msgs total), and the code runs too fast for me to notice it now. I > suspect, but don't know, that this mixed scheme would do significantly > better on short messages. That's interesting -- it's like the idea of "decomposing" tokens and using the strongest output of the result. e.g. for "Free!", decompose that to "free!" "Free" "free" and use the strongest result of those 4 lookups. Yeah, I'm interested because I'd be pretty sure that compound-word breakup tweak would increase db size, but that doesn't seem to be mentioned... --j. From tim.one at comcast.net Fri Aug 8 00:31:06 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Aug 7 23:31:43 2003 Subject: [spambayes-dev] RE: [Spambayes-checkins] spambayes/Outlook2000 msgstore.py, 1.61, 1.62 In-Reply-To: Message-ID: [Mark Hammond] > Modified Files: > msgstore.py > Log Message: > Fix [ 782709 ] not match between actual score and what's shown in > outlook We can't trust potentially large properties in the data used > to create the msg object. Thanks Tim, Tony, everyone. Excellent, Mark! I confirm that all the (subtle, unless you're looking for them) symptoms went away for me. This had a major-league good effect on my score distributions too: I've been mildly puzzled for a long time that the scores-after-training in my ham and spam Outlook data had much higher variance than in standalone non-Outlook tests. Now I suspect my modest 1,000-msg training database is much bigger than I really need <0.9 wink>. BTW, the ham msg I posted about before, scoring 0.18 or 0.03 (depending on where you looked), now scores a much more satisfying 0.000972835. In effect, the Outlook addin has been acting much like a body-only classifier? Wow. No wonder I had to keep training Laura Creighton's two-liners as ham . From mhammond at skippinet.com.au Sat Aug 9 16:27:04 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sat Aug 9 01:27:05 2003 Subject: [spambayes-dev] Merge outlook dialog branch soon Message-ID: <000501c35e36$def8ad30$f502a8c0@eden> After a few days to see if Outlook-007 has fatal flaws, I intend merging the Outlook dialog branch onto the trunk. This will mean the next binary will come with this new code. Any objections or suggestions? Mark. From skip at pobox.com Sat Aug 9 13:32:09 2003 From: skip at pobox.com (Skip Montanaro) Date: Sat Aug 9 13:32:16 2003 Subject: [spambayes-dev] RE: [Spambayes] Loosing spam database? In-Reply-To: <019501c35e12$d84c57d0$f502a8c0@eden> References: <019501c35e12$d84c57d0$f502a8c0@eden> Message-ID: <16181.12313.339459.9415@montanaro.dyndns.org> >>>>> "Mark" == Mark Hammond writes: Mark> It looks like you have a very old version of the program. Maybe we should set sys.excepthook to a function which dumps version info when dumping a traceback. That obviously wouldn't help with people running old(er) versions, though we'd be able to tell they were not running the latest. Skip From eloff at helpmygame.com Sat Aug 9 13:14:07 2003 From: eloff at helpmygame.com (Daniel Eloff) Date: Sat Aug 9 15:14:24 2003 Subject: [spambayes-dev] Excellent program Message-ID: A really excelent approach to the problem of spam. I'm going through the classification code and I came across two things I don't understand so far. I'm trying to understand what these lines of code do in the classifier.py and chi2.py files: (yesterday was my first brush with the python language). in chi2_spamprob: clues = self._getclues(wordstream) for prob, word, record in clues here we first see prob which is used extensively throughout the function. It's obviosuly very key to understanding the function, but I have no idea what value it is, or why for that matter. word and record don't seem to be used... and in chi2Q: assert v & 1 == 0 What's this statment do? (I'm familair with assert statments, but not what's (v & 1) == 0 mean? Thanks to anybody who can help me understand this. -- -Daniel Eloff From tim.one at comcast.net Sat Aug 9 16:39:45 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Aug 9 15:40:18 2003 Subject: [spambayes-dev] Excellent program In-Reply-To: Message-ID: [Daniel Eloff] > ... > I'm trying to understand what these lines of code do in the > classifier.py and chi2.py files: > (yesterday was my first brush with the python language). Work your way thru the Python Tutorial, then: it will save a ton of frustration. The Tutorial that comes with Python is most suitable for experienced (in some other language) programmers. > in chi2_spamprob: > > clues = self._getclues(wordstream) > for prob, word, record in clues > > here we first see prob which is used extensively > throughout the function. It's obviosuly very key to understanding > the function, but I have no idea what value it is, or why > for that matter. word and record don't seem to be used... You need to learn more about Python first. Then when I say that the loop iterates over a sequence of 3-tuples, you won't have to stop and wonder about each word too. Crawl first, run later . > and in chi2Q: > > assert v & 1 == 0 > > What's this statment do? (I'm familair with assert statments, but not > what's (v & 1) == 0 mean? Same thing as in C (except for operator precedence): it's asserting that the last bit in v is 0, or, IOW, that v is an even integer. This means the same thing: assert v % 2 == 0 From mhammond at skippinet.com.au Sun Aug 10 10:21:36 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sat Aug 9 19:21:29 2003 Subject: [spambayes-dev] ComputerWorld SpamBayes articles Message-ID: <01f001c35ecc$fb5cad00$f502a8c0@eden> Thanks Larry! I hadn't seen it. I hope you don't mind me forwarding this... Mark. -----Original Message----- From: Larry Fresinski Sent: Saturday, 9 August 2003 9:30 PM To: Mark Hammond Subject: RE: [Spambayes-announce] ANNOUNCE: Version 007 of the Outlook pluginavailable, and donations scheme up and running Mark, If you haven't seen it yet, I've told ComputerWorld about SPAMbayes work. It's in there Aug. 4 edition and online here... http://www.computerworld.com/softwaretopics/software/groupware/story/0,10801 ,83689,00.html http://www.computerworld.com/softwaretopics/software/groupware/story/0,10801 ,83684,00.html -Larry From mhammond at skippinet.com.au Sun Aug 10 18:27:39 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun Aug 10 03:27:29 2003 Subject: [spambayes-dev] Merge outlook dialog branch soon In-Reply-To: <000501c35e36$def8ad30$f502a8c0@eden> Message-ID: <000001c35f10$e1dd5480$f502a8c0@eden> > After a few days to see if Outlook-007 has fatal flaws, I > intend merging the > Outlook dialog branch onto the trunk. This will mean the > next binary will > come with this new code. Done! The branch is dead. Thanks, Mark. From T.A.Meyer at massey.ac.nz Mon Aug 11 14:23:59 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Aug 10 21:24:33 2003 Subject: [spambayes-dev] ComputerWorld SpamBayes articles Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF36F3@its-xchg4.massey.ac.nz> > If you haven't seen it yet, I've told ComputerWorld about > SPAMbayes work. It's in there Aug. 4 edition and online here... Is there any way we can stop people saying that SpamBayes doesn't work with Outlook Express? Sure the Outlook plugin doesn't (which is probably why it's called an Outlook plugin ;), but the other tools do. Is the stuff on our website not clear enough, perhaps? Or are journalists just too lazy to check what they publish? =Tony Meyer From mhammond at skippinet.com.au Mon Aug 11 12:49:19 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun Aug 10 21:49:22 2003 Subject: [spambayes-dev] RE: [Spambayes] does SpamBayes work with Outlook Rules Wizard ? In-Reply-To: <16182.61558.922112.812084@montanaro.dyndns.org> Message-ID: <006101c35faa$c8604fe0$f502a8c0@eden> [CC-ing spambayes-dev and dropping spambayes] Skip: > Actually, if you want something to run before Outlook, use > pop3proxy or > imapfilter. The interface is web-based, so it's obviously > not going to be > tightly integrated with Outlook, but it's functional. > > > > Another possibility might be to combine the best of the > Outlook and the > pop3proxy. When Outlook starts, you could fire up a proxy > (like the core of > pop3proxy) and reconfigure Outlook to get mail from the proxy > and the proxy > to get mail from the real POP server, restoring the POP3 > settings upon exit. > The user interface would still be embedded in Outlook, but > would control the > proxy via XML-RPC. This would separate the UI from the proxy engine > completely, allowing the proxy engine to be reused with plugins for > different mailers. > > I think that is actually a great long term idea. By splitting the "proxy" part out well enough, we could still move the Spam and Unsure to Outlook folders, and keep the same level of integration. This wouldn't be easy, but would be a nice feature. Note that the new Outlook dialogs are already using the "OptionsClass" objects, in much the same way as the existing web interface does. However, I'm not going to drive any effort like this for at least a few months! I've got to get things as they stand now back to a managable level. Mark. From mhammond at skippinet.com.au Mon Aug 11 12:55:09 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun Aug 10 21:55:05 2003 Subject: [spambayes-dev] ComputerWorld SpamBayes articles In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF36F3@its-xchg4.massey.ac.nz> Message-ID: <006201c35fab$9921c780$f502a8c0@eden> > Is there any way we can stop people saying that SpamBayes doesn't work > with Outlook Express? Probably by removing the *bold* text on my download page that says "it does not work with Outlook express" . However, that is in the context of the addin, and is qualified in the next sentence, and elsewhere on the page. I'd be happy to move this to sourceforge, but keep giving up when I have to define whatever it is that sourceforge asks me to define to release a file. If someone set up everything and mailed me the 2 steps I would need to go through to do it via sourceforge, I promise I would do them Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 1880 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030811/04d4f2fa/winmail.bin From T.A.Meyer at massey.ac.nz Mon Aug 11 15:14:33 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Aug 10 22:15:15 2003 Subject: [spambayes-dev] does SpamBayes work with OutlookRules Wizard ? Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF375D@its-xchg4.massey.ac.nz> > I think that is actually a great long term idea. By > splitting the "proxy" part out well enough, we could still > move the Spam and Unsure to Outlook folders, and keep the > same level of integration. Would you then have Exchange and Hotmail proxies as well? Or still do filtering for them in the same way? Does this fit with the 'generic filter' Mark/Sean idea at all? =Tony Meyer From tim.one at comcast.net Mon Aug 11 00:20:40 2003 From: tim.one at comcast.net (Tim Peters) Date: Sun Aug 10 23:21:13 2003 Subject: [spambayes-dev] ComputerWorld SpamBayes articles In-Reply-To: <006201c35fab$9921c780$f502a8c0@eden> Message-ID: [Mark Hammond] > ... > I'd be happy to move this to sourceforge, but keep giving up when I > have to define whatever it is that sourceforge asks me to define to > release a file. If someone set up everything and mailed me the 2 > steps I would need to go through to do it via sourceforge, I > promise I would do them I'm afraid it can't be reduced to two steps. Alas, it's one of those things that's easy to do after you've done it, but takes forever to explain due to the sheer number of buttons you have to click all over different screens; the workflow for an SF file release is plain convoluted, but not truly difficult. If you'd like to release files from SF, I'd be happy to help -- I used to do PLabs Python releases all the time on SF, and don't have to think about it. Your part would be to upload the installer, via anonymous ftp, to the incoming directory at upload.sf.net, then tell me the name of the uploaded file. It takes about 10 steps after that to release the file, but they only take about 2 minutes total (provided you don't have to think for 5 minutes each 10 times before each step ). -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 1036 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030810/abe44f99/winmail.bin From mhammond at skippinet.com.au Mon Aug 11 14:25:52 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun Aug 10 23:25:52 2003 Subject: [spambayes-dev] does SpamBayes work with OutlookRules Wizard ? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF375D@its-xchg4.massey.ac.nz> Message-ID: <00d401c35fb8$456e8d00$f502a8c0@eden> > > I think that is actually a great long term idea. By > > splitting the "proxy" part out well enough, we could still > > move the Spam and Unsure to Outlook folders, and keep the > > same level of integration. > > Would you then have Exchange and Hotmail proxies as well? Or still do > filtering for them in the same way? Yeah, we could not proxy them effectively. But yeah, we could still use both techniques. But then we are back where we started for a significant number of users. > Does this fit with the 'generic filter' Mark/Sean idea at all? No impact at all really. Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 1868 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030811/d06edf29/winmail.bin From T.A.Meyer at massey.ac.nz Mon Aug 11 16:50:39 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Aug 10 23:51:22 2003 Subject: [spambayes-dev] testing tweaks Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF37D9@its-xchg4.massey.ac.nz> [Tim explains his & Gary's uni/bigram idea] > I only had time to run a few tests on that, and it looked very > promising, You're right about that! I had a little play around with this idea over the weekend and it certainly improves the results. I was lazy, so I did this the easiest way (well, what seemed the easiest way), producing *token* bigrams, rather than *word* bigrams. This means that "This is a test" produces "This", "test" and "This test", since "is" and "a" don't generate tokens. It also means that our synthetic tokens become part of bigrams (so a bigram could be skip information, or headers and so on). Whether it's better or worse than word bigrams, I don't know (that's what testing is for!). Also as a result of the laziness, I left in the circular bigram that was created of the last token and first token; since the first token is likely to be fairly constant, I doubt this makes much difference. Here are preliminary results [using "timtest.py -n5"]. The two columns with "fresh" in the filename are results with a fresh-from-cvs spambayes. The "tim1s" column are results where I mistakenly allowed duplicate tokens to be generated (if a token had a stronger difference than both the bigram with the previous token and the next token then it is used twice). The "tim2" columns are with this mistake removed, and the "tim3" columns are like tim2, but also with Kenny's variant of Sean's split_compound_words idea enabled. filename: sa_freshs sa_tim3s tim1s tim2s tim3s sa_tim2s freshs ham:spam: 7580:7580 7580:7580 7900:15260 7900:15260 7580:7580 7900:15260 7900:15260 fp total: 44 47 47 2 2 2 2 fp %: 0.58 0.62 0.62 0.03 0.03 0.03 0.03 fn total: 16 12 13 176 94 128 127 fn %: 0.21 0.16 0.17 1.15 0.62 0.84 0.83 unsure t: 356 315 320 501 497 482 500 unsure %: 2.35 2.08 2.11 2.16 2.15 2.08 2.16 real cost: $527.20 $545.00 $547.00 $296.20 $213.40 $244.40 $247.00 best cost: $592.40 $843.20 $825.20 $489.60 $379.20 $402.20 $416.40 h mean: 3.40 4.07 4.07 0.63 1.19 0.92 0.94 h sdev: 14.19 15.55 15.49 4.84 7.05 5.98 6.09 s mean: 97.94 98.76 98.74 94.52 96.23 96.02 95.99 s sdev: 9.43 7.80 7.88 18.67 14.79 15.54 15.64 mean diff: 94.54 94.69 94.67 93.89 95.04 95.10 95.05 k: 4.00 4.06 4.05 3.99 4.35 4.42 4.37 So a *big* win on the second set (which is from my actual mail; the other corpus is based on the SpamAssassin public corpus) in terms of fn's. In fact the mistake variant did best - almost halving the number of fn's. Not sure about the first set - 3 more fp's, but 3 fewer fn's and quite a drop in the number of unsures. I care more about the second set, anyway. My (bsddb based) databases ballooned from about 1.5MB to about 10MB, but what do I care? Although the second set was all from my actual mail, the training set I use is much smaller - about 400 ham and 4000 spam (a crazy imbalance, but it works...). These results are from this smaller set, using "timtest.py -n3", first without the adjustment, and then with. filename: reals real_tims real_tim_adjs real_adjs ham:spam: 754:8884 754:8884 754:8884 754:8884 fp total: 0 0 0 0 fp %: 0.00 0.00 0.00 0.00 fn total: 193 72 583 455 fn %: 2.17 0.81 6.56 5.12 unsure t: 638 470 541 438 unsure %: 6.62 4.88 5.61 4.54 real cost: $320.60 $166.00 $691.20 $542.60 best cost: $316.00 $197.20 $435.40 $391.60 h mean: 2.88 5.94 1.09 1.39 h sdev: 11.20 16.56 5.79 6.87 s mean: 92.54 95.98 84.92 87.91 s sdev: 21.09 14.86 32.43 29.64 mean diff: 89.66 90.04 83.83 86.52 k: 2.78 2.87 2.19 2.37 Again, a clear win for me (although the ham mean does jump up quite a bit). =Tony Meyer From tim.one at comcast.net Mon Aug 11 01:23:01 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Aug 11 00:23:35 2003 Subject: [spambayes-dev] Outlook release on SourceForge Message-ID: [pyton-dev'ers, read the last paragraph and weep ] I created a "Outlook Addin" package on the spambayes file release page, and added Mark's 0.7 installer to it: http://sf.net/project/showfiles.php?group_id=61702 I also ripped off much of the text from Mark's Starship page (this is what you get to if you click on the "Version 0.7" on the page above): http://sf.net/project/shownotes.php?release_id=177282 Mark, if you object to any of this, don't be shy! Meaningless statistics : Project UNIX name: spambayes Registered: 2002-09-03 22:33 Activity Percentile (last week): 99.4015% "Activity ratings" on SF are computed by an arcane formula, but generally speaking the higher the percentile the more "active" a project is. With the percentile above, this is how many SF projects rank below spambayes: >>> round(66620 * 0.994015) 66221.0 >>> In particular, we rank higher than Python(!) now, which currently sits at 99.0583%. Python used to be among the 10 most active projects every week, but lost a lot when it stopped releasing files from SF (downloads count a lot toward the activity ranking). It's possible that downloads of the Outlook addin could boost spambayes to that lofty level. It's also possible that I'll get a job working on Python someday . From popiel at wolfskeep.com Sun Aug 10 23:05:10 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Mon Aug 11 01:05:14 2003 Subject: [spambayes-dev] testing tweaks In-Reply-To: Message from "Meyer, Tony" of "Mon, 11 Aug 2003 15:50:39 +1200." <1ED4ECF91CDED24C8D012BCF2B034F1302BF37D9@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1302BF37D9@its-xchg4.massey.ac.nz> Message-ID: <20030811050510.9AE5C2DDF1@cashew.wolfskeep.com> In message: <1ED4ECF91CDED24C8D012BCF2B034F1302BF37D9@its-xchg4.massey.ac.nz> "Meyer, Tony" writes: >[Tim explains his & Gary's uni/bigram idea] >Here are preliminary results [using "timtest.py -n5"]. Hey, where's the patch? It's kind of hard to generate corroborating evidence without a patch... - Alex From T.A.Meyer at massey.ac.nz Mon Aug 11 18:52:34 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Aug 11 01:53:18 2003 Subject: [spambayes-dev] testing tweaks Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF387F@its-xchg4.massey.ac.nz> > Hey, where's the patch? It's kind of hard to generate > corroborating evidence without a patch... Good point . Attached are "diff -u"s - is that right? Anyone wise in the ways of Python is welcome to point out the inefficiencies in the code; I'm happy to learn :) =Tony Meyer -------------- next part -------------- A non-text attachment was scrubbed... Name: classifier.diff Type: application/octet-stream Size: 3777 bytes Desc: classifier.diff Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030811/b9a2f1d3/classifier.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: msgs.diff Type: application/octet-stream Size: 324 bytes Desc: msgs.diff Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030811/b9a2f1d3/msgs.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: Tester.diff Type: application/octet-stream Size: 1828 bytes Desc: Tester.diff Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030811/b9a2f1d3/Tester.obj From mhammond at skippinet.com.au Mon Aug 11 17:29:57 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Aug 11 02:29:52 2003 Subject: [spambayes-dev] RE: Outlook release on SourceForge In-Reply-To: Message-ID: <01af01c35fd1$fc66b960$f502a8c0@eden> > Mark, if you object to any of this, don't be shy! Sounds good to me - thanks! In the meantime, I changed the link on my starship page to the sourceforge download URL - http://prdownloads.sourceforge.net/spambayes/SpamBayes-Outlook-Setup-007.exe ?download - is there any evidence this will or will not have the same effect as sending then to the "file releases" page? Let's-beat-those-damn-pythoneers ly, Mark. From anthony at interlink.com.au Mon Aug 11 18:00:01 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Aug 11 03:00:16 2003 Subject: [spambayes-dev] RE: Outlook release on SourceForge In-Reply-To: <01af01c35fd1$fc66b960$f502a8c0@eden> Message-ID: <200308110700.h7B701kk022337@localhost.localdomain> >>> "Mark Hammond" wrote > Sounds good to me - thanks! In the meantime, I changed the link on my > starship page to the sourceforge download URL - > http://prdownloads.sourceforge.net/spambayes/SpamBayes-Outlook-Setup-007.exe > ?download - is there any evidence this will or will not have the same effect > as sending then to the "file releases" page? Geez - I had a brief look at the starship http logs, and according to them, there's been 6400 downloads of the 006 installer in the first 9 days of August alone. If Mark can get all of those users to kick in $0 each, then pretty soon he might fail to be very very wealthy. Ah well, you can always sell off the excess goodwill generated to someone like Microsoft. Anthony. From mark.winder4 at btinternet.com Mon Aug 11 11:30:11 2003 From: mark.winder4 at btinternet.com (Mark Winder) Date: Mon Aug 11 05:28:29 2003 Subject: [spambayes-dev] Newbie is well pythoned! Help! Message-ID: <000801c35feb$2a493560$0200a8c0@bigdaddy> Hi, I tried to use Spambayes but have fallen at the first post. From what I've read I could easily use it if you gave me only a tincy wincy bit more info. It may not in fact be your fault that I've failed. I'd like to use with Outlook Express under Win98 SE, so I've installed the Python 2.3 using the windows installer. I accepted all the defaults, and I also looked at the advanced settings that said make file associations with .py etc etc, it was checked. I then downloaded and unpacked the Spambayes source code to C:\temp so that it was in the directory c:\temp\spambyes-1.0-a4 Next, according to your instructions you say Once you've downloaded and unpacked the source archive, do the regular setup.py build; setup.py install dance, Well typing setup.py at the dos prompt produced "bad command or filename" Clicking on it seemed to do something, but its not clear what. After this I'm stuck. As you will have gathered, I don't know python. I alost ried entering the python GUI, but this didn't help. Can you give me a clue ? regards, Mark Winder. . --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.507 / Virus Database: 304 - Release Date: 04/08/03 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20030811/9e9cde9d/attachment.htm From paulhar at netapp.com Mon Aug 11 15:04:56 2003 From: paulhar at netapp.com (Hargreaves, Paul) Date: Mon Aug 11 08:05:33 2003 Subject: [spambayes-dev] Correction and new question to the FAQ Message-ID: <765B6B38B4D29D498077F8E644E23B7F01A1CE04@nlhoe2k02.europe.netapp.com> "3.10 - It is recommended that you configure auto-complete to keep at least a few days of Spam around," I'm sure this is supposed to be "auto-archive" as mentioned in the paragraph above. A question I have that may/may not be suitable for the FAQ: Xxx. Can I use SpamBayes to filter into more than 2 categories (i.e. mail sorting rather than spam detection). Regards, Paul Hargreaves Systems Engineer Network Appliance Unit 1160 Elliot Court Herald Avenue Coventry Business Park Coventry, CV5 6UB From kennypitt at hotmail.com Mon Aug 11 11:59:26 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Mon Aug 11 10:59:38 2003 Subject: [spambayes-dev] FolderSelector problem in Define Filters dialog Message-ID: <3F37AF4E.1060209@hotmail.com> Recently I reinstalled Outlook using a different profile name. Rather than copying my old plug-in .INI file to the new profile name, I decided to just redo my settings from scratch in the SpamBayes Manager. When I went into Define Filters I found that the Browse buttons under Certain Spam and Possible Spam did *nothing*, while the Browse button for "Filter the following folders" worked fine. I tracked the problem down to two fixes, for which I have attached diffs. The first change to opt_processors.py allowed the FolderSelector dialog to display when Browse was clicked, but I couldn't click OK and the status did not update at the bottom of the dialog. The second fix in FolderSelector.py seems to have corrected the problem, and I now have my spam folders selected and filtering enabled. -- Kenny Pitt -------------- next part -------------- Index: opt_processors.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/Outlook2000/dialogs/opt_processors.py,v retrieving revision 1.2 diff -u -r1.2 opt_processors.py --- opt_processors.py 10 Aug 2003 07:26:50 -0000 1.2 +++ opt_processors.py 11 Aug 2003 14:50:20 -0000 @@ -186,7 +186,7 @@ if is_multi: ids = self.option.get() else: - ids = [self.optin.get()] + ids = [self.option.get()] from dialogs import FolderSelector if self.option_include_sub: cb_state = self.option_include_sub.get() -------------- next part -------------- Index: FolderSelector.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/Outlook2000/dialogs/FolderSelector.py,v retrieving revision 1.21 diff -u -r1.21 FolderSelector.py --- FolderSelector.py 10 Aug 2003 07:26:49 -0000 1.21 +++ FolderSelector.py 11 Aug 2003 14:47:11 -0000 @@ -340,8 +340,8 @@ # If single-select, the checked state is not used, just the # selected state. try: - h = win32gui.SendMessage(self.list, commctrl.TVM_GETSELECTEDITEM, - commctrl.TVGN_CARET, h) + h = win32gui.SendMessage(self.list, commctrl.TVM_GETNEXTITEM, + commctrl.TVGN_CARET, commctrl.TVI_ROOT) except win32gui.error: return info = self._GetLVItem(h) From tim.one at comcast.net Mon Aug 11 12:58:14 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Aug 11 11:58:47 2003 Subject: [spambayes-dev] Slashdotted Message-ID: Barry W pointed out that spambayes was "the winner" in this comparative review linked to from slashdot.org today (under the "Comparison of Bayesian POP3 Spam Filters" headline on Slashdot's front page): http://home.dataparty.no/kristian/reviews/bayesian/ Yawn . From skip at pobox.com Mon Aug 11 12:04:30 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Aug 11 12:04:50 2003 Subject: [spambayes-dev] Re: [Spambayes] Slashdotted In-Reply-To: References: Message-ID: <16183.48782.267489.993991@montanaro.dyndns.org> Tim> Barry W pointed out that spambayes was "the winner" in this Tim> comparative review linked to from slashdot.org today (under the Tim> "Comparison of Bayesian POP3 Spam Filters" headline on Slashdot's Tim> front page): Tim> http://home.dataparty.no/kristian/reviews/bayesian/ Tim> Yawn . I see that it mentions "even grandma can use it". I don't suppose your sisters are grandmas yet, are they? Perhaps the author was referring to them. ;-) Skip From skip at pobox.com Mon Aug 11 14:51:51 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Aug 11 14:52:33 2003 Subject: [spambayes-dev] Re: [Spambayes] Bug In-Reply-To: <3F37DFD1.8080809@newsguy.com> References: <3F37DFD1.8080809@newsguy.com> Message-ID: <16183.58823.134841.786455@montanaro.dyndns.org> John> Got this from SpamBayes, just downloaded and installed the latest John> from the site today and installed for the first time. This has been seen more and more recently. I think we could "correct" messages which have raw 8-bit text in their headers before feeding them to the email package. Thus this particular message's Subject: header would get converted from Cvnf stability money-maker w!hen life seem5 to expensive, you ?eed to get ah?ad to (I think): =?ISO-8859-1?Q?Cvnf stability money-maker w!hen life seem5 to expensive, you =F1eed to g= et ah=EAad= You'd obviously have to make an educated guess about the actual encoding. I have a function that does a pretty good job. The list of encodings to try could be an option with a default oriented toward ISO-8859-* and its various Windows variants. Skip From igor at gameplasma.com Mon Aug 11 15:47:27 2003 From: igor at gameplasma.com (Igor "JI" Murashkin) Date: Mon Aug 11 16:01:05 2003 Subject: [spambayes-dev] SpamBayes - on a Linux mail server? Message-ID: <000001c36041$799fc380$660010ac@swordsmaster> Hello, I have a Linux box, on which I have a mail server installed. In short, I fetch my mail from the server that's on Linux, but I have enough users -- 20-30 who use my mail server daily. My question is, would it be possible to install SpamBayes so that it filters everything out server side, so that the spam never even reaches the users, being instantly discarded? That would be lovely, as I wouldn't want all my users to worry about installing their own spam filters. Thanks! -Igor -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20030811/8ce100fc/attachment.htm From tim.one at comcast.net Mon Aug 11 17:30:42 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Aug 11 16:31:10 2003 Subject: [spambayes-dev] RE: Outlook release on SourceForge In-Reply-To: <01af01c35fd1$fc66b960$f502a8c0@eden> Message-ID: [Mark Hammond] > ... > In the meantime, I changed the link on my starship page to the < sourceforge download URL - > http://prdownloads.sourceforge.net/spambayes/SpamBayes-Outlook-Set > up-007.exe ?download - is there any evidence this will or will not > have the same effect as sending then to the "file releases" page? I expect it's the same, but don't know for sure. > Let's-beat-those-damn-pythoneers ly, We already beat those losers; I'm searching for a worthier adversary now . From vanhorn at whidbey.com Mon Aug 11 14:34:57 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Mon Aug 11 16:35:00 2003 Subject: [spambayes-dev] SpamBayes - on a Linux mail server? References: <000001c36041$799fc380$660010ac@swordsmaster> Message-ID: <3F37FDF1.EC35B437@whidbey.com> Philosophically, the answer is not to do this, as no two people are likely to have exactly the same reaction to different messages. Now, how much of a difference is this? Probably quite a bit. I have two instances of pop3proxy running, one handles four or five accounts on my local workstation, the other one handles four accounts on another workstation. Mine works like a charm. The second one, which handles two low-volume accounts of mine and my wife's mail, has definite problems because I didn't know every list she subscribed to. I know that anything that involves quilts or beads is going to be ham for her, but she has some general interests as well. So there were some messages that looked like spam to SpamBayes, and I corroborated that judgment in the training, so even when I go back and take those same messages and train as ham, it's going to be a while before similar messages stop ending up in her unsure folder. Now, if you can figure out a way to make sure that each user only trains on their own mail, then it probably would work. But the default is for SB to train by reinforcing its existing decisions, which introduces some real problems in the multi-user scenario. I am, however, considering offering the proxy or IMAP on my mail server, which would allow the users to drop their volume of mail that they actually pick up. They will be able to do their own training, but they won't have to do their own local installs. But sharing a training database looks like a loser to me. Van Igor \"JI\" Murashkin wrote: > Hello, I have a Linux box, on which I have a mail server installed. > In short, I fetch my mail from the server that's on Linux, but I have > enough users -- 20-30 who use my mail server daily. > My question is, would it be possible to install SpamBayes so that it > filters everything out server side, so that the spam never even > reaches the users, being instantly discarded? That would be lovely, as > I wouldn't want all my users to worry about installing their own spam > filters. Thanks!-Igor > > ---------------------------------------------------------------- > _______________________________________________ > spambayes-dev mailing list > spambayes-dev@python.org > http://mail.python.org/mailman/listinfo/spambayes-dev > -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20030811/7db79a9d/attachment.htm From richie at entrian.com Tue Aug 12 00:21:43 2003 From: richie at entrian.com (Richie Hindle) Date: Mon Aug 11 18:21:51 2003 Subject: [spambayes-dev] Newbie is well pythoned! Help! In-Reply-To: <000801c35feb$2a493560$0200a8c0@bigdaddy> References: <000801c35feb$2a493560$0200a8c0@bigdaddy> Message-ID: <805gjvscr7frqthndr8k7gg0h2qu0au7de@4ax.com> Hi Mark, > do the regular setup.py build; setup.py install dance, > Well typing setup.py at the dos prompt produced "bad command or filename" You need to start up a Command Prompt and do something like this: > cd \temp\spambayes-1.0a4 > c:\python23\python setup.py install You may need to tweak the commands to fit your machine, but those are the steps of the dance on Win98. The reason you can't just double-click setup.py is that it expects a command-line argument ('install'). Where a script doesn't expect an argument, and this includes pop3proxy.py, you should be able to just double-click it. However, if you do that, you might not see any errors it outputs (not that pop3proxy ever causes any errors 8-) so even for argumentless scripts, running them from a command prompt is a good idea. The spambayes scripts, including pop3proxy.py, will be installed into \python23\scripts (or its equivalent on your machine). Don't run the ones in \temp\spambayes-1.0a4 - you can delete that once you've installed the software. By the way, the "spambayes-dev" list is really for discussion of the development of the spambayes code - the "spambayes" list is for discussion of installation and usage. -- Richie Hindle richie@entrian.com From T.A.Meyer at massey.ac.nz Tue Aug 12 16:29:44 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Aug 11 23:30:34 2003 Subject: [spambayes-dev] RE: [Spambayes] Training good messages has no effect Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF39B4@its-xchg4.massey.ac.nz> > > Current version is 0.6, latest is 0.6. > > Current is actually 0.7, but we keep managing to screw up the > website where this is stored. "Check Latest Version" should > start reporting 0.7 soon. What about my suggestion that we make installing Version.cfg separate from making all? So a "make install" *doesn't* install Version.cfg, and a "make version install" command is necessary? I can see this accidentally happening all the time... =Tony Meyer From popiel at wolfskeep.com Mon Aug 11 22:19:26 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Tue Aug 12 00:19:30 2003 Subject: [spambayes-dev] testing tweaks In-Reply-To: Message from "Meyer, Tony" of "Mon, 11 Aug 2003 17:52:34 +1200." <1ED4ECF91CDED24C8D012BCF2B034F1302BF387F@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1302BF387F@its-xchg4.massey.ac.nz> Message-ID: <20030812041926.933632DE8C@cashew.wolfskeep.com> In message: <1ED4ECF91CDED24C8D012BCF2B034F1302BF387F@its-xchg4.massey.ac.nz> "Meyer, Tony" writes: >This is a multi-part message in MIME format. > >------_=_NextPart_001_01C35FCC.C2CD2B12 >Content-Type: text/plain; > charset="US-ASCII" >Content-Transfer-Encoding: quoted-printable > >> Hey, where's the patch? It's kind of hard to generate=20 >> corroborating evidence without a patch... > >Good point . Attached are "diff -u"s - is that right? It looks like only the classifier change is needed; the others look like null changes to me. Is this correct? Also, for those of us still running 2.2, it's nice to stick in the 'from __future__ import generators' at the top of the file, while using yield. I'm now having the following error thrown: Traceback (most recent call last): File "timcv.py", line 167, in ? main() File "timcv.py", line 164, in main drive(nsets) File "timcv.py", line 113, in drive d.test(hamstream, spamstream) File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/TestDriver.py", line 265, in test t.predict(spam, True, new_spam) File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/Tester.py", line 92, in predict prob = guess(example) File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/classifier.py", line 225, in chi2_spamprob clues = self._getclues(wordstream) File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/classifier.py", line 452, in _getclues q = wordstream.next() AttributeError: 'Msg' object has no attribute 'next' Given that the error clearly didn't happen on the first message it tried to classify, I suspect it's triggered by a peculiarity of one of my messages... as a random guess, I'd say perhaps a MIME multipart/digest or some other thing that has an embedded rfc822 section? In any case, I'm looking at how I might rephrase the classifier to avoid this issue... - Alex From mhammond at skippinet.com.au Tue Aug 12 15:29:46 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Aug 12 00:29:48 2003 Subject: [spambayes-dev] RE: [Spambayes] Training good messages has no effect In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF39B4@its-xchg4.massey.ac.nz> Message-ID: <00cf01c3608a$5ce8f470$f502a8c0@eden> > What about my suggestion that we make installing Version.cfg separate > from making all? So a "make install" *doesn't* install > Version.cfg, and > a "make version install" command is necessary? > > I can see this accidentally happening all the time... I agree 100%. Feel free to beat me too it :) I'd be happy with a single target, say 'version' that also did the install. Indeed, I would be surprised if you can convince 'make' to work so that "make version install" updates version.cfg, but neither "make version" nor "make install" do. Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 1788 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030812/fa5fcb11/winmail-0001.bin From tim.one at comcast.net Tue Aug 12 01:45:23 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Aug 12 00:45:59 2003 Subject: [spambayes-dev] testing tweaks In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF37D9@its-xchg4.massey.ac.nz> Message-ID: There was a thread partly about the mixed unigram/bigram scheme last November, starting here: http://mail.python.org/pipermail/spambayes/2002-November/001912.html It wasted time starting with a unigram+bigram+trigram scheme, and wasted more time trying to use hash codes to reduce the database burden (we've regretted that every time we've tried it). The spambayes results on my main test data were already so good then that testing couldn't verify any claimed improvement (it could only demonstrate that a suggested idea did worse). The "I only had time to run a few tests on that, and it looked very promising" refers to later small tests I never wrote up. They were closest to what a msg late in this thread called "bix" (exact (non-hashing) bigrams). Like Tony did, I was really using token bigrams (and trigrams, at the start). There were many mysteries related to bigrams created from header tokens, as pointed out in several of that thread's messages. Another mystery covered there is that split-on-whitespace still beat "extract words" for the fundamental tokenization gimmick. It's a mystery because the only "reason" I ever found for s-o-w winning with unigrams was the weak context info it offers (like "free!!" is more likely to be spammy than "free"). Moving to bigrams (or higher) really should give much stronger context info than we get from keeping punctuation. So many mysteries, so little time ... From T.A.Meyer at massey.ac.nz Tue Aug 12 17:58:01 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Aug 12 00:58:40 2003 Subject: [spambayes-dev] Contact page Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3A1F@its-xchg4.massey.ac.nz> A while back we had some suggestions about making the mailing list details more prominent, and one of the suggestions was that we have a separate contact page with this (and possibly other) information. (I'm too lazy to give you links, but it's there in the archives somewhere). I've checked in (and uploaded) a stab at a contact page (no other pages have been changed). This would be linked from the side bar - instead of "Email Us", have "Contact Us", and link it to the page, rather than spambayes@python.org. Comments? =Tony Meyer From skip at pobox.com Tue Aug 12 01:41:37 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Aug 12 01:41:45 2003 Subject: [spambayes-dev] Contact page In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3A1F@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3A1F@its-xchg4.massey.ac.nz> Message-ID: <16184.32273.139414.488082@montanaro.dyndns.org> Tony> I've checked in (and uploaded) a stab at a contact page (no other Tony> pages have been changed). This would be linked from the side bar Tony> - instead of "Email Us", have "Contact Us", and link it to the Tony> page, rather than spambayes@python.org. +1. Skip From anthony at interlink.com.au Tue Aug 12 16:58:36 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Aug 12 01:59:12 2003 Subject: [spambayes-dev] Re: [Spambayes] SpamBayes Problem In-Reply-To: <012f01c36093$7108d890$f502a8c0@eden> Message-ID: <200308120558.h7C5wjlS016774@localhost.localdomain> >>> "Mark Hammond" wrote > You could try resetting all the toolbars - see > http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/spambayes/spambaye > s/Outlook2000/docs/troubleshooting.html, and the bit about deleting > "outcmd.dat". I'm thinking a page http://spambayes.sourceforge.net/troubleshooting.html that simply redirects to the above page could be a good thing. -- Anthony Baxter It's never too late to have a happy childhood. From mhammond at skippinet.com.au Tue Aug 12 17:04:26 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Aug 12 02:04:27 2003 Subject: [spambayes-dev] RE: [Spambayes] SpamBayes Problem In-Reply-To: <200308120558.h7C5wjlS016774@localhost.localdomain> Message-ID: <014e01c36097$96b490d0$f502a8c0@eden> > I'm thinking a page http://spambayes.sourceforge.net/troubleshooting.html > that simply redirects to the above page could be a good thing. Yeah - or "outlook/troubleshooting.html" etc - then we could run amok ;) I-better-go-find-'html-for-dummies' ly, Mark. From T.A.Meyer at massey.ac.nz Tue Aug 12 19:33:16 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Aug 12 02:33:59 2003 Subject: [spambayes-dev] testing tweaks Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3A53@its-xchg4.massey.ac.nz> > It looks like only the classifier change is needed; the > others look like null changes to me. Is this correct? Sorry, it probably is. I made the other changes when I was trying to make the change to tokenizer (to get word, not token, bigrams), before I reconsidered and moved to classifier. > Also, for those of us still running 2.2, it's nice to stick > in the 'from __future__ import generators' at the top of the > file, while using yield. Sorry, I'll try and be more considerate... > I'm now having the following error thrown: [...] > File > "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/cla > ssifier.py", line 452, in _getclues > q = wordstream.next() > AttributeError: 'Msg' object has no attribute 'next' I recall seeing something like this - it's the reason for the if type(wordstream) stuff in learn and unlearn. Sometimes the wordstream is a Msg object rather than a generator. However, this: """ if type(wordstream) == types.GeneratorType: wordstream = self._enhance_wordstream(wordstream) """ should probably be: """ if type(wordstream) == type(Msg): wordstream = self._enhance_wordstream(wordstream.as_tokens) else: wordstream = self._enhance_wordstream(wordstream) """ (which then does require some of the other changes. I'm not sure if Msg is in the namespace, either). Perhaps the difference is that I was using timtest and you were using timcv? I can't recall when I saw the error, but it was definitely only in learning/unlearning, not getting probability. Off to look at Tim's original patch... =Tony Meyer From T.A.Meyer at massey.ac.nz Tue Aug 12 20:13:43 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Aug 12 03:14:36 2003 Subject: [spambayes-dev] Correction and new question to the FAQ Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3A5D@its-xchg4.massey.ac.nz> > "3.10 - It is recommended that you configure auto-complete to keep at > least a few days of Spam around," > > I'm sure this is supposed to be "auto-archive" as mentioned in the > paragraph above. Thanks. This is now fixed. > A question I have that may/may not be suitable for the FAQ: > > Xxx. Can I use SpamBayes to filter into more than 2 categories (i.e. > mail sorting rather than spam detection). I'm not sure if this is a FAQ or not, but the short answer is "no - look at POPfile instead". The longer answer is "maybe, read through the archives for discussions about how you could implement this". =Tony Meyer From vanhorn at whidbey.com Tue Aug 12 12:03:58 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Tue Aug 12 14:04:02 2003 Subject: [spambayes-dev] Correction and new question to the FAQ References: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3A5D@its-xchg4.massey.ac.nz> Message-ID: <3F392C0E.70ABD77E@whidbey.com> I think the long answer would be more along the lines of this: "SpamBayes wasn't designed for that, while POPFiles was. However, like any two-state filter, a series of SpamBayes instances could be cascaded to filter into any number of categories, although each would need to be trained. The result might be very effective, but not efficient." Telling anyone to "read through the archives" is cruel, given the volume here. I think I read every post that doesn't relate to Outlook, and even some of those, daily. I'd hate to have to go back and find anything if I got behind. Van "Meyer, Tony" wrote: > > "3.10 - It is recommended that you configure auto-complete to keep at > > least a few days of Spam around," > > > > I'm sure this is supposed to be "auto-archive" as mentioned in the > > paragraph above. > > Thanks. This is now fixed. > > > A question I have that may/may not be suitable for the FAQ: > > > > Xxx. Can I use SpamBayes to filter into more than 2 categories (i.e. > > mail sorting rather than spam detection). > > I'm not sure if this is a FAQ or not, but the short answer is "no - look > at POPfile instead". The longer answer is "maybe, read through the > archives for discussions about how you could implement this". > > =Tony Meyer > > _______________________________________________ > spambayes-dev mailing list > spambayes-dev@python.org > http://mail.python.org/mailman/listinfo/spambayes-dev -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From tim.one at comcast.net Tue Aug 12 15:35:12 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Aug 12 14:35:52 2003 Subject: [spambayes-dev] RE: [Spambayes] SpamBayes : contribution In-Reply-To: <159ijvgekl9go0bgtu0hq26rk195hq6d2l@4ax.com> Message-ID: [Tim] >> If we wait a few months, the PSF is currently paying a lawyer to >> review a "joint ownership" contribution agreement, enabling the PSF >> and a contributor to effectively share copyright. [Richie Hindle] > That's a good idea - could you keep us (or more likely spambayes-dev) > up to date with what happens? Ta. Oh sure. It will be publicized when it happens . An already out-of-date draft is in the Proposed Contributor Agreement section at http://www.python.org/psf/psf-contributor-agreement.html From skip at pobox.com Tue Aug 12 16:17:32 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Aug 12 16:17:42 2003 Subject: [spambayes-dev] Simply n-way classifier Message-ID: <16185.19292.111612.453684@montanaro.dyndns.org> I checked in a simple n-way classifier to the contrib directory just now. Executing 'python nway.py -h" should give interested parties a reasonable idea of how to use it and create databases for it. Skip From skip at pobox.com Tue Aug 12 22:54:39 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Aug 12 22:54:50 2003 Subject: [spambayes-dev] Correction and new question to the FAQ In-Reply-To: <3F392C0E.70ABD77E@whidbey.com> References: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3A5D@its-xchg4.massey.ac.nz> <3F392C0E.70ABD77E@whidbey.com> Message-ID: <16185.43119.305872.694102@montanaro.dyndns.org> Van> I think the long answer would be more along the lines of this: Van> "SpamBayes wasn't designed for that, while POPFiles was. However, Van> like any two-state filter, a series of SpamBayes instances could be Van> cascaded to filter into any number of categories, although each Van> would need to be trained. The result might be very effective, but Van> not efficient." I added a question and answer about this topic to the FAQ today. I also added a toy app to the contrib directory (nway.py) modelled after hammiefilter.py. Skip From mhammond at skippinet.com.au Wed Aug 13 15:11:02 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Aug 13 00:10:48 2003 Subject: [spambayes-dev] www.python.org/sf/sb/12345? Message-ID: <062101c36150$e9224100$f502a8c0@eden> I really like the www.python.org/sf/xxxxx cgi script - is there any hope of getting one for us? If it is hard for our sf based site, can we beg, borrow or steal space on python.org? Maybe that could be a benefit of being a "PSF sponsoring project" (of which we are a founding member) . Or maybe on starship? putting-off-the-bookwork ly, Mark. From T.A.Meyer at massey.ac.nz Wed Aug 13 17:13:01 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Aug 13 00:13:38 2003 Subject: [spambayes-dev] www.python.org/sf/sb/12345? Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D04@its-xchg4.massey.ac.nz> > I really like the www.python.org/sf/xxxxx cgi script - is > there any hope of getting one for us? Definitely +1. I've thought this many times, too. =Tony Meyer From skip at pobox.com Wed Aug 13 00:39:15 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Aug 13 00:39:28 2003 Subject: [spambayes-dev] www.python.org/sf/sb/12345? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D04@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D04@its-xchg4.massey.ac.nz> Message-ID: <16185.49395.656633.177189@montanaro.dyndns.org> >> I really like the www.python.org/sf/xxxxx cgi script - is there any >> hope of getting one for us? Tony> Definitely +1. I've thought this many times, too. Okay, I have it running at http://staging.musi-cal/com/cgi-bin/sf Usage is the same as the the one for the Python project. If you fail to give an id it prompts for one. Where can this go? Does SF support the ability to run CGI scripts? Skip From skip at pobox.com Wed Aug 13 00:42:13 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Aug 13 00:42:27 2003 Subject: [spambayes-dev] www.python.org/sf/sb/12345? In-Reply-To: <062101c36150$e9224100$f502a8c0@eden> References: <062101c36150$e9224100$f502a8c0@eden> Message-ID: <16185.49573.966870.718177@montanaro.dyndns.org> Mark> I really like the www.python.org/sf/xxxxx cgi script - is there Mark> any hope of getting one for us? If it is hard for our sf based Mark> site, can we beg, borrow or steal space on python.org? Maybe that Mark> could be a benefit of being a "PSF sponsoring project" (of which Mark> we are a founding member) . Or maybe on starship? I can tweak the version I have on the Musi-Cal staging server to not collide with the Python project's use, then see if one script can serve both projects (default the groupid to 5470). Alternatively, the SB version can run as "sb" instead of "sf". Either way is fine with me. Skip From T.A.Meyer at massey.ac.nz Wed Aug 13 19:53:32 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Aug 13 02:54:08 2003 Subject: [spambayes-dev] Correction and new question to the FAQ Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D7A@its-xchg4.massey.ac.nz> > I think the long answer would be more along the lines of this: > > "SpamBayes wasn't designed for that, while POPFile was. > However, like any two-state filter, a series of SpamBayes > instances could be cascaded to filter into any number of > categories, although each would need to be trained. The > result might be very effective, but not efficient." That is a much better answer :) > Telling anyone to "read through the archives" is cruel, given > the volume here. I should have said "search through the archives", not read. Googling for 'site:mail.python.org spambayes "n-way"' brings up lots of posts, which are probably the relevant ones, or at least a starting point. (Although the change from "pipermail-21" to "pipermail" breaks almost all the links, it's fixable by hand). =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Aug 13 19:57:39 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Aug 13 02:58:34 2003 Subject: [spambayes-dev] Correction and new question to the FAQ Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D7B@its-xchg4.massey.ac.nz> > I added a question and answer about this topic to the FAQ > today. I also added a toy app to the contrib directory > (nway.py) modelled after hammiefilter.py. Any reason why the FAQ doesn't mention that script? BTW, did you try the script out? If so, how did it go, at first glance? =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Aug 13 21:59:45 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Aug 13 05:00:25 2003 Subject: [spambayes-dev] New Outlook Dialogs Problem Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D8A@its-xchg4.massey.ac.nz> If I haven't got enough training information to enable filtering, the "enable filtering" box isn't greyed out anymore. If I try to check it, it doesn't check, and I get this traceback: Traceback (most recent call last): File "D:\cvs\spambayes\spambayes\Outlook2000\dialogs\dlgcore.py", line 286, in OnCommand self.ApplyHandlingOptionValueError(handler.OnCommand, wparam, lparam) File "D:\cvs\spambayes\spambayes\Outlook2000\dialogs\dlgcore.py", line 245, in ApplyHandlingOptionValueError self.dialog_def.caption, mb_flags) AttributeError: ProcessorDialog instance has no attribute 'dialog_def' (I'd try and fix this myself, but the new dialog code still has me overwhelmed ;) =Tony Meyer From barry at python.org Wed Aug 13 12:47:18 2003 From: barry at python.org (Barry Warsaw) Date: Wed Aug 13 07:47:19 2003 Subject: [spambayes-dev] Correction and new question to the FAQ In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D7A@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D7A@its-xchg4.massey.ac.nz> Message-ID: <1060775205.14202.8.camel@anthem> On Wed, 2003-08-13 at 02:53, Meyer, Tony wrote: > I should have said "search through the archives", not read. Googling > for 'site:mail.python.org spambayes "n-way"' brings up lots of posts, > which are probably the relevant ones, or at least a starting point. > (Although the change from "pipermail-21" to "pipermail" breaks almost > all the links, it's fixable by hand). If "by hand" you mean one of mine that participated in adding a redirect in mail.python.org's Apache config... you were right! Or left. Or, well let's give both my hands a hand. :) -Barry From edrubins at andisplace.com Wed Aug 13 10:01:43 2003 From: edrubins at andisplace.com (Ed Rubinsky) Date: Wed Aug 13 09:02:01 2003 Subject: [spambayes-dev] FAQ entry for Eudora clients Message-ID: <5.1.0.14.0.20030813085310.00b118b0@localhost> The attached diff will add a entry on configuring Eudora mail clients for use with pop3proxy.py, as well as a few spelling corrections Best, Ed -------------- next part -------------- *** faq.txt Wed Aug 13 07:48:48 2003 --- orig_faq.txt Wed Aug 13 07:10:22 2003 *************** *** 71,77 **** the `I'm not a programmer but still want to help`_ question for more details. ! * Donate money to the Python Software Foundations. For more information, including why you would want to donate to the PSF, please see our `donations page`_. --- 71,77 ---- the `I'm not a programmer but still want to help`_ question for more details. ! * Dontate money to the Python Software Foundations. For more information, including why you would want to donate to the PSF, please see our `donations page`_. *************** *** 92,98 **** Spambayes or help other users. 2. The `Spambayes developers list`_ provides a forum for people ! maintaining and improving the package. 3. The `Spambayes announcements list`_ is a low-volume list where announcements about new releases are posted. --- 92,98 ---- Spambayes or help other users. 2. The `Spambayes developers list`_ provides a forum for people ! maintaining and improving the pacakge. 3. The `Spambayes announcements list`_ is a low-volume list where announcements about new releases are posted. *************** *** 344,415 **** .. _IMAP: http://spambayes.sf.net/applications.html#imap - How do I configure Eudora for use with Spambayes? - ------------------------------------------------- - - Note: The following instructions have been verified using Eudora 5.1 - under Windows. If anyone is using Eudora under Max OS please let us - know if the configuration is the same as Windows. - - Eudora does not allow configuring the server port through the - normaloptions dialogue. However a large number of options are exposed - in an intitialization file (eudora.ini) read at startup. The contents - of the initialization file are documented by clicking on Help->Topics - and searching on EUDORA.INI (you may want to print this help page for - future reference.) Depending on how you installed Eudora, eudora.ini - is located either in the Eudora install directory or the user's - setting directory - (C:\Documents and Settings\userid\ApplicationData\Qualcomm\Eudora\eudora.ini on my system.) - - 1. Locate eudora.ini. - - 2. Make two copies - eudoraok.ini for backup and eudorame.ini to - modify. - - 3. Configure pop3proxy for each of Eudora's personalities' POP3 - servers, specifying a separate port for each. For example 1110, 1120, - 1130 and 1140 for four personalities. Do the same for smtpproxy - for - example 1115, 1125, 1135 and 1145 corresponding to the four POP3 - servers. - - 4. Close Eudora. - - 5. Open eudorame.ini with a text editor - wordpad for example. DO NOT - USE A WORD PROCESSOR TO EDIT THE INITIALIZATION FILE. - - 6. Ffind the section starting with [Settings]. This contains settings - for the dominant personality. - - 7. Find the line beggining POPAccount. The last part of the account - name starting with @ is the server. Change it to @localhost. - - 8. Find the lines beggining SMTPServer and POPServer. They will have - the server names defined for your dominant personality. - - 9. Change both server names to localhost - - 10, Add the following two lines. Use whatever ports you assigned to - pop3proxy and smtpproxy for the dominant personality. - POPPort=1110 - SMTPPort=1115 - - 11. Setting for other personalities are kept in sections begging with - [Persona-personality_name]. For each personality make the same changes - as you made for the dominant personality, substituting the proper port - numbers. - - 12. Copy eudorame.ini to eudora.ini and re-start Eudora. - - 13. In the password dialog for each personality you should see - localhost where you used to see the actual server name. This may take - some getting used to at first. Since every personality will now have a - server named localhost you will have to know what order Eudora prompts - for the user id's and passwords. - - 14. If there are any problems, close Eudora, copy eudoraok.ini to - eudora.ini and restart Eudora. This will restore Eudora's original - configuration until the problem can be resolved. - Outlook Plugin ============== --- 344,349 ---- From skip at pobox.com Wed Aug 13 10:36:42 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Aug 13 10:36:54 2003 Subject: [spambayes-dev] missized fonts Message-ID: <16186.19706.77495.119643@montanaro.dyndns.org> I'm adding Ed Rubinsky's Eudora configuration q&a to the FAQ. I used ``...`` around some text I intended to be displayed in a fixed-width font. All the text is rendered in my browser (Safari, Mac OS X) in Courier, however all such text is about half the height of the surrounding regular text. Here's how one little snippet is defined in HTML: eudoraok.ini Something needs tweaking in style.css, but I know next to nothing about it. I added TT.literal { font-size: 12pt; } to style.css, because it appeared that the main text is supposed to be rendered at that size. The fixed-width text grew, but is still smaller than the surrounding text. I checked in a modifies style.css and the updated FAQ, but can someone with the proper CSS-fu tweak style.css appropriately? Thanks, Skip From skip at pobox.com Wed Aug 13 10:37:05 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Aug 13 10:37:15 2003 Subject: [spambayes-dev] FAQ entry for Eudora clients In-Reply-To: <5.1.0.14.0.20030813085310.00b118b0@localhost> References: <5.1.0.14.0.20030813085310.00b118b0@localhost> Message-ID: <16186.19729.129989.21182@montanaro.dyndns.org> Ed> The attached diff will add a entry on configuring Eudora mail Ed> clients for use with pop3proxy.py, as well as a few spelling Ed> corrections Got it, thanks. Skip From kennypitt at hotmail.com Wed Aug 13 11:37:57 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Aug 13 10:38:09 2003 Subject: [spambayes-dev] New Outlook Dialogs Problem In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D8A@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D8A@its-xchg4.massey.ac.nz> Message-ID: <3F3A4D45.2030304@hotmail.com> Meyer, Tony wrote: > If I haven't got enough training information to enable filtering, the > "enable filtering" box isn't greyed out anymore. If I try to check it, > it doesn't check, and I get this traceback: > > Traceback (most recent call last): > File "D:\cvs\spambayes\spambayes\Outlook2000\dialogs\dlgcore.py", line > 286, in OnCommand > self.ApplyHandlingOptionValueError(handler.OnCommand, wparam, > lparam) > File "D:\cvs\spambayes\spambayes\Outlook2000\dialogs\dlgcore.py", line > 245, in ApplyHandlingOptionValueError > self.dialog_def.caption, mb_flags) > AttributeError: ProcessorDialog instance has no attribute 'dialog_def' > > (I'd try and fix this myself, but the new dialog code still has me > overwhelmed ;) > Here's a first stab at a fix. Maybe Mark can clean this up and plug any additional holes that I didn't notice. Index: dlgcore.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/Outlook2000/dialogs/dlgcore.py,v retrieving revision 1.2 diff -u -r1.2 dlgcore.py --- dlgcore.py 10 Aug 2003 07:26:49 -0000 1.2 +++ dlgcore.py 13 Aug 2003 14:34:54 -0000 @@ -86,6 +86,9 @@ def DoModal(self): return self._DoCreate(win32gui.DialogBoxIndirect) + + def GetCaption(self): + return win32gui.GetWindowText(self.hwnd) def GetMessageMap(self): ret = { @@ -242,7 +245,7 @@ except ValueError, why: mb_flags = win32con.MB_ICONEXCLAMATION | win32con.MB_OK win32gui.MessageBox(self.hwnd, str(why), - self.dialog_def.caption, mb_flags) + self.GetCaption(), mb_flags) return False def SaveAllControls(self): Index: dialog_map.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/Outlook2000/dialogs/dialog_map.py,v retrieving revision 1.2 diff -u -r1.2 dialog_map.py --- dialog_map.py 10 Aug 2003 07:26:49 -0000 1.2 +++ dialog_map.py 13 Aug 2003 14:34:46 -0000 @@ -35,6 +35,14 @@ 0, db_status) class FilterEnableProcessor(BoolButtonProcessor): + def OnOptionChanged(self, option): + self.Init() + + def Init(self): + BoolButtonProcessor.Init(self) + reason = self.window.manager.GetDisabledReason() + win32gui.EnableWindow(self.GetControl(), reason is None) + def UpdateValue_FromControl(self): check = win32gui.SendMessage(self.GetControl(), win32con.BM_GETCHECK) if check: -- Kenny Pitt From skip at pobox.com Wed Aug 13 10:54:47 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Aug 13 10:55:02 2003 Subject: [spambayes-dev] Correction and new question to the FAQ In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D7B@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D7B@its-xchg4.massey.ac.nz> Message-ID: <16186.20791.737601.992031@montanaro.dyndns.org> >>>>> "Tony" == Tony Meyer writes: >> I added a question and answer about this topic to the FAQ today. I >> also added a toy app to the contrib directory (nway.py) modelled >> after hammiefilter.py. Tony> Any reason why the FAQ doesn't mention that script? BTW, did you Tony> try the script out? If so, how did it go, at first glance? I wrote the FAQ first, then later on decided to give the script a go. I just updated the faq. The script seems to work fine given the minimal amount of testing I've done. I used the multiple mboxtrain.py runs scheme outlined in the docstring to create five databases besides my usual spam database. The inputs were existing mailboxes specific to each of the five subjects. I then ran a small handful of messages from my incoming directory against the nway script. It seemed to classify the messages properly. One Spambayes-related message that found its way into my normal mbox was correctly classified as being Python-related. (It had been sent in private mail, so didn't contain any of the header flags my procmail recipes look for.) Skip From adam.walker at rbwconsulting.com Wed Aug 13 12:07:57 2003 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Wed Aug 13 11:08:12 2003 Subject: [spambayes-dev] New Outlook Dialogs Problem In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D8A@its-xchg4.massey.ac.nz> Message-ID: <20030813150808.5CC19862BD@plunder.dreamhost.com> D'oh! I'll take the blame for the exception. I removed dialog_def when I added a utility to remove dependence on the rc when making a binary. However, the control not graying out is someone else's ;) --Adam > -----Original Message----- > From: spambayes-dev-bounces@python.org [mailto:spambayes-dev- > bounces@python.org] On Behalf Of Meyer, Tony > Sent: Wednesday, August 13, 2003 5:00 AM > To: spambayes-dev@python.org > Subject: [spambayes-dev] New Outlook Dialogs Problem > > If I haven't got enough training information to enable filtering, the > "enable filtering" box isn't greyed out anymore. If I try to check it, > it doesn't check, and I get this traceback: > > Traceback (most recent call last): > File "D:\cvs\spambayes\spambayes\Outlook2000\dialogs\dlgcore.py", line > 286, in OnCommand > self.ApplyHandlingOptionValueError(handler.OnCommand, wparam, > lparam) > File "D:\cvs\spambayes\spambayes\Outlook2000\dialogs\dlgcore.py", line > 245, in ApplyHandlingOptionValueError > self.dialog_def.caption, mb_flags) > AttributeError: ProcessorDialog instance has no attribute 'dialog_def' > > (I'd try and fix this myself, but the new dialog code still has me > overwhelmed ;) > > =Tony Meyer > > _______________________________________________ > spambayes-dev mailing list > spambayes-dev@python.org > http://mail.python.org/mailman/listinfo/spambayes-dev From kennypitt at hotmail.com Wed Aug 13 12:39:46 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Aug 13 11:40:28 2003 Subject: [spambayes-dev] missized fonts In-Reply-To: <16186.19706.77495.119643@montanaro.dyndns.org> References: <16186.19706.77495.119643@montanaro.dyndns.org> Message-ID: <3F3A5BC2.7060605@hotmail.com> Skip Montanaro wrote: > I'm adding Ed Rubinsky's Eudora configuration q&a to the FAQ. I used > ``...`` around some text I intended to be displayed in a fixed-width font. > All the text is rendered in my browser (Safari, Mac OS X) in Courier, > however all such text is about half the height of the surrounding regular > text. Here's how one little snippet is defined in HTML: > > eudoraok.ini > > Something needs tweaking in style.css, but I know next to nothing about it. > I added > > TT.literal { > font-size: 12pt; > } > > to style.css, because it appeared that the main text is supposed to be > rendered at that size. The fixed-width text grew, but is still smaller than > the surrounding text. > > I checked in a modifies style.css and the updated FAQ, but can someone with > the proper CSS-fu tweak style.css appropriately? > Courier and Courier New render smaller at the same point size than Arial, Verdana, etc. Try something like: TT.literal { font-size: 110%; } Tweak the percentage until it looks right to you, and then just *hope* that it renders the same for everyone else. ;-) -- Kenny Pitt From T.A.Meyer at massey.ac.nz Thu Aug 14 12:38:41 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Aug 13 19:39:16 2003 Subject: [spambayes-dev] Correction and new question to the FAQ Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3EF9@its-xchg4.massey.ac.nz> > > (Although the change from "pipermail-21" to "pipermail" > > breaks almost all the links, it's fixable by hand). > > If "by hand" you mean one of mine that participated in adding > a redirect in mail.python.org's Apache config... you were right! > Or left. Or, well let's give both my hands a hand. :) Thanks for that - it makes googling through the archives much simpler again. =Tony Meyer From T.A.Meyer at massey.ac.nz Thu Aug 14 13:24:01 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Aug 13 20:24:46 2003 Subject: [spambayes-dev] missized fonts Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3F46@its-xchg4.massey.ac.nz> > The fixed-width text grew, but is still smaller > than the surrounding text. This is what it looks like here (Windows, IE6, Opera, and Mozilla): Is it smaller for you (mac<->windows font sizes have always been problematic)? I like that the pre text is slightly smaller than the surrounding text - it helps it stand out. It's definitely fixed at 12, anyway, since if I shrink the page (ctrl-scroll wheeling down, whatever that does...presumably makes everything "smaller" in css terms), it looks like: =Tony Meyer From skip at pobox.com Wed Aug 13 21:48:57 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Aug 13 21:49:13 2003 Subject: [spambayes-dev] missized fonts In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3F46@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3F46@its-xchg4.massey.ac.nz> Message-ID: <16186.60041.239700.150794@montanaro.dyndns.org> >> The fixed-width text grew, but is still smaller than the surrounding >> text. Tony> This is what it looks like here (Windows, IE6, Opera, and Mozilla): Tony> Yeah, about like hat. Someone else (sorry, I forgot who already) indicated that courier fonts tend to be smaller than their verdana, arial, etc counterparts for the same point size. Tony> ... It's definitely fixed at 12, anyway, since if I shrink the Tony> page ... Well, that's probably helped by the fact that I defined it that way. I was just copying the setting for the body: BODY { background: white; color: #484848; margin-right: 15%; font-family: geneva, verdana, arial, "ms sans serif", sans-serif; font-size: 12pt; } ... TT.literal { font-size: 12pt; } Should it have been "font-size: 100%"? Someone suggested that, right? Skip From kiko at async.com.br Thu Aug 14 00:35:15 2003 From: kiko at async.com.br (Christian Reis) Date: Wed Aug 13 22:36:33 2003 Subject: [spambayes-dev] RE: [Python-Dev] RE: [Spambayes] Question (orpossibly a bug report) In-Reply-To: References: <020901c35236$e5576f10$f502a8c0@eden> Message-ID: <20030814023515.GO3095@async.com.br> On Fri, Jul 25, 2003 at 07:25:48AM +0200, Martin v. L?wis wrote: > "Mark Hammond" writes: > > > The "best" solution to this probably involves removing Python being > > dependent on the locale - there is even an existing patch for that. > > While the feature is desirable, I don't like the patch it all. It > copies the relevant code of Gnome glib, and I > a) doubt it works on all systems we care about, and I'm sorry you don't like the patch, but if there's something that can be fixed, we will fix it :-) Well, glib is known to be quite portable, and we would make sure that it does run on the supported platforms before considering checking it in. (I'm betting it does.) > b) is too much code for us to maintain, and It's not *that* much code, and we can rely on fixes that are produced to glib being easily ported to us -- we get free maintenance of the code if we choose to do so, actually. > c) introduces yet another license (although the true authors > of that code would be willing to relicense it) Which means that c) is a non-issue? > It would be better if system functions could be found for a > locale-agnostic atof/strtod on all systems. For example, glibc > has a strtod_l function, which expects a locale_t in addition > to the char*. Yes, but if all we were worried about was glibc, then point a) would be a non-issue too. I imagine it's easier to make sure the code we *have* runs on multiple platforms than trying to find and call code that *may* exist on each given platform. > It would be good if something similar was discovered for VC. Using > undocumented or straight Win32 API functions would be fine. > Unfortunately, the "true" source of atof (i.e. from conv.obj) is not > shipped with MSVC :-( I don't understand this bit. You'd rather use an undocumented API function than an open source, well-tested, properly licensed set of functions? Take care, -- Christian Reis, Senior Engineer, Async Open Source, Brazil. http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL From kiko at async.com.br Thu Aug 14 00:38:49 2003 From: kiko at async.com.br (Christian Reis) Date: Wed Aug 13 22:39:01 2003 Subject: [spambayes-dev] RE: [Python-Dev] RE: [Spambayes] Question (orpossibly a bug report) In-Reply-To: References: Message-ID: <20030814023849.GP3095@async.com.br> On Fri, Jul 25, 2003 at 03:13:46AM -0400, Tim Peters wrote: > [martin@v.loewis.de] > > While the feature is desirable, I don't like the patch it all. It > > copies the relevant code of Gnome glib, and I > > a) doubt it works on all systems we care about, and > > b) is too much code for us to maintain, and > > c) introduces yet another license (although the true authors > > of that code would be willing to relicense it) > > OTOH, even assuming "C" locale, Python's float<->string story varies across > platforms anyway, due to different C libraries treating things like > infinities, NaNs, signed zeroes, and the number of digits displayed in an > exponent differently. This also has bad consequences, although one-platform > programmers usually don't notice them (Windows programmers do more than > most, because MS's C library can't read back the strings it produces for > NaNs and infinities -- which Python also produces and can't read back in > then). > > So it's not that the patch is too much code to maintain, it's not enough > code to do the whole job <0.9 wink>. My question, now, is if we would we be able to cobble something even more magical into the g_ascii_* functions that makes Python more robust to these changes (over time)? Take care, -- Christian Reis, Senior Engineer, Async Open Source, Brazil. http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL From mkoehler at stblaw.com Wed Aug 13 23:56:25 2003 From: mkoehler at stblaw.com (Koehler, Michael W) Date: Wed Aug 13 22:56:36 2003 Subject: [spambayes-dev] Addition to Outlook FAQ Message-ID: Thanks to all for a great product! Things have been working beautifully, then the other day my Inbox was once again filled with spam. My first thought was that SpamBayes had stopped filtering. Everything seemed to be working correctly, the spam just would not move to the Spam folder. After much fiddling I realized the obvious... My Spam folder was full. I did not bother to count, but deleting a few hundred got SpamBayes going again. I now auto-archive my spam folder. The limit seems to be 16,384. See http://support.microsoft.com/default.aspx?scid=kb;[LN];Q196494 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20030813/2f2afc7b/attachment.htm From T.A.Meyer at massey.ac.nz Thu Aug 14 15:59:25 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Aug 13 23:00:06 2003 Subject: [spambayes-dev] Addition to Outlook FAQ Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D91F08@its-xchg4.massey.ac.nz> > Everything seemed to be working correctly, the > spam just would not move to the Spam folder. > After much fiddling I realized the obvious... > My Spam folder was full. I did not bother to > count, but deleting a few hundred got SpamBayes > going again. I now auto-archive my spam folder. > The limit seems to be 16,384. Interestingly, Mark just checked in a comment about this (maybe someone else ran into this and reported it to him). Mark - do we add this to the FAQ, or do you want to put in some sort of dialog to pop up when this happens, explaining the problem? =Tony Meyer From guido at python.org Wed Aug 13 21:34:16 2003 From: guido at python.org (Guido van Rossum) Date: Wed Aug 13 23:35:22 2003 Subject: [spambayes-dev] RE: [Python-Dev] RE: [Spambayes] Question (orpossibly a bug report) In-Reply-To: Your message of "Wed, 13 Aug 2003 23:35:15 -0300." <20030814023515.GO3095@async.com.br> References: <020901c35236$e5576f10$f502a8c0@eden> <20030814023515.GO3095@async.com.br> Message-ID: <200308140334.h7E3YGa02755@12-236-84-31.client.attbi.com> > I don't understand this bit. You'd rather use an undocumented API > function than an open source, well-tested, properly licensed set of > functions? I don't know what the exact requirements of this license are, but I assure you that redistributing code that is not under the PSF license is a pain, even if it's an open source license. If we can get the original authors to contribute the code to the PSF without the requirement to include a license of any kind (beyond the PSF license) in redistributions, either by the PSF or downstream, even if those redistributions are commercial or contain proprietary code in addition to open source code. This is what's possible with the PSF license, and that needs to remain the case. In particular, the GPL is *not* acceptable for this purpose. --Guido van Rossum (home page: http://www.python.org/~guido/) From kiko at async.com.br Thu Aug 14 01:48:03 2003 From: kiko at async.com.br (Christian Reis) Date: Wed Aug 13 23:48:37 2003 Subject: [spambayes-dev] RE: [Python-Dev] RE: [Spambayes] Question (orpossibly a bug report) In-Reply-To: <200308140334.h7E3YGa02755@12-236-84-31.client.attbi.com> References: <020901c35236$e5576f10$f502a8c0@eden> <20030814023515.GO3095@async.com.br> <200308140334.h7E3YGa02755@12-236-84-31.client.attbi.com> Message-ID: <20030814034803.GB5693@async.com.br> On Wed, Aug 13, 2003 at 08:34:16PM -0700, Guido van Rossum wrote: > > I don't understand this bit. You'd rather use an undocumented API > > function than an open source, well-tested, properly licensed set of > > functions? > > I don't know what the exact requirements of this license are, but I > assure you that redistributing code that is not under the PSF license > is a pain, even if it's an open source license. If we can get the > original authors to contribute the code to the PSF without the > requirement to include a license of any kind (beyond the PSF license) > in redistributions, either by the PSF or downstream, even if those > redistributions are commercial or contain proprietary code in addition > to open source code. This is what's possible with the PSF license, You omit the predicate that follows this if clause, but I'm hoping you meant something positive like `we will gladly accept it' I'm waiting on Alex's answer on relicensing the code, but he's said on IRC that he'd be willing to do it, so barring any environmental disasters, that should be solved sometime soon. Take care, -- Christian Reis, Senior Engineer, Async Open Source, Brazil. http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL From adam.walker at rbwconsulting.com Thu Aug 14 18:58:28 2003 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Thu Aug 14 17:58:50 2003 Subject: [spambayes-dev] Dialog Hacking Message-ID: <20030814215845.D46BA13E248@sack.dreamhost.com> I finally got around to installing MS Visual Studio on my machine the other day and have been hacking around in the dialog stuff. I tried using WEdit from win32-lcc and gave up. Linked is a screenshot of the manager dialog with a SpamBayes logo I threw together. I'll submit a patch after I iron out of the details of the other dialogs. http://meta.xenogeist.com/images/manager.jpg Feedback? --Adam From mhammond at skippinet.com.au Fri Aug 15 09:58:12 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Aug 14 18:58:00 2003 Subject: [spambayes-dev] Dialog Hacking In-Reply-To: <20030814215845.D46BA13E248@sack.dreamhost.com> Message-ID: <085d01c362b7$8a85af20$f502a8c0@eden> That is great! From my eye, I would suggest: * Drop some of the vspace at the top of the logo before the word. * Drop the "Outlook Addin" part - then we have a generic logo every app can use. If you are comfortable with the code changes that will come with it, just check it in rather than going via a patch (but obviously mail if you need guidance) Cool :) Mark. > -----Original Message----- > From: spambayes-dev-bounces@python.org > [mailto:spambayes-dev-bounces@python.org]On Behalf Of Adam Walker > Sent: Friday, 15 August 2003 7:58 AM > To: spambayes-dev@python.org > Subject: [spambayes-dev] Dialog Hacking > > > I finally got around to installing MS Visual Studio on my > machine the other > day and have been hacking around in the dialog stuff. I tried > using WEdit > from win32-lcc and gave up. > Linked is a screenshot of the manager dialog with a SpamBayes > logo I threw > together. I'll submit a patch after I iron out of the details > of the other > dialogs. > > http://meta.xenogeist.com/images/manager.jpg > > Feedback? > --Adam > > > > _______________________________________________ > spambayes-dev mailing list > spambayes-dev@python.org > http://mail.python.org/mailman/listinfo/spambayes-dev From adam.walker at rbwconsulting.com Thu Aug 14 21:34:14 2003 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Thu Aug 14 20:34:40 2003 Subject: [spambayes-dev] Dialog Hacking In-Reply-To: <085d01c362b7$8a85af20$f502a8c0@eden> Message-ID: <20030815003436.8A28A8627F@plunder.dreamhost.com> Here's take two ;) http://meta.xenogeist.com/images/manager2.jpg It's mainly the gui for the timer delays I'm worried about (and that's giving me fits. Damn sliders.). The logo is in Jasc Paint Shop Pro 8 format. Should I check that file in as well as the bmp? --Adam [Mark Hammond] > -----Original Message----- > > That is great! From my eye, I would suggest: > * Drop some of the vspace at the top of the logo before the word. > * Drop the "Outlook Addin" part - then we have a generic logo every app > can > use. > > If you are comfortable with the code changes that will come with it, just > check it in rather than going via a patch (but obviously mail if you need > guidance) > > Cool :) > > Mark. From T.A.Meyer at massey.ac.nz Fri Aug 15 13:55:34 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Aug 14 20:56:19 2003 Subject: [spambayes-dev] Dialog Hacking Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D921AD@its-xchg4.massey.ac.nz> > Here's take two ;) http://meta.xenogeist.com/images/manager2.jpg This is much better. I couldn't really read the "Outlook plugin" bit because the gradient was too wide, and too dark in parts. There's still too much (IMO) space between the logo and the rest of the dialog (the black bit). This could just be a couple of pixels (again, IMO). (I would like a full-stop at the end of the explanation of training too, but that isn't a dialog design issue). > It's mainly the gui for the timer delays I'm worried about > (and that's giving me fits. Damn sliders.). Is this what you are putting in the (returned!) advanced section? Or are you putting it in the filter dialog somewhere? I think that the new dialogs should expose the four 'read status' options as well. I haven't seen anything much in the way of problems reported about them, and they are often requested. > The logo is in Jasc Paint Shop Pro 8 format. Should I check > that file in as well as the bmp? Yes. There's nothing worse than working with a bitmap image that was originally a vector image. You could check in some other form of vector image that isn't PSP only if you'd rather (something that the Gimp or Photoshop could open), but it doesn't really matter. =Tony Meyer [ While you're playing with things, you should put in an easter egg of sorts that opens up a browser window with python.org if you click on the Python Powered image... ;) ] From mhammond at skippinet.com.au Fri Aug 15 12:10:32 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Aug 14 21:10:14 2003 Subject: [spambayes-dev] Dialog Hacking Message-ID: <08ba01c362ca$0785a220$f502a8c0@eden> Oops - I didn't see the spambayes-dev CC! I just sent this to Adam: > It's mainly the gui for the timer delays I'm worried about (and that's > giving me fits. Damn sliders.). Are my "processors" making sense? Note I was considering dropping these values down to integer seconds for the "real" version - so if that makes stuff easier that is fine (we *will* move the option names for this stuff, so we can change it. We may migrate values, but obviously can handle that too). I really don't think we should expose "ms" in the UI - at the worst, I think it should be fractions of a second (eg, slider has a 1, 1.5, 2) sequence. What do you think? Should we be considering using property pages? > The logo is in Jasc Paint Shop Pro 8 format. Should I check > that file in as > well as the bmp? Yeah, I guess so. Although maybe not with the psp logo in the Outlook tree. Maybe something like: spambayes/logos - new directory - pspro file here spambayes/Outlook/dialogs/somewhere - bmp + readme here. spambayes/somewhere/- jpg used by pop3proxy Not sure about "logos" - maybe a name more general purpose - but we already have the "website" directory, so I can't see, eg, documents ever living here, so I am back to "logos" :) Maybe take this one back to spambayes-dev, and just check the .bmp in Outlook whereever it makes sense, following up with the "source file" later. Remember to add this stuff with "-kb" so they are flagged as binary. Thanks, Mark. > --Adam > > [Mark Hammond] > > -----Original Message----- > > > > That is great! From my eye, I would suggest: > > * Drop some of the vspace at the top of the logo before the word. > > * Drop the "Outlook Addin" part - then we have a generic > logo every app > > can > > use. > > > > If you are comfortable with the code changes that will come > with it, just > > check it in rather than going via a patch (but obviously > mail if you need > > guidance) > > > > Cool :) > > > > Mark. > > From mhammond at skippinet.com.au Fri Aug 15 12:16:22 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Aug 14 21:16:11 2003 Subject: [spambayes-dev] Dialog Hacking In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302D921AD@its-xchg4.massey.ac.nz> Message-ID: <08c101c362ca$d7a77e60$f502a8c0@eden> > I think that the new dialogs should expose the four 'read status' > options as well. I haven't seen anything much in the way of problems > reported about them, and they are often requested. The simple way to start here is in the "Filter" dialog, and a simple checkbox in both the "Spam" and "Unsure" sections saying something like "and mark the message as read". This would handle the common cases. I'm worried about overwhelming the dialogs. > [ While you're playing with things, you should put in an easter egg of > sorts that opens up a browser window with python.org if you > click on the > Python Powered image... ;) ] yeah yeah - let's make an easter egg - i've never done that :) Sounds like more fun that sorting bug dupes :) Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 1964 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030815/5441fd35/winmail.bin From adam.walker at rbwconsulting.com Thu Aug 14 22:43:45 2003 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Thu Aug 14 21:44:19 2003 Subject: [spambayes-dev] Dialog Hacking In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302D921AD@its-xchg4.massey.ac.nz> Message-ID: <20030815014410.5CDB513E240@sack.dreamhost.com> > There's still too much (IMO) space between the logo and the rest of the > dialog (the black bit). This could just be a couple of pixels (again, > IMO). Done. > > (I would like a full-stop at the end of the explanation of training too, > but that isn't a dialog design issue). Full-stop. Period. Dot. Whatever, it's there now ;) http://meta.xenogeist.com/images/manager3.jpg > > > It's mainly the gui for the timer delays I'm worried about > > (and that's giving me fits. Damn sliders.). > > Is this what you are putting in the (returned!) advanced section? Or > are you putting it in the filter dialog somewhere? Yep. The timers and verbose logging options are going under "Advanced". > > I think that the new dialogs should expose the four 'read status' > options as well. I haven't seen anything much in the way of problems > reported about them, and they are often requested. I put two of them in the filter dialog. I missed the other two (in the general section) until you said that. > > > The logo is in Jasc Paint Shop Pro 8 format. Should I check > > that file in as well as the bmp? > > Yes. There's nothing worse than working with a bitmap image that was > originally a vector image. You could check in some other form of vector > image that isn't PSP only if you'd rather (something that the Gimp or > Photoshop could open), but it doesn't really matter. I've switched to photoshop (psd) format as all three programs can read psd files. Or at least photoshop version 3 files. BTW, I think you mean layered not vector. Last time I tried, neither photoshop nor the gimp support vector graphics, but paintshop does to some extent. --Adam From adam.walker at rbwconsulting.com Thu Aug 14 22:47:26 2003 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Thu Aug 14 21:47:52 2003 Subject: [spambayes-dev] Dialog Hacking In-Reply-To: <08ba01c362ca$0785a220$f502a8c0@eden> Message-ID: <20030815014748.50F1113E213@sack.dreamhost.com> > spambayes/logos - new directory - pspro file here > spambayes/Outlook/dialogs/somewhere - bmp + readme here. > spambayes/somewhere/- jpg used by pop3proxy > > Not sure about "logos" - maybe a name more general purpose - but we > already > have the "website" directory, so I can't see, eg, documents ever living > here, so I am back to "logos" :) Here's some options... *graphics *gfx *images *resources (we already have spambayes/spambayes/resources and spambayes/outlook2000/dialogs/resources) *use the pre-existing contrib directory From T.A.Meyer at massey.ac.nz Fri Aug 15 15:31:17 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Aug 14 22:32:05 2003 Subject: [spambayes-dev] Dialog Hacking Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D92241@its-xchg4.massey.ac.nz> > > Not sure about "logos" - maybe a name more general purpose - but we > > already have the "website" directory, so I can't see, eg, documents > > ever living here, so I am back to "logos" :) > > Here's some options... [...] > *resources (we already have spambayes/spambayes/resources and > spambayes/outlook2000/dialogs/resources) > *use the pre-existing contrib directory If this is something that the others apps (like the web UI) will use, then I think putting them in spambayes/spambayes/resources makes sense - that's where the existing (non-Outlook) ones are. Anything that's *only* for Outlook can go in spambayes/Outlook2000/dialogs/resources. Resources includes all the files that are neither Python scripts nor text files (html and images, at the moment). It has both the 'raw' format (eg jpg) and the format used by the interface (.py via resourcepackage). Since the plugin already uses modules from spambayes/spambayes, also getting stuff from sp/sp/resources doesn't seem like a big deal. The contrib directory is more for 'optional extra' type things, I think (like Skip's n-way, and the procmail recipe), rather than required resources like the logo. =Tony Meyer From T.A.Meyer at massey.ac.nz Fri Aug 15 15:36:27 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Aug 14 22:37:10 2003 Subject: [spambayes-dev] Dialog Hacking Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D92247@its-xchg4.massey.ac.nz> > The simple way to start here is in the "Filter" dialog, and a > simple checkbox in both the "Spam" and "Unsure" sections > saying something like "and mark the message as read". This > would handle the common cases. I'm worried about > overwhelming the dialogs. Fair enough. Those are the simpler options anyway, and probably the more requested ones. Given that the dialogs don't have anything at all (IIRC) about the delete/recover buttons, there isn't a logical place for them. Maybe if the advanced dialog does make a comeback (well, the button makes a comeback, and the dialog arrives), they could slot into there? There isn't much else to go there really (at the moment). > yeah yeah - let's make an easter egg - i've never done that > :) Sounds like more fun that sorting bug dupes :) Of course, discussing a hidden feature in a public forum isn't the cleverest thing ;) You'll have to come up with something else (possibly as well :) and let us go-a-lookin'. =Tony Meyer From T.A.Meyer at massey.ac.nz Fri Aug 15 15:43:45 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Aug 14 22:44:26 2003 Subject: [spambayes-dev] Dialog Hacking Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D92255@its-xchg4.massey.ac.nz> [...] > http://meta.xenogeist.com/images/manager3.jpg Yes, I like this a lot more now. > > I think that the new dialogs should expose the four 'read status' > > options as well. > I put two of them in the filter dialog. I missed the other > two (in the general section) until you said that. See also the message from/to Mark. > I've switched to photoshop (psd) format as all three programs > can read psd files. Or at least photoshop version 3 files. Sounds good. > BTW, I think you mean layered not vector. Last time I tried, > neither photoshop nor the gimp support vector graphics, but > paintshop does to some extent. Well, I don't know how you've drawn it, but I probably mean layered vector graphics. Layered as in the text and background are separately editable, which is important, but even more so that the text is stored as a vector rather than a bitmap, so that it is easily resizable (which is the edit that I find myself needing to do most on logo-type images). Both PSP & Gimp can do this - it's a long, long, time since I've used Photoshop, but it presumably can too since it's meant to be the king of such things. =Tony Meyer From mhammond at skippinet.com.au Fri Aug 15 13:46:49 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Aug 14 22:46:45 2003 Subject: [spambayes-dev] Dialog Hacking In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302D92241@its-xchg4.massey.ac.nz> Message-ID: <092701c362d7$7a4e0510$f502a8c0@eden> > If this is something that the others apps (like the web UI) will use, > then I think putting them in spambayes/spambayes/resources > makes sense - How about "ui_resources" or some such - just "resources" isn't descriptive enough, whereas "dialogs/resources" makes more sense due to the context. However, I'm still not sure what would ever go in this directory except images. "ui_resources" could still imply Python code that manages a cross-application UI, for example. So I still lean towards "images" or "logos" simply as it describes exactly the only things I see being stored there. > Since the plugin already uses modules from spambayes/spambayes, also > getting stuff from sp/sp/resources doesn't seem like a big deal. Nah. And re the easter eggs: > Of course, discussing a hidden feature in a public forum > isn't the cleverest thing ;) You'll have to come up with > something else (possibly as well :) and let us go-a-lookin'. There is no way I am going to even *try* to be clever enough to hide it from this crowd in an open source app Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2172 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030815/eadb502e/winmail.bin From skip at pobox.com Fri Aug 15 01:08:57 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Aug 15 01:09:09 2003 Subject: [spambayes-dev] Dialog Hacking In-Reply-To: <20030815003436.8A28A8627F@plunder.dreamhost.com> References: <085d01c362b7$8a85af20$f502a8c0@eden> <20030815003436.8A28A8627F@plunder.dreamhost.com> Message-ID: <16188.27369.234971.796179@montanaro.dyndns.org> Adam> Here's take two ;) Not to be a wet blanket, but I'm not really keen on the colors, the gradients or the very busy background, but maybe that's just me. Adam> The logo is in Jasc Paint Shop Pro 8 format. Should I check that Adam> file in as well as the bmp? I would only check stuff in which can be directly used (jpeg, png, bmp are okay). I have no idea what tools I might have at my disposal to manipulate Paint Shop Pro format images. Skip From tys at tvg.ca Fri Aug 15 00:04:43 2003 From: tys at tvg.ca (Tys von Gaza) Date: Fri Aug 15 01:16:18 2003 Subject: [spambayes-dev] Dialog Hacking In-Reply-To: <16188.27369.234971.796179@montanaro.dyndns.org> References: <085d01c362b7$8a85af20$f502a8c0@eden><20030815003436.8A28A8627F@plunder.dreamhost.com> <16188.27369.234971.796179@montanaro.dyndns.org> Message-ID: <3860.142.179.244.169.1060923883.squirrel@mail.tvg.ca> Skip> I would only check stuff in which can be directly used (jpeg, png, bmp are Skip> okay). I have no idea what tools I might have at my disposal to Skip> manipulate Skip> Paint Shop Pro format images. Can't do any easy editing to a jpeg, png, bmp though. It would be like checking in a binary with no source files. The PSP or PSD are like the source files where it is much easier to make graphical changes, imho it would be a good idea to include them. -- Tys von Gaza tys@tvg.ca From T.A.Meyer at massey.ac.nz Fri Aug 15 18:15:59 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Aug 15 01:16:45 2003 Subject: [spambayes-dev] Dialog Hacking Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D92311@its-xchg4.massey.ac.nz> > Not to be a wet blanket, but I'm not really keen on the > colors, the gradients or the very busy background, but maybe > that's just me. I must admit that I'm not a fan of the busy background either, but then I'm no expert on logo design... The colours could perhaps match those on the website, assuming that there's a reason that the website has those colours...I don't really care about the gradient. So this isn't just negative ;) I do like that it uses a simple font, and the (presence of the) subtle-but-there Python-powered logo. =Tony Meyer From T.A.Meyer at massey.ac.nz Fri Aug 15 18:22:12 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Aug 15 01:22:51 2003 Subject: [spambayes-dev] Dialog Hacking Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D92315@its-xchg4.massey.ac.nz> > How about "ui_resources" or some such - just "resources" > isn't descriptive enough, whereas "dialogs/resources" makes > more sense due to the context. It depends if this is for things that are just used by Outlook's dialogs, or by other things (like the web UI) as well. If it's just for Outlook, then something like sb/sb/Outlook2000/dialogs/images would be best, but if it's more general, then sb/sb/resources/images perhaps? > However, I'm still not sure what would ever go in this > directory except images. Well, sb/sb/resources has the web ui's html pages and images (plus the corresponding resourcepackage .py files). (Something else that might go in either directory are sound files. Anyone up for a cute sound to be played when you click "delete as spam"? ) =Tony Meyer From skip at pobox.com Fri Aug 15 01:23:22 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Aug 15 01:23:36 2003 Subject: [spambayes-dev] Dialog Hacking In-Reply-To: <3860.142.179.244.169.1060923883.squirrel@mail.tvg.ca> References: <085d01c362b7$8a85af20$f502a8c0@eden> <20030815003436.8A28A8627F@plunder.dreamhost.com> <16188.27369.234971.796179@montanaro.dyndns.org> <3860.142.179.244.169.1060923883.squirrel@mail.tvg.ca> Message-ID: <16188.28234.936527.569022@montanaro.dyndns.org> >>>>> "Tys" == Tys von Gaza writes: Skip> I would only check stuff in which can be directly used (jpeg, png, Skip> bmp are okay). I have no idea what tools I might have at my Skip> disposal to manipulate Paint Shop Pro format images. Tys> Can't do any easy editing to a jpeg, png, bmp though. It would be Tys> like checking in a binary with no source files. The PSP or PSD are Tys> like the source files where it is much easier to make graphical Tys> changes, imho it would be a good idea to include them. What are we checking in, images for websites or layers for people to edit? It's been awhile since I used Gimp, but it has a native format as well and has the added benefit of being open source and very widely available. (I have it on my Mac even.) Skip From mhammond at skippinet.com.au Fri Aug 15 17:14:23 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri Aug 15 02:14:10 2003 Subject: [spambayes-dev] Dialog Hacking In-Reply-To: <16188.27369.234971.796179@montanaro.dyndns.org> Message-ID: <002101c362f4$79696b40$f502a8c0@eden> > I would only check stuff in which can be directly used (jpeg, > png, bmp are > okay). I can see the guy's problem though - these really can't be used effectively as a "source" format. Unfortunately, I don't know enough about the tools or formats to have a reasonable opinion beyond that though. So personally, I am happy with whatever it was Tony and Adam agreed on. Mark. From mhammond at skippinet.com.au Fri Aug 15 17:21:18 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri Aug 15 02:21:04 2003 Subject: [spambayes-dev] Dialog Hacking In-Reply-To: <16188.28234.936527.569022@montanaro.dyndns.org> Message-ID: <002801c362f5$70afe0f0$f502a8c0@eden> > What are we checking in, images for websites or layers for > people to edit? > It's been awhile since I used Gimp, but it has a native > format as well and > has the added benefit of being open source and very widely > available. (I > have it on my Mac even.) The "source code" analogy was good. We want to check in the source to the images (the layers etc) so that future tweaks are reasonable. However, as the tools are either not freely or not commonly available to convert from the source to the binary (eg, jpeg for the website), we are also forced to check in the binaries. This situtation is not good, but the only other reasonable alternative is to check in binaries only - which seems worse to me. So the suggestion was to check the "source" into a special/reasonable directory, and check the "binary" version whereever it makes most sense for the consumer of the binary - eg, Outlook\dialogs\logo.bmp for Outlook, website/logo.jpg for the website, etc. At least, that is what I am talking about Mark. From adam.walker at rbwconsulting.com Fri Aug 15 11:56:02 2003 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Fri Aug 15 10:56:24 2003 Subject: [spambayes-dev] Dialog Hacking In-Reply-To: <16188.28234.936527.569022@montanaro.dyndns.org> Message-ID: <20030815145613.23A8013E213@sack.dreamhost.com> > What are we checking in, images for websites or layers for people to edit? > It's been awhile since I used Gimp, but it has a native format as well and > has the added benefit of being open source and very widely available. (I > have it on my Mac even.) The problem with using the Gimp's format is the Gimp is the only program that reads it. The last time I used the Gimp on windows (~3 months ago) it didn't seem ready for day-in-day-out use as a graphic artist. Photoshop, The Gimp, and Paint Shop Pro can all read and write psd files -- so it's a happy medium. --Adam From skip at pobox.com Fri Aug 15 11:38:26 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Aug 15 11:38:43 2003 Subject: [spambayes-dev] Dialog Hacking In-Reply-To: <20030815145613.23A8013E213@sack.dreamhost.com> References: <16188.28234.936527.569022@montanaro.dyndns.org> <20030815145613.23A8013E213@sack.dreamhost.com> Message-ID: <16188.65138.238776.506389@montanaro.dyndns.org> Adam> Photoshop, The Gimp, and Paint Shop Pro can all read and write psd Adam> files -- so it's a happy medium. That's fine. I was unaware of the problem. Skip From adam.walker at rbwconsulting.com Fri Aug 15 12:52:03 2003 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Fri Aug 15 11:52:13 2003 Subject: [spambayes-dev] Dialog Hacking In-Reply-To: <16188.65138.238776.506389@montanaro.dyndns.org> Message-ID: <20030815155210.71D2613E23D@sack.dreamhost.com> Take four http://meta.xenogeist.com/images/manager4.jpg I tried it with a flat blue background and opted for the gradient from the website. --Adam From skip at pobox.com Fri Aug 15 11:54:27 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Aug 15 11:54:43 2003 Subject: [spambayes-dev] Re: [Spambayes] mboxtrain ham headers overwritten In-Reply-To: <4793.142.179.244.169.1060929746.squirrel@mail.tvg.ca> References: <2598.129.128.138.135.1060907963.squirrel@mail.tvg.ca> <4793.142.179.244.169.1060929746.squirrel@mail.tvg.ca> Message-ID: <16189.563.364259.921642@montanaro.dyndns.org> Tys> Ok, found my error, and of course it was stupid but I didn't see it Tys> documented anywhere, so here it is. Tys> I had the following set in my ~/.spambayesrc Tys> [globals] Tys> verbose=True Tys> This caused the following lines to be added to the start of each Tys> e-mail message that got filtered through procmail: Tys> """ Tys> Loading state from /home/gaza/maildata/bayes.db database Tys> /home/gaza/maildata/bayes.db is an existing database, with 69 spam and 21 ham Tys> """ I think those messages should go to stderr. storage.py is littered with prints to stdout when verbose is set. Skip From neale at woozle.org Fri Aug 15 12:52:52 2003 From: neale at woozle.org (Neale Pickett) Date: Fri Aug 15 14:52:58 2003 Subject: [spambayes-dev] ["WatchGuard LiveSecurity"] Keep Spam at Bay with SpamBayes Message-ID: I thought you folks would like to see the message that went out last week to WatchGuard's LiveSecurity subscription service. It goes out to an audience of 40,000, I'm told. Maybe something in here for quotes.html? (Sorry it's all HTML--I'm passing the body along exactly as I recieved it.) -------------- next part -------------- An embedded message was scrubbed... From: "WatchGuard LiveSecurity" Subject: LiveSecurity | Keep Spam at Bay with SpamBayes Date: 8 Aug 2003 11:41:34 -0700 Size: 24202 Url: http://mail.python.org/pipermail/spambayes-dev/attachments/20030815/77a03eaf/attachment.eml From skip at pobox.com Fri Aug 15 15:14:57 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Aug 15 15:15:11 2003 Subject: [spambayes-dev] Regarding the WatchGuard article about SpamBayes Message-ID: <16189.12593.462122.385869@montanaro.dyndns.org> Neale Pickett sent a copy of your article about SpamBayes to the SpamBayes developers mailing list. I skimmed it and thought I would send you a little feedback: 1. You mention that it might not scale well for large organizations. I assume you stated this because the Outlook plugin must be installed on the users' computers. I agree installation might be troublesome, however on the plus side, since the plugin runs on the client machines, it's not going to burn up your mail servers' cpus. From that perspective it probably scales better than a server-based solution. It also has the added functional benefit that unlike most centralized solutions, SpamBayes allows each user to define what is and is not spam to them. 2. You linked directly to the 006 version of the installer. Note that the 007 version has already been released. You'd be much better off linking to the Windows page on the SpamBayes website: http://spambayes.sourceforge.net/windows.html 3. I'm sure most of your readers use Windows & Outlook, but it might be worth noting that there are a number of other SpamBayes applications which allow you to integrate the technology into other platforms (Unix, Mac, etc) or with other mail readers. In particular, there are POP3 and IMAP proxies (both of which are controlled via a web interface) and a simple filter which takes a message on stdin, scores it, and writes it to stdout with the score carried in a new header. -- Skip Montanaro Got gigs? http://www.musi-cal.com/ Got spam? http://spambayes.sf.net/ From anthony at interlink.com.au Sat Aug 16 18:04:41 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Sat Aug 16 03:04:52 2003 Subject: [spambayes-dev] SF rankings... Message-ID: <200308160704.h7G74g4f016971@localhost.localdomain> Well, in the week since the outlook addin was moved to SF, the spambayes project's gone from 97th to 14th in the rankings. (The 11th of August saw 2,068 downloads!) (Is it worth changing the project title from "Bayesian anti-spam classifier" to something more descriptive?) 2068-times-$0-is-still-$0-sorry-mark, Anthony From T.A.Meyer at massey.ac.nz Sat Aug 16 20:12:48 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sat Aug 16 03:13:27 2003 Subject: [spambayes-dev] SF rankings... Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D92365@its-xchg4.massey.ac.nz> > Well, in the week since the outlook addin was moved to SF, > the spambayes project's gone from 97th to 14th in the > rankings. (The 11th of August saw 2,068 downloads!) That damn 'Python' project is still at #4 for the 'all time' rankings, though ;) We have a way to go yet... BTW I noticed that it's only 19 days until SpamBayes' first birthday (taking birth as the project opening on sf - any prior life within Python is really just time in the womb ;) Maybe we could 'celebrate' by putting out 1.0b1 then? =Tony Meyer From tim.one at comcast.net Sat Aug 16 13:25:41 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Aug 16 12:26:16 2003 Subject: [spambayes-dev] World domination Message-ID: Following up on my Machiavellian plan to release the spambayes Outlook addin from SourceForge, the spambayes project ranked 99.9551% at SF last week, and is now on the front page as the 7th "most active" project (of 67,000) at SF last week. There have been at least 4,300 downloads of the OL addin from SF. Congratulations! I wish I could claim more credit for myself, but I've had little to do with it since last year. The credit belongs to the currently active developers, who've wrestled tirelessly with never-ending nightmares from Outlook to IMAP. Thanks to all -- great work! Neverthless, beatings will continue until it's #1 . To help brand recognition, I've changed the SF "public name" of the project from "Bayesian anti-spam classifier" to "SpamBayes anti-spam". taking-thrills-where-i-can-find-'em-ly y'rs - tim From tim.one at comcast.net Sat Aug 16 13:44:52 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Aug 16 12:45:34 2003 Subject: [spambayes-dev] SF rankings... In-Reply-To: <200308160704.h7G74g4f016971@localhost.localdomain> Message-ID: [Anthony Baxter] > Well, in the week since the outlook addin was moved to SF, > the spambayes project's gone from 97th to 14th in the rankings. > (The 11th of August saw 2,068 downloads!) It's at #7 now. > (Is it worth changing the project title from > "Bayesian anti-spam classifier" to something more > descriptive?) Harmonic convergence! I did that before seeing your email -- see my later email. > 2068-times-$0-is-still-$0-sorry-mark, Don't feel too badly for Mark -- I have it on good authority that his is now the most widely recognized face in Australia. It's not from the photograph that ran in the Aussie press, it's actually from the "Delete As Spam" frowny face icon in the OL addin . From richie at entrian.com Sun Aug 17 22:29:47 2003 From: richie at entrian.com (Richie Hindle) Date: Sun Aug 17 16:29:58 2003 Subject: [spambayes-dev] Calling Outlook Express users... Message-ID: ...on this list? I must be joking. Romain Guy has posted a patch and a new module that lets the web interface train on Outlook Express mailboxes, by uploading a .dbx file in the same was as you can upload an mbox file: http://sourceforge.net/tracker/?func=detail&atid=498105&aid=789916&group_id=61702 I'm happy to code-review and commit the patch, but not being an Outlook Express user I can't test it. Are there any Spambayes developers who are also Outlook Express users? Feel free to email me privately if you don't want to admit it here. 8-) -- Richie Hindle richie@entrian.com From adam.walker at rbwconsulting.com Sun Aug 17 18:31:48 2003 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Sun Aug 17 17:31:55 2003 Subject: [spambayes-dev] Outlook Manager dialog Message-ID: <20030817213151.B296413E26B@sack.dreamhost.com> Ok, The new dialog stuff is checked to cvs. This includes the logo (bmp only currently), new options, and switched most of the dialogs to property pages on the manager dialog. Thanks to the rc file and Mark's control processor framework moving controls to another property page is mostly trivial. --Adam From mhammond at skippinet.com.au Mon Aug 18 09:44:39 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun Aug 17 18:44:19 2003 Subject: [spambayes-dev] Outlook Manager dialog In-Reply-To: <20030817213151.B296413E26B@sack.dreamhost.com> Message-ID: <0c5b01c36511$24cfe680$f502a8c0@eden> > Ok, The new dialog stuff is checked to cvs. This includes the > logo (bmp only > currently), new options, and switched most of the dialogs to > property pages > on the manager dialog. Thanks to the rc file and Mark's > control processor > framework moving controls to another property page is mostly trivial. It looks pretty good. I've a few comments though. I think the property pages may have gone too far, and should reflect the workflow a little closer. Off the top of my head: * I think "training" should be the second property page - training is the first thing the user will need to do. * "Spam" and "Possible Spam" should maybe back back on a single "Filter" page This leaves us with: General, Training, Filter, Filter Now, Advanced There are a few other tweaks, such as the combos should be list-boxes etc, but it is looking good. Thanks! I'd really like to hear other people's opinions on this too. Mark. From adam.walker at rbwconsulting.com Sun Aug 17 20:06:06 2003 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Sun Aug 17 19:06:17 2003 Subject: [spambayes-dev] Outlook Manager dialog In-Reply-To: <0c5b01c36511$24cfe680$f502a8c0@eden> Message-ID: <20030817230614.B20B213E235@sack.dreamhost.com> > * I think "training" should be the second property page - training is the > first thing the user will need to do. Sure. > * "Spam" and "Possible Spam" should maybe back back on a single "Filter" > page This is really a question of how big each page should be. I have no qualms with recombining the two ... it's just a lot longer than the other pages. From anthony at interlink.com.au Mon Aug 18 11:07:01 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Sun Aug 17 20:07:57 2003 Subject: [spambayes-dev] SF rankings... In-Reply-To: Message-ID: <200308180007.h7I0712U024089@localhost.localdomain> >>> "Tim Peters" wrote > Don't feel too badly for Mark -- I have it on good authority that his is now > the most widely recognized face in Australia. It's not from the photograph > that ran in the Aussie press, it's actually from the "Delete As Spam" frowny > face icon in the OL addin . I have to say, the resemblance _is_ uncanny. On a more topical note, SB developers might note that Microsoft last week announced that they were ending Outlook Express development. It will still be available, but will gradually rot away in favour of the full Outlook. Amusingly, one of the reasons I saw reported was that Outlook's rather large footprint is now less of a concern because computers are faster and have more memory. Well-if-MS-won't-support-it-why-should-we, Anthony -- Anthony Baxter It's never too late to have a happy childhood. From T.A.Meyer at massey.ac.nz Mon Aug 18 13:26:56 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Aug 17 20:27:43 2003 Subject: [spambayes-dev] Outlook Manager dialog Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D9258A@its-xchg4.massey.ac.nz> > * "Spam" and "Possible Spam" should maybe back on a > single "Filter" page +1 > This leaves us with: > General, Training, Filter, Filter Now, Advanced To me it seems odd that the general page has a 'training' section, and then there is a separate 'training' tab. There seem to be three sections - training, filtering and advanced. Squishing everything into three pages probably means pages that are too big, but maybe not. The layout I think makes most sense: Page 1: Training. Includes the current training box from 'general', and all of the content of the training tab. Results in a larger page, but only slightly (also see page 3). Page 2: Filtering. Includes the current filtering box from 'general', and all of the content of the spam and unsure tabs. Results in a larger page, but not too much, and makes more sense (also see page 3). The "mark spam as read" option seems to take up too much room, but I don't have any idea how to improve it... :) Page 3: Status (this could maybe be page 1). The version string could be moved here, plus the database ham/spam count, plus the watch folder information. This is the page that I would want to see most often once everything is running. This would save space on other pages, and also allow the watch folder information space to be a bit bigger (it runs out of room very quickly). This is a big shuffle, though. Page 4: Advanced. As it is, although I wonder if "save spam score" belongs in advanced. The wording of the delete-as-spam-marks-as-read option isn't clear either. If you select "None", it says "When a message is deleted as spam, change its read state to None", which isn't what happens. Where is "Filter now", you ask? In a separate dialog, accessed via either a button on the filtering tab, or as a separate toolbar menuitem. The rest is status/settings, this is an action [1]; it makes sense to differentiate it. > I'd really like to hear other people's opinions on this too. Personally I don't really like tabs, and thought the old one was better (it made more logical sense). I realise that this is probably a minority opinion, and that the users are familiar with a tabbed interface, though. =Tony Meyer [1] Ok, training is an action, but it's a 'settings' kind of action ;) From T.A.Meyer at massey.ac.nz Mon Aug 18 13:28:38 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Aug 17 20:29:26 2003 Subject: [spambayes-dev] Calling Outlook Express users... Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D9258D@its-xchg4.massey.ac.nz> > Subject: [spambayes-dev] Calling Outlook Express users... > ...on this list? I must be joking. :) > Romain Guy has posted a patch and a new module that lets the > web interface train on Outlook Express mailboxes, by > uploading a .dbx file in the same was as you can upload an mbox file: > I'm happy to code-review and commit the patch, but not being an Outlook > Express user I can't test it. I'll do some testing on this as soon as I get a chance, although if you want to go over the code first, that would be great :) (Note that I'm not an OE user for my mail, but I use it to test all sorts of things, plus my fiance uses it at home - with pop3proxy at the moment). =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Aug 18 13:38:08 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Aug 17 20:40:59 2003 Subject: [spambayes-dev] SF rankings... Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D9259F@its-xchg4.massey.ac.nz> > On a more topical note, SB developers might note that > Microsoft last week announced that they were ending Outlook > Express development. It will still be available, but will > gradually rot away in favour of the full Outlook. [...] > Well-if-MS-won't-support-it-why-should-we, This is, of course, a backwards step. We have some ability to work with OE - we have no ability at all to work with whatever version of hotmail gets built in to replace OE. I doubt MS is planning on putting in convenient plug-in hooks, either. Still, I can't see people abandoning OE (and their pop3/imap addresses) in droves any time soon, so who knows what will happen... =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Aug 18 13:35:27 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Aug 17 20:41:08 2003 Subject: [spambayes-dev] World domination Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D92599@its-xchg4.massey.ac.nz> > front page as the 7th "most active" project (of 67,000) at SF > last week. There have been at least 4,300 downloads of the > OL addin from SF. #6 now...maybe one fewer beating today? ;) Interestingly, the number of downloads fell quite a lot over the last couple of days, but the ranking went up. Maybe it was a slow day for everyone... > Congratulations! I wish I could claim more credit for > myself, but I've had little to do with it since last year. Well, to be fair, as nice as the Outlook plug-in is, I doubt many people would be recommending it if it didn't actually do the business of correctly classifying mail. You can have as much of that credit as you can get before Gary et. al. realise that there is credit to be taken and come for their share... :) > Nevertheless, beatings will continue until it's #1 . I don't know if I've ever seen the sf page without "Compiere ERP + CRM Business Solution", "Gaim", and "phpMyAdmin" in the top five. Pushing them out will be quite a feat, but then there's bound to be a surge with the next release of the plug-in, so maybe that would do it. =Tony Meyer From anthony at interlink.com.au Mon Aug 18 11:51:01 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Sun Aug 17 20:51:17 2003 Subject: [spambayes-dev] SF rankings... In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302D9259F@its-xchg4.massey.ac.nz> Message-ID: <200308180051.h7I0p1XN025105@localhost.localdomain> >>> "Meyer, Tony" wrote > This is, of course, a backwards step. We have some ability to work with > OE - we have no ability at all to work with whatever version of hotmail > gets built in to replace OE. I doubt MS is planning on putting in > convenient plug-in hooks, either. > > Still, I can't see people abandoning OE (and their pop3/imap addresses) > in droves any time soon, so who knows what will happen... One of the quotes I saw was from an MS product manager complaining that IMAP wasn't a rich enough protocol for an email client. This suggests that they're planning to do more proprietary crap between Outlook and Exchange. This is not good :-( -- Anthony Baxter It's never too late to have a happy childhood. From anthony at interlink.com.au Mon Aug 18 11:55:01 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Sun Aug 17 20:55:14 2003 Subject: [spambayes-dev] SF rankings... In-Reply-To: <200308180051.h7I0p1XN025105@localhost.localdomain> Message-ID: <200308180055.h7I0t1Q2025192@localhost.localdomain> >>> Anthony Baxter wrote > One of the quotes I saw was from an MS product manager complaining that > IMAP wasn't a rich enough protocol for an email client. This suggests that > they're planning to do more proprietary crap between Outlook and Exchange. > This is not good :-( Hm. It seems like they might have backed down. http://news.zdnet.co.uk/software/applications/0,39020384,39115720,00.htm (original piece: http://new.zdnet.co.uk/zdnetuk/news/software/applications/0,39020384,39115680,00.htm ) -- Anthony Baxter It's never too late to have a happy childhood. From adam.walker at rbwconsulting.com Sun Aug 17 22:18:09 2003 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Sun Aug 17 21:18:24 2003 Subject: [spambayes-dev] Outlook Manager dialog In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302D9258A@its-xchg4.massey.ac.nz> Message-ID: <20030818011819.0A14D862F7@plunder.dreamhost.com> > To me it seems odd that the general page has a 'training' section, and > then there is a separate 'training' tab. I agree it's odd. Of course it was odd before when we had a training section and a training dialog. I'm of the opinion that we may need two GUIs. A simplified and a power user (or maybe a wizard-for-first-setup and standard?) interface. I think most users are confused by the multiple places to select folders. * spam for training. * ham for training. * folders to filter. * spam for the filter. * unsure for the filter. * folders under filter now. Having used it for a few months, I understand why all those options are there. But to the start out, it's a bit overwhelming -- the new user simply wants to point the plug-in at a pile ham, a pile of spam, and folder for unsures and click a "finish" button. At which point the plug-in would train, set the other folder options from the choices made before, set defaults for the read state options, and enable itself. >Where is "Filter now", you ask? In a separate dialog, accessed via either >a button on the filtering tab, or as a separate toolbar menuitem. As long as the button/menuitem say "Filter Now..." (or something with "..." and the end) and not "Filter Now" if it will bring up a dialog. >Personally I don't really like tabs, and thought the old one was better (it >made more logical sense). I realise that this is probably a minority >opinion, and that the users are familiar with a tabbed interface, though. The old layout suffered many of the same problems the current one does (I didn't change the layout much other breaking up some pages) and violated some GUI design conventions. It may have made sense code-wise but not usage wise. At least that my $.02. Exchange rates may vary. --Adam From mhammond at skippinet.com.au Mon Aug 18 12:23:37 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun Aug 17 21:23:28 2003 Subject: [spambayes-dev] World domination In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302D92599@its-xchg4.massey.ac.nz> Message-ID: <0d0401c36527$5a4447a0$f502a8c0@eden> > surge with > the next release of the plug-in, so maybe that would do it. Yes - I reckon that if we release a new version once per week, with each one containing a "critical bug that you can't see, but we promise is there" (just like the last release ) we could get real mileage. Add a background thread so that the plugin checks for a new version each time it starts and *insists* you download it, and I believe we could get there! hehe - this is almost sounding serious. We could take the approach of "windows update", and download in the background, saying we are ready to upgrade once we get it. *sigh* - ok, back to the real world. I'm avoiding looking at the spambayes list for a few days, but am about to fiddle with the dialogs :) Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 1956 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030818/69406e4f/winmail.bin From T.A.Meyer at massey.ac.nz Mon Aug 18 14:37:46 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Aug 17 21:38:33 2003 Subject: [spambayes-dev] Outlook Manager dialog Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D92600@its-xchg4.massey.ac.nz> > I agree it's odd. Of course it was odd before when we had a > training section and a training dialog. True, although not as odd, because you moved from a general training section into a whole new dialog. > I'm of the opinion that we may need two GUIs. A simplified > and a power user (or maybe a wizard-for-first-setup and > standard?) interface. I hate having advanced and simple GUIs, but realise that I may again be in the minority. IMO, a good interface is easy for beginners to use, and easy for 'experts' to use faster. One of the problems with beginner/advanced options is that most people start out as beginner, progress out of it, but never reach expert, ending up somewhere in the middle. That said, a 'wizard' type thing just for an initial setup would probably be a good thing. InBoxer has one of these. > The old layout suffered many of the same problems the current > one does True, but while we're changing it, we should fix them all! (When I say "we", I mean you and Mark, of course ). > (I didn't change the layout much other breaking up > some pages) The particular change I was referring to was the change from multiple dialogs to a single, tabbed, dialog. I know this is how Microsoft (and others) believe that this is how a dialog should be, but I don't agree (sometimes they are right, sometimes they aren't). =Tony Meyer From ta-meyer at ihug.co.nz Mon Aug 18 14:40:48 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Sun Aug 17 21:41:26 2003 Subject: [spambayes-dev] RE: [Spambayes-checkins] website related.ht, 1.10, 1.11 In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302CE977C@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212ACB9@its-xchg4.massey.ac.nz> [From the check-ins list] > Bring the 'related projects' up to date wrt inboxer/spambayes. Does this mean that SpamAtBay no longer exists? (I was always a bit confused about how InBoxer and SpamAtBay related, anyway). It was the better name, IMO, although perhaps not as marketable. > !

Some developers like the SpamBayes project enough to > invest in building other projects on top of it. Please > contact us if you would like to be listed here. A listing > here does not mean that the SpamBayes team endorses the > project. Commercial projects offer the same success in > filtering mail, but in exchange for your money, strive to be > more user-friendly, offer more in the way of support, or > additional features that enhance the functions of the core > SpamBayes code base.

This sounds much better than what was previously there, BTW. =Tony Meyer From mhammond at skippinet.com.au Mon Aug 18 12:46:00 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun Aug 17 21:45:46 2003 Subject: [spambayes-dev] Outlook Manager dialog In-Reply-To: <20030818011819.0A14D862F7@plunder.dreamhost.com> Message-ID: <0d2201c3652a$7b9f86f0$f502a8c0@eden> > I'm of the opinion that we may need two GUIs. A simplified > and a power user > (or maybe a wizard-for-first-setup and standard?) interface. Agreed. I admit to liking the wizard idea. Maybe borrowing ideas from MS again: * SpamBayes manager dialog is tabbed, with first tab being, pretty much, "About SpamBayes". It has the version info etc, but a nice large button "Configuration Wizard" or some such. We still try and keep the tabs down though. * When SpamBayes starts, if it is not correctly configured, it displays the wizard. The idea is that the user will be up-and-running before they ever see the main SpamBayes dialog. When they *do* bring up the dialog, they will be able to see everything reflected correctly, and/or will have a clear way of "re-configuring" using the same process they did at the start. I like tabs, and I hate them. The simple reality is that we are ending up with too many options to have a single dialog, with modal dialogs popping up all over the place. Modal dialogs off modal dialogs are considered a hanging offence by people who take this stuff seriously . However tabs also quickly get overwhelming, and at some point we have to ask ourselves if the option is better left out of the GUI completely - especially ones that are "working around" bugs we don't know how to fix A good example of this is the "Save Spam Score" option (which I mentioned to Adam in private mail) - I see no legitimate reason a user would ever want to disable this, unless it had some negative impact on the system - such as modifying the message in some unexpected way, or just caused the whole thing to fail. In this case we have a bug. As we don't know how to fix all such obscure bugs yet, the option is a perfectly thing to have - but probably not a reasonable thing to try and cram in the UI. Once we get 10,000 users, if only a fraction of a percent start mailing the list with "why would I want to turn this option off", we are in trouble . (Note that I am talking more in general than this specific option - if this fits OK and makes sense then great, but not all options will) I think I am simply saying I believe some options are best left to the people capable of finding them > I think most > users are confused by the multiple places to select folders. ... > Having used it for a few months, I understand why all those > options are I agree 100%, especially for initial configuration. I was having this discussion with a real-world mate the other day who just started using it. I told him that unfortunately, the UI still reflects the underlying code structure rather than the best user experience. You are saying the exact same thing :) > The old layout suffered many of the same problems the current > one does (I Actually, the biggest problem by far with the old one was the inflexibility of the dialogs. This meant that once I considered them "good enough for now", that is how they stayed. The Outlook GUI has not changed in any significant way since my first checkin of them ages ago (until now of course :) Thankfully this has been fixed, and we are now able to not only have this discussion, but do something about it :) Mark. From seant at iname.com Sun Aug 17 23:07:00 2003 From: seant at iname.com (Sean True) Date: Sun Aug 17 22:09:20 2003 Subject: [spambayes-dev] RE: [Spambayes-checkins] website related.ht, 1.10, 1.11 In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130212ACB9@its-xchg4.massey.ac.nz> Message-ID: <005c01c3652d$69f42ed0$0201a8c0@swapwizard.com> > > [From the check-ins list] > > Bring the 'related projects' up to date wrt inboxer/spambayes. > > Does this mean that SpamAtBay no longer exists? (I was always a bit > confused about how InBoxer and SpamAtBay related, anyway). It was the > better name, IMO, although perhaps not as marketable. > > > !

Some developers like the SpamBayes project enough to > > invest in building other projects on top of it. Please > > contact us if you would like to be listed here. A listing > > here does not mean that the SpamBayes team endorses the > > project. Commercial projects offer the same success in > > filtering mail, but in exchange for your money, strive to be > > more user-friendly, offer more in the way of support, or > > additional features that enhance the functions of the core > > SpamBayes code base.

> > This sounds much better than what was previously there, BTW. > > =Tony Meyer It's complicated. Basically, it's a triumph of marketers over developers (should sound familiar), but since I'm squarely on both sides of the fence, I don't mind a bit. I'll be announcing something definitive shortly, but in the mean time I am trying to reduce confusion on public venues. InBoxer has a review coming up in PCMag in a couple of weeks, and we expect all _you know what_ to break loose: no matter what label is on it, I'm firmly on the hook for support. If you want to get a preview of the marketing slant for InBoxer, take a look at http://www.inboxer.com/0sb.html Compared to the nice spare h2tohtml site we all know (and love), it's pretty cluttered. -- Sean From skip at pobox.com Sun Aug 17 22:09:44 2003 From: skip at pobox.com (Skip Montanaro) Date: Sun Aug 17 22:09:50 2003 Subject: [Python-Dev] RE: [spambayes-dev] World domination In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302D92599@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1302D92599@its-xchg4.massey.ac.nz> Message-ID: <16192.13672.160054.733779@montanaro.dyndns.org> Tony> I don't know if I've ever seen the sf page without "Compiere ERP + Tony> CRM Business Solution", "Gaim", and "phpMyAdmin" in the top five. I can understand Gaim and - to a certain degree - phpMyAdmin, but where's the geek appeal in an ERP/CRM tool? Skip From romain.guy at jext.org Mon Aug 18 05:18:04 2003 From: romain.guy at jext.org (Romain GUY) Date: Sun Aug 17 22:22:32 2003 Subject: [spambayes-dev] Windows installer for non Outlook users Message-ID: <20038184184.810601@Thinthalion> Hello everyone, I've just taken a few minutes to explain a friend of mine how to make spambayes run on his Windows XP/Outlook Express platform. And one thing is certain : if installing the spambayes Windows service is not hard at all when you have clear, precise and convenient instructions, it becomes quite difficult when it comes to "normal users" (that is to say Tim's sister ;-) with no programmer friend around. As I am a (almost) full time Windows users and as I'm used to use Inno Setup I am willing to set up a very simple Windows installer. This installer would include a Python interpreter (with the only required libraries to make spambayes run), the win32all extension and some picture based documentation to teach users how to set up their mail client. This installer would also install the spambayes service so that they won't bother running it manually at every startup. Maybe we could even add start menu/desktop icons which would launch the web interface (we can also consider adding a bookmark in IE and/or Netscape). Maybe the installer could also try to set up Outlook Express directly (finding user's accounts in registry, setting them in spambayes). The Notate To: option could be also activated by default. So, if you do agree, I'm ready to take care of this. -- Romain GUY romain.guy@jext.org http://www.jext.org http://progx.jext.org From mhammond at skippinet.com.au Mon Aug 18 13:33:19 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun Aug 17 22:33:00 2003 Subject: [spambayes-dev] Windows installer for non Outlook users In-Reply-To: <20038184184.810601@Thinthalion> Message-ID: <0d7301c36531$16a74650$f502a8c0@eden> > So, if you do agree, I'm ready to take care of this. I agree, but see no reason we can't use py2exe etc for this. It may be hard to do optimally, but even a "simple" py2exe distribution may be better than this, and would have alot less change of screwing up an existing Python install. Thomas Heller (py2exe guy) is currently working on some nice features that I proposed simply so I could make my installer even better. My intention, and we are not that far off, is to have a single installer that comes with pop3proxy as a service, as a "standard" exe for win9x, and as the outlook plugin. Inno would then detect which is most appropriate. The size of the distribution would not grow much at all, as all common code would live in a .zip file, and shared between the various small, "stub" exes and dlls. but yeah, if the above makes you balk, I have no objection. Mark. From T.A.Meyer at massey.ac.nz Mon Aug 18 16:19:39 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Aug 17 23:20:27 2003 Subject: [Python-Dev] RE: [spambayes-dev] World domination Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D926B1@its-xchg4.massey.ac.nz> Tony> I don't know if I've ever seen the sf page without "Compiere ERP + Tony> CRM Business Solution", "Gaim", and "phpMyAdmin" in the top five. Skip> I can understand Gaim and - to a certain degree - phpMyAdmin, Skip> but where's the geek appeal in an ERP/CRM tool? Maybe lots of geeks are running small to medium sized enterprises? Or think they will be in the future, and are planning for that day? I agree; it is out of place. =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Aug 18 16:26:38 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Aug 17 23:27:22 2003 Subject: [spambayes-dev] RE: [Spambayes] pop3proxy_service and smtpproxy Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D926B7@its-xchg4.massey.ac.nz> > When I explicitly start pop3proxy.py, it starts smtpproxy as > expected. If I use pop3proxy_service.py, the smtpproxy is > not started. Any ideas why this is happening? The service starts pop3proxy's main function, but smtpproxy is started in pop3proxy's run function (which calls main). What's the best way to fix this? o Move the smtpproxy starting code into main(). o Add smtpproxy starting code into pop3proxy_service.py. o Have a separate service for smtpproxy (this is problematic because they have to share the database). o Something else. I don't really know much about Windows services, so I'm throwing this to the -dev list. =Tony Meyer From adam.walker at rbwconsulting.com Mon Aug 18 00:31:22 2003 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Sun Aug 17 23:31:47 2003 Subject: [spambayes-dev] Outlook Manager dialog In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302D9262F@its-xchg4.massey.ac.nz> Message-ID: <20030818033139.EA64A13E21F@sack.dreamhost.com> For those kinds of things, I'd made it a right in a blank area of the advanced tab. The log level numbers should be replaced with words ("minimal", "debug", "verbose") if they are not hidden. > -----Original Message----- > From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] > Sent: Sunday, August 17, 2003 10:06 PM > To: Mark Hammond; Adam Walker > Subject: RE: [spambayes-dev] Outlook Manager dialog > > > Maybe a better option would be some cleverly worded, > > semi-hidden option for "diagnostics". It could also help > > locate the log file - even creating a mail with the log > > attached. The "clever wording" could reflect that this > > should only be touched when specifically asked by a developer > > in the process of tracking down a problem. > > I like this idea a lot. I don't know how or where you'd hide or word it, > but it's a good idea. > > Cheers, > Tony From adam.walker at rbwconsulting.com Mon Aug 18 00:35:14 2003 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Sun Aug 17 23:35:26 2003 Subject: [spambayes-dev] Outlook Manager dialog In-Reply-To: <0d2701c3652b$3cf27e70$f502a8c0@eden> Message-ID: <20030818033523.2391113E261@sack.dreamhost.com> > > I think I better hack a wizard framework together eh? Shouldn't be too hard. The wizards tend to operate like tabs. --Adam From adam.walker at rbwconsulting.com Mon Aug 18 00:43:19 2003 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Sun Aug 17 23:43:32 2003 Subject: [spambayes-dev] Outlook Manager dialog In-Reply-To: <20030818033139.EA64A13E21F@sack.dreamhost.com> Message-ID: <20030818034327.F0FC413E230@sack.dreamhost.com> D'oh! That should say right-click. I'm going to sleep now ;) > -----Original Message----- > From: spambayes-dev-bounces@python.org [mailto:spambayes-dev- > bounces@python.org] On Behalf Of Adam Walker > Sent: Sunday, August 17, 2003 11:31 PM > To: 'Meyer, Tony'; 'Mark Hammond' > Cc: spambayes-dev@python.org > Subject: RE: [spambayes-dev] Outlook Manager dialog > > For those kinds of things, I'd made it a right in a blank area of the > advanced tab. The log level numbers should be replaced with words > ("minimal", "debug", "verbose") if they are not hidden. > > > -----Original Message----- > > From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] > > Sent: Sunday, August 17, 2003 10:06 PM > > To: Mark Hammond; Adam Walker > > Subject: RE: [spambayes-dev] Outlook Manager dialog > > > > > Maybe a better option would be some cleverly worded, > > > semi-hidden option for "diagnostics". It could also help > > > locate the log file - even creating a mail with the log > > > attached. The "clever wording" could reflect that this > > > should only be touched when specifically asked by a developer > > > in the process of tracking down a problem. > > > > I like this idea a lot. I don't know how or where you'd hide or word it, > > but it's a good idea. > > > > Cheers, > > Tony > > > _______________________________________________ > spambayes-dev mailing list > spambayes-dev@python.org > http://mail.python.org/mailman/listinfo/spambayes-dev From mhammond at skippinet.com.au Mon Aug 18 14:56:43 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun Aug 17 23:56:58 2003 Subject: [spambayes-dev] RE: [Spambayes] pop3proxy_service and smtpproxy In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302D926B7@its-xchg4.massey.ac.nz> Message-ID: <0df601c3653c$be822830$f502a8c0@eden> > The service starts pop3proxy's main function, but smtpproxy is started > in pop3proxy's run function (which calls main). > > What's the best way to fix this? > > o Move the smtpproxy starting code into main(). > o Add smtpproxy starting code into pop3proxy_service.py. > o Have a separate service for smtpproxy (this is problematic because > they have to share the database). > o Something else. The best way is to have a single entry-point that pop3proxy_service.py can call to start everything it needs. I don't care how it is spelt, or what it looks or smells like :) > I don't really know much about Windows services, so I'm > throwing this to > the -dev list. Note pop3proxy_service.py has about 100 lines of code, so should be easy to get your head around. About the only real issue is the need for an asynchronous "stop" command that can be issued (which may need to be implemented via a local socket - whatever). This is missing now, and is slightly dangerous as currently pop3proxy is simply aborted at service shutdown time. I don't really know much about pop3proxy Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2148 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030818/9a152589/winmail.bin From vanhorn at whidbey.com Sun Aug 17 22:07:17 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Mon Aug 18 00:07:26 2003 Subject: [spambayes-dev] SF rankings... References: <200308180051.h7I0p1XN025105@localhost.localdomain> Message-ID: <3F4050F5.5495E1D@whidbey.com> Anthony Baxter wrote: > >>> "Meyer, Tony" wrote > > This is, of course, a backwards step. We have some ability to work with > > OE - we have no ability at all to work with whatever version of hotmail > > gets built in to replace OE. I doubt MS is planning on putting in > > convenient plug-in hooks, either. > > > > Still, I can't see people abandoning OE (and their pop3/imap addresses) > > in droves any time soon, so who knows what will happen... > > One of the quotes I saw was from an MS product manager complaining that > IMAP wasn't a rich enough protocol for an email client. This suggests that > they're planning to do more proprietary crap between Outlook and Exchange. > This is not good :-( The more things change, the more they stay the same: Friends don't let friends use Microsoft mail products. Van -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From kennypitt at hotmail.com Mon Aug 18 10:52:47 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Mon Aug 18 09:53:32 2003 Subject: [spambayes-dev] New Outlook Dialogs Problem In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D8A@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D8A@its-xchg4.massey.ac.nz> Message-ID: <3F40DA2F.1080104@hotmail.com> Meyer, Tony wrote: > If I haven't got enough training information to enable filtering, the > "enable filtering" box isn't greyed out anymore. If I try to check it, > it doesn't check, and I get this traceback: > > Traceback (most recent call last): > File "D:\cvs\spambayes\spambayes\Outlook2000\dialogs\dlgcore.py", line > 286, in OnCommand > self.ApplyHandlingOptionValueError(handler.OnCommand, wparam, > lparam) > File "D:\cvs\spambayes\spambayes\Outlook2000\dialogs\dlgcore.py", line > 245, in ApplyHandlingOptionValueError > self.dialog_def.caption, mb_flags) > AttributeError: ProcessorDialog instance has no attribute 'dialog_def' > A fix was checked in for the exception caused by trying to enable filtering without enough training data, but I haven't heard any further public discussion of the second part about disabling the checkbox. I noticed that it is still not disabled in the latest dialog updates that Adam just checked in. Was it decided whether or not we want to do this? If anyone is interested, I will gladly update the patch that I submitted for this so that it works with Adam's new dialogs. -- Kenny Pitt From david at rebirthing.co.nz Mon Aug 18 15:53:26 2003 From: david at rebirthing.co.nz (David McNab) Date: Mon Aug 18 10:53:27 2003 Subject: [spambayes-dev] FAQ Contribution Message-ID: <1061218278.1144.10.camel@rebirth> Q: I notice the web interface rejects browser access unless the browser is running on the same host. How do I enable web access from other nodes on the LAN? A: Edit the bayescustomize.ini script. Just below the line '[html_ui]', add the line 'allow_remote_connections:True'. But make sure you firewall off outside access to port 8880, to stop unauthorised users from messing with the web interface. -- Cheers David From mhammond at skippinet.com.au Tue Aug 19 09:43:36 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Aug 18 18:45:00 2003 Subject: [spambayes-dev] New Outlook Dialogs Problem In-Reply-To: <3F40DA2F.1080104@hotmail.com> Message-ID: <12d001c365da$29f8e840$f502a8c0@eden> > A fix was checked in for the exception caused by trying to enable > filtering without enough training data, but I haven't heard > any further > public discussion of the second part about disabling the checkbox. I > noticed that it is still not disabled in the latest dialog > updates that > Adam just checked in. > > Was it decided whether or not we want to do this? If anyone is > interested, I will gladly update the patch that I submitted > for this so > that it works with Adam's new dialogs. Yes, please do. I don't think we know for sure exactly what we want, but will know it when we see it . Mark. From romain.guy at jext.org Tue Aug 19 02:31:49 2003 From: romain.guy at jext.org (Romain GUY) Date: Mon Aug 18 19:36:13 2003 Subject: [spambayes-dev] A weird trick for Outlook Express users Message-ID: <200381913149.445701@Thinthalion> I've attached a mail to this mail. It is a .eml file that Outlook Express users can drag and drop into their mailbox. The trick with this mail is that when you read it, a new browser window is opened (IE, Mozilla, Firebird, Opera... according to your default browser choice) directly in the SpamBayes Web Interface. Maybe Outlook Express might actually like it. I'm not sure but maybe it would be possible to allow Outlook Express users to manage SpamBayes right from the mail client using tricky email files... -- Romain GUY romain.guy@jext.org http://www.jext.org http://progx.jext.org -------------- next part -------------- An embedded message was scrubbed... From: unknown sender Subject: no subject Date: no date Size: 3710 Url: http://mail.python.org/pipermail/spambayes-dev/attachments/20030819/1858d50d/SpamBayes.eml From T.A.Meyer at massey.ac.nz Tue Aug 19 20:38:36 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Aug 19 03:39:27 2003 Subject: [spambayes-dev] SpamBayes Readme Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D92C0C@its-xchg4.massey.ac.nz> > I suggest the following approach. Move the current README > aside and create a new one. It can just have the section > headings in there initially, then people can write sections > as they go. The current approach of hoping for > someone to come along and rewrite it all isn't working. That's probably right. However, I couldn't be bothered creating appropriate headings, so I've created a new version. It's here at the moment: Comments? Or should I just check this in? I'll create a "readme-devel.txt" file that has all the testing (etc) info that the old readme had. BTW, because there are so many different ways to use & train spambayes, it is quite difficult to write an introduction that suits everyone. I'm not sure this does this yet, but it's a step along the way. It is very similar to INTEGRATION.TXT (which probably could be retired if we use this alternative readme). Richard: is this more what you were after? Apart from having things moved around, it is *very* similar to INTEGRATION.TXT, which apparently wasn't good enough. Can you suggest improvements? =Tony Meyer From anthony at interlink.com.au Tue Aug 19 18:48:26 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Aug 19 04:24:51 2003 Subject: [spambayes-dev] Re: SpamBayes Readme In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302D92C0C@its-xchg4.massey.ac.nz> Message-ID: <200308190748.h7J7mQqY017297@localhost.localdomain> You need to have Python 2.2 or later (2.3 is recommended). You can download Python from . + Many distributions of unix now ship with Python - try typing 'python' + at a shell prompt. As far as the many different ways to install SB, perhaps we should pick one or two as the "suggested way" to do it? Then have a bunch of files for the other sorts of ways, and reference them from the main README. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From richardjones at optushome.com.au Wed Aug 20 11:58:58 2003 From: richardjones at optushome.com.au (Richard Jones) Date: Fri Aug 22 11:35:16 2003 Subject: [spambayes-dev] Re: SpamBayes Readme In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302D92C0C@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1302D92C0C@its-xchg4.massey.ac.nz> Message-ID: <200308201059.02790.richardjones@optushome.com.au> On Tue, 19 Aug 2003 05:38 pm, Meyer, Tony wrote: > > I suggest the following approach. Move the current README > > aside and create a new one. It can just have the section > > headings in there initially, then people can write sections > > as they go. The current approach of hoping for > > someone to come along and rewrite it all isn't working. > > That's probably right. However, I couldn't be bothered creating > appropriate headings, so I've created a new version. > > It's here at the moment: > > > Comments? Or should I just check this in? I'll create a > "readme-devel.txt" file that has all the testing (etc) info that the old > readme had. That's a huge improvement for new users over the existing readme! Especially the Really Impatient part, which gives the immediate impression that the software is easy to use. Then the most common use-cases are laid out straight away in a clear and concise manner. I'm a little bemused by your having "everyone else" before "procmail filtering" though :) No mention of setup.py ... is it supposed to be used at all? If it is, I suggest that the scripts be renamed to make them a little more SB-specific (eg. "sb-pop3proxy"), as on my system it installs them all to /usr/bin (as I used /usr/bin/python2.3 to install them). Running pop3proxy.py, the first problem is that the webbrowser module is b0rken for me... I've posted a patch to Python bug 687747 which consists of: --- webbrowser.py 2003-08-20 10:28:07.000000000 +1000 +++ /usr/lib/python2.3/webbrowser.py 2003-08-04 10:18:17.000000000 +1000 @@ -354,7 +354,7 @@ if "BROWSER" in os.environ: # It's the user's responsibility to register handlers for any unknown # browser referenced by this value, before calling open(). - _tryorder[0:0] = os.environ["BROWSER"].split(os.pathsep) + _tryorder = os.environ["BROWSER"].split(os.pathsep) for cmd in _tryorder: if not cmd.lower() in _browsers: [might be useful for your later reference if the problem pops up] After I fixed that, the web interface came up nicely. I then tried to set up a proxy for my main email account. I stupidly put in "110" as the port number (having not read the instructions fully, I thought I a was entering the port number for the target pop server, not the local port), and promptly got an error and a whole slew of output to the console pop3proxy was running in consisting of a constant stream of: warning: unhandled read event warning: unhandled write event Ehem. So, I need to choose a port number above 1024 there :) In response to Skip's question, I'm running KMail to two POP servers and one IMAP server (I also use the OSX Mail.app to read the IMAP box). I now know that setting up the spambayes for the POP server will be a breeze. The IMAP box is already spam-filtered (poorly) by Mail.app. Getting the POP accounts spam-filtered will be a big win for now. Also on this note, I've previously used spamassasin through kmail's filtering (invoking the command-line tool from a filter). I think that the procmail setup could be adapted to do this, but training is an issue. BTW, I don't even know what format my mail is stored in by KMail (I suggest that a lot of users wouldn't know this :). I do know it's not mailbox format. It looks like a directory per folder, with "cur", "new" and "tmp" for each, and then a file per message. *shrug* Also, I had a look in the FAQ, but I couldn't see any reference to starting spambayes on boot-time. Has any work been done on some sort of rc script? Richard -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030820/472ed1d4/attachment-0001.bin From T.A.Meyer at massey.ac.nz Wed Aug 20 17:02:59 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Aug 22 16:19:50 2003 Subject: [spambayes-dev] RE: SpamBayes Readme Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302EDA1FA@its-xchg4.massey.ac.nz> > > BTW, there was a suggestion a while back (from Skip) that > > we adopt the > > format that ssh uses for port forwarding: "remote host:remote > > port:local port". For example: "pop.example.com:110:110". Do you > > think this would be any clearer? > > I think it would be. It would be interesting to hear what the other -dev people have to say. There was a suspicious quiet after Skip suggested it ;) It looks more confusing to me, but then I've never used ssh port forwarding... > Note that there's a large number of people out there that > won't know that POP is normally on port 110 :) True, which is why the remote port just defaults to 110, and the doc suggests 110 for the proxy. > Why not just arbitrarily assign a port number if the user > doesn't supply one? One reason is that unless/until we integrate more tightly with the mail clients, the user needs to know the number, so that they can tell they mail client which port to connect to. It would make it easier in terms of support, too, if almost everyone was using the same port. > > I haven't heard much about how well Mail.app filters. Is it really > > that bad? > > The filtering is fine - the spam detection isn't so crash-hot I worded that poorly. I meant the spam detecting. > The interface *rocks* though ... for a given message, there's a > button with which you can simply say "this is spam" or > "you marked this as spam and got it wrong". [...] > I'm assuming the outlook interface is like that. Almost exactly. > Unless KMail includes spambayes by default > (oooh!) it's unlikely to get that level of integration. Any client that provides a decent interface for plug-ins should be able to have this sort of integration, as long as someone's willing to put in the time to do it. How difficult it is depends a lot on the plug-in interface; Outlook is very complicated, so a lot of work. Eudora, for example, also has a plug-in interface and (in theory) should be somewhat simpler (especially since you can copy work that Mark has done). I think someone was going to work on a (Windows) Eudora plug-in, but I haven't heard anything further. =Tony Meyer From richardjones at optushome.com.au Wed Aug 20 12:54:07 2003 From: richardjones at optushome.com.au (Richard Jones) Date: Fri Aug 22 16:30:18 2003 Subject: [spambayes-dev] Re: SpamBayes Readme In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302D92E21@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1302D92E21@its-xchg4.massey.ac.nz> Message-ID: <200308201154.07329.richardjones@optushome.com.au> On Wed, 20 Aug 2003 11:28 am, Meyer, Tony wrote: > BTW, there was a suggestion a while back (from Skip) that we adopt the > format that ssh uses for port forwarding: "remote host:remote port:local > port". For example: "pop.example.com:110:110". Do you think this would > be any clearer? I think it would be. Note that there's a large number of people out there that won't know that POP is normally on port 110 :) Why not just arbitrarily assign a port number if the user doesn't supply one? The roundup demo script does this as a means of making things easier on the end user via this code: # figure basic params for server hostname = socket.gethostname() # pick a fairly odd, random port port = 8917 while 1: print 'Trying to set up web server on port %d ...'%port, s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) try: s.connect((hostname, port)) except socket.error, e: if not hasattr(e, 'args') or e.args[0] != errno.ECONNREFUSED: raise print 'should be ok.' break else: s.close() print 'already in use.' port += 100 > I haven't heard much about how well Mail.app filters. Is it really that > bad? The filtering is fine - the spam detection isn't so crash-hot (from what people have told me - I don't get much spam at that address). The interface *rocks* though ... for a given message, there's a button with which you can simply say "this is spam" or "you marked this as spam and got it wrong". No idea what sort of scheme they're using under the covers. I'm assuming the outlook interface is like that. Unless KMail includes spambayes by default (oooh!) it's unlikely to get that level of integration. Richard -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030820/0e24d359/attachment-0001.bin From T.A.Meyer at massey.ac.nz Wed Aug 20 14:28:05 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Aug 22 17:07:54 2003 Subject: [spambayes-dev] RE: SpamBayes Readme Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D92E21@its-xchg4.massey.ac.nz> > That's a huge improvement for new users over the existing > readme! It really is only a reordering of INTEGRATION.TXT for the most part. (The IMAP stuff being an exception, which is my fault anyway). > I'm a little bemused by your having > "everyone else" before "procmail filtering" though :) That's an area that needs improvement. The thing is that the "everyone else" section applies to procmail (and vi/emacs later on) users. I'm not sure where the information can really go, without duplicating it in each section. I'll integrate Peter's procmail steps and see if that helps. > No mention of setup.py ... is it supposed to be used at all? Opps. I don't use it, so forgot about it. Yes, there should be an "installation" action before anything else, although it's not strictly necessary. > If it is, I suggest that the scripts be renamed to make them a > little more SB-specific (eg. "sb-pop3proxy"), as on my system > it installs them all to /usr/bin (as I used /usr/bin/python2.3 > to install them). This was suggested and debated not all that long ago on the -dev list. IIRC, there was enough agreement that this probably will happen...possibly by the next release (better sooner than later, I suppose). It's an annoying task, though, because you have to go through *everything* and make sure that you've corrected all the names... > Running pop3proxy.py, the first problem is that the > webbrowser module is broken for me... [...] > [might be useful for your later reference if the problem pops up] Thanks. > After I fixed that, the web interface came up nicely. I then > tried to set up a proxy for my main email account. I stupidly > put in "110" as the port number (having not read the instructions > fully, I thought I a was entering the port number for the target > pop server, not the local port), and promptly got an error and > a whole slew of output to the console pop3proxy was running in > consisting of a constant stream of: > > warning: unhandled read event > warning: unhandled write event > > Ehem. So, I need to choose a port number above 1024 there :) This isn't a restriction for everyone, of course. Windows users are able to use 110 as the port without needing to be logged in as administrator or anything. There should be a note about this, though. It could probably handle the error more nicely, too. BTW, there was a suggestion a while back (from Skip) that we adopt the format that ssh uses for port forwarding: "remote host:remote port:local port". For example: "pop.example.com:110:110". Do you think this would be any clearer? > The IMAP box is already spam-filtered (poorly) by Mail.app. I haven't heard much about how well Mail.app filters. Is it really that bad? =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Aug 20 14:12:09 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Aug 22 17:12:28 2003 Subject: [spambayes-dev] FAQ Contribution Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D92E12@its-xchg4.massey.ac.nz> > Q: I notice the web interface rejects browser access unless > the browser is running on the same host. How do I enable web > access from other nodes on the LAN? > > A: Edit the bayescustomize.ini script. Just below the line > '[html_ui]', add the line 'allow_remote_connections:True'. > But make sure you firewall off outside access to port 8880, > to stop unauthorised users from messing with the web interface. Thanks. I've added a FAQ about this, with additional information about the new-to-cvs ability to specify individual IP [ranges] that are allowed access. =Tony Meyer From anthony at interlink.com.au Sat Aug 23 18:26:42 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Sun Aug 24 00:42:50 2003 Subject: [spambayes-dev] RE: SpamBayes Readme In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302EDA1FA@its-xchg4.massey.ac.nz> Message-ID: <200308230726.h7N7Qgx1015054@localhost.localdomain> >>> "Meyer, Tony" wrote > > > BTW, there was a suggestion a while back (from Skip) that > > > we adopt the > > > format that ssh uses for port forwarding: "remote host:remote > > > port:local port". For example: "pop.example.com:110:110". Do you > > > think this would be any clearer? > > > > I think it would be. > > It would be interesting to hear what the other -dev people have to say. > There was a suspicious quiet after Skip suggested it ;) It looks more > confusing to me, but then I've never used ssh port forwarding... > So long as we actually follow the actual ssh approach, which is localport:remotehost:remoteport Anthony -- Anthony Baxter It's never too late to have a happy childhood. From richie at entrian.com Sun Aug 24 11:00:34 2003 From: richie at entrian.com (Richie Hindle) Date: Sun Aug 24 06:17:59 2003 Subject: [spambayes-dev] RE: SpamBayes Readme Message-ID: <6jvgkvokp45a0fck4mib4781ql8c97897d@4ax.com> [Resend] [Tony] > there was a suggestion a while back (from Skip) that > we adopt the format that ssh uses for port forwarding: > For example: "pop.example.com:110:110". > [...] > It would be interesting to hear what the other -dev people have to say. > There was a suspicious quiet after Skip suggested it ;) It looks more > confusing to me, but then I've never used ssh port forwarding... Personally I don't like it - I imagine someone unfamiliar with the idea of ports would not have a clue what it meant. By separating the local and remote ports into different fields, we separate the concepts rather than muddling them together (there's an added complication relating items in one list with items in the other, but that's not difficult and you only typically have a small number of them). Maybe the fix is as simple as changing the Configuration page to say "Spambayes ports" or "Local ports" or "Listening ports" instead of just "Ports"? -- Richie Hindle richie@entrian.com From richie at entrian.com Sat Aug 23 12:12:43 2003 From: richie at entrian.com (Richie Hindle) Date: Sun Aug 24 06:59:03 2003 Subject: [spambayes-dev] RE: SpamBayes Readme In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302EDA1FA@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1302EDA1FA@its-xchg4.massey.ac.nz> Message-ID: [Tony] > there was a suggestion a while back (from Skip) that > we adopt the format that ssh uses for port forwarding: > For example: "pop.example.com:110:110". > [...] > It would be interesting to hear what the other -dev people have to say. > There was a suspicious quiet after Skip suggested it ;) It looks more > confusing to me, but then I've never used ssh port forwarding... Personally I don't like it - I imagine someone unfamiliar with the idea of ports would not have a clue what it meant. By separating the local and remote ports into different fields, we separate the concepts rather than muddling them together (there's an added complication relating items in one list with items in the other, but that's not difficult and you only typically have one or two of them). Maybe the fix is as simple as changing the Configuration page to say "Spambayes ports" or "Local ports" or "Listening ports" instead of just "Ports"? -- Richie Hindle richie@entrian.com From vanhorn at whidbey.com Sun Aug 24 14:44:35 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Sun Aug 24 16:51:36 2003 Subject: [spambayes-dev] RE: SpamBayes Readme References: <6jvgkvokp45a0fck4mib4781ql8c97897d@4ax.com> Message-ID: <3F4923B3.A6129656@whidbey.com> Greetings: I've been meaning to ask if someone couldn't generate a new source collection for the download page. It looks like 1.0a4 was released on 7 July, and I know that at least two major issues (unicode in headers and default database) have been dealt with since then. Either that or come over here this afternoon and help figure out why I've never been able to get CVS working! Van -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From paul.huygen at huygen.nl Sun Aug 24 16:34:24 2003 From: paul.huygen at huygen.nl (Paul Huygen) Date: Sun Aug 24 18:12:32 2003 Subject: [spambayes-dev] Proposal for Emacs script to save spam-nonspam messages for training purposes Message-ID: <16200.48864.657573.887131@Grootgrut.hit> Hi, While looking on the web for code to optimize emacs's VM for handling spam, I found (in url http://mail.python.org/pipermail/spambayes-checkins/2003-May/001291.html) your script: > (defun copy-to-spam () > (interactive) > (vm-save-message (expand-file-name "~/tmp/newspam")) > (vm-undelete-message 1)) > > (defun copy-to-nonspam () > (interactive) > (vm-save-message (expand-file-name "~/tmp/newham")) > (vm-undelete-message 1)) > > (define-key vm-mode-map "ls" 'copy-to-spam) > (define-key vm-summary-mode-map "ls" 'copy-to-spam) > (define-key vm-mode-map "lh" 'copy-to-nonspam) > (define-key vm-summary-mode-map "lh" 'copy-to-nonspam) Thank you for this script. However, it did not run properly in my case. The function "copy-to-spam" saves the current message into "~/tmp/newspam", then marks it for deletion, jumps to the next message and undeletes that message. In my case, the following modification of e.g. copy-to-spam seems to work: (defun copy-to-spam () (interactive) (let ((vm-move-after-deleting nil)) (vm-save-message (expand-file-name "~/mail/mboxes/newspam"))) (let ((vm-move-after-undeleting t)) (vm-undelete-message 1))) Best regards, Paul Huygen From T.A.Meyer at massey.ac.nz Mon Aug 25 13:43:13 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Aug 24 20:44:29 2003 Subject: [spambayes-dev] RE: SpamBayes Readme Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302EDABA5@its-xchg4.massey.ac.nz> > Personally I don't like it - I imagine someone unfamiliar > with the idea of ports would not have a clue what it meant. That's my opinion, too. > Maybe the > fix is as simple as changing the Configuration page to say > "Spambayes ports" or "Local ports" or "Listening ports" > instead of just "Ports"? This sounded like a good idea, so I checked it in :) I guess we leave the local:remote:remote format for now, since it's undecided whether it's good or not, and either way it's more work ;) =Tony Meyer From richie at entrian.com Sun Aug 24 23:41:22 2003 From: richie at entrian.com (Richie Hindle) Date: Mon Aug 25 01:18:10 2003 Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme] Message-ID: [Van] > I've been meaning to ask if someone couldn't generate a new source collection > for the download page. It looks like 1.0a4 was released on 7 July, and I know > that at least two major issues (unicode in headers and default database) have > been dealt with since then. I've been wondering this myself (mostly because I'm fed up with replying to people complaining about that unicode header bug!) However, there's an issue with the web interface (messages not appearing on the Review page) that I'd like to look at before the next release. Hopefully I'll have time to look at that in the next couple of days - if anyone was going to be mad-keen enough to build a release right now, could they hold off until I've looked at that? Ta. (I'm happy to build the release myself, unless anyone else particularly wants to do it.) Is there anyone else with a pending edit that they'd like to see in 1.0a5? (There's Romain's HTTP Auth patch, 791393, which should be my job to apply but I don't know when I'll get the chance.) -- Richie Hindle richie@entrian.com From T.A.Meyer at massey.ac.nz Mon Aug 25 19:36:24 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Aug 25 02:37:11 2003 Subject: [spambayes-dev] 1.0a5 release Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAD51@its-xchg4.massey.ac.nz> > I've been wondering this myself (mostly because I'm fed up > with replying to people complaining about that unicode header bug!) Indeed. I imagine that there will be a new Outlook plug-in release soon, too, and it's nice when they come out at similar times. (Although Mark's up to 8, and we're up to 5 ;). > However, there's an issue with the web interface (messages > not appearing on the Review page) that I'd like to look at > before the next release. Hopefully I'll have time to look at > that in the next couple of days I agree that this really must be fixed; I've been meaning to look at it too (it was probably me or TimS that broke it) but haven't had a chance yet either. > (I'm happy to build the > release myself, unless anyone else particularly wants to do it.) +1 to you doing it :) > Is there anyone else with a pending edit that they'd like to > see in 1.0a5? I checked in the new readmes and modified smtpproxy today, which I wanted in. I'd actually like to see 1.0b1, though. In terms of the API, we must be pretty stable now, right? Except that if we are going to do the renaming thing (as proposed by Greg), we should probably do that (really the sooner the better if we are going to, even if we are still at 1.0a5). I think we should also catch the 'no dbm available' traceback (I can't remember the wording) and replace it with a nice error for (at least) pop3proxy users, so that those using dumbdbm don't flood us with "1.0a5 doesn't work" messages. > (There's Romain's HTTP Auth patch, 791393, which should be my > job to apply but I don't know when I'll get the chance.) If you don't, I can probably do this. =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Aug 25 19:43:08 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Aug 25 02:43:49 2003 Subject: [spambayes-dev] 1.0a5 release Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAD52@its-xchg4.massey.ac.nz> [me] > I'd actually like to see 1.0b1, though. One thing I forgot: I still like the cuteness of releasing the next version on the 4th of September, at which point SpamBayes will have been at sourceforge for exactly a year. Although this means people have to wait another 10 days, it could very well be another ten days before we get everything ready to go... :) (Mark could buy us a birthday cake with all the $0's he has collected from sales of the Outlook plug-in ;) =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Aug 25 19:55:45 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Aug 25 02:56:25 2003 Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme] Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAD57@its-xchg4.massey.ac.nz> > Is there anyone else with a pending edit that they'd like to > see in 1.0a5? Ok, there was more I forgot: o I'm going to put in the request for notate_to to optionally work on ham & unsure messages. o I'll also put in the request for an 'advanced' config page for the web ui. o I'd like to be able to get the overkill script [renamed and] working with OE (it works with other MUAs). I might not get time for this, and it's not that important. o The fix that Mark & I made to pop3proxy_service so that smtpproxy also starts needs an improvement (not surprisingly, with my bit, not Mark's), in that the stop() function doesn't do what it should. (unless anyone has any objections to any of these). There's also the option of ripping out all the backwards compatibility code from Options.py/OptionsClass.py. This might mean that the odd config file no longer works, but it's got to be done at some time, and would make the code a lot tidier. =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Aug 25 21:10:27 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Aug 25 04:11:26 2003 Subject: [spambayes-dev] stopping pop3proxy Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAD5F@its-xchg4.massey.ac.nz> [Tony] > > Done. (There are now only three pop3proxy functions of > > interest: prepare(), start() and stop(). They do what you > > would expect, and all take a pop3proxy.State object (like > > pop3proxy.state) as the only parameter). [Mark] > stop() appears dangerous - remember it is called > asynchronously, so another thread could be doing almost > anything. What we really need is a trigger to tell the main > loop to stop, and the save should be done as that loop terminates. I understand what you're saying, but I don't know how to do this - hopefully Richie does. The closest I can think of is that stop() needs to call stop_when_done() on each of the BayesProxy objects, and then wait until they have all stopped (I don't know how to check this). Then save then return. Richie - any advice? Also in terms of stopping pop3proxy, on my fiance's computer pop3proxy is running all the time so that she doesn't have to know how to start & stop it when checking for mail (it's too old for pop3proxy_service). This works fine, except that when she shuts down Windows, it comes up with a "can't shut this program down" message (it does terminate it after a delay, but it's annoying). Is there some way that we can handle this? I presume Windows sends a sigterm or something to the application? Would other OSs do exactly the same thing? =Tony Meyer From richie at entrian.com Mon Aug 25 11:18:51 2003 From: richie at entrian.com (Richie Hindle) Date: Mon Aug 25 05:19:17 2003 Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme] In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAD57@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAD57@its-xchg4.massey.ac.nz> Message-ID: [Tony] > There's also the option of ripping out all the backwards compatibility > code from Options.py/OptionsClass.py. > [...] > I'd actually like to see 1.0b1, though. In terms of the API, we must be > pretty stable now, right? Except that if we are going to do the > renaming thing (as proposed by Greg), we should probably do that (really > the sooner the better if we are going to, even if we are still at > 1.0a5). I'm +1 on both pulling out the backward-compatibility code and on renaming everything, but I don't think we can do that in a beta release - even the first one. Major changes like that should happen during the alpha cycle IMHO. It could even be worth releasing 1.0a5 *before* making those edits, with an announcement that the old options and script names are deprecated, then immediately releasing 1.0a6 with just those edits in place. That way, no-one will be obliged to swallow the new names in order to get their hands on a bugfix. > I've been meaning to look at [the problem of messages not appearing on > the review page] too If I find a decent amount of time to devote to it I'll let you know, and if you could do the same then we won't duplicate the work. > I think we should also catch the 'no dbm available' traceback (I can't > remember the wording) and replace it with a nice error for (at least) > pop3proxy users, so that those using dumbdbm don't flood us with "1.0a5 > doesn't work" messages. +1 > I still like the cuteness of releasing the next > version on the 4th of September, at which point SpamBayes will have been > at sourceforge for exactly a year. I'd like to say +1, especially since it's my birthday too! (Spambayes is exactly 31 years younger than me, and already considerably brighter 8-) But if we're ready more than a couple of days before that, we should probably go ahead. -- Richie Hindle richie@entrian.com From vanhorn at whidbey.com Mon Aug 25 04:02:38 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Mon Aug 25 06:02:42 2003 Subject: [spambayes-dev] 1.0a5 release References: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAD52@its-xchg4.massey.ac.nz> Message-ID: <3F49DEBE.ACE7BB49@whidbey.com> Okay, how about 1.0a5 in the next couple of days, and then go for broke with 1.0b1 on the anniversary? Van "Meyer, Tony" wrote: > [me] > > I'd actually like to see 1.0b1, though. > > One thing I forgot: I still like the cuteness of releasing the next > version on the 4th of September, at which point SpamBayes will have been > at sourceforge for exactly a year. Although this means people have to > wait another 10 days, it could very well be another ten days before we > get everything ready to go... :) > > (Mark could buy us a birthday cake with all the $0's he has collected > from sales of the Outlook plug-in ;) > > =Tony Meyer > > _______________________________________________ > spambayes-dev mailing list > spambayes-dev@python.org > http://mail.python.org/mailman/listinfo/spambayes-dev -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From richie at entrian.com Mon Aug 25 14:15:13 2003 From: richie at entrian.com (Richie Hindle) Date: Mon Aug 25 08:15:34 2003 Subject: [spambayes-dev] stopping pop3proxy In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAD5F@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAD5F@its-xchg4.massey.ac.nz> Message-ID: [Mark] > stop() appears dangerous - remember it is called > asynchronously, so another thread could be doing almost > anything. What we really need is a trigger to tell the main > loop to stop, and the save should be done as that loop terminates. [Tony] > I understand what you're saying, but I don't know how to do this - > hopefully Richie does. The closest I can think of is that stop() needs > to call stop_when_done() on each of the BayesProxy objects, and then > wait until they have all stopped (I don't know how to check this). Then > save then return. > > Richie - any advice? What needs to be done is to refuse all new connections (or rather accept them but push() back an error message and call close_when_done(), according to whether some flag, probably state.isShuttingDown or similar), then exit when all current connections complete (close_when_done is per-connection, so it's not what we want - the connections will close anyway). I think the best way to do this is to call sys.exit() when BayesProxy.close() is called and state.activeSessions goes to zero with state.isShuttingDown set. I'll try to have a look at this - if anybody else wants to look at it, let me know so we don't duplicate the work. > Also in terms of stopping pop3proxy, on my fiance's computer pop3proxy > is running all the time so that she doesn't have to know how to start & > stop it when checking for mail (it's too old for pop3proxy_service). > This works fine, except that when she shuts down Windows, it comes up > with a "can't shut this program down" message (it does terminate it > after a delay, but it's annoying). Is there some way that we can handle > this? I presume Windows sends a sigterm or something to the > application? Would other OSs do exactly the same thing? Windows sends a message to all top-level windows on shutdown, but the pop3proxy has no windows. Oddly, my XP box shuts down fine, with no warnings - I can't explain that... Mark? -- Richie Hindle richie@entrian.com From mhammond at skippinet.com.au Mon Aug 25 23:31:41 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Aug 25 08:31:50 2003 Subject: [spambayes-dev] stopping pop3proxy In-Reply-To: Message-ID: <037d01c36b04$d61fdf00$f502a8c0@eden> > > Also in terms of stopping pop3proxy, on my fiance's > computer pop3proxy > > is running all the time so that she doesn't have to know > how to start & > > stop it when checking for mail (it's too old for pop3proxy_service). > > This works fine, except that when she shuts down Windows, > it comes up > > with a "can't shut this program down" message (it does terminate it > > after a delay, but it's annoying). Is there some way that > we can handle > > this? I presume Windows sends a sigterm or something to the > > application? Would other OSs do exactly the same thing? > > Windows sends a message to all top-level windows on shutdown, but the > pop3proxy has no windows. Oddly, my XP box shuts down fine, with no > warnings - I can't explain that... Mark? I've seen similar things on Win9x, but not the NT platform. Windows does try and shut down console apps, but I have no idea exactly what it does. Or-even-roughly Mark. From papaDoc at videotron.ca Mon Aug 25 11:17:38 2003 From: papaDoc at videotron.ca (papaDoc) Date: Mon Aug 25 10:35:53 2003 Subject: [spambayes-dev] Patch for pop3proxy Message-ID: <3F4A1A82.9080106@videotron.ca> Hi, I have this problem with pop3proxy.py (cvs of 2003.08.25) and python 2.2.2 SpamBayes POP3 Proxy Beta1, version 0.1 (May 2003), using SpamBayes POP3 Proxy Web Interface Alpha2, version 0.02 and engine SpamBayes Beta2, version 0.2 (July 2003). Loading database... Filename for database = d:/NoBackup/users/ricard/Spambayes/hammie.db Traceback (most recent call last): File "C:\Devtools\SPAMBA~1\SPAMBA~2.25\POP3PR~1.PY", line 819, in ? run() File "C:\Devtools\SPAMBA~1\SPAMBA~2.25\POP3PR~1.PY", line 804, in run prepare(state=state) File "C:\Devtools\SPAMBA~1\SPAMBA~2.25\POP3PR~1.PY", line 746, in prepare state.createWorkers() File "C:\Devtools\SPAMBA~1\SPAMBA~2.25\POP3PR~1.PY", line 614, in createWorkers if '::' in filename: TypeError: 'in ' requires character as left operand So this is a patch to solve this problem *************** *** 611,617 **** filename = os.path.expanduser(filename) print "Filename for database = %s" % filename if self.useDB: ! if re.search(r'::', filename): sql_types = {"pgsql" : storage.PGClassifier, "mysql" : storage.mySQLClassifier, } --- 611,617 ---- filename = os.path.expanduser(filename) print "Filename for database = %s" % filename if self.useDB: ! if '::' in filename: sql_types = {"pgsql" : storage.PGClassifier, "mysql" : storage.mySQLClassifier, } I don't know if this is the good way to do it but it solves my problem....... Remi From skip at pobox.com Mon Aug 25 11:37:15 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Aug 25 11:37:32 2003 Subject: [spambayes-dev] Patch for pop3proxy In-Reply-To: <3F4A1A82.9080106@videotron.ca> References: <3F4A1A82.9080106@videotron.ca> Message-ID: <16202.11563.472963.116863@montanaro.dyndns.org> papaDoc> if '::' in filename: papaDoc> TypeError: 'in ' requires character as left operand Fixed in CVS using string object's find() method. (Actual fix is in spambayes/storage.py now due to some reshuffling over the weekend.) papaDoc> I don't know if this is the good way to do it but it solves my papaDoc> problem....... It worked, but using regular expressions may have been overkill. Probably my favorite Internet quote of all time is this one from Jamie Zawinski: Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. Skip From skip at pobox.com Mon Aug 25 13:00:12 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Aug 25 13:00:25 2003 Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme] In-Reply-To: References: Message-ID: <16202.16540.742359.215660@montanaro.dyndns.org> Richie> Is there anyone else with a pending edit that they'd like to see Richie> in 1.0a5? We didn't resolve the issue of the print statements in storage.py. I have a simple change which will shoot them out to sys.stderr instead. I think that should be considered a bug fix, not an enhancement. Skip From richie at entrian.com Mon Aug 25 20:02:10 2003 From: richie at entrian.com (Richie Hindle) Date: Mon Aug 25 14:02:15 2003 Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme] In-Reply-To: <16202.16540.742359.215660@montanaro.dyndns.org> References: <16202.16540.742359.215660@montanaro.dyndns.org> Message-ID: [Richie] > Is there anyone else with a pending edit that they'd like to see > in 1.0a5? [Skip] > We didn't resolve the issue of the print statements in storage.py. I have a > simple change which will shoot them out to sys.stderr instead. I think that > should be considered a bug fix, not an enhancement. +1 from me. Is there a reason *not* to do it that I'm not aware of? -- Richie Hindle richie@entrian.com From skip at pobox.com Mon Aug 25 15:02:44 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Aug 25 15:05:10 2003 Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme] In-Reply-To: References: <16202.16540.742359.215660@montanaro.dyndns.org> Message-ID: <16202.23892.947280.850525@montanaro.dyndns.org> >> We didn't resolve the issue of the print statements in storage.py. I >> have a simple change which will shoot them out to sys.stderr instead. >> I think that should be considered a bug fix, not an enhancement. Richie> +1 from me. Is there a reason *not* to do it that I'm not aware Richie> of? I thought someone else had an alternative solution which involved dumping the prints altogether. I'll check in my change in a moment. Skip From skip at pobox.com Mon Aug 25 15:23:42 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Aug 25 15:24:00 2003 Subject: [spambayes-dev] Proposal for Emacs script to save spam-nonspam messages for training purposes In-Reply-To: <16200.48864.657573.887131@Grootgrut.hit> References: <16200.48864.657573.887131@Grootgrut.hit> Message-ID: <16202.25150.488685.552545@montanaro.dyndns.org> Paul> Thank you for this script. However, it did not run properly in my Paul> case. The function "copy-to-spam" saves the current message into Paul> "~/tmp/newspam", then marks it for deletion, jumps to the next Paul> message and undeletes that message. In my case, the following Paul> modification of e.g. copy-to-spam seems to work: Paul> (defun copy-to-spam () Paul> (interactive) Paul> (let ((vm-move-after-deleting nil)) Paul> (vm-save-message (expand-file-name "~/mail/mboxes/newspam"))) Paul> (let ((vm-move-after-undeleting t)) (vm-undelete-message 1))) Thanks for the feedback. FWIW, I don't use those precise Emacs Lisp incantations anymore. I'm not too surprised they didn't work in all situations. VM is a fairly complex beast. Here's what I do now. (defun train-as-spam () (interactive) (let ((vm-delete-after-saving nil)) (vm-save-message (expand-file-name "~/tmp/newspam")) (vm-add-message-labels "trained" 1)) (vm-pipe-message-to-command "hammiefilter.py -s >/dev/null" nil)) (defun train-as-nonspam () (interactive) (let ((vm-delete-after-saving nil)) (vm-save-message (expand-file-name "~/tmp/newham")) (vm-add-message-labels "trained" 1)) (vm-pipe-message-to-command "hammiefilter.py -g >/dev/null" nil)) (define-key vm-mode-map "ls" 'train-as-spam) (define-key vm-summary-mode-map "ls" 'train-as-spam) (define-key vm-mode-map "lh" 'train-as-nonspam) (define-key vm-summary-mode-map "lh" 'train-as-nonspam) It changes two things. One, it tries not to delete the message, so the problem you encountered should be gone. Two, it trains on the message. I don't think I've encountered any problems in this regard, though it's worth noting that mail could be arriving at the same time as hammiefilter has the database open for write. I'll update the faq. Skip From skip at pobox.com Mon Aug 25 15:32:42 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Aug 25 15:48:33 2003 Subject: [spambayes-dev] Website problems Message-ID: <16202.25690.487337.38536@montanaro.dyndns.org> I tried pushing a change to the faq just now, but errors from rsync: % make install cd download ; make install Push to shell1.sourceforge.net:/home/groups/s/sp/spambayes/htdocs//download ... rsync --rsh=ssh -v -r -l -t --update --exclude-from=../scripts/rsync-excludes ./ shell1.sourceforge.net:/home/groups/s/sp/spambayes/htdocs//download building file list ... done rsync: recv_generator: mkdir "/home/groups/s/sp/spambayes/htdocs//download": No such file or directory (2) stat /home/groups/s/sp/spambayes/htdocs//download : No such file or directory rsync: recv_generator: mkdir "/home/groups/s/sp/spambayes/htdocs//download": No such file or directory (2) stat /home/groups/s/sp/spambayes/htdocs//download : No such file or directory wrote 32 bytes read 20 bytes 11.56 bytes/sec total size is 0 speedup is 0.00 rsync error: some files could not be transferred (code 23) at main.c(620) make[1]: *** [install] Error 23 make: *** [local_install] Error 2 I logged into shell1.sourceforge.net and see that /home/groups/s is empty. The login message says: On 2003-08-23, one project file server (of seven) sufferred a multi-disk failure; data from this file server has been restored from tape. At this time, the filesystem for impacted projects is marked read-only pending completion of our analysis and resolution of this issue. A small number of users will not be able to login until this resolution occurs. Watch the "Site Status" page on the SourceForge.net site for updates. Projects served from this file server start with the letters j, q, s, and y. I guess this means that for the time being we can't make changes to the website. Has anyone else encountered this? Skip From skip at pobox.com Mon Aug 25 15:41:56 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Aug 25 15:51:33 2003 Subject: [spambayes-dev] RE: SpamBayes Readme In-Reply-To: <200308230726.h7N7Qgx1015054@localhost.localdomain> References: <1ED4ECF91CDED24C8D012BCF2B034F1302EDA1FA@its-xchg4.massey.ac.nz> <200308230726.h7N7Qgx1015054@localhost.localdomain> Message-ID: <16202.26244.998766.755415@montanaro.dyndns.org> Anthony> So long as we actually follow the actual ssh approach, which is Anthony> localport:remotehost:remoteport Yeah, which reads to me, "when the user connects to localport, forward it to remotehost:remoteport". I apologize if I screwed up the syntax in my original suggestion. Skip From T.A.Meyer at massey.ac.nz Tue Aug 26 12:20:13 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Aug 25 19:21:14 2003 Subject: [spambayes-dev] Patch for pop3proxy Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAED9@its-xchg4.massey.ac.nz> > papaDoc> if '::' in filename: > papaDoc> TypeError: 'in ' requires character as > left operand > > Fixed in CVS using string object's find() method. (Actual > fix is in spambayes/storage.py now due to some reshuffling > over the weekend.) Sorry, this was me. This is something that was added in Python 2.3, wasn't it - I had forgotten that. =Tony Meyer From T.A.Meyer at massey.ac.nz Tue Aug 26 12:26:17 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Aug 25 19:27:10 2003 Subject: [spambayes-dev] stopping pop3proxy Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAEE4@its-xchg4.massey.ac.nz> > What needs to be done is [...clever sounding complicated stuff...] > I'll try to have a look > at this - if anybody else wants to look at it, let me know so > we don't duplicate the work. All yours :) [Richie] > Windows sends a message to all top-level windows on shutdown, > but the pop3proxy has no windows. [Mark] > I've seen similar things on Win9x, but not the NT platform. > Windows does try and shut down console apps, but I have no idea > exactly what it does. Perhaps if I ran pop3proxy with pythonw instead of python, so that there wasn't a console window? I'll give that a go tonight. I suppose that whatever Windows does to try and shut down console apps must be documented somewhere on msdn, so I could go look for it myself... =Tony Meyer From mhammond at skippinet.com.au Tue Aug 26 10:40:18 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Aug 25 19:40:29 2003 Subject: [spambayes-dev] stopping pop3proxy In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAEE4@its-xchg4.massey.ac.nz> Message-ID: <15e101c36b62$3e80a090$f502a8c0@eden> [Tony] > Perhaps if I ran pop3proxy with pythonw instead of python, so > that there > wasn't a console window? I'll give that a go tonight. I suppose that > whatever Windows does to try and shut down console apps must be > documented somewhere on msdn, so I could go look for it myself... No, I think the issue will actually be "does the app have a message queue?". Even if running under pythonw.exe, there is no way for Windows to cleanly shutdown a Python app other than a "console control event" (I think it is called). Running a messagw queue would also allow you to detect logon/loggoff etc events under Win9x (NT platforms should just use the service) Back to the original issue - I thought a common way to shutdown a server like this was simply to make a local connection to the server and issue a shutdown command. Or was it just to make a temporary local connection just to wake up each listener so it can shutdown? Either way, I have no clue about how pop3proxy is structured, so have no idea if this even makes sense :) Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2148 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030826/eab6424c/winmail.bin From T.A.Meyer at massey.ac.nz Tue Aug 26 13:16:09 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Aug 25 20:16:50 2003 Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme] Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAF4D@its-xchg4.massey.ac.nz> >> We didn't resolve the issue of the print statements in storage.py. >> I have a simple change which will shoot them out to sys.stderr instead. >> I think that should be considered a bug fix, not an enhancement. [...] > I thought someone else had an alternative solution which > involved dumping the prints altogether. I'll check in my > change in a moment. I was the one suggesting removing them, but this would be an enhancement, whereas your checkin was a bug fix. Instead of removing them, I think that we might want to move towards an integer verbose level at some point, which could mean that they are only printed at a high enough level. The last time this was suggested it got some +0's, but nothing else, so I presume people don't really care either way. It's not urgent, though (IMO), and could wait until a later release. =Tony Meyer From ta-meyer at ihug.co.nz Tue Aug 26 17:34:39 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Tue Aug 26 00:35:18 2003 Subject: [spambayes-dev] pop3proxy/imapfilter advanced configuration page Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212AD4D@its-xchg4.massey.ac.nz> I've just checked in an update to the web interface for pop3proxy/imapfilter that provides an "Advanced Configuration" page (there's a button at the bottom of the regular config page). This was requested (#791254), but it seemed to make sense to me that some people might want to play around with advanced options, but not want to have to understand how to edit the config file by hand (and there's also the need to use Python to get a list of the options available). I tried to choose options that seemed too advanced to go on the regular config page (and I moved a couple from there), but none that are simply too complicated for people to use unless they know what they are doing. If anyone thinks I've included something that I shouldn't have, or have missed an option that I should have included, please let me know. Any comments welcome. =Tony Meyer From T.A.Meyer at massey.ac.nz Tue Aug 26 18:17:51 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Aug 26 01:18:38 2003 Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme] Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302EDB0E6@its-xchg4.massey.ac.nz> > I'm +1 on both pulling out the backward-compatibility code > and on renaming everything, but I don't think we can do that > in a beta release - even the first one. Major changes like > that should happen during the alpha cycle IMHO. Fair enough. It just seems that every time things are stable enough to label the release beta, something comes up. I guess the version info for each app that's in Version.py clarifies this somewhat, though. > It could even be worth releasing 1.0a5 *before* making those > edits, with an announcement that the old options and script > names are deprecated, then immediately releasing 1.0a6 with > just those edits in place. The deprecation of the options was announced ages ago (while you were off creating a family), and everyone was instructed to change to the new ones (a script was even provided to do this). I think these must be ready to go for 1.0a5. Changing the names, though, hasn't been announced at all. Overall, then, +1 to your idea. > I'd like to say +1, especially since it's my birthday too! > But if we're ready more than a > couple of days before that, we should probably go ahead. Given the last couple of days, it looks like we probably will. Perhaps 1.0a5 in a couple of days and then 1.0a6 (as above) on your birthday? =Tony Meyer From skip at pobox.com Tue Aug 26 09:32:14 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Aug 26 09:32:59 2003 Subject: [spambayes-dev] Re: [Spambayes] fatal error? In-Reply-To: <200308252112.32705.mark.tabash@novacolor.ca> References: <200308251630.18299.mark.tabash@novacolor.ca> <16202.30590.705370.358704@montanaro.dyndns.org> <200308252112.32705.mark.tabash@novacolor.ca> Message-ID: <16203.24926.483126.523549@montanaro.dyndns.org> (cc'ing spambayes-dev) Mark> Thanks. I guess I will delete the database and start over again. Mark> But what guarantees me that this is not going to happen again? There are no guarantees. We don't at this moment know what the problem is. (What follows is perhaps more for the developers than Mark...) I contacted Sleepycat about distributing binaries of their command line executables (it doesn't seem they'd have a problem with it). In addition to information about that I got this information about your specific error: DB_RUNRECOVERY is the error that is returned when the library detects a fatal error or structure in the shared region or environment files that are used to coordinate the interaction between multiple threads of control. Once this occurs, the shared region is marked invalid and the application must be shut down, recovery must be run and the application can be brought back up. Recovery can be run as a standalone utility (db_recover) or from the application, by specifying DB_RECOVER when opening the environment. If the Outlook plugin is executing multiple threads, two of which might operate on the database simultaneously, I suspect it will have to lock access to the db file. Mark Hammond, does the above comment jive with the structure of the plugin? Skip From mhammond at skippinet.com.au Wed Aug 27 00:52:24 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Aug 26 09:52:31 2003 Subject: [spambayes-dev] Re: [Spambayes] fatal error? In-Reply-To: <16203.24926.483126.523549@montanaro.dyndns.org> Message-ID: <022b01c36bd9$48bc96a0$f502a8c0@eden> > If the Outlook plugin is executing multiple threads, two of > which might > operate on the database simultaneously, I suspect it will have to lock > access to the db file. Mark Hammond, does the above comment > jive with the > structure of the plugin? The addin is "mainly" single-threaded - Outlook always calls us from the same thread. The only time a second thread is used is by the "training" or "filtering" dialogs. If "training" is running, then this thread will be updating the database - however, in that case, the dialog is up, which is modal, so there is no way the other thread could be doing a training operation. The passage you quoted doesn't rule out the possibility that this error could occur even if only one thread is writing, but another is reading. If that is a problem, then yes, we would hit it :) Mark. From barry at python.org Tue Aug 26 15:18:13 2003 From: barry at python.org (Barry Warsaw) Date: Tue Aug 26 10:18:20 2003 Subject: [spambayes-dev] Re: [Spambayes] fatal error? In-Reply-To: <022b01c36bd9$48bc96a0$f502a8c0@eden> References: <022b01c36bd9$48bc96a0$f502a8c0@eden> Message-ID: <1061907454.23837.13.camel@yyz> On Tue, 2003-08-26 at 09:52, Mark Hammond wrote: > The addin is "mainly" single-threaded - Outlook always calls us from the > same thread. The only time a second thread is used is by the "training" or > "filtering" dialogs. If "training" is running, then this thread will be > updating the database - however, in that case, the dialog is up, which is > modal, so there is no way the other thread could be doing a training > operation. > > The passage you quoted doesn't rule out the possibility that this error > could occur even if only one thread is writing, but another is reading. If > that is a problem, then yes, we would hit it :) Note that for the Berkeley-based storages, I had to implement application level locking for all reads and writes, but in that "application" there are definitely multiple threads doing both. Berkeley itself has a lock subsystem, but I couldn't trust that because it is statically allocated and there are situations during some transactions where an unbounded number of pages could get touched, exhausting the lock table. So I ditched Berkeley's locks and used an application level (threading) lock. FWIW, -Barry p.s. at one point there was talk of a Usenet group for BerkeleyDB programmers. I sure wish that existed. From skip at pobox.com Tue Aug 26 10:17:57 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Aug 26 10:18:31 2003 Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme] In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAF4D@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAF4D@its-xchg4.massey.ac.nz> Message-ID: <16203.27669.269986.138770@montanaro.dyndns.org> Tony> Instead of removing them, I think that we might want to move Tony> towards an integer verbose level at some point, which could mean Tony> that they are only printed at a high enough level. The last time Tony> this was suggested it got some +0's, but nothing else, so I Tony> presume people don't really care either way. I think if we are going to get that sophisticated, we might as well use the logging module, though that would break 2.2 compatibility. In any case, I've seen no crying need for anything beside normal and verbose. Tony> It's not urgent, though (IMO), and could wait until a later release. Agreed. Skip From barry at python.org Tue Aug 26 15:28:11 2003 From: barry at python.org (Barry Warsaw) Date: Tue Aug 26 10:28:12 2003 Subject: [spambayes-dev] Re: [Spambayes] fatal error? In-Reply-To: <1061907454.23837.13.camel@yyz> References: <022b01c36bd9$48bc96a0$f502a8c0@eden> <1061907454.23837.13.camel@yyz> Message-ID: <1061908058.23837.19.camel@yyz> On Tue, 2003-08-26 at 10:17, Barry Warsaw wrote: > Note that for the Berkeley-based storages, I had to implement > application level locking for all reads and writes, but in that > "application" there are definitely multiple threads doing both. > Berkeley itself has a lock subsystem, but I couldn't trust that because > it is statically allocated and there are situations during some > transactions where an unbounded number of pages could get touched, > exhausting the lock table. So I ditched Berkeley's locks and used an > application level (threading) lock. Tim helpfully reminds me that you're using the dbapi to Berkeley, which doesn't create an environment. I've never actually run Berkeley without an environment (i.e. a directory that contains all the db files) and IIRC we couldn't find much information on running Berkeley in that type of configuration. -Barry From skip at pobox.com Tue Aug 26 10:56:08 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Aug 26 10:56:21 2003 Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme] In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302EDB0E6@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1302EDB0E6@its-xchg4.massey.ac.nz> Message-ID: <16203.29960.949254.86283@montanaro.dyndns.org> Tony> It just seems that every time things are stable enough to label Tony> the release beta, something comes up. Maybe we should go into feature freeze after 1.0a5 is released, so we can focus on bug fixes. Skip From skip at pobox.com Tue Aug 26 11:06:37 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Aug 26 11:06:57 2003 Subject: [spambayes-dev] Re: [Spambayes] fatal error? In-Reply-To: <022b01c36bd9$48bc96a0$f502a8c0@eden> References: <16203.24926.483126.523549@montanaro.dyndns.org> <022b01c36bd9$48bc96a0$f502a8c0@eden> Message-ID: <16203.30589.46546.16580@montanaro.dyndns.org> Mark> The passage you quoted doesn't rule out the possibility that this Mark> error could occur even if only one thread is writing, but another Mark> is reading. If that is a problem, then yes, we would hit it :) I'll check with Sleepycat, but it seems to me that the most expedient course would be to acquire a lock around database accesses. Skip From skip at pobox.com Tue Aug 26 11:09:20 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Aug 26 11:09:36 2003 Subject: [spambayes-dev] Re: [Spambayes] fatal error? In-Reply-To: <1061907454.23837.13.camel@yyz> References: <022b01c36bd9$48bc96a0$f502a8c0@eden> <1061907454.23837.13.camel@yyz> Message-ID: <16203.30752.500092.713526@montanaro.dyndns.org> Barry> p.s. at one point there was talk of a Usenet group for BerkeleyDB Barry> programmers. I sure wish that existed. I didn't know Sleepycat had a time machine: http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&group=comp.databases.berkeley-db Have they been talking to Guido? Skip From tim.one at comcast.net Tue Aug 26 12:45:04 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Aug 26 11:46:45 2003 Subject: [spambayes-dev] Re: [Spambayes] fatal error? In-Reply-To: <16203.30589.46546.16580@montanaro.dyndns.org> Message-ID: [Skip] > I'll check with Sleepycat, but it seems to me that the most expedient > course would be to acquire a lock around database accesses. Brrrr. Running a Berkeley backend is already soooooo much slower than running from a dict. I didn't really notice that until the SoBig worm turds starting swamping my inbox, but after a few days of that I switched back to using a pickled dict. Adding a lock around each stinkin' access is a good way to soak up excess cycles, anyway . From skip at pobox.com Tue Aug 26 13:32:38 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Aug 26 13:38:01 2003 Subject: [spambayes-dev] Re: [Spambayes] fatal error? In-Reply-To: References: <16203.30589.46546.16580@montanaro.dyndns.org> Message-ID: <16203.39350.978784.935238@montanaro.dyndns.org> >> I'll check with Sleepycat, but it seems to me that the most expedient >> course would be to acquire a lock around database accesses. Tim> Brrrr. Running a Berkeley backend is already soooooo much slower Tim> than running from a dict. I didn't really notice that until the Tim> SoBig worm turds starting swamping my inbox, but after a few days Tim> of that I switched back to using a pickled dict. Adding a lock Tim> around each stinkin' access is a good way to soak up excess cycles, Tim> anyway . I suspect that the Outlook plugin simply makes it easier to find problems (more users, more worm mail, more concurrent threads, whatever). I think the same (or a similar) problem would exist were two instances of hammiefilter running at the same time, both trying to update the file. I'm just fortunate enough to have never encountered that problem. Even using a pickle, you really ought to use some sort of lock protocol when reading or writing the pickle file if there's any chance of concurrent access by another process or thread. That you only read it at the beginning and write it at the end only limits the opportunity for collision. I just (re)ran a little experiment. (I'm sure we've done this in the past.) I took my current hammie.db (153685 keys, no hapaxes, the result of processing 11,000+ hams and 8,000+ spams) and converted it to a pickle using dbExpImp. Startup time is dramatically different: % time python -c 'import pickle ; db = pickle.load(open("hammie.pck"))' real 0m32.193s user 0m22.850s sys 0m0.430s % time python -c 'import cPickle ; db = cPickle.load(open("hammie.pck"))' real 0m5.650s user 0m3.720s sys 0m0.350s % time python -c 'import shelve ; db = shelve.open("hammie.db")' real 0m0.155s user 0m0.050s sys 0m0.050s This is not to imply that my huge database is typical or that my usage of hammiefilter is either. Using pickles for moderately sized training databases would probably work, regardless of the application. With long-running SB apps like the Outlook plugin or pop3proxy, pickles are probably the way to go. (Maybe it's time to give up on hammiefilter altogether.) Skip From skip at pobox.com Tue Aug 26 15:21:17 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Aug 26 15:21:29 2003 Subject: [spambayes-dev] db_* binaries for Windows, DB_RECOVER Message-ID: <16203.45869.990264.444596@montanaro.dyndns.org> Dave Segleau at Sleepycat confirmed for me that making Windows binaries of the Sleepycat db_* utilities available from the SpamBayes website would be okay. Who can take the time to make them available? We would need to make sure that db_recover will actually fix Mark Tabash's (and others?) database file. He also said that running db_recover isn't the correct solution. The correct way to do this is to create an environment object with the DB_RECOVER flag set. That's not compatible with the anydbm interface however. I see a few possible solutions: * Special case the situation where whichdb.whichdb() returns "dbhash" and make direct calls to the relevant bsddb package functions to create a db object which is resilient in a multi-threaded environment. This might be done either using Python's lock facilities or using Sleepycat's environment locks. * Modify the behavior of bsddb.hashopen() (and cousins) so that it creates a DBEnv object with the DB_RECOVER flag and passes it to the DB() constructor: def hashopen(....): flags = bsddb._checkflag(flag) d = bsddb.db.DB(bsddb.db.DBEnv(bsddb.db.DB_RECOVER)) ... bsddb.hashopen = hashopen * Provide locks around all database file accesses. Skip From richie at entrian.com Tue Aug 26 21:38:42 2003 From: richie at entrian.com (Richie Hindle) Date: Tue Aug 26 15:38:48 2003 Subject: [spambayes-dev] db_* binaries for Windows, DB_RECOVER In-Reply-To: <16203.45869.990264.444596@montanaro.dyndns.org> References: <16203.45869.990264.444596@montanaro.dyndns.org> Message-ID: [Skip] > Dave Segleau at Sleepycat confirmed for me that making Windows binaries of > the Sleepycat db_* utilities available from the SpamBayes website would be > okay. Who can take the time to make them available? I already have, for db_recover: http://entrian.com/db_recover.zip Let me know if you need any more and I'll put them up. > We would need to make sure that db_recover will actually fix Mark > Tabash's (and others?) database file. That's the problem. I couldn't make head nor tail of how to use it to recover a Spambayes database - it expects the database to be a directory (full of files under bsddb's control) rather than a single file. I assume this is what Sleepycat mean by an "environment". For what it's worth, I'm not 100% convinced that what we have is a threading problem. I keep getting a corrupt spambayes.messageinfo.db, and I'm pretty sure that's only ever accessed by one thread. I even added debug statements to print the thread ID, and I only ever saw access from one thread. -- Richie Hindle richie@entrian.com From richie at entrian.com Tue Aug 26 22:43:21 2003 From: richie at entrian.com (Richie Hindle) Date: Tue Aug 26 16:43:33 2003 Subject: [spambayes-dev] db_* binaries for Windows, DB_RECOVER In-Reply-To: <16203.45869.990264.444596@montanaro.dyndns.org> References: <16203.45869.990264.444596@montanaro.dyndns.org> Message-ID: <5cgnkv8f9vdd16l2e6ovm18da8lot8r7ob@4ax.com> [Skip] > I see a few possible solutions: > > * [...] * Use a different embedded database? PySQLite? It's just as easy to install (on Windows at least) as pybsddb for Python 2.2, and although I've never used it, I've heard good things about it. Does anyone here have any experience with it? -- Richie Hindle richie@entrian.com From skip at pobox.com Tue Aug 26 17:48:01 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Aug 26 17:48:14 2003 Subject: [spambayes-dev] db_* binaries for Windows, DB_RECOVER In-Reply-To: <5cgnkv8f9vdd16l2e6ovm18da8lot8r7ob@4ax.com> References: <16203.45869.990264.444596@montanaro.dyndns.org> <5cgnkv8f9vdd16l2e6ovm18da8lot8r7ob@4ax.com> Message-ID: <16203.54673.552408.583277@montanaro.dyndns.org> >> I see a few possible solutions: >> >> * [...] Richie> * Use a different embedded database? PySQLite? It's just as Richie> easy to install (on Windows at least) as pybsddb for Python Richie> 2.2, and although I've never used it, I've heard good things Richie> about it. Does anyone here have any experience with it? I have none. I briefly played around with PostgreSQL and found it much slower than the anydbm-based storage. That might just have been because I am not a very sophisticated SQL programmer. Isn't SQLite supposed to be an embedded SQL engine? If so, where's the database and how is it shared across (for example) two instances of hammiefilter? I think the cleanest way to do this would be to run a server which simply fronts a pickle. All apps would talk to it for reading and updating the info. You run into performance problems with network overhead and it makes deploying all applications that much more complex. Skip From romain.guy at jext.org Wed Aug 27 02:31:06 2003 From: romain.guy at jext.org (Romain GUY) Date: Tue Aug 26 19:35:37 2003 Subject: [spambayes-dev] Outlook express abandon Message-ID: <20038271316.824848@Thinthalion> I don't know if this news reached the mailing list but anyway. Microsoft finally announced they won't drop Outlook Express development and support : http://news.zdnet.co.uk/software/applications/0,39020384,39115720,00.htm -- Romain GUY romain.guy@jext.org http://www.jext.org http://progx.jext.org From T.A.Meyer at massey.ac.nz Wed Aug 27 12:36:06 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Aug 26 19:36:57 2003 Subject: [spambayes-dev] RE: [Spambayes] FAQ Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130308BC01@its-xchg4.massey.ac.nz> > I changed the footer for both the digest and non-digest > versions of the list to include an admonition that people > check the FAQ before posting questions. This is a check that > I didn't screw it up somehow. Good idea. We should probably also update reply.txt since there aren't the problems with the Outlook plug-in that there was when it was created, and we could make the mention of the FAQ more prominent. >From memory, Skip, Tim or Barry have to do this, right? =Tony Meyer From tim.one at comcast.net Tue Aug 26 22:58:07 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Aug 26 22:05:49 2003 Subject: [spambayes-dev] Re: [Spambayes] fatal error? In-Reply-To: <16203.39350.978784.935238@montanaro.dyndns.org> Message-ID: [Skip] > I suspect that the Outlook plugin simply makes it easier to find > problems (more users, more worm mail, more concurrent threads, > whatever). Is that relevant? I've never seen a database corruption complaint from someone using the Outlook addin (did I miss one?), and I deliberately switched my 3 classifiers to Berkeley in order to try to provoke one. No luck. IIRC, Mark has never seen this either. The first message in this thread: http://mail.python.org/pipermail/spambayes-dev/2003-August/000873.html was copied to spambayes-dev from some other source, and was missing sufficient context to tell what it was talking about. Trying to track the source down probably leads to here: http://mail.python.org/pipermail/spambayes/2003-August/007311.html If so, the OP was running on Windows, but was almost certainly not using the Outlook addin: Now I'm getting an error message in the email my headers: X-Spambayes-Exception: bsddb._DBRunRecoveryError ((-30982, 'DB_RUNRECOVERY: Fatal error, run database recovery -- fatal region error detected; run recovery')) in __getitem__() at C:\PTYTHON23\lib\bsddb\__init.py line 86: return self.db[key] The Outlook addin never inserts email headers, so I don't believe that fellow's problem had anything to do with the addin. > I think the same (or a similar) problem would exist were two > instances of hammiefilter running at the same time, both trying > to update the file. I'm just fortunate enough to have never > encountered that problem. Even using a pickle, you really ought to > use some sort of lock protocol when reading or writing the pickle > file if there's any chance of concurrent access by another process or > thread. That you only read it at the beginning and write it at the > end only limits the opportunity for collision. Python dicts are safe for multiple-reader single-writer access without explicit synchronization, and per-access locks are so bloody expensive that I don't want to change anything in the absence of proof that there's a problem that can't be wormed around more cheaply. To date, I don't believe we've seen any report of corruption via the Outlook addin, which suggests it's doing something right . > I just (re)ran a little experiment. (I'm sure we've done this in the > past.) I took my current hammie.db (153685 keys, no hapaxes, the > result of processing 11,000+ hams and 8,000+ spams) and converted it > to a pickle using dbExpImp. Startup time is dramatically different: Of course. > % time python -c 'import pickle ; db = > pickle.load(open("hammie.pck"))' > > real 0m32.193s > user 0m22.850s > sys 0m0.430s > % time python -c 'import cPickle ; db = > cPickle.load(open("hammie.pck"))' > > real 0m5.650s > user 0m3.720s > sys 0m0.350s > % time python -c 'import shelve ; db = shelve.open("hammie.db")' > > real 0m0.155s > user 0m0.050s > sys 0m0.050s > > This is not to imply that my huge database is typical or that my > usage of hammiefilter is either. Using pickles for moderately sized > training databases would probably work, regardless of the > application. With long-running SB apps like the Outlook plugin or > pop3proxy, pickles are probably the way to go. (Maybe it's time to > give up on hammiefilter altogether.) I don't know about hammiefilter (haven't used it). I'll remind that the original spambayes design was done with the expectation that the "big dict" would eventually be replaced by a BTree stored in ZODB. That's still a nearly perfect database for spambayes, although only Jeremy pursued it (I continue to feel guilty about it, though ). From T.A.Meyer at massey.ac.nz Wed Aug 27 15:23:49 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Aug 26 22:29:33 2003 Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme] Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130308BD02@its-xchg4.massey.ac.nz> > Maybe we should go into feature freeze after 1.0a5 is > released, so we can focus on bug fixes. What about we combine this with Richie's suggestion: * We release 1.0a5 any time now. * We rename the scripts and move them into the scripts directory and cut the backwards compatibility code from the options and immediately release 1.0a6. * We go into feature freeze for a while and then release 1.0b1. (All this excludes the Outlook2000 directory, of course ;) =Tony Meyer From anthony at interlink.com.au Wed Aug 27 15:15:50 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Wed Aug 27 00:16:12 2003 Subject: [spambayes-dev] Formatting bugs in the auto-responder message In-Reply-To: Message-ID: <200308270415.h7R4FoWC022078@localhost.localdomain> A couple of underlines are horked in the auto-responder message: >>> spambayes-bounces@python.org wrote > READ THIS! (If you want help.) > > > What is Spambayes? ------------------ > > > I found a bug. -------------- > -- Anthony Baxter It's never too late to have a happy childhood. From ta-meyer at ihug.co.nz Wed Aug 27 20:19:32 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Wed Aug 27 03:20:09 2003 Subject: [spambayes-dev] Sourceforge's up-to-24 hour delay Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212AD84@its-xchg4.massey.ac.nz> Although the last newsletter I got from them said that the anonymous cvs server would be back to full speed real soon now, it's still taking a while. This has been fairly annoying on a few occasions recently, when trying to get people to test out bug fixes. Is there some way that (until sourceforge recovers) we could get the script that processes the checkins (the one that sends the email) to also create an up-to-date tarball and put it online somewhere? (spambayes.org/downloads/currentcvs.tgz or something). If this is a reasonable idea, would someone be able to put it together? I presume for this sort of thing you need to be an admin, rather than just a developer. =Tony Meyer From skip at pobox.com Wed Aug 27 10:35:15 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Aug 27 10:35:43 2003 Subject: [spambayes-dev] RE: [Spambayes] FAQ In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130308BC01@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130308BC01@its-xchg4.massey.ac.nz> Message-ID: <16204.49571.34410.320571@montanaro.dyndns.org> Tony> We should probably also update reply.txt since there aren't the Tony> problems with the Outlook plug-in that there was when it was Tony> created, and we could make the mention of the FAQ more prominent. Tony> From memory, Skip, Tim or Barry have to do this, right? Yes. Anyone with CVS update privileges can check in changes to the reply.txt file in the website top level dir then let one of us know about it. I'm in the midst of a protracted house move (moving out of the old house before the new one is ready - great fun), so I'm suffering from low availability during off-hours at the moment, but if you drop me a note I'll try and adjust the auto-response text in Mailman. Skip From skip at pobox.com Wed Aug 27 10:45:07 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Aug 27 10:45:26 2003 Subject: [spambayes-dev] Re: [Spambayes] fatal error? In-Reply-To: References: <16203.39350.978784.935238@montanaro.dyndns.org> Message-ID: <16204.50163.430449.976434@montanaro.dyndns.org> Tim> [Skip] >> I suspect that the Outlook plugin simply makes it easier to find >> problems (more users, more worm mail, more concurrent threads, >> whatever). Tim> Is that relevant? I've never seen a database corruption complaint Tim> from someone using the Outlook addin (did I miss one?), and I Tim> deliberately switched my 3 classifiers to Berkeley in order to try Tim> to provoke one. No luck. IIRC, Mark has never seen this either. I guess I was mistaken. Sorry about that. Tim> If so, the OP was running on Windows, but was almost certainly not Tim> using the Outlook addin: Tim> Now I'm getting an error message in the email my Tim> headers: X-Spambayes-Exception: bsddb._DBRunRecoveryError Tim> ((-30982, 'DB_RUNRECOVERY: Fatal error, run database recovery -- Tim> fatal region error detected; run recovery')) in __getitem__() at Tim> C:\PTYTHON23\lib\bsddb\__init.py line 86: return self.db[key] Tim> The Outlook addin never inserts email headers, so I don't believe Tim> that fellow's problem had anything to do with the addin. I have this bad habit of jumping to the conclusion that the user was running the Outlook plugin if a traceback is posted which includes "C:\...". This would have then been an error in pop3proxy I guess. >> I think the same (or a similar) problem would exist were two >> instances of hammiefilter running at the same time, both trying to >> update the file. I'm just fortunate enough to have never encountered >> that problem. Even using a pickle, you really ought to use some sort >> of lock protocol when reading or writing the pickle file if there's >> any chance of concurrent access by another process or thread. That >> you only read it at the beginning and write it at the end only limits >> the opportunity for collision. Tim> Python dicts are safe for multiple-reader single-writer access Tim> without explicit synchronization, and per-access locks are so Tim> bloody expensive that I don't want to change anything in the Tim> absence of proof that there's a problem that can't be wormed around Tim> more cheaply. To date, I don't believe we've seen any report of Tim> corruption via the Outlook addin, which suggests it's doing Tim> something right . Skip> ... Startup time is dramatically different: Tim> Of course. [ times elided ] >> This is not to imply that my huge database is typical or that my >> usage of hammiefilter is either. Tim> I don't know about hammiefilter (haven't used it). My only reason for referring to hammiefilter is that its runtime is dominated by startup and shutdown costs, since all it does is train on or score a single message. That makes the pickle/dict solution painfully slow. Were it not for the presence of one-shot apps like hammiefilter, we could probably just use a pickle for storage and be done with it. Skip From skip at pobox.com Wed Aug 27 10:46:18 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Aug 27 10:46:41 2003 Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme] In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130308BD02@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130308BD02@its-xchg4.massey.ac.nz> Message-ID: <16204.50234.942808.115814@montanaro.dyndns.org> Tony> What about we combine this with Richie's suggestion: ... Fine with me. I won't really be able to contribute anything for the next week or two I don't think. Skip From skip at pobox.com Wed Aug 27 10:53:11 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Aug 27 10:54:49 2003 Subject: [spambayes-dev] Formatting bugs in the auto-responder message In-Reply-To: <200308270415.h7R4FoWC022078@localhost.localdomain> References: <200308270415.h7R4FoWC022078@localhost.localdomain> Message-ID: <16204.50647.615545.604379@montanaro.dyndns.org> Anthony> A couple of underlines are horked in the auto-responder message: Alas, it appears to be a Mailman problem. This URL http://spambayes.sf.net/reply.txt is simply pasted into the auto-response text field of the Mailman config stuff. It looks okay in the text area. Skip From vanhorn at whidbey.com Wed Aug 27 09:09:56 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Wed Aug 27 11:10:16 2003 Subject: [spambayes-dev] Re: [Spambayes] fatal error? References: <16203.39350.978784.935238@montanaro.dyndns.org> <16204.50163.430449.976434@montanaro.dyndns.org> Message-ID: <3F4CC9C4.CDF7813@whidbey.com> I didn't follow the start of this closely, but I understand the conclusion you have been jumping to, even though I personally run pop3proxy on a couple of Windows machines. That C:\... reference is sort of a giveaway. But what's that right after, they're using pTython23 as the directory? I suspect a typo, one that Windows should be able to find with a search for files containing the string "ptython". Van Skip Montanaro wrote: > Tim> [Skip] > >> I suspect that the Outlook plugin simply makes it easier to find > >> problems (more users, more worm mail, more concurrent threads, > >> whatever). > > Tim> Is that relevant? I've never seen a database corruption complaint > Tim> from someone using the Outlook addin (did I miss one?), and I > Tim> deliberately switched my 3 classifiers to Berkeley in order to try > Tim> to provoke one. No luck. IIRC, Mark has never seen this either. > > I guess I was mistaken. Sorry about that. > > Tim> If so, the OP was running on Windows, but was almost certainly not > Tim> using the Outlook addin: > > Tim> Now I'm getting an error message in the email my > Tim> headers: X-Spambayes-Exception: bsddb._DBRunRecoveryError > Tim> ((-30982, 'DB_RUNRECOVERY: Fatal error, run database recovery -- > Tim> fatal region error detected; run recovery')) in __getitem__() at > Tim> C:\PTYTHON23\lib\bsddb\__init.py line 86: return self.db[key] > > Tim> The Outlook addin never inserts email headers, so I don't believe > Tim> that fellow's problem had anything to do with the addin. > > I have this bad habit of jumping to the conclusion that the user was running > the Outlook plugin if a traceback is posted which includes "C:\...". This > would have then been an error in pop3proxy I guess. > > >> I think the same (or a similar) problem would exist were two > >> instances of hammiefilter running at the same time, both trying to > >> update the file. I'm just fortunate enough to have never encountered > >> that problem. Even using a pickle, you really ought to use some sort > >> of lock protocol when reading or writing the pickle file if there's > >> any chance of concurrent access by another process or thread. That > >> you only read it at the beginning and write it at the end only limits > >> the opportunity for collision. > > Tim> Python dicts are safe for multiple-reader single-writer access > Tim> without explicit synchronization, and per-access locks are so > Tim> bloody expensive that I don't want to change anything in the > Tim> absence of proof that there's a problem that can't be wormed around > Tim> more cheaply. To date, I don't believe we've seen any report of > Tim> corruption via the Outlook addin, which suggests it's doing > Tim> something right . > > Skip> ... Startup time is dramatically different: > > Tim> Of course. > > [ times elided ] > > >> This is not to imply that my huge database is typical or that my > >> usage of hammiefilter is either. > > Tim> I don't know about hammiefilter (haven't used it). > > My only reason for referring to hammiefilter is that its runtime is > dominated by startup and shutdown costs, since all it does is train on or > score a single message. That makes the pickle/dict solution painfully slow. > Were it not for the presence of one-shot apps like hammiefilter, we could > probably just use a pickle for storage and be done with it. > > Skip > > _______________________________________________ > spambayes-dev mailing list > spambayes-dev@python.org > http://mail.python.org/mailman/listinfo/spambayes-dev -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From richie at entrian.com Wed Aug 27 23:49:15 2003 From: richie at entrian.com (Richie Hindle) Date: Wed Aug 27 17:49:24 2003 Subject: [spambayes-dev] db_* binaries for Windows, DB_RECOVER In-Reply-To: <16203.54673.552408.583277@montanaro.dyndns.org> References: <16203.45869.990264.444596@montanaro.dyndns.org> <5cgnkv8f9vdd16l2e6ovm18da8lot8r7ob@4ax.com> <16203.54673.552408.583277@montanaro.dyndns.org> Message-ID: <9q7qkv4dkj0vanhk8fbva9j61bnf4clsnn@4ax.com> [Skip] > Isn't SQLite supposed to be an embedded SQL engine? If so, where's the > database and how is it shared across (for example) two instances of > hammiefilter? The database is just a file. It supports multithreaded operation, and as far as I can tell at a quicj glance that extends to multiprocess operation. You just use a different connection in each thread. Tim, or anyone who knows - is ZODB (without ZEO, which as I understand it is essentially what Skip suggests below) shareable across threads/processes? > I think the cleanest way to do this would be to run a server which simply > fronts a pickle. All apps would talk to it for reading and updating the > info. You run into performance problems with network overhead and it makes > deploying all applications that much more complex. This is something I've talked about in the past (but talk is cheap) and which Neale Pickett kind of implemented with hammiesrv. hammiesrv is a message-classifying XMLRPC server, whereas you seems to proposing more of a database server, but the underlying idea is the same. I'd envisaged pop3proxy becoming a component of a generic spambayes server, which serves or proxies POP3, SMTP, HTML/HTTP, XMLRPC and any other protocols we need. With the exception of XMLRPC, it's all that already (and it even has a non-human HTTP client in the shape of proxytee.py). But as you say, it's another installation/maintenance headache. Hmm, it seems talk is still cheap. 8-) -- Richie Hindle richie@entrian.com From mhammond at skippinet.com.au Thu Aug 28 21:48:32 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Aug 28 06:48:32 2003 Subject: [spambayes-dev] pop3proxy binaries Message-ID: <0ef101c36d51$ecf1e440$f502a8c0@eden> I pretty much have py2exe and SpamBayes working together. The new py2exe code I am helping with allows us to create a binary distribution for Windows with a .zip file containing *all* Python code, and and arbitrary number of executables which share this .zip for their Python library. Thus, each new .exe/.dll is <30k, meaning we can have as many as we like :) I have a .dll for Outlook, and a "windows" and a "service" exe for pop3proxy - and single installation .exe that detects if Outlook is installed and does "the right thing" would be almost trivial. I expect to check in my scripts etc soon. However, this does have an impact on pop3proxy, in terms of "out of the box" setup. Off the top of my head: * We should ensure only 1 proxy is running on the machine (ie, prevent starting either the service or the .exe twice) * We should think about where the databases are stored (the "program files" directory where we install is probably not appropriate - but a "per user" database directory makes no sense for a .exe * Consider a "start_pop3proxy" program that "does the right thing" depending on the platform and configuration. Eg, it could start the correct program (service if not running but installed, standard exe otherwise) and fire the browser to the config URL if it detects it is unconfigured etc. * other stuff :) FWIW, we could detect a "binary build" by checking is sys.frozen exists. If it does, we can also assume the win32all extensions are available - eg, we could check for a single instance by using a global, named Mutex, etc. I am willing to help out significantly, but I am unable to "drive" anything, as I don't own it, and don't want to. Does this interest anyone enough to take it on with me? At-your-service ly, Mark. From kennypitt at hotmail.com Thu Aug 28 10:48:12 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Thu Aug 28 09:49:16 2003 Subject: [spambayes-dev] db_* binaries for Windows, DB_RECOVER In-Reply-To: <9q7qkv4dkj0vanhk8fbva9j61bnf4clsnn@4ax.com> References: <16203.45869.990264.444596@montanaro.dyndns.org> <5cgnkv8f9vdd16l2e6ovm18da8lot8r7ob@4ax.com> <16203.54673.552408.583277@montanaro.dyndns.org> <9q7qkv4dkj0vanhk8fbva9j61bnf4clsnn@4ax.com> Message-ID: <3F4E081C.6020701@hotmail.com> Richie Hindle wrote: > [Skip] > ... >>I think the cleanest way to do this would be to run a server which simply >>fronts a pickle. All apps would talk to it for reading and updating the >>info. You run into performance problems with network overhead and it makes >>deploying all applications that much more complex. > > > This is something I've talked about in the past (but talk is cheap) and > which Neale Pickett kind of implemented with hammiesrv. hammiesrv is a > message-classifying XMLRPC server, whereas you seems to proposing more of > a database server, but the underlying idea is the same. I'd envisaged > pop3proxy becoming a component of a generic spambayes server, which serves > or proxies POP3, SMTP, HTML/HTTP, XMLRPC and any other protocols we need. > With the exception of XMLRPC, it's all that already (and it even has a > non-human HTTP client in the shape of proxytee.py). Sounds good in theory and in some cases would probably work quite well, but maybe not in all. As an example, pop3proxy is basically machine-specific instead of user-specific, particularly if it is running as a Windows service. If I'm not mistaken, it uses only one database regardless of which user is logged in and making requests through it, so a training data server would serve the same purpose. On the other hand, one of the wonderful things about the Outlook plugin is that it stores training data on a per-user basis. It seems like handling user-specific data on a centralized training server would make things much more complicated. -- Kenny Pitt From richie at entrian.com Thu Aug 28 23:12:36 2003 From: richie at entrian.com (Richie Hindle) Date: Thu Aug 28 17:12:43 2003 Subject: [spambayes-dev] pop3proxy binaries In-Reply-To: <0ef101c36d51$ecf1e440$f502a8c0@eden> References: <0ef101c36d51$ecf1e440$f502a8c0@eden> Message-ID: [Mark] > Thus, each new .exe/.dll is <30k, meaning we can have as many as we like :) > I have a .dll for Outlook, and a "windows" and a "service" exe for pop3proxy > - and single installation .exe that detects if Outlook is installed and does > "the right thing" would be almost trivial. Very cool! > * We should ensure only 1 proxy is running on the machine (ie, prevent > starting either the service or the .exe twice) It's not quite as simple as that - there's no reason you can't run multiple POP3 proxies on different ports, either as multiple listening sockets under the same server, or as different processes with different databases. Whether that's relevant for a binary release I don't know - perhaps the binary release should be simplified. We should note that Windows doesn't always give you a bind() failure when you try to bind() to a port that's already bound - you're probably right about needing a mutex or similar. > * We should think about where the databases are stored (the "program files" > directory where we install is probably not appropriate - but a "per user" > database directory makes no sense for a .exe SHGetFolderPath(CSIDL_APPDATA)? > * Consider a "start_pop3proxy" program that "does the right thing" depending > on the platform and configuration. Eg, it could start the correct program > (service if not running but installed, standard exe otherwise) and fire the > browser to the config URL if it detects it is unconfigured etc. The way I've envisaged this working (in an ideal world, because it would probably be a significant effort) is that there's one executable which gives you several options when you first run it: o install and run as a service (on systems that support it) o on 9x, install in RunServices o add the web UI homepage to your IE bookmarks o install a tray icon with a simple 'stop/start/launch UI' menu o auto-configure OE to point to the proxy, and the proxy to point to OE's configured POP3 server(s) (Tony has talked about this before). I'd also love to see a traditional 'Windows executable' wrapper for the UI - just a wrapper that hosts the web UI in an embedded IE, and would run in a separate process, or better, a separate (and totally independent) thread in the pop3proxy process. Or even just a .HTA - provided the server is running, that would be enough. I worry that the fact that the UI only ever appears in a browser will prove too weird for people used to traditional Windows programs. > I am willing to help out significantly, but I am unable to "drive" anything, > as I don't own it, and don't want to. Does this interest anyone enough to > take it on with me? I would love to, but I'm just too pushed for time right now to be in charge of this. I'm going to try to make the service-stopping code more safe this weekend, but I don't have enough time to take this whole thing on as the main developer. I'm happy to act as the resident pop3proxy 'expert', but (like you) I can't drive the project, much as I'd like to. -- Richie Hindle richie@entrian.com From T.A.Meyer at massey.ac.nz Fri Aug 29 12:17:41 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Aug 28 19:19:10 2003 Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme] Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130308C332@its-xchg4.massey.ac.nz> Tony> What about we combine this with Richie's suggestion: Tony> We release 1.0a5 any time now. Tony> * We rename the scripts and move them into the scripts directory and Tony> cut the backwards compatibility code from the options and immediately Tony> release 1.0a6. Tony> * We go into feature freeze for a while and then release 1.0b1. Skip> Fine with me. I won't really be able to contribute anything Skip> for the next week or two I don't think. Richie> That sounds like an excellent plan. Ok, looks like this is what we're going to do. Richie - the only thing that I have left before 1.0a5 is that last smtpproxy bug (well, it's the only one I'm bothered with). I'll also update the changelog, version.py and what's new file today. Do you want to package up 1.0a5 about this time tomorrow? We could aim to get 1.0a6 out four days later (it would be good to have a *little* testing ;), say the 04/09. (If you're too busy celebrating your getting-older-ness, I could do 1.0a6). =Tony Meyer From ta-meyer at ihug.co.nz Fri Aug 29 14:12:59 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Thu Aug 28 21:13:43 2003 Subject: [spambayes-dev] Bug messages Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212ADB3@its-xchg4.massey.ac.nz> Now this isn't important, but I'm curious :) Recently (I think), the messages to the bug list are missing a space. Specifically, the space after the second word of the (bug) subject. For example: [spambayes-bugs] [ spambayes-Feature Requests-791246 ] IMAP: keepnew messages unread Summary: IMAP: keep new messages unread Is this some weird sourceforge thing, or something we have setup wrong? (Or somehow something I've got wrong? ;) =Tony Meyer From T.A.Meyer at massey.ac.nz Fri Aug 29 15:10:30 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Aug 28 22:11:43 2003 Subject: [spambayes-dev] pop3proxy binaries Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130308C403@its-xchg4.massey.ac.nz> > I pretty much have py2exe and SpamBayes working together. ~~~~~~~~~~~ :) > I expect to check in my scripts etc soon. Will these work for anyone else? (i.e. do you have a magic version of py2exe that you and Thomas are working on?) > * We should think about where the databases are stored (the > "program files" directory where we install is probably not > appropriate - but a "per user" database directory makes no > sense for an .exe Wouldn't the same setup as the Outlook plug-in make sense? Once a location is decided I'm happy to write a script that sets the appropriate options to the correct values. > * Consider a "start_pop3proxy" program [...] It shouldn't be that hard to put this together, so I can do this. > I am willing to help out significantly, but I am unable to > "drive" anything, as I don't own it, and don't want to. > > Does this interest anyone enough to take it on with me? I get the terrible feeling that this will end up on the quotes page under you taking on the Outlook plugin , but since Richie's too busy, my hand is up. I don't know pop3proxy as well as Richie, but as long as he keeps going on being the expert , then I'm happy to take the distribution part on. Although I don't use pop3proxy a huge amount myself, it is the one that I've installed on family machines (including one at home)... Tell me what I need to do :) =Tony Meyer From jeremy at alum.mit.edu Thu Aug 28 23:48:24 2003 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu Aug 28 22:49:31 2003 Subject: [spambayes-dev] db_* binaries for Windows, DB_RECOVER In-Reply-To: <9q7qkv4dkj0vanhk8fbva9j61bnf4clsnn@4ax.com> References: <16203.45869.990264.444596@montanaro.dyndns.org> <5cgnkv8f9vdd16l2e6ovm18da8lot8r7ob@4ax.com> <16203.54673.552408.583277@montanaro.dyndns.org> <9q7qkv4dkj0vanhk8fbva9j61bnf4clsnn@4ax.com> Message-ID: <1062125303.13897.361.camel@localhost.localdomain> On Wed, 2003-08-27 at 17:49, Richie Hindle wrote: > Tim, or anyone who knows - is ZODB (without ZEO, which as I understand it > is essentially what Skip suggests below) shareable across > threads/processes? Yes, both. All storages can be shared by multiple threads in a single process. To share a storage among multiple processes, you must use ZEO. Skip's suggestion below is a big vague, so I'm not sure why (or why not) you would think that's what ZEO is. If you want to share a database across processes and/or machines, you've got to have some kind of IPC. ZEO uses sockets to share access to a single storage. It is indeed hard to run a ZEO server that it is to run a single database. You've got to worry about whether the server process is running in addition to the client process. You've got two applications to configure. There are more performance issues to think about. On the other hand, ZEO is widely used in the Zope community. A lot of the issues have been worked out. Jeremy From richie at entrian.com Fri Aug 29 07:27:21 2003 From: richie at entrian.com (Richie Hindle) Date: Fri Aug 29 01:27:28 2003 Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme] In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130308C332@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130308C332@its-xchg4.massey.ac.nz> Message-ID: [Tony] > * We release 1.0a5 any time now. > * We rename the scripts and move them into the scripts directory and > cut the backwards compatibility code from the options and immediately > release 1.0a6. > * We go into feature freeze for a while and then release 1.0b1. [Tony again, after some discussion] > Ok, looks like this is what we're going to do. Richie - the only thing > that I have left before 1.0a5 is that last smtpproxy bug (well, it's the > only one I'm bothered with). I'll also update the changelog, version.py > and what's new file today. Do you want to package up 1.0a5 about this > time tomorrow? We could aim to get 1.0a6 out four days later (it would > be good to have a *little* testing ;), say the 04/09. (If you're too > busy celebrating your getting-older-ness, I could do 1.0a6). I need to improve the pop3proxy service shutdown code, which I may not be able to do until Saturday, but yes, I'll package 1.0a5 as soon as it's done. Any more for any more? -- Richie Hindle richie@entrian.com From T.A.Meyer at massey.ac.nz Fri Aug 29 16:47:46 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Aug 29 01:43:14 2003 Subject: [spambayes-dev] pop3proxy binaries Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130308C496@its-xchg4.massey.ac.nz> [Mark] > a single installation .exe that detects if Outlook is > installed and does "the right thing" would be almost trivial. Of course it should also check if the user is using pop3 or imap, and do the right thing then, too ;) [Richie] > It's not quite as simple as that - there's no reason you can't run > multiple POP3 proxies on different ports, either as multiple listening > sockets under the same server, or as different processes with > different databases. Whether that's relevant for a binary release I > don't know - perhaps the binary release should be simplified. I think we could make this a restriction of the binary release. If people want to do something more esoteric, then they can get Python & the source. > o on 9x, install in RunServices What's RunServices? (I never really used 9x much). > o install a tray icon with a simple 'stop/start/launch UI' menu I have a basic one of these made already (it's pretty simple to make based on the demo that comes with the win32 extensions). I can check it in if you think it's worth having. > o auto-configure OE to point to the proxy, and the proxy to > point to OE's configured POP3 server(s) (Tony has talked about > this before). I have indeed, and always put it off because it's such a major effort. Romain's OE module should make this easier, though (although it only reads folder/message data at the moment, it's a start). OTOH, doing this sort of auto-conf for Eudora/Mozilla mail will be a piece of cake. > I'd also love to see a traditional 'Windows executable' > wrapper for the UI [...] > Or even just a .HTA - provided the > server is running, that would be enough. To create a .hta, all we have to do is save a copy of the page with the .hta extension (into a temp file), and then execute the temp file, right? (plus fill any details in the hta tag that we care about). This doesn't sound that difficult. > I worry that the fact that the UI only > ever appears in a browser will prove too weird for people used to > traditional Windows programs. Me, too, although I'm not sure if a .HTA would be that much more reassuring. I think if we really want to reassure them, then we could build on the tray app, to the extreme of having a tabbed dialog pop up (hmm, where can we find code for that? ) to do the config on. Then the web ui (as nice as it is!) is just for those who don't like tray apps, and non Windows users. (Who are clever enough to use the web ui and not get freaked ). =Tony Meyer From mhammond at skippinet.com.au Fri Aug 29 23:18:14 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri Aug 29 08:19:26 2003 Subject: [spambayes-dev] pop3proxy binaries In-Reply-To: Message-ID: <044201c36e27$9f7c28c0$f502a8c0@eden> > > * We should ensure only 1 proxy is running on the machine > (ie, prevent > > starting either the service or the .exe twice) > > It's not quite as simple as that Indeed. > Whether that's relevant for a binary release I > don't know - > perhaps the binary release should be simplified. I vote we go that option. However, my "simplified" isn't > > * We should think about where the databases are stored (the > > "program files" directory where we install is probably not > > appropriate - but a "per user" database directory makes no > > sense for a .exe Doh - I meant "for a service". > > SHGetFolderPath(CSIDL_APPDATA)? Unfortunately, this is "per user". For a service logged on as the "system" user, this would be a problem (as it would mean a "default service" and a standard .exe would have different directories. How about this for a first cut: * A service installed, but configured for the system user is considered "unconfigured", and will refuse to start. * A mutex named something like "SpamBayes\{username}" is always created - service and .exe. GetCurrentUser() is used to create the mutex name, and SHGetFolderPath(CSIDL_APPDATA) is used. * The "bootstrap" executable is always the "tray icon" program. If the mutex is alreay set, then it simply offers to launch the UI and whatever else we feel necessary. If the mutex is not set, it runs the proxy in process. If this process detects the service running, it could present the exact same UI as if the proxy was running in-process - except it would control the service instead of running the proxy. However, it doesn't sound like being a tray icon is compatible with "RunServices" - but then again CSIDL_APPDATA doesn't either - so maybe we just stick to being a "normal" tray icon process on 9x? Then for later versions someone else figures out what in that doens't work :) > > * Consider a "start_pop3proxy" program that "does the right > thing" depending > > on the platform and configuration. Eg, it could start the > correct program > > (service if not running but installed, standard exe > otherwise) and fire the > > browser to the config URL if it detects it is unconfigured etc. > > The way I've envisaged this working (in an ideal world, > because it would > probably be a significant effort) is that there's one executable which > gives you several options when you first run it: I see no reason this needs to be one executable. Each new exe under this py2exe scheme is <30k, so we should be able to develop a "driver" program that detects the environment, and delegates to the correct "sub-exe". > I'd also love to see a traditional 'Windows executable' > wrapper for the UI > - just a wrapper that hosts the web UI in an embedded IE, and > would run in > a separate process, Aww shucks, I could throw a Pythonwin based one of them together :) At the cost of around 1MB in the installer I just managed to remove (MFC) . The Outlook dialog/wizard infrastructure works almost exclusively with "OptionClass" objects - so a stand-alone Wizard that configured your options would not be impossible. Finding the time to do it is though > or better, a separate (and totally > independent) thread > in the pop3proxy process. That doesn't work for a service, but would work well for a tray-based icon hmmm - I think I will be able to come up with something :) And Tony said: > I have a basic one of these made already (it's pretty simple to make > based on the demo that comes with the win32 extensions). I can check it > in if you think it's worth having. Yes please. In the "windows" directory along with pop3proxy_service.py. And I saw Tony ask about py2exe: It will be in the standard CVS version of py2exe - but the sandbox directory. You can download and configure this tree now (but remember to run setup.py from sandbox). None of the py2exe samples are likely to work, but a simple "standard" py2exe script, as per the docs, should (but there isn't one of them in the "samples" directory - only advanced ones.) I expect to check new code into that sandbox directory before I go to bed :) Mark. From mhammond at skippinet.com.au Fri Aug 29 23:23:09 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri Aug 29 08:23:10 2003 Subject: [spambayes-dev] skippinet.com.au slightly constipated Message-ID: <044701c36e28$4fa9c860$f502a8c0@eden> My ISP has had to dedicate a server soley to handle the sobig traffic to skippinet.com.au. But I only got a small one . If you need to reach me and would like to me read it in the same week it is sent, you should use mhammond shift-2 keypoint.com.au. I've redirected most mailing list and sourceforge traffic directly to this address. Mark. From adam.walker at rbwconsulting.com Fri Aug 29 12:35:44 2003 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Fri Aug 29 11:35:51 2003 Subject: [spambayes-dev] pop3proxy binaries In-Reply-To: <044201c36e27$9f7c28c0$f502a8c0@eden> Message-ID: <20030829153545.77A448627A@plunder.dreamhost.com> > > I'd also love to see a traditional 'Windows executable' > > wrapper for the UI > > - just a wrapper that hosts the web UI in an embedded IE, and > > would run in > > a separate process, > > Aww shucks, I could throw a Pythonwin based one of them together :) At > the > cost of around 1MB in the installer I just managed to remove (MFC) . > > The Outlook dialog/wizard infrastructure works almost exclusively with > "OptionClass" objects - so a stand-alone Wizard that configured your > options > would not be impossible. Finding the time to do it is though > I'll volunteer to work on the tray program. I wrote a python script that needed the minimize to tray function, so I've already worked out how to do that part. --Adam From tim.one at comcast.net Fri Aug 29 12:44:24 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Aug 29 11:45:00 2003 Subject: [spambayes-dev] Bug messages In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130212ADB3@its-xchg4.massey.ac.nz> Message-ID: [Tony Meyer] > Now this isn't important, but I'm curious :) > > Recently (I think), the messages to the bug list are missing a space. > Specifically, the space after the second word of the (bug) subject. > For example: > > [spambayes-bugs] [ spambayes-Feature Requests-791246 ] IMAP: keepnew > messages unread Summary: IMAP: keep new messages unread Heh. I'm not sure what this is an example of. Could you be very explicit about what it is in that two lines of stuff you're talking about? > Is this some weird sourceforge thing, or something we have setup > wrong? (Or somehow something I've got wrong? ;) No idea (neither about what's causing it, nor about what "it" is). From mhammond at keypoint.com.au Sat Aug 30 11:01:11 2003 From: mhammond at keypoint.com.au (Mark Hammond) Date: Fri Aug 29 20:01:40 2003 Subject: [spambayes-dev] Bug messages In-Reply-To: Message-ID: <059201c36e89$d31f3ac0$f502a8c0@eden> > [Tony Meyer] > > Now this isn't important, but I'm curious :) > > > > Recently (I think), the messages to the bug list are > missing a space. > > Specifically, the space after the second word of the (bug) subject. > > For example: > > > > [spambayes-bugs] [ spambayes-Feature Requests-791246 ] IMAP: keepnew > > messages unread Summary: IMAP: keep new messages unread > > Heh. I'm not sure what this is an example of. Could you be > very explicit > about what it is in that two lines of stuff you're talking about? The Subject line of the bug includes the string "IMAP: keepnew", but the bug summary itself contains the string "IMAP: keep new". Something is removing a single space character in the bug summary as it appears in the subject of the mail, but not in the body or anywhere else. I've pondered the same thing. Maybe it is a gravity issue, and that first space can't handle the world down under? Mark. From T.A.Meyer at massey.ac.nz Sat Aug 30 21:47:02 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sat Aug 30 04:47:50 2003 Subject: [spambayes-dev] pop3proxy binaries Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130308C51E@its-xchg4.massey.ac.nz> > How about this for a first cut: [details cut] +1 to all of that. Is writing this stuff part of the guidance you offered, or is it my job? :) > The Outlook dialog/wizard infrastructure works almost > exclusively with "OptionClass" objects - so a stand-alone > Wizard that configured your options would not be impossible. > Finding the time to do it is though I'm happy to do this (probably not for the first release, though), since I'm familiar with the OptionClass objects (or ought to be ;), and would like to be familiar with the Outlook dialog infrastructure. [checking in the simple pop3proxy tray thing that Tony made] > Yes please. In the "windows" directory along with > pop3proxy_service.py. Will do (that's where it was already :). =Tony Meyer From T.A.Meyer at massey.ac.nz Sat Aug 30 21:48:57 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sat Aug 30 04:49:36 2003 Subject: [spambayes-dev] pop3proxy binaries Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130308C51F@its-xchg4.massey.ac.nz> > I'll volunteer to work on the tray program. I wrote a python > script that needed the minimize to tray function, so I've > already worked out how to do that part. I have no huge desire to be doing this particular part, so feel free to rip apart the script that I'll check in shortly (windows/pop3proxy_tray.py). (I actually originally wrote it for the 'overkill' script (which I'm still tinkering with), and I'll keep on working on that version). =Tony Meyer From martin at v.loewis.de Sun Aug 31 01:41:54 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Sat Aug 30 20:42:13 2003 Subject: [spambayes-dev] RE: [Python-Dev] RE: [Spambayes] Question (orpossibly a bug report) In-Reply-To: <20030814023515.GO3095@async.com.br> References: <020901c35236$e5576f10$f502a8c0@eden> <20030814023515.GO3095@async.com.br> Message-ID: Christian Reis writes: > I don't understand this bit. You'd rather use an undocumented API > function than an open source, well-tested, properly licensed set of > functions? Precisely. I don't want to maintain any more floating-point code. Regards, Martin From tim.one at comcast.net Sat Aug 30 22:38:57 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Aug 30 21:39:33 2003 Subject: [spambayes-dev] Bug messages In-Reply-To: <059201c36e89$d31f3ac0$f502a8c0@eden> Message-ID: [Tony Meyer] >>> Now this isn't important, but I'm curious :) >>> >>> Recently (I think), the messages to the bug list are missing a >>> space. Specifically, the space after the second word of the (bug) >>> subject. For example: >>> >>> [spambayes-bugs] [ spambayes-Feature Requests-791246 ] IMAP: keepnew >>> messages unread Summary: IMAP: keep new messages unread [Tim] >> Heh. I'm not sure what this is an example of. Could you be very >> explicit about what it is in that two lines of stuff you're talking >> about? [Mark Hammond] > The Subject line of the bug includes the string "IMAP: keepnew", but > the bug summary itself contains the string "IMAP: keep new". > Something is removing a single space character in the bug summary as > it appears in the subject of the mail, but not in the body or > anywhere else. > > I've pondered the same thing. Maybe it is a gravity issue, and that > first space can't handle the world down under? Thanks for the clarification! The problem is obvious now, but the cause is not. If there's an ongoing problem here, it has to be due to something SF is doing. The only control we (project admins) have over the bug-report email is: 1. whether or not to send it; and, 2. if we do want to send it, the email address to which it gets sent. So the only thing anyone did here was tell SF to email bug tracker stuff to spambayes-bugs@python.org. There are no other hooks into that system (e.g., we don't run any scripts when bug email is generated, and couldn't even if we wanted to). Sounds like a PHP bug to me . From tim.one at comcast.net Sun Aug 31 20:23:24 2003 From: tim.one at comcast.net (Tim Peters) Date: Sun Aug 31 19:25:29 2003 Subject: [spambayes-dev] Bug messages In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130308C653@its-xchg4.massey.ac.nz> Message-ID: > Oh well, it's hardly important, anyway. Thanks for the clarification :) If it's any consolation, it's not just spambayes -- I just noticed that the same thing is happening to Python bug reports, like [ python-Bugs-793822 ] gc.get_referrers() isinherently dangerous isinherently looks so much like the name of a built-in function that I had to stare at it for two hours to realize a space was missing . From vanhorn at whidbey.com Sun Aug 31 18:44:53 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Sun Aug 31 20:45:05 2003 Subject: [spambayes-dev] IMAP setup References: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAED9@its-xchg4.massey.ac.nz> Message-ID: <3F529685.BBDF094D@whidbey.com> I hadn't gotten time to actually look at the source yet, but I've been planning on setting up a mail server with imapfilter. IMAP is a server-based process, so I assumed that imapfilter would run on the server, but reading the top of the source I'm not at all sure that's the case. Have I missed the boat in a really, really big way here? My plan was to run Sendmail, MailScanner, SpamAssassin, etc in their normal fashion, but then let individual users have their own specific SpamBayes IMAP filter to really clean things up. Van -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ----------------------------------------------------------