From heli at helimodels.com Wed Sep 1 02:42:10 2004 From: heli at helimodels.com (John Moriarty) Date: Wed Sep 1 02:46:17 2004 Subject: [spambayes-dev] message subject filtering In-Reply-To: Message-ID: <003a01c48fbc$848eef40$2101a8c0@user> Hello again All I think I am saying is that very many times the spam falls at the first fence. It appears that the new (random/gibberish) words have no effect, good. ...SpamBayes needs to work equally well regardless of the ratio of ham vs. spam that a particular user receives... ...so what's best training to do when spams vastly predominate? Question, how many ppl would open any of these the last few spams I just got: In Spam folder: account update RE: Sensually get some of the action Here is your 50 dollar restaurant card Online Canadian Generic Phamacy -- Next Day Shipping! Lowest Prices on ... meeting tommorow at 18-00 Buy pain relief medicine at Unbelievable Prices account update RE: Sensually get some of the action Here is your 50 dollar restaurant card Online Canadian Generic Phamacy -- Next Day Shipping! Lowest Prices on ... meeting tommorow at 18-00 Buy pain relief medicine at Unbelievable Prices Get cash out of your house Lowest cost for potency meds today nadir cheer coda theses arbiter molehill line bear joke regulate countryside cruickshank hegemony hanley consoOrder your meds online with confidence Attract and Seduce Men In Possible junk folder: Make $397 Seems to me processing would speed up with these dodgy headers. Wonder if this message with all those spam message header quotes gets deleted as spam;) Kind regards, John Moriarty (+353) (0)87 2833 530 www.helimodels.com -----Original Message----- From: Kenny Pitt [mailto:kennypitt@hotmail.com] Sent: 31 August 2004 19:58 To: 'John Moriarty'; spambayes-dev@python.org Cc: 'David Kirwan' Subject: RE: [spambayes-dev] message subject filtering John Moriarty wrote: > A lot of spam shows: > > * Ungrammatical and/or irrelevant wording > * Random words > * Gibberish words > * Deliberately weird or obscure punctuations > * Since this is true in the header as well as the text body, this > potentially reduces the loads on the filter. Are you interested in this because you want to analyze just the headers and not download the entire message if it is determined to be spam? If so, there are other issues besides whether or not we can successfully identify the spam just based on the headers. In the case of the Outlook Add-in, Outlook has already downloaded the message by the time we are told about it. In the case of the POP3 proxy (sb_server), discarding a message that you have partially processed is problematic because the e-mail client is already aware that the message exists and will sometimes get confused if we refuse to give it any data. > Random words not seen > before seem to allow stuff through more easily. In the case of SpamBayes, this is not true. SpamBayes assigns a probability of 0.5 to any word that it hasn't been trained on, and then discards any words that have a probability between 0.4 and 0.6 before calculating the spam score. Because SpamBayes ignores these words, they have absolutely no effect, either positive or negative, on the classification of the message. The only time that random words have an effect on the classification is if the spammer happens to hit on some words that you *have* seen before. If those words have only been seen in spam messages then it only *increases* the probability that the message will be properly identified as spam. It is very rare for the spammer to stumble across a significant number of words that you have trained as hammy, and even then there aren't usually enough of them to outweigh the other spammy clues in the message. > * I also note spam outnumbers ham by up to 100 to one Maybe for you, but not necessarily for everyone. While it does seem that most people these days are receiving more spam than good messages, there are still some people (someone who is extremely active on a lot of high-volume mailing lists, possibly) that get far more ham than spam. SpamBayes needs to work equally well regardless of the ratio of ham vs. spam that a particular user receives. > And invariably the text body contains the web address of the seller, > so a web address of itself is a giveaway. SpamBayes has an option that will break up URLs and create clues from the domain name, directory names, etc. If a particular domain is used a lot in spam then that will become a spam clue. The mere presence of a URL in the message is not a good indicator of spam in general. I receive a lot of legitimate mail such as developer newsletters that contain lots of URLs. > I am fast at identifying spam by the header alone, using the above > observations I reckon I spot 90% plus in a blink. The human brain has a capacity for learning and detecting patterns in the text that far exceeds what SpamBayes can ever be capable of. In most cases, however, SpamBayes can probably process the entire message in less time than you can process just the header. The more information SpamBayes has at its disposal, the less likely it is to make a mistake and toss an important message into your spam folder. -- Kenny Pitt From sethg at GoodmanAssociates.com Wed Sep 1 05:27:52 2004 From: sethg at GoodmanAssociates.com (Seth Goodman) Date: Wed Sep 1 05:27:43 2004 Subject: [spambayes-dev] message subject filtering In-Reply-To: <003a01c48fbc$848eef40$2101a8c0@user> Message-ID: > From: John Moriarty > Sent: Tuesday, August 31, 2004 7:42 PM <...> > Wonder if this message with all those spam message header quotes gets > deleted as spam;) Here's how it scored on my system. -- Seth Goodman *************************************************************** Combined Score: 0% (2.10065e-010) Internal ham score (*H*): 1 Internal spam score (*S*): 5.55112e-016 # ham trained on: 1260 # spam trained on: 1309 150 Significant Tokens token spamprob #ham #spam 'filtering' 0.0155709 14 0 'folder.' 0.0155709 14 0 'spambayes,' 0.0180723 12 0 'spams' 0.0302013 7 0 'filter.' 0.0348837 6 0 'restaurant' 0.0348837 6 0 'wrote:' 0.0377975 30 1 'message-----' 0.03904 29 1 'spambayes' 0.0395184 169 7 'kenny' 0.0412844 5 0 'pitt' 0.0412844 5 0 'spam?' 0.0412844 5 0 'exceeds' 0.0505618 4 0 'irrelevant' 0.0505618 4 0 'problematic' 0.0505618 4 0 'question,' 0.0505618 4 0 'vs.' 0.0505618 4 0 'outlook' 0.057784 129 8 'ham' 0.0611551 18 1 'besides' 0.0652174 3 0 'headers.' 0.0652174 3 0 'partially' 0.0652174 3 0 'spammer' 0.0652174 3 0 '0.6' 0.0918367 2 0 'clues' 0.0918367 2 0 "hasn't" 0.0918367 2 0 'observations' 0.0918367 2 0 'probability' 0.0918367 2 0 'score.' 0.0918367 2 0 'spammy' 0.0918367 2 0 'subject:filtering' 0.0918367 2 0 'urls' 0.0918367 2 0 'downloaded' 0.0960671 20 2 'spam' 0.0980099 117 13 'spam.' 0.122818 22 3 'cc:no real name:2**0' 0.126144 28 4 'junk' 0.12979 40 6 'likely' 0.139058 25 4 'identifying' 0.141076 7 1 'sent:' 0.143691 47 8 'messages,' 0.153008 17 3 '0.4' 0.155172 1 0 'add-in,' 0.155172 1 0 'got:' 0.155172 1 0 'invariably' 0.155172 1 0 'joke' 0.155172 1 0 'obscure' 0.155172 1 0 'theses' 0.155172 1 0 'wording' 0.155172 1 0 'cc:' 0.1601 6 1 'training' 0.169601 48 10 'case' 0.173236 33 7 'good.' 0.173672 10 2 'user' 0.179492 58 13 'issues' 0.183649 35 8 'random' 0.185056 5 1 'maybe' 0.18721 30 7 'header:In-Reply-To:1' 0.188084 88 21 'seems' 0.189561 46 11 'august' 0.191786 21 5 'proxy' 0.2007 16 4 'header:Importance:1' 0.213684 245 69 'everyone.' 0.219234 4 1 'weird' 0.219234 4 1 'using' 0.222277 206 61 'update' 0.224625 67 20 'messages' 0.226939 53 16 'appears' 0.232045 26 8 'data.' 0.233298 10 3 'seem' 0.234093 32 10 'pop3' 0.2355 13 4 'properly' 0.2355 13 4 'url' 0.239458 28 9 'subject:spambayes' 0.239792 31 10 'download' 0.244353 60 20 'regards,' 0.253755 108 38 'and/or' 0.255968 34 12 'speed' 0.256066 20 7 'refuse' 0.256606 6 2 'newsletters' 0.261476 14 5 'cc:2**0' 0.264469 70 26 'falls' 0.268912 3 1 'vastly' 0.268912 3 1 'x-mailer:microsoft outlook cws, build 9.0.2416 (9.0.2911.0)' 0.268912 3 1 'sometimes' 0.276398 18 7 'text' 0.280013 35 14 'gets' 0.28081 25 10 'processing' 0.285637 22 9 'does' 0.287482 122 51 'meeting' 0.289726 31 13 'true.' 0.291402 5 2 'learning' 0.291771 12 5 'subject:subject' 0.291771 12 5 'enough' 0.291965 40 17 'option' 0.295123 21 9 'domain' 0.296626 14 6 'probably' 0.297196 39 17 'note' 0.299816 52 23 'however,' 0.300611 63 28 "what's" 0.301911 27 12 'so,' 0.304793 42 19 'capacity' 0.306329 9 4 'saying' 0.307076 22 10 'it.' 0.308495 106 49 'possible' 0.310605 43 20 'particular' 0.313772 15 7 'skip:- 10' 0.319851 33 16 'word' 0.320398 37 18 'subject:] ' 0.324429 347 173 'john' 0.326766 28 14 'usually' 0.32918 12 6 'times' 0.329979 53 27 'trained' 0.330009 10 5 "aren't" 0.331238 8 4 'e-mail' 0.333142 164 85 'when' 0.33456 274 143 'cases,' 0.337125 4 2 'equally' 0.337125 4 2 'medicine' 0.337125 4 2 'itself' 0.338762 19 10 'effect' 0.340334 17 9 'header:Errors-To:1' 0.341055 361 194 'message.' 0.343773 59 32 '100' 0.344597 24 13 '0.5' 0.347748 2 1 'cheer' 0.347748 2 1 'detecting' 0.347748 2 1 'regardless' 0.348294 11 6 'absolutely' 0.655798 10 20 'cost' 0.661048 30 61 'men' 0.678659 9 20 'ratio' 0.683663 3 7 'confidence' 0.683917 7 16 'subject:message' 0.692332 11 26 'presence' 0.711008 3 8 'relief' 0.718228 1 3 'pain' 0.724071 5 14 'prices' 0.736657 15 44 'canadian' 0.75361 3 10 'lowest' 0.785289 8 31 'names,' 0.800158 2 9 'arbiter' 0.844828 0 1 'easily.' 0.844828 0 1 'cash' 0.849598 8 48 'regulate' 0.851022 1 7 'generic' 0.875673 2 16 'seduce' 0.908163 0 2 'meds' 0.931618 1 17 'potency' 0.934783 0 3 'unbelievable' 0.958716 0 5 <...> All Message Tokens 443 unique tokens '$397' "'david" "'john" '(+353)' '(0)87' '(sb_server),' '(someone' '*have*' '*increases*' '...' '...so' '...spambayes' '0.4' '0.5' '0.6' '100' '18-00' '19:58' '2004' '2833' '530' '90%' 'about' 'above' 'absolutely' 'account' 'across' 'action' 'active' 'add-in,' 'address' 'again' 'all' 'allow' 'alone,' 'already' 'also' 'analyze' 'and' 'and/or' 'any' 'appears' 'arbiter' 'are' "aren't" 'assigns' 'attract' 'august' 'aware' 'based' 'bear' 'because' 'become' 'been' 'before' 'before.' 'besides' 'best' 'between' 'blink.' 'body' 'body,' 'brain' 'break' 'but' 'buy' 'calculating' 'can' 'canadian' 'capable' 'capacity' 'card' 'case' 'cases,' 'cash' 'cc:' 'cc:2**0' 'cc:addr:baltimore.com' 'cc:addr:david.kirwan' 'cc:no real name:2**0' 'cheer' 'client' 'clue.' 'clues' 'coda' 'confidence' 'confused' 'consoorder' 'contain' 'contains' 'content-type:text/plain' 'cost' 'countryside' 'create' 'cruickshank' 'data.' 'day' 'days' 'deleted' 'deliberately' 'detecting' 'determined' 'developer' 'directory' 'discarding' 'discards' 'disposal,' 'dodgy' 'does' 'dollar' 'domain' 'download' 'downloaded' 'e-mail' 'easily.' 'effect' 'effect,' 'either' 'email addr:hotmail.com]' 'email addr:python.org' 'email name:[mailto:kennypitt' 'email name:spambayes-dev' 'enough' 'entire' 'equally' 'etc.' 'even' 'ever' 'everyone.' 'exceeds' 'exists' 'extremely' 'falls' 'far' 'fast' 'fence.' 'few' 'filter.' 'filtering' 'first' 'folder.' 'folder:' 'for' 'from' 'from:' 'from:addr:heli' 'from:addr:helimodels.com' 'from:name:john moriarty' 'general.' 'generic' 'get' 'gets' 'gibberish' 'give' 'giveaway.' 'good' 'good.' 'got:' 'ham' 'hammy,' 'hanley' 'happens' 'has' "hasn't" 'have' 'header' 'header.' 'header:Date:1' 'header:Errors-To:1' 'header:From:1' 'header:Importance:1' 'header:In-Reply-To:1' 'header:MIME-Version:1' 'header:Message-ID:1' 'header:Received:8' 'header:Return-Path:1' 'header:Subject:1' 'header:To:1' 'headers' 'headers.' 'hegemony' 'hello' 'here' 'high-volume' 'hit' 'house' 'how' 'however,' 'human' 'identified' 'identify' 'identifying' 'ignores' 'important' 'indicator' 'information' 'interested' 'into' 'invariably' 'irrelevant' 'issues' 'it.' 'its' 'itself' 'john' 'joke' 'junk' 'just' 'kenny' 'kind' "kirwan'" 'last' 'learning' 'legitimate' 'less' 'likely' 'line' 'list' 'lists,' 'loads' 'lot' 'lots' 'lowest' 'mail' 'mailing' 'make' 'many' 'maybe' 'medicine' 'meds' 'meeting' 'men' 'mere' 'message' 'message-----' 'message-id:@user' 'message.' 'messages' 'messages,' 'mistake' 'molehill' 'more' 'moriarty' "moriarty';" 'most' 'nadir' 'name,' 'names,' 'necessarily' 'needs' 'negative,' 'new' 'newsletters' 'next' 'not' 'note' 'number' 'obscure' 'observations' 'of.' 'on,' 'one' 'online' 'only' 'open' 'option' 'other' 'out' 'outlook' 'outnumbers' 'outweigh' 'pain' 'partially' 'particular' 'patterns' 'people' 'phamacy' 'pitt' 'plus' 'pop3' 'positive' 'possible' 'possibly)' 'potency' 'potentially' 'ppl' 'predominate?' 'presence' 'prices' 'probability' 'probably' 'problematic' 'process' 'processed' 'processing' 'properly' 'proto:http' 'proxy' 'punctuations' 'question,' 'quotes' 'random' 'rare' 'ratio' 're:' 'receive' 'receives.' 'receives...' 'receiving' 'reckon' 'reduces' 'refuse' 'regardless' 'regards,' 'regulate' 'relief' 'reply-to:none' 'restaurant' 'saying' 'score.' 'seduce' 'seem' 'seems' 'seen' 'seller,' 'sender:addr:python.org' 'sender:addr:spambayes-dev-bounces' 'sender:no real name:2**0' 'sensually' 'sent:' 'shipping!' 'shows:' 'significant' 'since' 'skip:( 10' 'skip:- 10' 'skip:[ 10' 'skip:_ 40' 'skip:c 10' 'skip:s 10' 'skip:u 10' 'skip:w 10' 'so,' 'some' 'sometimes' 'spam' 'spam.' 'spam;)' 'spam?' 'spambayes' 'spambayes,' 'spammer' 'spammy' 'spams' 'speed' 'spot' 'still' 'stuff' 'stumble' 'subject' 'subject:' 'subject: ' 'subject:-' 'subject:[' 'subject:] ' 'subject:dev' 'subject:filtering' 'subject:message' 'subject:spambayes' 'subject:subject' 'successfully' 'such' 'text' 'than' 'that' 'the' 'them' 'then' 'there' 'these' 'theses' 'they' 'think' 'this' 'those' 'through' 'time' 'times' 'to:' 'to:2**0' 'to:addr:python.org' 'to:addr:spambayes-dev' 'to:no real name:2**0' 'today' 'told' 'tommorow' 'toss' 'trained' 'training' 'true' 'true.' 'unbelievable' 'update' 'url' 'url:listinfo' 'url:mail' 'url:mailman' 'url:org' 'url:python' 'url:spambayes-dev' 'urls' 'urls.' 'used' 'user' 'using' 'usually' 'vastly' 'very' 'vs.' 'want' 'web' 'weird' 'well' 'what' "what's" 'when' 'whether' 'while' 'who' 'will' 'with' 'wonder' 'word' 'wording' 'words' 'words,' 'work' 'would' 'wrote:' 'x-mailer:microsoft outlook cws, build 9.0.2416 (9.0.2911.0)' 'you' 'you,' 'your' From paulagrassi203 at hotmail.com Thu Sep 2 20:12:39 2004 From: paulagrassi203 at hotmail.com (Paula Bernardes) Date: Thu Sep 2 20:12:30 2004 Subject: [spambayes-dev] =?iso-8859-1?q?Divulga=E7=E3o_em_sites_de_busca_?= =?iso-8859-1?q?-_Marketing_por_e-mail_-_Cadastramento_de_homepages?= =?iso-8859-1?q?_em_buscadores?= Message-ID: <20040902181229.B38D51E4015@bag.python.org> Visite agora: http://www.divulgamail.mx.gs Divulga??o da Home Page em Sites de Busca, Divulga??o Sites Como divulgar home pages como divulgar sites como divulgar meu site, dicas de divulga??o de sites. Otimiza??o e posicionamento no Google. Mala direta e-mail, email regi?es, e-mails regi?o, mala direta por email, marketing e-mail, regi?es, cadastro e-mails, publicidade por email, emails regi?o, divulgar, enviar emails, campanha emails, propaganda emails, email cidade, envio an?nimo emails, email estados, divulgar e-mail, programas emails, e-mails por estados, e-mails cidade, cadastro e-mail, mala direta por e-mail, listas emails, e-mail regi?es, propaganda email, enviar email an?nimo, envio mala direta, estados, campanha, cidade, envio, publicidade e-mails, Visite agora: http://www.divulgamail.mx.gs Divulga??o da Home Page em Sites de Busca, Divulga??o Sites Como divulgar home pages como divulgar sites como divulgar meu site, dicas de divulga??o de sites. Otimiza??o e posicionamento no Google. campanhas e-mail, lista e-mail, programas e-mails, e-mails estado, publicidade emails, marketing digital, cidade, divulgar, lista email, emails estados, propaganda digital e-mails, e-mail por regi?es, e-mails por cidades, email cidades, campanha e-mail, e-mail estado, listas email, lista emails, propaganda por e-mails, mala direta email, publicidade, cidades, marketing emails, cidade, email por regi?es, envio propaganda, listas e-mails, e-mails regi?es, divulgar e-mails, envio mala-direta, e-mail cidades, email estado, e-mails por Visite agora: http://www.divulgamail.mx.gs Divulga??o da Home Page em Sites de Busca, Divulga??o Sites Como divulgar home pages como divulgar sites como divulgar meu site, dicas de divulga??o de sites. Otimiza??o e posicionamento no Google. Regi?o, marketing por emails, propaganda, software email em massa, propaganda digital e-mail, programas email, email, mala direta, propaganda e-mail, marketing e-mails, e-mail, mala-direta email, propaganda digital, emails por regi?o, email segmentado, estado, campanhas e-mails, e-mails cidades, e-mails segmentados, email por estado, marketing por email, emails segmentado, divulga??o, e-mails estados, cidade, campanha e-mails, software, email segmentados, regi?o, enviar e-mails an?nimo, enviar emails an?nimo, mala direta emails, marketing email, emails segmentados, programas e-mail, e-mails por cidade, lista e-mails, propaganda, mala direta por e-mails, campanha email, software spam internet, Visite agora: http://www.divulgamail.mx.gs Divulga??o da Home Page em Sites de Busca, buscadores, Divulga??o Sites Como divulgar home pages como divulgar sites como divulgar meu site, dicas de divulga??o de sites. Otimiza??o e posicionamento no Google., e-mail regi?o, listas, listas segmentadas, marketing, marketing digital por emails, email regi?o, divulga??o e-mail, emails por cidade, mala-direta por email, marketing digital por e-mails, listas email, lista segmentada, cidades, cadastro email, divulgue seu produto, mala-direta por e-mails, e-mail por estado, segmentos, email por cidades, propaganda por e-mail, emails cidades, publicidade por emails, envio e-mail, e-mails por estado, mala direta, mala-direta, mala-direta por emails, e-mail segmentado, marketing digital emails, cidades, divulga??o e-mails, marketing, e-mail estados, cidades, marketing por e-mail, envio emails, marketing digital email, propaganda Visite agora: http://www.divulgamail.mx.gs por email, envio an?nimo email, divulgue sua propaganda, propaganda digital emails, cidade, emails por cidades, e-mails segmentado, propaganda por emails, divulgar email, e-mail cidade, enviar e-mails, e-mails, cadastro emails, e-mail por cidade, envio email, cadastro, lista, envio e-mails, propaganda digital email, publicidade por e-mails, marketing digital, e-mail por regi?o, email por estados, divulga??o, emails por estados, segmentados, mala-direta emails, envio publicidade, campanhas, mala direta por emails, e-mail por estados, marketing por e-mails, emails por estado, mala-direta e-mails, marketing digital e-mail, divulgar emails, emails regi?es, publicidade, email por regi?o, e-mails por regi?es, listas e-mail, divulga??o emails, mala-direta por e-mail, enviar e-mail, enviar email, Visite agora: http://www.divulgamail.mx.gs divulga??o email, cidades, publicidade por e-mail, enviar, emails por regi?es, marketing digital por e-mail, email por cidade, campanhas email, marketing digital por email, marketing digital e-mails, propaganda e-mails, e-mail segmentados, envio an?nimo e-mail, software publicidade internet, segmentados, envio an?nimo e-mails, lista mala direta, programa email an?nimo, mala direta internet, publicidade email, mala direta segmentada, emails segmentados, marketing digital, mala direta email, publicidade, spam, mala direta e-mail, email regi?es, e-mails regi?o, mala direta por email, marketing e-mail, regi?es, cadastro e-mails, publicidade por email, emails regi?o, divulgar, enviar emails, campanha emails, propaganda emails, email cidade, envio an?nimo emails, email estados, divulgar e-mail, programas emails, e-mails por estados, e-mails cidade, cadastro e-mail, mala direta por e-mail, listas emails, e-mail regi?es, propaganda email, enviar email an?nimo, envio Visite agora: http://www.divulgamail.mx.gs mala direta, estados, campanha, cidade, envio, publicidade e-mails, campanhas e-mail, lista e-mail, programas e-mails, e-mails estado, publicidade emails, marketing digital, cidade, divulgar, lista email, emails estados, propaganda digital e-mails, e-mail por regi?es, e-mails por cidades, email cidades, campanha e-mail, e-mail estado, listas email, lista emails, propaganda por e-mails, mala direta email, publicidade, cidades, marketing emails, cidade, email por regi?es, envio propaganda, listas e-mails, e-mails regi?es, divulgar e-mails, envio mala-direta, e-mail cidades, email estado, e-mails por regi?o, marketing por emails, propaganda, software email em massa, propaganda digital e-mail, programas email, email, mala direta, propaganda e-mail, marketing e-mails, e-mail, mala-direta email, propaganda Visite agora: http://www.divulgamail.mx.gs digital, emails por regi?o, email segmentado, estado, campanhas e-mails, e-mails cidades, e-mails segmentados, email por estado, marketing por email, emails segmentado, divulga??o, e-mails estados, cidade, campanha e-mails, software, email segmentados, regi?o, enviar e-mails an?nimo, enviar emails an?nimo, mala direta emails, marketing email, emails segmentados, programas e-mail, e-mails por cidade, lista e-mails, propaganda, mala direta por e-mails, campanha email, software spam internet, emails Visite agora: http://www.divulgamail.mx.gs estado, publicidade e-mail, e-mail por cidades, enviar e-mail an?nimo, software propaganda internet, emails cidade, emails, campanhas emails, mala-direta e-mail, publicidade email, mala direta e-mails, e-mail regi?o, listas, listas segmentadas, marketing, marketing digital por emails, email regi?o, divulga??o e-mail, emails por cidade, mala-direta por email, marketing digital por e-mails, listas email, lista segmentada, cidades, cadastro email, divulgue seu produto, mala-direta por e-mails, e-mail por estado, segmentos, email por cidades, propaganda por e-mail, emails cidades, publicidade por emails, envio e-mail, e- Visite agora: http://www.divulgamail.mx.gs mails por estado, mala direta, mala-direta, mala-direta por emails, e-mail segmentado, marketing digital emails, cidades, divulga??o e-mails, marketing, e-mail estados, cidades, marketing por e-mail, envio emails, marketing digital email, propaganda por email, envio an?nimo email, divulgue sua propaganda, propaganda digital emails, cidade, emails por cidades, e-mails segmentado, propaganda por emails, divulgar email, e-mail cidade, enviar e-mails, e-mails, cadastro emails, e-mail por cidade, envio email, cadastro, lista, envio e-mails, propaganda digital email, publicidade por e-mails, marketing digital, e-mail por regi?o, email por estados, divulga??o, emails por estados, segmentados, mala-direta emails, envio publicidade, campanhas, mala direta por emails, e-mail por estados, marketing por e- Visite agora: http://www.divulgamail.mx.gs mails, emails por estado, mala-direta e-mails, marketing digital e-mail, divulgar emails, emails regi?es, publicidade, email por regi?o, e-mails por regi?es, listas e-mail, divulga??o emails, mala-direta por e-mail, enviar e-mail, enviar email, divulga??o email, cidades, publicidade por e-mail, enviar, emails por regi?es, marketing digital por e-mail, email por cidade, campanhas email, marketing digital por email, marketing digital e-mails, propaganda e-mails, e-mail segmentados, envio an?nimo e-mail, software publicidade internet, segmentados, envio an?nimo e-mails, lista mala direta, programa email an?nimo, mala direta internet, publicidade email, mala direta segmentada, emails segmentados, marketing digital, mala direta email, publicidade, spam From mindy at googod.com Tue Sep 7 21:50:18 2004 From: mindy at googod.com (Mindy) Date: Tue Sep 7 07:17:50 2004 Subject: [spambayes-dev] Quality link request Message-ID: <004501c49513$e9e988d0$6f01a8c0@mindy> Hello, I found your website http://mail.python.org/ on Google. Your website has content related to ours at www.water-splash.co.uk. This is a quality website and is well ranked on Google. We are happy to upload a link onto this website in any way you request in exchange for a return link. I'm sure you appreciate that this would be of great benefit to us both. To go ahead with this exchange please upload our link information below to your links page. Then drop me an email to let me know where you have uploaded it. If you would like your return link presenting in a particular way please include this information in your email. I will then arrange for your link to be uploaded and email you again the let you know. Thank you. Regards Mindy Our link Info: Link Text: Water Cooler Description: Plumbed in pure chilled filtered water coolers and fountains. URL: http://www.water-splash.co.uk Our link html: <<-- begin html -->> Water Cooler
Plumbed in pure chilled filtered water coolers and fountains. <<-- end html -->> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20040907/eaf37847/attachment.html From mindy at googod.com Tue Sep 7 22:32:23 2004 From: mindy at googod.com (Mindy) Date: Tue Sep 7 07:33:05 2004 Subject: [spambayes-dev] Quality link request Message-ID: <028f01c49519$ccbc73c0$6f01a8c0@mindy> Hello, I found your website http://mail.python.org on Google. Your website has content related to ours at www.water-splash.co.uk. This is a quality website and is well ranked on Google. We are happy to upload a link onto this website in any way you request in exchange for a return link. I'm sure you appreciate that this would be of great benefit to us both. To go ahead with this exchange please upload our link information below to your links page. Then drop me an email to let me know where you have uploaded it. If you would like your return link presenting in a particular way please include this information in your email. I will then arrange for your link to be uploaded and email you again the let you know. Thank you. Regards Mindy Our link Info: Link Text: Water Cooler Description: Plumbed in pure chilled filtered water coolers and fountains. URL: http://www.water-splash.co.uk Our link html: <<-- begin html -->> Water Cooler
Plumbed in pure chilled filtered water coolers and fountains. <<-- end html -->> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20040907/21bf1cc7/attachment.htm From skip at pobox.com Thu Sep 9 04:21:30 2004 From: skip at pobox.com (Skip Montanaro) Date: Thu Sep 9 04:21:40 2004 Subject: [spambayes-dev] Wittel/Wu article on statistical attacks Message-ID: <16703.48682.870030.600499@montanaro.dyndns.org> Has anyone investigated the attack methods outlined in the Wittel/Wu paper at the CEAS conference: http://ceas.cc/papers-2004/170.pdf It's not obvious to me why SpamBayes should have performed as poorly as the authors indicated. In particular, they were adding common dictionary words which should have just added non-extreme words which should have for the most part been ignored (spamprobs between 0.4 and 0.6). Skip From jepler at unpythonic.net Thu Sep 9 19:07:00 2004 From: jepler at unpythonic.net (Jeff Epler) Date: Thu Sep 9 19:45:19 2004 Subject: [spambayes-dev] Wittel/Wu article on statistical attacks In-Reply-To: <16703.48682.870030.600499@montanaro.dyndns.org> References: <16703.48682.870030.600499@montanaro.dyndns.org> Message-ID: <20040909170659.GA20186@unpythonic.net> I used the "top 100 english words" file referenced in the paper and checked the nspam vs nham counts in my database. Some of the words were very spammy, few were very more than a little bit hammy. Given this fact it's hard to see that adding "common words" from this list would effectively bypass my spambayes filter. It's interesting to note that the subject: tokens are the most extreme of the lot. I have no idea what that means, though. Jeff ------------------------------------------------------------------------ import csv, sets def main(): words = sets.Set(open("top100en.txt").read().split("\n")) db = csv.reader(open("spambayes.db.flat")) output = [] for line in db: if len(line) != 3: continue k = line[0] h = int(line[1]) s = int(line[2]) if k.startswith("subject:"): k1 = k.split(":")[-1] else: k1 = k if k1 and (k1 in words): output.append((100.*s/(s+h), k, s, h)) output.sort() output.reverse() print "100*s/(h+s) token h s" print "------------------------------------" for row in output: print "%5.1f %20s %4d %4d" % row main() 100*s/(h+s) token h s ------------------------------------ 100.0 subject:years 7 0 100.0 subject:would 1 0 100.0 subject:will 9 0 100.0 subject:who 3 0 100.0 subject:were 2 0 100.0 subject:time 25 0 100.0 subject:they 2 0 100.0 subject:their 7 0 100.0 subject:some 9 0 100.0 subject:said 4 0 100.0 subject:percent 2 0 100.0 subject:people 5 0 100.0 subject:over 4 0 100.0 subject:other 8 0 100.0 subject:only 11 0 100.0 subject:most 5 0 100.0 subject:more 13 0 100.0 subject:may 2 0 100.0 subject:market 3 0 100.0 subject:last 2 0 100.0 subject:his 4 0 100.0 subject:had 5 0 100.0 subject:company 2 0 100.0 subject:but 8 0 100.0 subject:because 1 0 100.0 subject:This 14 0 96.7 subject:you 118 4 92.9 percent 13 1 91.9 subject:can 34 3 90.6 subject:now 29 3 89.2 subject:this 66 8 88.9 subject:software 16 2 86.7 market 72 11 85.9 government 67 11 85.7 subject:into 6 1 84.9 company 152 27 84.8 subject:that 28 5 84.8 subject:about 28 5 84.8 million 95 17 84.6 subject:than 11 2 84.6 subject:out 11 2 81.8 subject:have 18 4 81.8 subject:has 9 2 81.5 subject:any 22 5 80.0 subject:all 8 2 78.6 over 283 77 77.2 years 129 38 77.1 subject:The 37 11 75.0 subject:new 12 4 75.0 subject:after 3 1 75.0 his 177 59 74.8 said 104 35 74.2 who 308 107 74.0 year 71 25 73.7 subject:New 56 20 72.1 now 385 149 72.1 most 281 109 70.9 subject:the 134 55 70.0 subject:been 7 3 68.8 time 350 159 68.2 new 457 213 67.8 more 713 339 67.6 subject:was 23 11 67.0 their 345 170 66.1 out 445 228 66.0 its 221 114 65.8 will 811 422 65.7 system 237 124 65.5 you 1750 922 65.4 last 178 94 65.2 been 377 201 65.0 they 406 219 63.7 only 456 260 63.6 subject:are 35 20 63.3 had 207 120 63.1 the 1886 1101 63.0 can 827 486 62.9 because 287 169 62.4 for 1458 879 61.8 than 373 231 61.6 were 218 136 61.6 all 678 423 61.5 and 1690 1058 61.5 has 476 298 61.2 subject:for 170 108 61.1 many 214 136 61.1 with 1001 637 61.1 have 903 576 61.0 after 199 127 60.9 are 827 530 60.9 people 277 178 60.6 also 282 183 60.4 from 843 552 60.4 one 491 322 60.2 two 183 121 60.2 may 287 190 60.1 into 235 156 60.0 subject:year 3 2 60.0 subject:there 6 4 60.0 subject:could 3 2 59.9 this 1099 735 59.8 subject:and 113 76 59.1 subject:not 13 9 59.0 software 121 84 58.0 about 426 309 57.3 subject:from 55 41 56.0 subject:one 14 11 56.0 that 1010 795 55.3 subject:with 47 38 55.2 was 554 449 54.4 not 796 667 54.3 says 51 43 53.7 first 203 175 53.3 other 353 309 52.5 data 83 75 52.3 such 156 142 51.7 any 456 426 51.5 some 360 339 50.7 could 232 226 50.0 subject:also 1 1 48.0 there 346 375 47.6 subject:when 10 11 47.5 when 347 384 46.4 them 148 171 45.8 which 322 381 45.8 would 428 507 45.7 but 515 613 41.8 use 248 345 32.0 subject:many 8 17 30.4 subject:system 7 16 27.3 subject:use 3 8 20.0 subject:data 2 8 4.3 subject:which 1 22 0.0 subject:two 0 4 0.0 subject:says 0 3 0.0 subject:million 0 2 0.0 subject:first 0 1 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20040909/9668bc12/attachment.pgp From kennypitt at hotmail.com Thu Sep 9 21:04:29 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Thu Sep 9 21:04:36 2004 Subject: [spambayes-dev] Wittel/Wu article on statistical attacks In-Reply-To: <16703.48682.870030.600499@montanaro.dyndns.org> Message-ID: Skip Montanaro wrote: > Has anyone investigated the attack methods outlined in the Wittel/Wu > paper at the CEAS conference: > > http://ceas.cc/papers-2004/170.pdf > > It's not obvious to me why SpamBayes should have performed as poorly > as the authors indicated. In particular, they were adding common > dictionary words which should have just added non-extreme words which > should have for the most part been ignored (spamprobs between 0.4 and > 0.6). I'd like to try to write a script to run a similar test against my current training data, although I'm not sure when I'll find the time. <0.5 wink> A couple of thoughts, though. As we all know, the accuracy of SpamBayes is controlled entirely by the training data used. It seems likely that 3000 ham messages from a public corpus would contain many more common words than an individual user's typical mail stream, especially if the user is doing train on mistakes instead of train on everything. I also wonder how many of the 3000 spam messages they trained on were already doing random word insertion? I would expect that once enough spam messages start randomizing with common words, those common words would quickly become neutral or even spammy with continued training. The "picospam" they use for testing has also been stripped of almost all header information. Since any spam must pass through the SMTP mailer chain before it can be received by a user, I wonder how much difference the Received header information would have made in the classification. This also leads to the question of what parsing options they used for SpamBayes. I suppose that we can assume that they left everything at the defaults. How much effect would some of the advanced parsing options, particularly bigrams, have had on the results? -- Kenny Pitt From lucycarmo082 at hotmail.com Thu Sep 9 23:57:56 2004 From: lucycarmo082 at hotmail.com (Lucy Carmo) Date: Thu Sep 9 23:57:48 2004 Subject: [spambayes-dev] Listas de e-mail para mala direta Message-ID: <20040909215747.07E161E400A@bag.python.org> Visite agora: http://www.divulgamail.mx.gs Emails, propaganda emails, email cidade, envio an?nimo emails, email estados, divulgar e-mail, programas emails, e-mails por estados, e-mails cidade, cadastro e-mail, mala direta por e-mail, listas emails, e-mail regi?es, propaganda email, enviar email an?nimo, envio mala direta, estados, campanha, cidade, envio, publicidade e-mails, Visite agora: http://www.divulgamail.mx.gs Divulga??o da Home Page em Sites de Busca, Divulga??o Sites Como divulgar home pages como divulgar sites como divulgar meu site, dicas de divulga??o de sites. Otimiza??o e posicionamento no Google. campanhas e-mail, lista e-mail, programas e-mails, e-mails estado, publicidade emails, marketing digital, cidade, divulgar, lista email, emails estados, propaganda digital e-mails, e-mail por regi?es, e-mails por cidades, email cidades, campanha e-mail, e-mail estado, listas email, lista emails, propaganda por e-mails, mala direta email, publicidade, cidades, marketing emails, cidade, email por regi?es, envio propaganda, listas e-mails, e-mails regi?es, divulgar e-mails, envio mala-direta, e-mail cidades, email estado, e-mails por Visite agora: http://www.divulgamail.mx.gs Divulga??o da Home Page em Sites de Busca, Divulga??o Sites Como divulgar home pages como divulgar sites como divulgar meu site, dicas de divulga??o de sites. Otimiza??o e posicionamento no Google. Regi?o, marketing por emails, propaganda, software email em massa, propaganda digital e-mail, programas email, email, mala direta, propaganda e-mail, marketing e-mails, e-mail, mala-direta email, propaganda digital, emails por regi?o, email segmentado, estado, campanhas e-mails, e-mails cidades, e-mails segmentados, email por estado, marketing por email, emails segmentado, divulga??o, e-mails estados, cidade, campanha e-mails, software, email segmentados, regi?o, enviar e-mails an?nimo, enviar emails an?nimo, mala direta emails, marketing email, emails segmentados, programas e-mail, e-mails por cidade, lista e-mails, propaganda, mala direta por e-mails, campanha email, software spam internet, Visite agora: http://www.divulgamail.mx.gs Divulga??o da Home Page em Sites de Busca, buscadores, Divulga??o Sites Como divulgar home pages como divulgar sites como divulgar meu site, dicas de divulga??o de sites. Otimiza??o e posicionamento no Google., e-mail regi?o, listas, listas segmentadas, marketing, marketing digital por emails, email regi?o, divulga??o e-mail, emails por cidade, mala-direta por email, marketing digital por e-mails, listas email, lista segmentada, cidades, cadastro email, divulgue seu produto, mala-direta por e-mails, e-mail por estado, segmentos, email por cidades, propaganda por e-mail, emails cidades, publicidade por emails, envio e-mail, e-mails por estado, mala direta, mala-direta, mala-direta por emails, e-mail segmentado, marketing digital emails, cidades, divulga??o e-mails, marketing, e-mail estados, cidades, marketing por e-mail, envio emails, marketing digital email, propaganda Visite agora: http://www.divulgamail.mx.gs por email, envio an?nimo email, divulgue sua propaganda, propaganda digital emails, cidade, emails por cidades, e-mails segmentado, propaganda por emails, divulgar email, e-mail cidade, enviar e-mails, e-mails, cadastro emails, e-mail por cidade, envio email, cadastro, lista, envio e-mails, propaganda digital email, publicidade por e-mails, marketing digital, e-mail por regi?o, email por estados, divulga??o, emails por estados, segmentados, mala-direta emails, envio publicidade, campanhas, mala direta por emails, e-mail por estados, marketing por e-mails, emails por estado, mala-direta e-mails, marketing digital e-mail, divulgar emails, emails regi?es, publicidade, email por regi?o, e-mails por regi?es, listas e-mail, divulga??o emails, mala-direta por e-mail, enviar e-mail, enviar email, Visite agora: http://www.divulgamail.mx.gs divulga??o email, cidades, publicidade por e-mail, enviar, emails por regi?es, marketing digital por e-mail, email por cidade, campanhas email, marketing digital por email, marketing digital e-mails, propaganda e-mails, e-mail segmentados, envio an?nimo e-mail, software publicidade internet, segmentados, envio an?nimo e-mails, lista mala direta, programa email an?nimo, mala direta internet, publicidade email, mala direta segmentada, emails segmentados, marketing digital, mala direta email, publicidade, spam, mala direta e-mail, email regi?es, e-mails regi?o, mala direta por email, marketing e-mail, regi?es, cadastro e-mails, publicidade por email, emails regi?o, divulgar, enviar emails, campanha emails, propaganda emails, email cidade, envio an?nimo emails, email estados, divulgar e-mail, programas emails, e-mails por estados, e-mails cidade, cadastro e-mail, mala direta por e-mail, listas emails, e-mail regi?es, propaganda email, enviar email an?nimo, envio Visite agora: http://www.divulgamail.mx.gs mala direta, estados, campanha, cidade, envio, publicidade e-mails, campanhas e-mail, lista e-mail, programas e-mails, e-mails estado, publicidade emails, marketing digital, cidade, divulgar, lista email, emails estados, propaganda digital e-mails, e-mail por regi?es, e-mails por cidades, email cidades, campanha e-mail, e-mail estado, listas email, lista emails, propaganda por e-mails, mala direta email, publicidade, cidades, marketing emails, cidade, email por regi?es, envio propaganda, listas e-mails, e-mails regi?es, divulgar e-mails, envio mala-direta, e-mail cidades, email estado, e-mails por regi?o, marketing por emails, propaganda, software email em massa, propaganda digital e-mail, programas email, email, mala direta, propaganda e-mail, marketing e-mails, e-mail, mala-direta email, propaganda Visite agora: http://www.divulgamail.mx.gs digital, emails por regi?o, email segmentado, estado, campanhas e-mails, e-mails cidades, e-mails segmentados, email por estado, marketing por email, emails segmentado, divulga??o, e-mails estados, cidade, campanha e-mails, software, email segmentados, regi?o, enviar e-mails an?nimo, enviar emails an?nimo, mala direta emails, marketing email, emails segmentados, programas e-mail, e-mails por cidade, lista e-mails, propaganda, mala direta por e-mails, campanha email, software spam internet, emails Visite agora: http://www.divulgamail.mx.gs estado, publicidade e-mail, e-mail por cidades, enviar e-mail an?nimo, software propaganda internet, emails cidade, emails, campanhas emails, mala-direta e-mail, publicidade email, mala direta e-mails, e-mail regi?o, listas, listas segmentadas, marketing, marketing digital por emails, email regi?o, divulga??o e-mail, emails por cidade, mala-direta por email, marketing digital por e-mails, listas email, Visite agora: http://www.divulgamail.mx.gs mails por estado, mala direta, mala-direta, mala-direta por emails, e-mail segmentado, marketing digital emails, cidades, divulga??o e-mails, marketing, e-mail estados, cidades, marketing por e-mail, envio emails, marketing digital email, propaganda por email, envio an?nimo email, divulgue sua propaganda, propaganda digital emails, cidade, emails por cidades, e-mails segmentado, propaganda por emails, divulgar email, e-mail cidade, enviar e-mails, e-mails, cadastro emails, e-mail por cidade, envio email, cadastro, lista, envio e-mails, propaganda digital email, publicidade por e-mails, marketing digital, e-mail por regi?o, email por estados, divulga??o, emails por estados, segmentados, mala-direta emails, envio publicidade, campanhas, mala direta por emails, e-mail por estados, marketing por e- Visite agora: http://www.divulgamail.mx.gs mails, emails por estado, mala-direta e-mails, marketing digital e-mail, divulgar emails, emails regi?es, publicidade, email por regi?o, e-mails por regi?es, listas e-mail, divulga??o emails, mala-direta por e-mail, enviar e-mail, enviar email, divulga??o email, cidades, publicidade por e-mail, enviar, emails por regi?es, marketing digital por e-mail, email por cidade, campanhas email, marketing From jorge at zeropaid.com Sat Sep 11 19:54:24 2004 From: jorge at zeropaid.com (Jorge A. Gonzalez) Date: Sat Sep 11 20:35:47 2004 Subject: [spambayes-dev] Filtering deleted messages... Message-ID: <7ea501c49828$5fac7cf0$0300a8c0@Queens> Spambayes should not filter deleted messages or there should be an option not to filter deleted messages. Recently, my company saw a huge increase in spamming to our accounts, about 8,000 messages a day. Spambayes would only catch a small number of these since emails would come in at a large number at once. I noticed that Spambayes took a really long time to filter my messages when I only had 1,000 or so messages to sort though. Then I noticed that I had a large number of deleted messages in my inbox. Once I purged the deleted messages, filtering time was drastically decreased. Also, for some reason Spambayes can not process large number of emails when received. I noticed that if the total number of incoming emails exceeds 30 or so emails, Spambayes does not filter them. Jorge A. Gonzalez Web Engineer -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20040911/a80c9743/attachment.htm From tameyer at ihug.co.nz Thu Sep 16 09:17:46 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Sep 16 09:17:59 2004 Subject: [spambayes-dev] Wittel/Wu article on statistical attacks In-Reply-To: Message-ID: > Has anyone investigated the attack methods outlined in the > Wittel/Wu paper at the CEAS conference: > > http://ceas.cc/papers-2004/170.pdf > > It's not obvious to me why SpamBayes should have performed as > poorly as the authors indicated. In particular, they were > adding common dictionary words which should have just added > non-extreme words which should have for the most part been > ignored (spamprobs between 0.4 and 0.6). I had a little look at this when I first read the paper, but haven't had a chance to have a proper look at it. Concerns that I have: * It seems (it's not clear) that they did train-on-everything, which isn't great, particularly (I think) for this type of spam. * Mixed-corpus testing is not a good idea, and it appears that that's what's done here. * There's only a from and subject header in the base test message. That's losing a *lot* of header info. * The list of common English words is "slightly modified by removing spammy words". This means it's actually a list of words that they feel are hammy or neutral. It's hard to know how this effects it. Attached is a script I wrote to try and duplicate the test. I'm running this at the moment, but it's taking a while (I didn't write it for speed!), so I'll post results when I have them. If they do match Wittel/Wu, then I might have a look to see if different training methods have an effect or not. Suggestions for improvements in the script (or errors!) are welcome, of course. There are a few hard-coded locations, but it should be simple enough to make sense of. =Tony Meyer -------------- next part -------------- A non-text attachment was scrubbed... Name: wittelwu.py Type: application/octet-stream Size: 4823 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20040916/1aae42d5/wittelwu.obj From gabosgab at gmail.com Thu Sep 16 23:16:12 2004 From: gabosgab at gmail.com (Brown Gabe) Date: Thu Sep 16 23:16:15 2004 Subject: [spambayes-dev] Suggested Feature? Message-ID: <7750f8d04091614167fc396c0@mail.gmail.com> In future versions of SpamBayes, do you think it would be a good idea to have a central repository that clients could connect incrementally to get updates on what spam is? I was thinking the server could have some sort of way to collect and find relationships between all user data and form rules based on all data from all users. This is a HUGE undertaking, but just a wild idea. Your thoughts? I've been using SpamBayes for over a year now and I love it! Great work guys! -Gabe From tameyer at ihug.co.nz Fri Sep 17 03:40:19 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Fri Sep 17 03:40:30 2004 Subject: [spambayes-dev] Wittel/Wu article on statistical attacks In-Reply-To: Message-ID: Results (using the script in my previous message, with only minor changes): Using a corpus made up of ham from the SA public archive and spam from there and the SpamArchive.org collection (randomly selected), as in the paper, with 3000+3000 toe, and the 10,000 common words referenced in the paper, I get worse results: (All SpamBayes defaults, basically current CVS code). Base message scores: 0.927778857491 Words Spam Ham Unsure 10 142 0 858 25 28 39 933 50 7 319 674 100 0 781 219 200 0 986 14 300 0 997 3 400 0 975 25 However, the base message's score is nowhere near certain spam, so it's not particularly surprising that adding random words drops the message into the unsures. I'm not sure why they end up ham rather than spam, though. Lacking headers is significant, of course. With a rough 'nonedge' training system, I get: Base message scores: 0.898231181245 Words Spam Ham Unsure 10 16 0 984 25 0 90 910 50 0 721 279 100 0 985 15 200 0 999 1 300 0 900 100 400 0 600 400 And with a rough 'fpfnunsure' training system, I get: Base message scores: 0.870322335645 Words Spam Ham Unsure 10 13 1 986 25 0 486 514 50 0 892 108 100 0 998 2 200 0 1000 0 300 0 992 8 400 0 625 375 Both of these seem to be heading back towards the message being unsure, rather than ham. However, if I use a ham/spam corpus of my own (but the same base message), then I get: [toe] Base message scores: 0.998165135488 Words Spam Ham Unsure 10 941 0 59 25 792 0 208 50 657 0 343 100 505 0 495 200 387 0 613 300 97 0 903 400 173 0 827 [nonedge] Base message scores: 0.987644137822 Words Spam Ham Unsure 10 847 0 153 25 751 0 249 50 706 0 294 100 719 0 281 200 840 0 160 300 904 0 96 400 954 0 46 [fpfnunsure] Base message scores: 0.999791130851 Words Spam Ham Unsure 10 981 0 19 25 894 0 106 50 817 0 183 100 780 0 220 200 823 0 177 300 865 0 135 400 918 0 82 These are probably more the results one would expect... =Tony Meyer From jepler at unpythonic.net Fri Sep 17 04:04:57 2004 From: jepler at unpythonic.net (Jeff Epler) Date: Fri Sep 17 04:05:18 2004 Subject: [spambayes-dev] Suggested Feature? In-Reply-To: <7750f8d04091614167fc396c0@mail.gmail.com> References: <7750f8d04091614167fc396c0@mail.gmail.com> Message-ID: <20040917020457.GA15212@unpythonic.net> At first blush, this should be "easy" if you write the "central repository" as a new database back-end for spambayes, with some kind of delayed update system. However, I suspect this setup won't be great for many users. A very obvious example is that To: and From: headers are used by Spambayes as token sources, and the spammy and hammy values are very individual. (In fact, because I get much more list traffic than legitimate personal e-mail, messages addressed to me are spammy, on average) Not everyone gets the same kinds of messages, either. For instance, many folks would get lots of legitimate messages about "wedding", while that's probably a spam clue for me. Then there's the need to keep "the spammers" from submitting messages in order to break the system, and the need to preserve users' privacy. Lots of websites seem to have a facility to send forgotten passwords in e-mail, which would make those words show up in the shared database. Do I really want to send token lists that come from all the love letters I get in e-mail out to some third party? Finally, why would I trust/want to use a service like this, when I built a very good spambayes database with only a modest amount of training effort? Larger databases are not clearly better than smaller databases, and you're basically proposing the largest possible database... Jeff -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20040916/06e090f6/attachment.pgp From ta-meyer at ihug.co.nz Fri Sep 17 08:24:29 2004 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Fri Sep 17 08:24:38 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) Message-ID: If you want something done... ;) I've put together a 1.0 binary release. Apart from the Inno file, this is exactly what's in CVS, and that's actually exactly what 1.0rc2 was anyway, apart from version numbers. However, this is the first time I've built the binary (Mark has in the past), although I've put together the last few source packages. I built it with OL2K, as one is meant to, and have tested sb_server & the plug-in on Win2K/OL2K and WinXP/OL2002. If anyone else has time to quickly test either sb_server or the plug-in, that would be fantastic. (A light test should be all that's needed, since it's really just the build/install process that needs to be tested). If all appears to be well (I can test it with one further system) then I'll probably update the website and send out the announce emails next week. Thanks! (If anyone wants to check the source dist's, feel free. They were built just like 1.0rc2 and prior, though, and also haven't had any non-version-number changes). =Tony Meyer From pcumming at yahoo.com Fri Sep 17 15:18:20 2004 From: pcumming at yahoo.com (Peter Cumming) Date: Fri Sep 17 15:18:23 2004 Subject: [spambayes-dev] Potential bug? Message-ID: <20040917131820.84310.qmail@web52904.mail.yahoo.com> I installed SpamBayes under Windows XP and Outlook 2003. After I uninstalled it and rebooted, the icons still appeared for SpamBayes in Outlook. Also I could not delete the "Junk E-mail" folder. Even tried looking in the registry to know avail. Thanks for an update on this. Sincerely Peter From kenny.pitt at gmail.com Fri Sep 17 15:29:10 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Fri Sep 17 15:29:17 2004 Subject: [spambayes-dev] Potential bug? In-Reply-To: <20040917131820.84310.qmail@web52904.mail.yahoo.com> References: <20040917131820.84310.qmail@web52904.mail.yahoo.com> Message-ID: <2a052b9904091706294199cb78@mail.gmail.com> Peter Cumming wrote: > I installed SpamBayes under Windows XP and Outlook > 2003. After I uninstalled it and rebooted, the icons > still appeared for SpamBayes in Outlook. Also I could > not delete the "Junk E-mail" folder. Even tried > looking in the registry to know avail. Failure to remove the toolbar is a known issue. See FAQ 3.16: http://spambayes.sourceforge.net/faq.html#how-do-i-uninstall-the-plug-in In Outlook 2003, the "Junk E-mail" folder is a special folder created by Outlook. Outlook handles it similarly to how it handles the "Deleted Items" folder, and does not allow it to be deleted. This is not specific to SpamBayes. -- Kenny Pitt From kenny.pitt at gmail.com Fri Sep 17 15:52:06 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Fri Sep 17 15:52:13 2004 Subject: [spambayes-dev] Suggested Feature? In-Reply-To: <7750f8d04091614167fc396c0@mail.gmail.com> References: <7750f8d04091614167fc396c0@mail.gmail.com> Message-ID: <2a052b9904091706527a5fcbcd@mail.gmail.com> Brown Gabe wrote: > In future versions of SpamBayes, do you think it would be a good idea > to have a central repository that clients could connect incrementally > to get updates on what spam is? I was thinking the server could have > some sort of way to collect and find relationships between all user > data and form rules based on all data from all users. Jeff made a couple of good points in a previous reply. First, part of the power of SpamBayes is that it is personalized to what you consider spam, which may be different from someone else's spam. Second, submitting your good messages to a central service raises a lot of privacy issues. If you are only submitting spam messages, there are already other services that do that. As you said, this would be a huge undertaking, and personally I would rather see us stick to writing software and not hosting large-scale centralized services. I'd prefer to leave that business to people like SpamNet. -- Kenny Pitt From richie at entrian.com Fri Sep 17 22:21:04 2004 From: richie at entrian.com (Richie Hindle) Date: Fri Sep 17 22:21:14 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: References: Message-ID: Hi Tony, > If you want something done... ;) I'd have volunteered but I don't have Outlook. > I don't know whether it's just me, but I can't download that. I go the URL and a SourceForge download page appears, saying: Your download should begin shortly. If it does not, try http://ovh.dl.sourceforge.net/sourceforge/spambayes/spambayes-1.0.exe or choose a different mirror If I wait, the page redirects back to itself. If I click that link, it takes me to the same page again. If I choose a different mirror, I get the same behaviour from that mirror. If this is a general problem with SF I'll be happy to host the downloads on entrian.com until it can be sorted out - just email me the files. If it's not a problem for other people, does anyone have a clue as to why this is happening to me? 8-) -- Richie Hindle richie@entrian.com From kenny.pitt at gmail.com Sun Sep 19 19:50:27 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Sun Sep 19 19:50:35 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: References: Message-ID: <2a052b990409191050371ceae1@mail.gmail.com> Richie Hindle wrote: > I don't know whether it's just me, but I can't download that. I go the > URL and a SourceForge download page appears, saying: > > Your download should begin shortly. If it does not, try > http://ovh.dl.sourceforge.net/sourceforge/spambayes/spambayes-1.0.exe > or choose a different mirror > > If I wait, the page redirects back to itself. If I click that link, it > takes me to the same page again. If I choose a different mirror, I get > the same behaviour from that mirror. > > If this is a general problem with SF I'll be happy to host the downloads > on entrian.com until it can be sorted out - just email me the files. If > it's not a problem for other people, does anyone have a clue as to why > this is happening to me? 8-) It's not just you, I had the exact same result. -- Kenny Pitt From tameyer at ihug.co.nz Mon Sep 20 00:44:46 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Sep 20 00:44:56 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: Message-ID: [Richie] > I don't know whether it's just me, but I can't download that. I go > the URL and a SourceForge download page appears, saying: > > Your download should begin shortly. If it does not, try > > http://ovh.dl.sourceforge.net/sourceforge/spambayes/spambayes-1.0.exe > or choose a different mirror > > If I wait, the page redirects back to itself. If I click that link, > it takes me to the same page again. If I choose a different mirror, I > get the same behaviour from that mirror. [Kenny] > It's not just you, I had the exact same result. Odd. It works with umn, but doesn't appear to work with any of the others. I'll play around with this a bit and try and figure out why this is, and update here once it's working properly. Sorry I didn't get to this during the weekend when people may have had more time... (And thanks for testing this much - otherwise there'd be 100 spambayes@python messages about this... ;) =Tony Meyer From tameyer at ihug.co.nz Mon Sep 20 06:24:04 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Sep 20 06:24:15 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: Message-ID: [Tony] >> If you want something done... ;) [Richie] > I'd have volunteered but I don't have Outlook. That's a pretty good excuse, really! :) (In fact, I believe you have to have not just Outlook, but Outlook 2000...) [download problem] Well, I don't know what was wrong, but I think I've fixed it (I removed the file and uploaded it again, and the file isn't hidden, although I don't think it was that). Anyway, it downloads from the 4 or 5 mirrors I just tried, so hopefully will for anyone else, too. =Tony Meyer From tameyer at ihug.co.nz Mon Sep 20 07:47:04 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Sep 20 07:47:10 2004 Subject: [spambayes-dev] RE: Delivery Status Notification (Failure) In-Reply-To: Message-ID: I (and I presume everyone else) have been getting one of these messages for every post to spambayes@python.org for a long time now (well over a month). They're getting a bit annoying, and are probably confusing for first-time posters. Do the mail admin experts here have any advice about what could be done? If the only suggestion is removing this address from the subscription list, is that a reasonable thing to do? =Tony Meyer > -----Original Message----- > From: postmaster@fresq.nl [mailto:postmaster@fresq.nl] > Sent: Monday, 20 September 2004 5:42 p.m. > To: tameyer@ihug.co.nz > Subject: Delivery Status Notification (Failure) > > > This is an automatically generated Delivery Status Notification. > > Unable to deliver message to the following recipients, > because the message was forwarded more than the maximum > allowed times. This could indicate a mail loop. > > w.korpershoek@fresq.nl > > > > From richie at entrian.com Mon Sep 20 10:45:52 2004 From: richie at entrian.com (Richie Hindle) Date: Mon Sep 20 10:46:04 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: References: Message-ID: <2s3tk0t0192m602457vjgtou14bjdq8jgk@4ax.com> [Tony] > Well, I don't know what was wrong, but I think I've fixed it Yup - I can download it now. The following refers to spambayes-1.0.exe: I tried installing it on my virgin VMWare Win98 machine, and it exited with the following traceback in the log: Traceback (most recent call last): File "pop3proxy_tray.py", line 100, in ? File "sb_server.pyc", line 104, in ? File "spambayes\message.pyc", line 103, in ? File "spambayes\dbmstorage.pyc", line 3, in ? File "spambayes\Options.pyc", line 1214, in ? File "spambayes\Options.pyc", line 1183, in load_options pywintypes.com_error: (-2147467263, 'Not implemented', None, None) This is because SHGetFolderPath isn't available on Win98 pre-IE5. I don't know whether we care about that - there are probably no such machines in the real world. I uninstalled SB, installed IE5.5, reinstalled SB, and the server started but then silently quit. I restarted it and was able to configure it, but I couldn't connect. Here's the log: Loading database... User interface url is http://localhost:8880/ Unhandled exception in thread started by > Traceback (most recent call last): File "pop3proxy_tray.py", line 407, in _ProxyThread File "sb_server.pyc", line 892, in start File "sb_server.pyc", line 870, in main File "spambayes\ProxyUI.pyc", line 154, in __init__ File "spambayes\UserInterface.pyc", line 269, in __init__ File "spambayes\UserInterface.pyc", line 137, in __init__ File "spambayes\UserInterface.pyc", line 255, in readUIResources File "spambayes\resources\__init__.pyc", line 30, in ? File "resourcepackage\package.pyc", line 100, in scan WindowsError: [Errno 3] The system cannot find the path specified: 'C:\\PROGRAM FILES\\SPAMBAYES\\lib\\spambayes.zip\\spambayes\\resources/*.*' Loading database... User interface url is http://localhost:8880/ Loading database... Listener on port 110 is proxying pop3.demon.co.uk:110 error: uncaptured python exception, closing channel (exceptions.KeyError:('pop3proxy', 'allow_remote_connections') [asyncore.pyc|read|69] [asyncore.pyc|handle_read_event|384] [spambayes\Dibbler.pyc|handle_accept|284] [sb_server.pyc|__init__|374] [sb_server.pyc|__init__|191] [sb_server.pyc|onIncomingConnection|206] [spambayes\OptionsClass.pyc|__getitem__|604] [spambayes\OptionsClass.pyc|get|601] [spambayes\OptionsClass.pyc|get_option|595]) error: uncaptured python exception, closing channel (exceptions.AttributeError:'_socketobject' object has no attribute 'isClosed' [asyncore.pyc|read|69] [asyncore.pyc|handle_read_event|390] [asynchat.pyc|handle_read|88] [sb_server.pyc|recv|397] [asyncore.pyc|recv|346] [asynchat.pyc|handle_close|155] [sb_server.pyc|close|405] [asyncore.pyc|__getattr__|365]) Unhandled exception in thread started by > Traceback (most recent call last): File "pop3proxy_tray.py", line 407, in _ProxyThread File "sb_server.pyc", line 892, in start File "sb_server.pyc", line 872, in main File "spambayes\Dibbler.pyc", line 701, in run File "asyncore.pyc", line 193, in loop File "asyncore.pyc", line 119, in poll File "asyncore.pyc", line 73, in read File "spambayes\Dibbler.pyc", line 204, in handle_error File "asyncore.pyc", line 420, in handle_error File "sb_server.pyc", line 405, in close File "asyncore.pyc", line 365, in __getattr__ AttributeError: '_socketobject' object has no attribute 'isClosed' That last exception happens whenever I try to connect to the POP3 proxy port, whether it's with my email client or telnet. The only configuration changes I've made are to add pop3.demon.co.uk / 110 to the POP3 proxies. I know I should try to fix these things myself, but I just don't have the time at the moment. -- Richie Hindle richie@entrian.com From kenny.pitt at gmail.com Mon Sep 20 20:34:04 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Mon Sep 20 20:34:10 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: <2s3tk0t0192m602457vjgtou14bjdq8jgk@4ax.com> References: <2s3tk0t0192m602457vjgtou14bjdq8jgk@4ax.com> Message-ID: <2a052b990409201134142984af@mail.gmail.com> Richie Hindle wrote: > I uninstalled SB, installed IE5.5, reinstalled SB, and the server > started but then silently quit. I restarted it and was able to > configure it, but I couldn't connect. Here's the log: > > [snip details] > File "spambayes\resources\__init__.pyc", line 30, in ? > File "resourcepackage\package.pyc", line 100, in scan > WindowsError: [Errno 3] The system cannot find the path specified: 'C:\\PROGRAM FILES\\SPAMBAYES\\lib\\spambayes.zip\\spambayes\\resources/*.*' Just guessing because I haven't had a chance to test this out myself, but this sounds like the binary was built with the resourcepackage scanning still enabled. IIRC, you need to replace __init__.py in spambayes/resources with a 0-length file before you build the release binary to prevent this. -- Kenny Pitt From kenny.pitt at gmail.com Mon Sep 20 21:50:31 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Mon Sep 20 21:50:40 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: <2a052b990409201134142984af@mail.gmail.com> References: <2s3tk0t0192m602457vjgtou14bjdq8jgk@4ax.com> <2a052b990409201134142984af@mail.gmail.com> Message-ID: <2a052b9904092012503c0e0584@mail.gmail.com> Earlier, I wrote: > Richie Hindle wrote: > > I uninstalled SB, installed IE5.5, reinstalled SB, and the server > > started but then silently quit. I restarted it and was able to > > configure it, but I couldn't connect. Here's the log: > > > > [snip details] > > File "spambayes\resources\__init__.pyc", line 30, in ? > > File "resourcepackage\package.pyc", line 100, in scan > > WindowsError: [Errno 3] The system cannot find the path specified: 'C:\\PROGRAM FILES\\SPAMBAYES\\lib\\spambayes.zip\\spambayes\\resources/*.*' > > Just guessing because I haven't had a chance to test this out myself, > but this sounds like the binary was built with the resourcepackage > scanning still enabled. IIRC, you need to replace __init__.py in > spambayes/resources with a 0-length file before you build the release > binary to prevent this. I compiled a 0-length __init__.py in my source dir and used it to replace the spambayes/resources/__init__.pyc in spambayes.zip. I also deleted the .pyc files for resourcepackage. After this change, sb_server remained running and I was able to access the UI. When I tried to proxy through it to retrieve mail, however, I got the following error in the log file (which looks remarkably similar to the error Richie reported): """ error: uncaptured python exception, closing channel (exceptions.KeyError:('pop3proxy', 'allow_remote_connections') [asyncore.pyc|read|69] [asyncore.pyc|handle_read_event|384] [spambayes\Dibbler.pyc|handle_accept|284] [sb_server.pyc|__init__|374] [sb_server.pyc|__init__|191] [sb_server.pyc|onIncomingConnection|206] [spambayes\OptionsClass.pyc|__getitem__|604] [spambayes\OptionsClass.pyc|get|601] [spambayes\OptionsClass.pyc|get_option|595]) """ -- Kenny Pitt From skip at pobox.com Mon Sep 20 22:51:43 2004 From: skip at pobox.com (Skip Montanaro) Date: Mon Sep 20 22:52:00 2004 Subject: [spambayes-dev] RE: Delivery Status Notification (Failure) In-Reply-To: References: Message-ID: <16719.17119.812896.584917@montanaro.dyndns.org> Tony> I (and I presume everyone else) have been getting one of these Tony> messages for every post to spambayes@python.org for a long time Tony> now (well over a month). They're getting a bit annoying, and are Tony> probably confusing for first-time posters. But for those of use who administer any public mailing lists they are way down in the noise. I get so much other crap I hadn't even noticed these. Anything with a subject containing "deliver status notification" winds up in my errors mailbox, where I routinely delete after determining its not a notice that *my* mail server isn't hosed. Tony> Do the mail admin experts here have any advice about what could be Tony> done? If the only suggestion is removing this address from the Tony> subscription list, is that a reasonable thing to do? I went looking to do just that. I didn't see that email address, so assume Tim beat me to it. Skip From tameyer at ihug.co.nz Tue Sep 21 00:17:32 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Sep 21 00:17:37 2004 Subject: [spambayes-dev] RE: Delivery Status Notification (Failure) In-Reply-To: Message-ID: [Tony] >> Do the mail admin experts here have any advice about what could be >> done? If the only suggestion is removing this address from the >> subscription list, is that a reasonable thing to do? [Skip] > I went looking to do just that. I didn't see that email > address, so assume Tim beat me to it. Weird. I can't find it either, but the bounces are still arriving. I suppose that means that they are subscribed under some address that forwards to the problem address. Oh well, I guess we live with it. SpamBayes considers these spam for me (I only administer teeny little mailing lists, but still get plenty of bogus bounces), but I still think that it'd confuse first-time spambayes@python.org posters. Thanks for looking into it, though! =Tony Meyer From tameyer at ihug.co.nz Tue Sep 21 00:26:28 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Sep 21 00:26:36 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: Message-ID: > I compiled a 0-length __init__.py in my source dir and used > it to replace the spambayes/resources/__init__.pyc in > spambayes.zip. I also deleted the .pyc files for resourcepackage. Drat. I had forgotten about changing the __init__.py file, but I did uninstall resourcepackage after the first binary I built didn't work. So I guess this means that I managed to upload that one, and not the one I built after that (since the resourcepackage .pyc files couldn't have been in there - and py2exe did say that it couldn't find it). I'll redo it with the proper __init__, though (and I'm writing a "how to build the binary" section for the readme-devel, so this is easier next time! > After this change, sb_server remained running and I was able > to access the UI. When I tried to proxy through it to > retrieve mail, however, I got the following error in the log > file (which looks remarkably similar to the error Richie reported): > > """ > error: uncaptured python exception, closing channel > > (exceptions.KeyError:('pop3proxy', > 'allow_remote_connections') [asyncore.pyc|read|69] > [asyncore.pyc|handle_read_event|384] > [spambayes\Dibbler.pyc|handle_accept|284] > [sb_server.pyc|__init__|374] [sb_server.pyc|__init__|191] > [sb_server.pyc|onIncomingConnection|206] > [spambayes\OptionsClass.pyc|__getitem__|604] > [spambayes\OptionsClass.pyc|get|601] > [spambayes\OptionsClass.pyc|get_option|595]) > """ I'll look into this. I only did fairly superficial testing (i.e. the various bits of the ui) since I figured that nothing else would be different from rc2. I'll do better testing this time, too. Thanks again to you & Richie. =Tony Meyer From sethg at GoodmanAssociates.com Tue Sep 21 01:31:26 2004 From: sethg at GoodmanAssociates.com (Seth Goodman) Date: Tue Sep 21 01:31:17 2004 Subject: [spambayes-dev] RE: Delivery Status Notification (Failure) In-Reply-To: Message-ID: > From: spambayes-dev-bounces+sethg=goodmanassociates.com@python.org > [mailto:spambayes-dev-bounces+sethg=goodmanassociates.com@python.org]On > Behalf Of Tony Meyer > Sent: Monday, September 20, 2004 5:18 PM <...> > Weird. I can't find it either, but the bounces are still arriving. I > suppose that means that they are subscribed under some address > that forwards to the problem address. Your list uses VERP bounce addresses, so the return-path includes the subscriber address. After one or more forwards, the end user's MTA still bounces to the same address. If the message I am replying to had bounced, the bounce would come back to your MTA with a 2821 envelope of: MAIL FROM:<> RCPT TO: Your list software is supposed to route that bounce to the admin so you can unsubscribe the dead account. There is no way that a bounce should get distributed to the list, unless it really was a post, not a DSN. What do the headers look like? A real DSN should come from a 2821 null-sender with a 2822 From: that is typically . While we're on the subject, I noticed that your list also sets the Sender: header to the VERP bounce address. It would be a little nicer to set Sender: to the list owner address, perhaps , so that recipients don't see their own address as part of the originating address displayed by the MUA (see what Outlook shows on the first line of this post). Just my 2 centavos. -- Seth Goodman From sethg at GoodmanAssociates.com Tue Sep 21 01:56:05 2004 From: sethg at GoodmanAssociates.com (Seth Goodman) Date: Tue Sep 21 01:55:53 2004 Subject: [spambayes-dev] RE: Delivery Status Notification (Failure) In-Reply-To: Message-ID: > From: spambayes-dev-bounces@python.org > [mailto:spambayes-dev-bounces@python.org]On Behalf Of Seth Goodman > Sent: Monday, September 20, 2004 6:31 PM > To: Tony Meyer; skip@pobox.com > Cc: spambayes-dev@python.org <...> Wouldn't you know it, but the message I replied to is one of very few from your listserve with a VERP return-path (and matching Sender). All the other messages just set the return-path and Sender: to . It almost looks like it wants to set a VERP return-path but some internal system handed it the message with a null-sender, so the resulting VERP address is missing the poster's address. Now that's bizarre. How do only a few of the list messages get VERP return-paths while the rest have plain return-paths? Could it be that your backup MTA does VERP but your primary does not? -- Seth Goodman From tameyer at ihug.co.nz Tue Sep 21 03:54:27 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Sep 21 03:54:40 2004 Subject: [spambayes-dev] RE: Delivery Status Notification (Failure) In-Reply-To: Message-ID: > Your list uses VERP bounce addresses, so the return-path includes > the subscriber address. After one or more forwards, the end > user's MTA still bounces to the same address. If the message I > am replying to had bounced, the bounce would come back to your > MTA with a 2821 envelope of: > > MAIL FROM:<> > RCPT TO: > > Your list software is supposed to route that bounce to the admin so > you can unsubscribe the dead account. There is no way that a bounce > should get distributed to the list, unless it really was a post, not > a DSN. These messages aren't going to the list, they're going to anyone that posts to the list (i.e. ta-meyer@ihug.co.nz gets the messages when I post with that address, not all of the spambayes@python.org recipients). I don't know whether mail.python.org is getting bounces or not - I presume not, or mailman would be doing something about it. > What do the headers look like? This (and VERP) was the clue I needed, though (thanks!). The (relevant) headers of the attached (bounced) messages are (for one such message): """ To: "'blakemail at gmx.net'" <"blakemail at gmx.net"@smtp.hispeed.ch>, X-MDRemoteIP: 62.250.3.117 X-MDRcpt-To: werner@wekorpershoek.nl X-MDRedirect: 1 X-MDaemon-Deliver-To: w.korpershoek@fresq.nl """ So this means that the problem address is "blakemail at gmx.net", right? That address *is* subscribed to the list. Could someone that understands this better than me confirm this? If so, then unsubscribing that address is ok, right? (I presume that attempting to email either the fresq.nl or gmx.net address would be futile). =Tony Meyer From tameyer at ihug.co.nz Tue Sep 21 10:20:19 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Sep 21 10:20:25 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: Message-ID: > After this change, sb_server remained running and I was able > to access the UI. When I tried to proxy through it to > retrieve mail, however, I got the following error in the log > file (which looks remarkably similar to the error Richie reported): > > """ > error: uncaptured python exception, closing channel > > (exceptions.KeyError:('pop3proxy', > 'allow_remote_connections') [asyncore.pyc|read|69] [...] D'oh! I finally figured out what the problem was here. I had forgotten that the machine I was using to build the binary (which I seldom use, but has the required Outlook 2000) had an installed copy of spambayes from some work I did a while back. py2exe was picking *this* spambayes up, rather than the nice fresh 1.0 copy that I had checked out. So the problem was incompatibility between the scripts (which were the right ones) and the spambayes package. So I've fixed this, built a new binary, and uploaded it. I put this one through more rigorous testing, so it should work, I think. But if you two can spare more time, it would be great to have another test... Thanks! =Tony Meyer From richie at entrian.com Tue Sep 21 11:43:43 2004 From: richie at entrian.com (Richie Hindle) Date: Tue Sep 21 11:42:43 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) Message-ID: [Tony] > the machine I was using to build the binary (which I seldom use, but > has the required Outlook 2000) had an installed copy of spambayes from some > work I did a while back. py2exe was picking *this* spambayes up, rather > than the nice fresh 1.0 copy that I had checked out. That's why I keep a clean machine in VMWare. It's not cheap (even when you're paying in UK pounds 8-) but it pays for itself in saved time by preventing this kind of thing from happening. > But if you two > can spare more time, it would be great to have another test... Will do, this evening. -- Richie Hindle richie@entrian.com From kenny.pitt at gmail.com Tue Sep 21 19:08:52 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Tue Sep 21 19:08:54 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: References: Message-ID: <2a052b9904092110084c233c32@mail.gmail.com> Tony Meyer wrote: > So I've fixed this, built a new binary, and uploaded it. I put this one > through more rigorous testing, so it should work, I think. But if you two > can spare more time, it would be great to have another test... OK, I tried both the Outlook add-in and the POP3 proxy. It wasn't rigorous testing, to be sure, but basic functionality seems to work fine in both. I did notice that the version number in SpamBayes Manager shows as "1.0 (July 2004)". I know the source code hasn't really changed since then so I'm fine with leaving that date on it, but I thought I'd point it out just in case it was an oversite. -- Kenny Pitt From sethg at GoodmanAssociates.com Tue Sep 21 23:30:29 2004 From: sethg at GoodmanAssociates.com (Seth Goodman) Date: Tue Sep 21 23:30:34 2004 Subject: [spambayes-dev] RE: Delivery Status Notification (Failure) In-Reply-To: Message-ID: > From: spambayes-dev-bounces@python.org > [mailto:spambayes-dev-bounces@python.org]On Behalf Of Tony Meyer > Sent: Monday, September 20, 2004 8:54 PM <...> > > What do the headers look like? > > This (and VERP) was the clue I needed, though (thanks!). The (relevant) > headers of the attached (bounced) messages are (for one such message): I didn't get an attachment either from the direct email from you or from the list. Try again? Perhaps that gmx account was the culprit, but you can't tell without the rest of it. As an interesting aside, I don't seem to get any such bounce messages when I post, unless I'm failing to notice them in my spam folder. The scenario sounds like when you post to the list, the list distributes the mail and then some broken MTA bounces back to the original poster instead of the list. Since the return-paths all go to python.org, whether or not they are VERP, you are dealing with a broken MTA. I have heard of some antiquated, broken MTA's that bounce to the 2822 From: address instead of the 2821 MAIL FROM: address. This is the most likely scenario for what you describe, and deleting the user is definitely in order. If you want to be really nice, you could write a note to the postmaster for the bouncing domain and let them know that their bounce practices are non-compliant with RFC821 and 2821 and will abuse virtually every modern mailing list should an account disappear that has active list subscriptions. For a large list, this is a lot of abuse. Make sure to use the domain that sends the bounce, not the domain of the subscribed address, since the latter may be completely innocent in case of forwarding. Unfortunately, five will get you ten that a system this broken doesn't have a working postmaster address, but it's worth a try. I'm still curious why only a few of the list postings get distributed with a VERP address and the rest don't. Any idea on this? If there is no VERP return-path, the list software can't automatically unsubscribe dead accounts, which is a nice feature. -- Seth Goodman From tameyer at ihug.co.nz Wed Sep 22 03:15:38 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Sep 22 03:15:50 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: Message-ID: > OK, I tried both the Outlook add-in and the POP3 proxy. It > wasn't rigorous testing, to be sure, but basic functionality > seems to work fine in both. Great, thanks. > I did notice that the version number in SpamBayes Manager > shows as "1.0 (July 2004)". I know the source code hasn't > really changed since then so I'm fine with leaving that date > on it, but I thought I'd point it out just in case it was an oversite. I noticed that too, when I tested on a second machine. I wondered about changing it and started to, then decided that I shouldn't. The source hasn't changed at all since then (July was when I put the source releases together - so they show that date, too - and when we had hoped that Mark would find the time to build the binary). It's easy to change if anyone thinks it should be, but I'm uncertain whether it should be (and since I'd also have to rebuild the source, didn't). =Tony Meyer From tameyer at ihug.co.nz Wed Sep 22 08:07:39 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Sep 22 08:07:49 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: Message-ID: [Richie] > This is because SHGetFolderPath isn't available on Win98 pre-IE5. I > don't know whether we care about that - there are probably no such > machines in the real world. I don't care :) Given that this isn't new, I'm guessing that it's not much of an issue. > That's why I keep a clean machine in VMWare. I'm not familiar with VMWare. Is it the stuff at www.vmware.com? > It's not cheap (even when you're paying in UK pounds 8-) > but it pays for itself in saved time by preventing this > kind of thing from happening. I probably can't justify it, sadly, since SpamBayes is really my only use case. It's pretty rare that I create anything else that needs to be used outside of my own systems. I gather you would do the actual build in VMWare, and not just testing? (This is rather off-topic, but I'm curious). =Tony Meyer From tameyer at ihug.co.nz Wed Sep 22 08:12:25 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Sep 22 08:12:34 2004 Subject: [spambayes-dev] RE: Delivery Status Notification (Failure) In-Reply-To: Message-ID: [Tony] >> This (and VERP) was the clue I needed, though (thanks!). The >> (relevant) headers of the attached (bounced) messages are (for one >> such message): [Seth] > I didn't get an attachment either from the direct email from > you or from the list. Try again? Re-reading that, I see it's rather unclear. By attached, I meant attached to the bounce messages, not attached to the message I was sending. > was the culprit, but you can't tell without the rest of it. > As an interesting aside, I don't seem to get any such bounce > messages when I post, unless I'm failing to notice them in my > spam folder. Hmm...back to the 'just me' scenario. Does anyone else get these? It's just for any message posted to spambayes@python.org (not this list). > The scenario sounds like when you post to the list, the list > distributes the mail and then some broken MTA bounces back to > the original poster instead of the list. Yes, I agree. > This is the > most likely scenario for what you describe, and deleting the > user is definitely in order. If you want to be really nice, > you could write a note to the postmaster for the bouncing > domain [...] I'm out of time this week, but will do this (both) next week. > I'm still curious why only a few of the list postings get > distributed with a VERP address and the rest don't. Any idea on this? No, I've avoided commenting on that because I'm completely ignorant. I'm only guessing, but I presume that the spambayes lists are setup basically the same as the other mail.python.org lists, so someone like Barry would be the most knowledgeable, probably. I'm guessing that they're not reading this thread and/or are busy at the moment. =Tony Meyer From nas at arctrix.com Wed Sep 22 08:44:41 2004 From: nas at arctrix.com (Neil Schemenauer) Date: Wed Sep 22 08:44:44 2004 Subject: [spambayes-dev] RE: Delivery Status Notification (Failure) In-Reply-To: References: Message-ID: <20040922064441.GA17241@mems-exchange.org> [Seth] > I'm still curious why only a few of the list postings get > distributed with a VERP address and the rest don't. Any idea on this? It is a feature of Mailman. It uses VERP so it can detect dead addresses but doesn't always use it in order to conserve bandwidth. How much bandwidth is saved is open for debate. Mailman has an option to enable VERP for every delivery. Neil From richie at entrian.com Wed Sep 22 09:07:13 2004 From: richie at entrian.com (Richie Hindle) Date: Wed Sep 22 09:07:18 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: References: Message-ID: [Tony] > I'm not familiar with VMWare. Is it the stuff at www.vmware.com? Yes, specifically http://www.vmware.com/products/desktop/ws_features.html VMWare Workstation is a software emulation of a PC's hardware, so you can install a whole operating system within it to have a machine within a machine. When I run Linux, it's within VMWare running on my Windows machine. Likewise Windows 98 (because my real OS is XP). > I gather you would do the actual build in VMWare, and not just testing? > (This is rather off-topic, but I'm curious). Yes, to make sure I wasn't picking up any cruft from last time. VMWare lets you take a snapshot of the virtual machine, so I have a snapshot taken at the end of installing Windows 98. I've tested the new build and it all works fine for me, except for uploading an Outlook Express mailbox for bulk training - it only trained on one message out of the mailbox. Has anyone else seen this? My other problem is probably not our fault (and probably the result of trying to make by brain work first thing in the morning) but I set up an Outlook Express filter for To: "spam," and it didn't fire, even though SpamBayes was correctly prepending "spam," to the To: header. Am I missing something? The filer was set to work on the Inbox, and to move messages to a "Possible Spam" folder, but they are staying in the Inbox. SpamBayes is working as advertised, but if there's something wrong with Outlook Express that prevents the feature from being useful, it needs a rethink. As I say, it's probably me. 8-) -- Richie Hindle richie@entrian.com From sethg at GoodmanAssociates.com Wed Sep 22 09:09:26 2004 From: sethg at GoodmanAssociates.com (Seth Goodman) Date: Wed Sep 22 09:09:23 2004 Subject: [spambayes-dev] RE: Delivery Status Notification (Failure) In-Reply-To: <20040922064441.GA17241@mems-exchange.org> Message-ID: > From: Neil Schemenauer > Sent: Wednesday, September 22, 2004 1:45 AM > To: spambayes-dev@python.org > Subject: Re: [spambayes-dev] RE: Delivery Status Notification (Failure) > > > [Seth] > > I'm still curious why only a few of the list postings get > > distributed with a VERP address and the rest don't. Any idea on this? > > It is a feature of Mailman. It uses VERP so it can detect dead > addresses but doesn't always use it in order to conserve bandwidth. > How much bandwidth is saved is open for debate. Mailman has an > option to enable VERP for every delivery. That's a curious feature, and I agree it is debatable if it saves any significant bandwidth. The other odd behavior of this list is that it sets Sender: to the return-path, even when it uses a VERP address, rather than the address of the list owner. This makes for a pretty unusual display of originating address in the MUA when the return-path is a VERP address. Is that also configurable? -- Seth Goodman From barry at python.org Wed Sep 22 13:53:42 2004 From: barry at python.org (Barry Warsaw) Date: Wed Sep 22 13:53:45 2004 Subject: [spambayes-dev] RE: Delivery Status Notification (Failure) In-Reply-To: References: Message-ID: <1095854021.8013.84.camel@geddy.wooz.org> On Wed, 2004-09-22 at 03:09, Seth Goodman wrote: > > From: Neil Schemenauer > > Sent: Wednesday, September 22, 2004 1:45 AM > > To: spambayes-dev@python.org > > Subject: Re: [spambayes-dev] RE: Delivery Status Notification (Failure) > > > > > > [Seth] > > > I'm still curious why only a few of the list postings get > > > distributed with a VERP address and the rest don't. Any idea on this? > > > > It is a feature of Mailman. It uses VERP so it can detect dead > > addresses but doesn't always use it in order to conserve bandwidth. > > How much bandwidth is saved is open for debate. Mailman has an > > option to enable VERP for every delivery. > > That's a curious feature, and I agree it is debatable if it saves any > significant bandwidth. > > The other odd behavior of this list is that it sets Sender: to the > return-path, even when it uses a VERP address, rather than the address of > the list owner. This makes for a pretty unusual display of originating > address in the MUA when the return-path is a VERP address. Is that also > configurable? > > -- > > Seth Goodman > > _______________________________________________ > spambayes-dev mailing list > spambayes-dev@python.org > http://mail.python.org/mailman/listinfo/spambayes-dev -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20040922/8fb2b4ac/attachment.pgp From barry at python.org Wed Sep 22 14:00:35 2004 From: barry at python.org (Barry Warsaw) Date: Wed Sep 22 14:00:38 2004 Subject: [spambayes-dev] RE: Delivery Status Notification (Failure) In-Reply-To: References: Message-ID: <1095854435.8014.92.camel@geddy.wooz.org> I haven't been paying attention to this thread, so I'm probably missing a lot of context, but... On Wed, 2004-09-22 at 03:09, Seth Goodman wrote: > > [Seth] > > > I'm still curious why only a few of the list postings get > > > distributed with a VERP address and the rest don't. Any idea on this? > > > > It is a feature of Mailman. It uses VERP so it can detect dead > > addresses but doesn't always use it in order to conserve bandwidth. > > How much bandwidth is saved is open for debate. Mailman has an > > option to enable VERP for every delivery. > > That's a curious feature, and I agree it is debatable if it saves any > significant bandwidth. Mailman will use VERP in a number of places, depending on how its configured. Some of the configuration is up to the site admin and some is up to the list admin. In general, Mailman can VERP every N posts, if personalization is turned off. This is used mostly for more accurate list membership pruning. If personalization is turned on, Mailman VERPs every message, since it's crafting a unique copy for every list recipient anyway. For the lists on python.org, we allow list owners to personalize their lists, although it doesn't appear that spambayes-dev is personalized. You can tell by the envelope sender (often copied into an RFC 2822 header by the terminal MTA -- but not always). Also, list owners who turn on personalization usually modify the footers to include links directly to a user's options page, for ease of unsubscription and such. > The other odd behavior of this list is that it sets Sender: to the > return-path, even when it uses a VERP address, rather than the address of > the list owner. This makes for a pretty unusual display of originating > address in the MUA when the return-path is a VERP address. Is that also > configurable? Not without hacking the source. Really, and MUA should never display Sender, although there is a very populate he-who-should-not-be-named MUA that does display this as the From address. Why, I have no clue because it's pretty obvious from the RFCs that this isn't intended to be a displayed header. I'm not sure how much more we should discuss Mailman on this list, but email and I'll try to answer any other questions you might have. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20040922/e283eae0/attachment.pgp From barry at python.org Wed Sep 22 14:01:22 2004 From: barry at python.org (Barry Warsaw) Date: Wed Sep 22 14:01:24 2004 Subject: [spambayes-dev] RE: Delivery Status Notification (Failure) In-Reply-To: <1095854021.8013.84.camel@geddy.wooz.org> References: <1095854021.8013.84.camel@geddy.wooz.org> Message-ID: <1095854482.8014.94.camel@geddy.wooz.org> On Wed, 2004-09-22 at 07:53, Barry Warsaw wrote nothing. Sorry for that empty reply, and this totally content free follow up. ;) -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20040922/07441f6e/attachment.pgp From kennypitt at hotmail.com Wed Sep 22 16:02:46 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Sep 22 16:04:04 2004 Subject: [spambayes-dev] RE: Delivery Status Notification (Failure) In-Reply-To: Message-ID: Tony Meyer wrote: > Hmm...back to the 'just me' scenario. Does anyone else get these? > It's just for any message posted to spambayes@python.org (not this > list). Yep, I get 'em every time I reply to a question on spambayes@python.org. -- Kenny Pitt From kennypitt at hotmail.com Wed Sep 22 16:32:37 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Sep 22 16:33:32 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: Message-ID: Tony Meyer wrote: > [Richie] >> That's why I keep a clean machine in VMWare. > > I'm not familiar with VMWare. Is it the stuff at www.vmware.com? > >> It's not cheap (even when you're paying in UK pounds 8-) but it pays >> for itself in saved time by preventing this kind of thing from >> happening. > > I probably can't justify it, sadly, since SpamBayes is really my only > use case. It's pretty rare that I create anything else that needs to > be used outside of my own systems. I gather that a lot of your non-SpamBayes work is also in Python (wish I had that luxury! ) so you may not have need for an MSDN subscription either. But if you do, the MSDN Universal subscription includes a product called Virtual PC that is very similar to VMWare. They each have their pros and cons, but pretty much the same basic functionality. -- Kenny Pitt From kennypitt at hotmail.com Wed Sep 22 23:06:55 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Sep 22 23:08:04 2004 Subject: [spambayes-dev] RE: [SpamBayes] Thunderbird Extension? In-Reply-To: Message-ID: Gerrans, Matt wrote: > I'm wondering if anyone has started a little SpamBayes Thunderbird > Extension project yet. If so, let me know -- I'd like to help! I was about to refer you to this feature tracker: http://sourceforge.net/tracker/?func=detail&atid=498106&aid=793478&group_id= 61702 However, I see your name on the last comment so I guess you already found it. I've not seen any additional discussion of this here on the developer list beyond what is already in the tracker. If you have interest in working on this, by all means please do! All contributions are welcome. Once you get a decent start on it, you could update the tracker to let people know about your efforts and tell them how to contact you if they want to contribute. We'll be glad to provide you with as much assistance as we can over on the spambayes-dev@python.org list if you have questions about the implementation of SpamBayes. I'm sure you can probably find some Thunderbird discussion lists somewhere, as well, where you can get help with Thunderbird Extension issues. -- Kenny Pitt From richie at entrian.com Thu Sep 23 23:02:34 2004 From: richie at entrian.com (Richie Hindle) Date: Thu Sep 23 23:02:41 2004 Subject: [spambayes-dev] Anti-spam measures for the SpamBayes Wiki Message-ID: Hello all, Because the SpamBayes Wiki has been repeatedly defaced by spammers, it's now necessary to create an account and log in before editing any content. Hopefully the spammers are using automated tools that aren't bright enough to create accounts. I'm sorry for any inconvenience this causes, but it's a quick and painless process. Many thanks to everyone who's been quietly keeping the Wiki spam-free, especially Skip and Seth. The latest version of MoinMoin has some anti-spam measures built-in, and also makes it easier to revert malicious edits. When I get the chance, I'll upgrade to that version and we'll see whether relaxing this new policy is possible (note to ThomasWaldmann if he's listening - thanks for the heads-up!) -- Richie Hindle richie@entrian.com From ta-meyer at ihug.co.nz Tue Sep 28 09:46:28 2004 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Tue Sep 28 09:46:38 2004 Subject: [spambayes-dev] ANNOUNCE: SpamBayes release 1.0 Message-ID: The SpamBayes team is pleased to announce the 1.0 release of SpamBayes. As is now usual, this is both a release of the source code and of an installation program for all Microsoft Windows users. The Windows installation program will install either the Outlook add-in (for Microsoft Outlook users), or the SpamBayes server program (for all other POP3 mail client users, including Microsoft Outlook Express). All Windows users (including existing users of the Outlook add-in) are encouraged to use the installation program. If you wish to use the source-code version, you will also need to install Python - see README.txt in the source tree for more information. This release includes no changes from the successful (but now rather dated) 1.0rc2 release. However, we still highly recommend that existing users upgrade to the final version. Work has already begun towards the first 1.1 release, and we expect to release a (bug fix only) 1.0.1 release around the same time as 1.1a1. September 2004 is Spambayes' 2nd birthday, and (as many users know) we have gone through a very long release process, including 8 alpha releases, a beta, and two release candidates, all tested by a large number of users. As such, we are very confident that this 1.0 release is stable and suitable for regular use. We do welcome any and all contributions for improvements, of course! Get it via the 'Download' page at http://spambayes.org/download.html Enjoy the new release and your spam-free mailbox :-) Thanks to everyone involved in this release, particularly Richie Hindle and Kenny Pitt! Tony. (on behalf of the SpamBayes team) --- What is SpamBayes? --- The SpamBayes project is working on developing a Bayesian (of sorts) anti-spam filter (in Python), initially based on the work of Paul Graham. The major difference between this and other, similar, projects is the emphasis on testing newer approaches to scoring messages. The project includes a number of different applications, all using the same core code, ranging from a plug-in for Microsoft Outlook, to a POP3 proxy, to various command-line tools. From tameyer at ihug.co.nz Tue Sep 28 10:12:05 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Sep 28 10:40:02 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: Message-ID: [As an aside, given the positive response from Richie, Kenny and me, I've gone ahead and done the release. I'm around tomorrow & Thursday if there needs to be a very quick 1.0.1. BTW, SpamBayes' 2nd birthday quietly passed three weeks ago! Here's hoping that the next stable release doesn't take another 2 years ] [VMWare description snipped] Thanks for that. If I had more use for it, then it sounds like it could be quite useful (or something like VirtualPC, as Kenny mentioned, although I'm not sure how well Microsoft is treating it since they acquired it). For the moment, I'll have to make do (unless I can convince work that something like that might be useful ;) > I've tested the new build and it all works fine for me, except for > uploading an Outlook Express mailbox for bulk training - it > only trained on one message out of the mailbox. Has anyone else seen > this? I've done very little with the OE mailbox stuff. I tested it here now and got an interesting result. With the first folder I chose it only trained one message, but when I opened OE to take a look at how many there should have been (I was pretty sure it should be more) I realised that I had chosen an IMAP folder (which it wouldn't have been able to connect to) so maybe an error occurred when trying to train. If I just used my regular inbox, then it trained on all 11 messages that were there. I wonder if maybe when something goes wrong it only trains one and then silently fails? Was the dbx you used odd in any way? > My other problem is probably not our fault (and probably the result of > trying to make by brain work first thing in the morning) but > I set up an Outlook Express filter for To: "spam," and it didn't fire, > even though SpamBayes was correctly prepending "spam," to the To: header. > Am I missing something? The filer was set to work on the Inbox, > and to move messages to a "Possible Spam" folder, but they are staying in > the Inbox. This is something I had never noticed before (I very seldom actually use OE for anything but testing). It seems that although we add "unsure," (etc) to the To list, OE strips the comma off the end. If you look in the preview pane, it has "unsure; ta-meyer@ihug.co.nz" (etc), with no comma. If you change the rule to just look for "spam" it'll work (but I think it will also fire for spambayes@python.org, too). The comma definitely stays if you notate the subject, rather than the To: header. This is a documentation error, then, I guess. The next release will include the fix that lets you change the classification words and still use the notate options, so people can use "fibbledegook-spam" and avoid this. I'm not sure what else we can do. =Tony Meyer From kenny.pitt at gmail.com Tue Sep 28 16:24:08 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Tue Sep 28 16:24:17 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: References: Message-ID: <2a052b990409280724422a43d6@mail.gmail.com> Tony Meyer wrote: > This is something I had never noticed before (I very seldom actually use OE > for anything but testing). It seems that although we add "unsure," (etc) to > the To list, OE strips the comma off the end. If you look in the preview > pane, it has "unsure; ta-meyer@ihug.co.nz" (etc), with no comma. If you > change the rule to just look for "spam" it'll work (but I think it will also > fire for spambayes@python.org, too). I don't use Outlook Express either so I'm just guessing, but from the looks of it Outlook Express is treating the comma as a separator between two e-mail addresses. This would be consistent with the address-list definition in RFC 2822. We may need to consider some changes to the format of the information we add. The spec seems to indicate that the { and } characters are legal in an e-mail address, but I've rarely if ever seen them used. Maybe something like "{spam}original@address" instead of the comma-separator? It would obviously require people to modify their filter rules, but it doesn't appear that rules for the current format would work correctly anyway. -- Kenny Pitt From sethg at GoodmanAssociates.com Tue Sep 28 17:50:48 2004 From: sethg at GoodmanAssociates.com (Seth Goodman) Date: Tue Sep 28 17:50:44 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: <2a052b990409280724422a43d6@mail.gmail.com> Message-ID: > From: Kenny Pitt > Sent: Tuesday, September 28, 2004 9:24 AM <...> > We may need to consider some changes to the format of the information > we add. The spec seems to indicate that the { and } characters are > legal in an e-mail address, but I've rarely if ever seen them used. > Maybe something like "{spam}original@address" instead of the > comma-separator? It would obviously require people to modify their > filter rules, but it doesn't appear that rules for the current format > would work correctly anyway. I don't use OE for anything but newsreading, but it seems that the putting 'spam' or 'unsure' in the To: header, where one expects an address, is pretty odd. As far as legal address characters, there are different rules for the domain part and the local part, and I doubt that MS paid any attention to those, anyway. There are no rules for a string like 'Spam', which contains neither a FQDN nor an "@" and is therefore not a legitimate address. Isn't it possible to modify the Subject: header and write a rule to filter on that? The only limitations there are 7-bit US-ASCII with the exception of the control codes, and 0x7F. Suitable strings for the Subject: header might be '{Spam} ' and '{Unsure} ', as the curly braces would rarely be seen in real mail subjects. -- Seth Goodman From kenny.pitt at gmail.com Tue Sep 28 18:09:03 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Tue Sep 28 18:09:05 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: References: <2a052b990409280724422a43d6@mail.gmail.com> Message-ID: <2a052b99040928090950e95fc2@mail.gmail.com> Seth Goodman wrote: > I don't use OE for anything but newsreading, but it seems that the putting > 'spam' or 'unsure' in the To: header, where one expects an address, is > pretty odd. Well, the option is there and people are using it so I thought we should at least make it work correctly. I never cared for it, either, because it plays havoc with the ability to reply to the message if it was in fact legitimate, and I would have no reservations about simplifying the configuration and eliminating this option. On the other hand, users sometimes get unhappy when a feature that they are used to just disappears, and you don't have to turn it on if you don't need or want it. > Isn't it possible to modify the Subject: header and write a rule to filter > on that? The only limitations there are 7-bit US-ASCII with the exception > of the control codes, and 0x7F. Suitable strings for the Subject: > header might be '{Spam} ' and '{Unsure} ', as the curly braces would rarely > be seen in real mail subjects. There is a notate_subject option that modifies the Subject header instead of the To header, so users have a choice. Currently, the subject is also modified by prepending the classification and a comma ("spam," or "unsure,"). Bracketing the classification in [] or {} is probably the more common approach in other filters, but the end result is the same. -- Kenny Pitt From tameyer at ihug.co.nz Wed Sep 29 06:27:56 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Sep 29 06:30:54 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: Message-ID: > Well, the option is there and people are using it so I > thought we should at least make it work correctly. I > never cared for it, either, because it plays havoc with the > ability to reply to the message if it was in fact legitimate, > and I would have no reservations about simplifying the > configuration and eliminating this option. There is no good solution here, because OE is so limited. IIRC, you can filter on From:, To:, Subject: and the body, so we have to use one of those or the notation is useless. To: and From: are arguably less intrusive than Subject:, so I like that we offer two options. I doubt many people notate ham, so as long as there are no false positives (a reasonable assumption), and few unsure hams (likely for many people I think), then changing the To: doesn't hurt. I wouldn't have a problem with changing it so that it matched whatever the RFC says (maybe classification@spambayes or something like that which is an invalid address, but valid formatting?). =Tony Meyer From tameyer at ihug.co.nz Wed Sep 29 06:32:52 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Sep 29 06:33:02 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: Message-ID: > We may need to consider some changes to the format of the > information we add. The spec seems to indicate that the { and > } characters are legal in an e-mail address, but I've rarely > if ever seen them used. Maybe something like > "{spam}original@address" instead of the comma-separator? I would prefer to keep them separated, so that if you do reply, you just delete an invalid address, rather than have to modify one. (Or if you just send it anyway, at least the message gets through, as well as you getting a bounce). > It would obviously require people to modify their filter rules, This would be ok with 1.1, but not with 1.0.1 (i.e. the 1.0_release_branch branch. > but it doesn't appear that rules for the current format would > work correctly anyway. Well, it works. It doesn't deal well with mail that also has "spam" in the To: header, but then notating the subject doesn't work with mail that has "spam," in the subject, either. It's more a limitation than a bug, IMO. I think that this won't be so important once there's a release that fixes the bug that stops people using a different trio of classification terms. Then people can just select something that they know they'll never see, and the rest of us with decent mailers can just get on with things as normal. (If we were to change anything, I would replace the comma with a space. Then people could change the classification names to "[spam]", "{spam}", "{[(jjuunnkk)]}" or whatever, as they pleased, and it would look relatively neat). =Tony Meyer From sethg at GoodmanAssociates.com Wed Sep 29 08:43:01 2004 From: sethg at GoodmanAssociates.com (Seth Goodman) Date: Wed Sep 29 08:43:05 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: Message-ID: > From: Tony Meyer [mailto:tameyer@ihug.co.nz] > Sent: Tuesday, September 28, 2004 11:28 PM <...> > There is no good solution here, because OE is so limited. IIRC, you can > filter on From:, To:, Subject: and the body, so we have to use > one of those or the notation is useless. To: and From: are arguably less > intrusive than Subject:, so I like that we offer two options. I doubt > many people notate ham, so as long as there are no false positives (a > reasonable assumption), and few unsure hams (likely for many people I > think), then changing the To: doesn't hurt. Looking at OE6, the filter conditions are: From: To: cc: To: or cc: Subject: body priority size attachment secure The filter actions are similarly anemic: delete highlight w/color flag mark as read mark as watched or ignored mark for download > > I wouldn't have a problem with changing it so that it matched whatever the > RFC says (maybe classification@spambayes or something like that > which is an invalid address, but valid formatting?). If you want that, the .invalid TLD is reserved in the RFC's as, well, invalid. I'm not sure exactly what it buys you, but classification@spambayes.invalid is by definition not resolvable. If you created an address book entry for each classification 'address', I think OE would display the name instead of the address. That might be a reasonable workaround for people stuck with OE. -- Seth Goodman From pcm at cisco.com Wed Sep 29 13:28:17 2004 From: pcm at cisco.com (Paul Michali) Date: Wed Sep 29 13:28:23 2004 Subject: [spambayes-dev] Re: [Spambayes] ANNOUNCE: SpamBayes release 1.0 In-Reply-To: References: Message-ID: <415A9C51.4060009@cisco.com> Great tool Tony! Thanks for all the effort on this! For the IMAP version, is there any plans to add a retry mechanism to the server login? I had to hack the version I have so that, if the IMAP login fails, it will just skip and at the next time around, try to login again to read mail. Previously, it would work for a few hours and then, one of the login attempts would fail and spambayes would exit. Also, does V1.0 have the ability to specify the folder to move HAM into? I had also hacked my version so that SPAM is placed in one folder and HAM in another. Then, instead of reading my inbox, which has yet to be processed mail, I just read what is in the HAM folder. Thanks again for such a useful program! PCM @ WORK (Paul Michali) From kenny.pitt at gmail.com Wed Sep 29 16:00:02 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Wed Sep 29 16:00:19 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: References: Message-ID: <2a052b990409290700855bbc7@mail.gmail.com> Tony Meyer wrote: > I would prefer to keep them separated, so that if you do reply, you just > delete an invalid address, rather than have to modify one. (Or if you just > send it anyway, at least the message gets through, as well as you getting a > bounce). That makes sense. I didn't realize that having the classification appear as a separate address was the original intent. > (If we were to change anything, I would replace the comma with a space. > Then people could change the classification names to "[spam]", "{spam}", > "{[(jjuunnkk)]}" or whatever, as they pleased, and it would look relatively > neat). Another possibility would be to provide configuration options for both a prefix and a suffix for the subject modification. This is what POPFile does. You could define the prefix as "[" and the suffix as "] " (that's "]" in case it wraps badly) and your modified subject would then look like "[classification] original subject". -- Kenny Pitt From tameyer at ihug.co.nz Thu Sep 30 04:08:53 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Sep 30 04:09:54 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: Message-ID: > That makes sense. I didn't realize that having the > classification appear as a separate address was the original intent. I'm not sure what the original intent was, but I think this works ok. > Another possibility would be to provide configuration options > for both a prefix and a suffix for the subject modification. > This is what POPFile does. You could define the prefix as "[" > and the suffix as "] " (that's "]" in case it wraps > badly) and your modified subject would then look like > "[classification] original subject". I'm loath to add even more options, particularly for little things like this. What I would like is to get around to finishing up the autoconfigure script, with a nice little wizard (like the Outlook one), so that OE users can use that rather than have to change lots of settings all over the place. The wizard could then default to setting the classification names to "[spam]" etc, and let the user change them if desired. That could all be done with the same 3 options we have now. (I do like the idea of moving the space, or comma, into the classification name, so that we actually just add the classification and no separator for the notate_subject option. For notate_to, the classification name and whatever the RFC correct address separator is would be nice, IMO). =Tony Meyer From tameyer at ihug.co.nz Thu Sep 30 04:17:14 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Sep 30 04:17:22 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: Message-ID: > Looking at OE6, the filter conditions are: > > From: > To: > cc: > To: or cc: > Subject: > body > priority > size > attachment > secure > > > The filter actions are similarly anemic: > > delete > highlight w/color > flag > mark as read > mark as watched or ignored > mark for download And also (with the OE6 I have, at least): Move Copy Forward Reply with message Stop processing more rules Do not download Delete from server Most people will just want Move (and maybe "stop processing"), I suspect. I don't see how we could use: secure, attachment, size, or priority (unless we stole priority for only our own uses), and body is ugly, so that does leave us with subject or addresses. > If you want that, the .invalid TLD is reserved in the > RFC's as, well, invalid. I'm not sure exactly what it > buys you, It would be nice (in this context at least) if OE recognised it and offered (with a "do not ask me again" box) to ignore the address. Sadly, you appear to just get a regular bounce from the mailer (well, my mailer, anyway). > but classification@spambayes.invalid is by definition > not resolvable. I think this would be a reasonable method of notating the To: header. Given that Subject: and To: aren't (so we find) identical already, and treating them identically doesn't make much sense, this seems a reasonable solution. Anyone mind if I change (for 1.1) notate_to to do exactly this? =Tony Meyer From tameyer at ihug.co.nz Thu Sep 30 04:46:43 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Sep 30 04:46:52 2004 Subject: [spambayes-dev] crash from sb_imapfilter.py In-Reply-To: Message-ID: > I received a couple of junk mails that caused my sb_imapfilter.py to > crash. The problem was that the messages contained a space > inside the brackets of a Message-ID (even though that's probably > illegal). This caused the "id" of the message to contain a space, > and this gave difficulty in the IMAP SEARCH command in the Save method. > > I see two possible solutions: > - strip() the message-id before it is assigned to the "id" field, or > - properly quote the search command in the Save method. > > I think it's probably a good idea to do the latter anyway, so > here is a patch. I can check this in if there is agreement. Have you checked this in? (I don't recall seeing it). I was away when you posted the message, and I don't think there's anyone else here that cares about imapfilter <0.5 wink>, so you probably got no response. Anyway +1 from me for checking it in. =Tony Meyer From valkyrie at valleycity.net Thu Sep 30 07:01:01 2004 From: valkyrie at valleycity.net (Valkyrie Publications) Date: Thu Sep 30 05:01:53 2004 Subject: [spambayes-dev] Potential bug? Message-ID: I had the same problem - the reason I was uninstalling it in the first place was that outlook quit working entirely after I installed it. Since the folders and toolbars didn't disappear, I thought "what the heck" and installed it again. I learned something in the process: you MUST restart the computer after installing spambayes and before opening outlook. If you don't, outlook will not work, no matter how many time you restart!!! It would be good to tell people this beforehand - and also let them know beforehand that it works best with equal amounts of spam and ham (before they train it with 1500 pieces of ham and 15 spams, as I did :-p -- the spam is gradually catching up). Now spambayes is working so well on that computer that I am putting it on my other one. Found this forum because I gave up on the mirror sites for spambayes-1.0.zip - none of them had the file - so I looked for it on Google. I am now downloading the .exe version, which takes a while. Thanks! Shirley Starke _____________________________________ Valkyrie Publications www.valkyriepub.com From kenny.pitt at gmail.com Thu Sep 30 16:09:50 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Thu Sep 30 16:09:53 2004 Subject: [spambayes-dev] 1.0 Build Testing (please!) In-Reply-To: References: Message-ID: <2a052b99040930070930337d66@mail.gmail.com> Tony Meyer wrote: > > but classification@spambayes.invalid is by definition > > not resolvable. > > I think this would be a reasonable method of notating the To: header. Given > that Subject: and To: aren't (so we find) identical already, and treating > them identically doesn't make much sense, this seems a reasonable solution. +1 here. -- Kenny Pitt