From skip at pobox.com Fri Jul 6 13:22:57 2007 From: skip at pobox.com (skip at pobox.com) Date: Fri, 6 Jul 2007 06:22:57 -0500 Subject: [spambayes-dev] 1.1b1? Message-ID: <18062.9745.201378.379989@montanaro.dyndns.org> Anyone think it's about time for a beta release? I'll be out of town this weekend, but will try to put one together when I return. Maybe a release schedule like this: 1.1b1 July 11 1.1b2 July 25 (if necessary) 1.1rc1 August 1 1.1final August 8 ? Given that the bulk of the users are Outlook users I think we should try and recruit a few extra people to put it through its paces. In particular, getting some Outlook 2007 and Vista users to use it would be good I think. That means Mark's or Tony's availability to create Outlook installers is key to all these dates. Ideally, I'd like to get the new core_server.py and sb_server.py merged back together, but that will require extracting the POP3 bits from sb_server.py into a plugin for core_server.py then renaming core_server.py to sb_server.py. I'm not sure I'll have the time to do the POP3 plugin, so that will probably have to wait until 1.2 if we are to ever get a 1.1 release out. Skip From dave at boost-consulting.com Fri Jul 6 17:03:59 2007 From: dave at boost-consulting.com (David Abrahams) Date: Fri, 06 Jul 2007 11:03:59 -0400 Subject: [spambayes-dev] spoof detector Message-ID: <87fy411sm8.fsf@grogan.peloton> Something that comes up over and over in spam is a link of the form: http://url/of/some/legit/site Does SpamBayes have a token that represents that information and an option I can set that will use it? -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com From skip at pobox.com Fri Jul 6 17:46:50 2007 From: skip at pobox.com (skip at pobox.com) Date: Fri, 6 Jul 2007 10:46:50 -0500 Subject: [spambayes-dev] spoof detector In-Reply-To: <87fy411sm8.fsf@grogan.peloton> References: <87fy411sm8.fsf@grogan.peloton> Message-ID: <18062.25578.308007.229411@montanaro.dyndns.org> David> Something that comes up over and over in spam is a link of the David> form: David> David> http://url/of/some/legit/site David> David> Does SpamBayes have a token that represents that information and David> an option I can set that will use it? The SpamBayes tokenizer essentially splits the message at word boundaries, so the two urls are considered separately. Their physical and structural proximity is not noted. Synthetic tokens based on hostname or IP address in the urls will be generated if you add x-pick_apart_urls:True to the Tokenizer section of your config file. For completeness here is my current set of tokenizer settings (haven't changed them in a long while): [Tokenizer] record_header_absence:True summarize_email_prefixes:True summarize_email_suffixes:True mine_received_headers:True x-pick_apart_urls:True x-fancy_url_recognition:False x-lookup_ip:True lookup_ip_cache:~/tmp/dnscache.pck x-image_size:True x-crack_images:True x-ocr_engine:gocr max_image_size:100000 crack_image_cache:~/tmp/imagecache.pck Skip From dave at boost-consulting.com Fri Jul 6 21:20:27 2007 From: dave at boost-consulting.com (David Abrahams) Date: Fri, 06 Jul 2007 15:20:27 -0400 Subject: [spambayes-dev] spoof detector In-Reply-To: <18062.25578.308007.229411@montanaro.dyndns.org> (skip@pobox.com's message of "Fri\, 6 Jul 2007 10\:46\:50 -0500") References: <87fy411sm8.fsf@grogan.peloton> <18062.25578.308007.229411@montanaro.dyndns.org> Message-ID: <87644xz6dg.fsf@grogan.peloton> on Fri Jul 06 2007, skip-AT-pobox.com wrote: > David> Something that comes up over and over in spam is a link of the > David> form: > > David> > David> http://url/of/some/legit/site > David> > > David> Does SpamBayes have a token that represents that information and > David> an option I can set that will use it? > > The SpamBayes tokenizer essentially splits the message at word boundaries, > so the two urls are considered separately. Yeah, I know that's the default behavior. > Their physical and structural proximity is not noted. Synthetic > tokens based on hostname or IP address in the urls will be generated > if you add x-pick_apart_urls:True to the Tokenizer section of your > config file. For completeness here is my current set of tokenizer > settings (haven't changed them in a long while): > > [Tokenizer] > record_header_absence:True > summarize_email_prefixes:True > summarize_email_suffixes:True > mine_received_headers:True > x-pick_apart_urls:True > x-fancy_url_recognition:False > x-lookup_ip:True > lookup_ip_cache:~/tmp/dnscache.pck > x-image_size:True > x-crack_images:True > x-ocr_engine:gocr > max_image_size:100000 > crack_image_cache:~/tmp/imagecache.pck That doesn't sound like it's doing what I'm asking about. I want a special token that is generated each time a link's text is just a URL and the link and the URL text don't point to the same place. Messages with this property are always spam and account for a large percentage of my unsures. No matter how much I train on them, they keep falling into unsure, so I thought if Spambayes could actually recognize their distinguishing feature I could easily train it to consider them spam. >From what you say above it looks like pick_apart_urls will generate tokens describing different parts of a given URL, but will do nothing to help capture this particular spammy relationship between enclosed text and actual link. Or did I misunderstand you? -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com From skip at pobox.com Fri Jul 6 22:55:41 2007 From: skip at pobox.com (skip at pobox.com) Date: Fri, 6 Jul 2007 15:55:41 -0500 Subject: [spambayes-dev] spoof detector In-Reply-To: <87644xz6dg.fsf@grogan.peloton> References: <87fy411sm8.fsf@grogan.peloton> <18062.25578.308007.229411@montanaro.dyndns.org> <87644xz6dg.fsf@grogan.peloton> Message-ID: <18062.44109.162303.926899@montanaro.dyndns.org> >> Their physical and structural proximity is not noted. Synthetic >> tokens based on hostname or IP address in the urls will be generated >> if you add x-pick_apart_urls:True to the Tokenizer section of your >> config file. Dave> That doesn't sound like it's doing what I'm asking about. No, it's not, however, you might be surprised how helpful it is to generate tokens for the /8, /16, /24 and /32 address blocks can be. I what I was implying is that maybe you don't need the spoof detection you were asking for if the address tokens generated from the spammer's IP address are spammy. Dave> I want a special token that is generated each time a link's text Dave> is just a URL and the link and the URL text don't point to the Dave> same place. That will require actually parsing the HTML at some level. SpamBayes just sees a stream of tokens. It doesn't really know much (if anything) about compound structure. Dave> Messages with this property are always spam and account for a Dave> large percentage of my unsures. Try these two settings x-pick_apart_urls:True x-lookup_ip:True and see if they help. Dave> From what you say above it looks like pick_apart_urls will Dave> generate tokens describing different parts of a given URL, but Dave> will do nothing to help capture this particular spammy Dave> relationship between enclosed text and actual link. Dave> Or did I misunderstand you? No, I probably misunderstood myself. The IP address hacker is the x-lookup_ip option I believe. They are both helpful though. Skip From dave at boost-consulting.com Sat Jul 7 02:56:03 2007 From: dave at boost-consulting.com (David Abrahams) Date: Fri, 06 Jul 2007 20:56:03 -0400 Subject: [spambayes-dev] spoof detector In-Reply-To: <18062.44109.162303.926899@montanaro.dyndns.org> (skip@pobox.com's message of "Fri\, 6 Jul 2007 15\:55\:41 -0500") References: <87fy411sm8.fsf@grogan.peloton> <18062.25578.308007.229411@montanaro.dyndns.org> <87644xz6dg.fsf@grogan.peloton> <18062.44109.162303.926899@montanaro.dyndns.org> Message-ID: <87odipvxp8.fsf@grogan.peloton> on Fri Jul 06 2007, skip-AT-pobox.com wrote: > Try these two settings > > x-pick_apart_urls:True > x-lookup_ip:True > > and see if they help. Well, they sure make training slow to a crawl! Is there any effective way of cacheing those DNS lookups? -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com From dave at boost-consulting.com Sat Jul 7 04:51:35 2007 From: dave at boost-consulting.com (David Abrahams) Date: Fri, 06 Jul 2007 22:51:35 -0400 Subject: [spambayes-dev] spoof detector References: <87fy411sm8.fsf@grogan.peloton> <18062.25578.308007.229411@montanaro.dyndns.org> <87644xz6dg.fsf@grogan.peloton> <18062.44109.162303.926899@montanaro.dyndns.org> <87odipvxp8.fsf@grogan.peloton> Message-ID: <87k5tdvsco.fsf@grogan.peloton> on Fri Jul 06 2007, David Abrahams wrote: > on Fri Jul 06 2007, skip-AT-pobox.com wrote: > >> Try these two settings >> >> x-pick_apart_urls:True >> x-lookup_ip:True >> >> and see if they help. Oh, and these go in the [Tokenizer] section, right? > Well, they sure make training slow to a crawl! > Is there any effective way of cacheing those DNS lookups? I did eventually find the lookup_ip_cache option, but frankly the results are disappointing. I would have expected one slow round in my train-to-exhaustion regime and then all following rounds to go very quickly, but that doesn't appear to be the case. The first round took 18.5 minutes and it doesn't look like the 2nd round is going to be much faster. Oh, and right now the dnscache file is 414 bytes long and is full of stuff that mostly doesn't look like it has any relevance to dns lookup. I realize I shouldn't expect to be able to read a pickle by eye, but there is one string in there that looks like a domain name so I expect to see the others. Aha! spambayes is relying on atexit to close the cache and write it out to disk, and tte obviously goes many rounds without doing that. Problem is, my ssh connection to the server always drops before training completes, and I'm not sure why (my ssh connections seem to time out). -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com From mhammond at skippinet.com.au Sat Jul 7 08:06:31 2007 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sat, 7 Jul 2007 16:06:31 +1000 Subject: [spambayes-dev] 1.1b1? In-Reply-To: <18062.9745.201378.379989@montanaro.dyndns.org> Message-ID: <023101c7c05c$f7a8fd80$090a0a0a@enfoldsystems.local> > Anyone think it's about time for a beta release? I'll be out > of town this > weekend, but will try to put one together when I return. > Maybe a release > schedule like this: > > 1.1b1 July 11 > 1.1b2 July 25 (if necessary) > 1.1rc1 August 1 > 1.1final August 8 > > ? Given that the bulk of the users are Outlook users I think > we should try > and recruit a few extra people to put it through its paces. > In particular, > getting some Outlook 2007 and Vista users to use it would be > good I think. > That means Mark's or Tony's availability to create Outlook > installers is key > to all these dates. I should be pretty good to go. I'll also announce my most recent binary to the main spambayes list to see if it works for anyone :) Cheers, Mark From dave at boost-consulting.com Sun Jul 8 17:58:58 2007 From: dave at boost-consulting.com (David Abrahams) Date: Sun, 08 Jul 2007 11:58:58 -0400 Subject: [spambayes-dev] spoof detector References: <87fy411sm8.fsf@grogan.peloton> <18062.25578.308007.229411@montanaro.dyndns.org> <87644xz6dg.fsf@grogan.peloton> <18062.44109.162303.926899@montanaro.dyndns.org> <87odipvxp8.fsf@grogan.peloton> <87k5tdvsco.fsf@grogan.peloton> Message-ID: <874pkeubst.fsf@grogan.peloton> on Fri Jul 06 2007, David Abrahams wrote: > on Fri Jul 06 2007, David Abrahams wrote: > >> on Fri Jul 06 2007, skip-AT-pobox.com wrote: >> >>> Try these two settings >>> >>> x-pick_apart_urls:True >>> x-lookup_ip:True >>> >>> and see if they help. > > Oh, and these go in the [Tokenizer] section, right? > >> Well, they sure make training slow to a crawl! >> Is there any effective way of cacheing those DNS lookups? > > I did eventually find the lookup_ip_cache option, but frankly the > results are disappointing. I would have expected one slow round in my > train-to-exhaustion regime and then all following rounds to go very > quickly, but that doesn't appear to be the case. The first round took > 18.5 minutes and it doesn't look like the 2nd round is going to be > much faster. Oh, and right now the dnscache file is 414 bytes long > and is full of stuff that mostly doesn't look like it has any > relevance to dns lookup. I realize I shouldn't expect to be able to > read a pickle by eye, but there is one string in there that looks like > a domain name so I expect to see the others. Well, I eventually got training to finish, but I don't notice any improvement in accuracy. It may even have gotten worse; I've had a few false negatives since enabling those options, and in general I *never* see those. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com From skip at pobox.com Mon Jul 9 14:44:19 2007 From: skip at pobox.com (skip at pobox.com) Date: Mon, 9 Jul 2007 07:44:19 -0500 Subject: [spambayes-dev] spoof detector In-Reply-To: <87odipvxp8.fsf@grogan.peloton> References: <87fy411sm8.fsf@grogan.peloton> <18062.25578.308007.229411@montanaro.dyndns.org> <87644xz6dg.fsf@grogan.peloton> <18062.44109.162303.926899@montanaro.dyndns.org> <87odipvxp8.fsf@grogan.peloton> Message-ID: <18066.11683.699111.180447@montanaro.dyndns.org> >> x-pick_apart_urls:True >> x-lookup_ip:True Dave> Well, they sure make training slow to a crawl! Is there any Dave> effective way of cacheing those DNS lookups? Yes, the first time the IP lookup stuff will be slow. I have this setting as well: lookup_ip_cache:~/tmp/dnscache.pck I don't recall if the dnscache module caches somewhere by default or if it only caches in-memory if you don't specify a cache file. Skip From skip at pobox.com Mon Jul 9 14:50:28 2007 From: skip at pobox.com (skip at pobox.com) Date: Mon, 9 Jul 2007 07:50:28 -0500 Subject: [spambayes-dev] spoof detector In-Reply-To: <87k5tdvsco.fsf@grogan.peloton> References: <87fy411sm8.fsf@grogan.peloton> <18062.25578.308007.229411@montanaro.dyndns.org> <87644xz6dg.fsf@grogan.peloton> <18062.44109.162303.926899@montanaro.dyndns.org> <87odipvxp8.fsf@grogan.peloton> <87k5tdvsco.fsf@grogan.peloton> Message-ID: <18066.12052.850978.795979@montanaro.dyndns.org> Dave> I did eventually find the lookup_ip_cache option, but frankly the Dave> results are disappointing. I would have expected one slow round Dave> in my train-to-exhaustion regime and then all following rounds to Dave> go very quickly, but that doesn't appear to be the case. It flies for me after the initial run. Dave> Aha! spambayes is relying on atexit to close the cache and write Dave> it out to disk, and tte obviously goes many rounds without doing Dave> that. It should still have its cache in memory before exiting. Only the first round should be slow. How many messages are you training on? (I have about 250 messages total at the moment.) Dave> Problem is, my ssh connection to the server always drops before Dave> training completes, and I'm not sure why (my ssh connections seem Dave> to time out). Do you have access to screen(1)? That should help you retain sessions for long periods of time if you're suffering from timeouts. It's in /usr/bin on my Mac. I imagine it's avaialble available on most Linux boxes. Skip From dave at boost-consulting.com Mon Jul 9 15:53:05 2007 From: dave at boost-consulting.com (David Abrahams) Date: Mon, 09 Jul 2007 09:53:05 -0400 Subject: [spambayes-dev] spoof detector In-Reply-To: <18066.12052.850978.795979@montanaro.dyndns.org> (skip@pobox.com's message of "Mon\, 9 Jul 2007 07\:50\:28 -0500") References: <87fy411sm8.fsf@grogan.peloton> <18062.25578.308007.229411@montanaro.dyndns.org> <87644xz6dg.fsf@grogan.peloton> <18062.44109.162303.926899@montanaro.dyndns.org> <87odipvxp8.fsf@grogan.peloton> <87k5tdvsco.fsf@grogan.peloton> <18066.12052.850978.795979@montanaro.dyndns.org> Message-ID: <87myy5k7jy.fsf@grogan.peloton> on Mon Jul 09 2007, skip-AT-pobox.com wrote: > Dave> I did eventually find the lookup_ip_cache option, but frankly the > Dave> results are disappointing. I would have expected one slow round > Dave> in my train-to-exhaustion regime and then all following rounds to > Dave> go very quickly, but that doesn't appear to be the case. > > It flies for me after the initial run. > > Dave> Aha! spambayes is relying on atexit to close the cache and write > Dave> it out to disk, and tte obviously goes many rounds without doing > Dave> that. > > It should still have its cache in memory before exiting. Only the first > round should be slow. How many messages are you training on? (I have about > 250 messages total at the moment.) I have about 850. It's possible that the cache is getting pruned, but I actually upped those numbers so it would be less likely. It's still quite slow. > Dave> Problem is, my ssh connection to the server always drops before > Dave> training completes, and I'm not sure why (my ssh connections seem > Dave> to time out). > > Do you have access to screen(1)? That should help you retain sessions for > long periods of time if you're suffering from timeouts. It's in /usr/bin on > my Mac. I imagine it's avaialble available on most Linux boxes. I have it or can get it. How would I use it in this case? Do I run screen on the local machine or on the server? -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com From skip at pobox.com Mon Jul 9 16:55:25 2007 From: skip at pobox.com (skip at pobox.com) Date: Mon, 9 Jul 2007 09:55:25 -0500 Subject: [spambayes-dev] spoof detector In-Reply-To: <87myy5k7jy.fsf@grogan.peloton> References: <87fy411sm8.fsf@grogan.peloton> <18062.25578.308007.229411@montanaro.dyndns.org> <87644xz6dg.fsf@grogan.peloton> <18062.44109.162303.926899@montanaro.dyndns.org> <87odipvxp8.fsf@grogan.peloton> <87k5tdvsco.fsf@grogan.peloton> <18066.12052.850978.795979@montanaro.dyndns.org> <87myy5k7jy.fsf@grogan.peloton> Message-ID: <18066.19549.215809.694464@montanaro.dyndns.org> >> Do you have access to screen(1)? That should help you retain >> sessions for long periods of time if you're suffering from timeouts. >> It's in /usr/bin on my Mac. I imagine it's avaialble available on >> most Linux boxes. Dave> I have it or can get it. Dave> How would I use it in this case? Do I run screen on the local Dave> machine or on the server? That's a good question. It's been years since I used it (Mac SE connecting to VMS days), but lots of people use it to maintain logins. If you get disconnected your session on the remote machine stays alive. The nex time you run screen you are reconnected to that session. Perfect for flaky communications. Skip From skip at pobox.com Sun Jul 15 16:14:42 2007 From: skip at pobox.com (skip at pobox.com) Date: Sun, 15 Jul 2007 09:14:42 -0500 Subject: [spambayes-dev] CVS to SVN Message-ID: <18074.11218.497051.238456@montanaro.dyndns.org> I'm in the midst of doing the CVS to Subversion conversion though I'm not blessed with a lot of free time today (it's Ellen's birthday). If you have stuff to check in feel free to do so. I'll watch for checkins and start over. Otherwise, I'll probably have it finished sometime tomorrow. I'll send out an email once I'm about to launch into the step that takes me past the point of no return. Skip From skip at pobox.com Sun Jul 15 16:18:27 2007 From: skip at pobox.com (skip at pobox.com) Date: Sun, 15 Jul 2007 09:18:27 -0500 Subject: [spambayes-dev] CVS to SVN Message-ID: <18074.11443.379802.769160@montanaro.dyndns.org> I'll send out an email once I'm about to launch into the step that takes me past the point of no return. I was closer to the point of no return than I thought. I'm beginning the process now (around 9:20AM CDT). Please don't make any modifications to the CVS repository. I'll send out an all clear email. Skip From skip at pobox.com Sun Jul 15 17:44:18 2007 From: skip at pobox.com (skip at pobox.com) Date: Sun, 15 Jul 2007 10:44:18 -0500 Subject: [spambayes-dev] CVS to Subversion - complete I think Message-ID: <18074.16594.941758.730872@montanaro.dyndns.org> I believe the CVS-to-Subversion migration is complete. I've not yet been able to do a checkout that allows me to commit. I think I'm just missing some flags on my svn co command. My laptop login is "skip" while my SF login is "montanaro". I checked it out using svn co --username=montanaro https://montanaro at spambayes.svn.sourceforge.net/svnroot/spambayes/trunk spambayes When I try to check in a change using svn commit --username=montanaro -m 'first commit - note version control change' I get this output: Authentication realm: SourceForge Subversion area Password for 'montanaro': Authentication realm: SourceForge Subversion area Username: svn: Commit failed (details follow): svn: MKACTIVITY of '/svnroot/spambayes/!svn/act/7c13b838-9cac-4e6d-9c05-0528431381a2': authorization failed (https://spambayes.svn.sourceforge.net) Any thoughts on what I'm missing? Thx, Skip From mhammond at skippinet.com.au Mon Jul 16 08:13:36 2007 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon, 16 Jul 2007 16:13:36 +1000 Subject: [spambayes-dev] CVS to Subversion - complete I think In-Reply-To: <18074.16594.941758.730872@montanaro.dyndns.org> Message-ID: <08d301c7c770$7272f6f0$090a0a0a@enfoldsystems.local> > I believe the CVS-to-Subversion migration is complete. I've > not yet been > able to do a checkout that allows me to commit. I think I'm > just missing > some flags on my svn co command. My laptop login is "skip" > while my SF > login is "montanaro". > > I checked it out using > > svn co --username=montanaro > https://montanaro at spambayes.svn.sourceforge.net/svnroot/spamba > yes/trunk spambayes > > When I try to check in a change using > > svn commit --username=montanaro -m 'first commit - note > version control change' > > I get this output: > > Authentication realm: > SourceForge > Subversion area > Password for 'montanaro': > Authentication realm: That last step worked for me after I typed my SF password. I'm not sure why my ssh2 key isn't being used here - it worked ok for CVS. I've got an SSH2 key in my "account preferences" page, so I'm not sure why that is failing. I'm guessing that your problem is your ssh agent expecting to be doing authentication, but instead getting confused by the interactive prompts? Mark From skip at pobox.com Mon Jul 16 13:27:40 2007 From: skip at pobox.com (skip at pobox.com) Date: Mon, 16 Jul 2007 06:27:40 -0500 Subject: [spambayes-dev] CVS to Subversion - complete I think In-Reply-To: <08d301c7c770$7272f6f0$090a0a0a@enfoldsystems.local> References: <18074.16594.941758.730872@montanaro.dyndns.org> <08d301c7c770$7272f6f0$090a0a0a@enfoldsystems.local> Message-ID: <18075.22060.322492.138133@montanaro.dyndns.org> Mark> That last step worked for me after I typed my SF password. Yes, for me as well. I opened a ticket on SF. Maybe they can tell me what's going on. Skip From dave at boost-consulting.com Mon Jul 16 14:08:29 2007 From: dave at boost-consulting.com (David Abrahams) Date: Mon, 16 Jul 2007 08:08:29 -0400 Subject: [spambayes-dev] CVS to Subversion - complete I think References: <18074.16594.941758.730872@montanaro.dyndns.org> <08d301c7c770$7272f6f0$090a0a0a@enfoldsystems.local> Message-ID: <87lkdgeeki.fsf@grogan.peloton> on Mon Jul 16 2007, "Mark Hammond" wrote: >> I believe the CVS-to-Subversion migration is complete. I've >> not yet been >> able to do a checkout that allows me to commit. I think I'm >> just missing >> some flags on my svn co command. My laptop login is "skip" >> while my SF >> login is "montanaro". >> >> I checked it out using >> >> svn co --username=montanaro >> https://montanaro at spambayes.svn.sourceforge.net/svnroot/spamba >> yes/trunk spambayes >> >> When I try to check in a change using >> >> svn commit --username=montanaro -m 'first commit - note >> version control change' >> >> I get this output: >> >> Authentication realm: >> SourceForge >> Subversion area >> Password for 'montanaro': >> Authentication realm: > > That last step worked for me after I typed my SF password. I'm not sure why > my ssh2 key isn't being used here - it worked ok for CVS. I've got an SSH2 > key in my "account preferences" page, so I'm not sure why that is > failing. Because you're not connecting using ssh; you're using https. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com From skip at pobox.com Mon Jul 16 15:06:23 2007 From: skip at pobox.com (skip at pobox.com) Date: Mon, 16 Jul 2007 08:06:23 -0500 Subject: [spambayes-dev] CVS to Subversion - complete I think In-Reply-To: <87lkdgeeki.fsf@grogan.peloton> References: <18074.16594.941758.730872@montanaro.dyndns.org> <08d301c7c770$7272f6f0$090a0a0a@enfoldsystems.local> <87lkdgeeki.fsf@grogan.peloton> Message-ID: <18075.27983.75044.354024@montanaro.dyndns.org> Dave> Because you're not connecting using ssh; you're using https. Any idea how I can get this to work without having to give my SF password on every commit? Skip From dave at boost-consulting.com Mon Jul 16 16:00:24 2007 From: dave at boost-consulting.com (David Abrahams) Date: Mon, 16 Jul 2007 10:00:24 -0400 Subject: [spambayes-dev] CVS to Subversion - complete I think In-Reply-To: <18075.27983.75044.354024@montanaro.dyndns.org> (skip@pobox.com's message of "Mon\, 16 Jul 2007 08\:06\:23 -0500") References: <18074.16594.941758.730872@montanaro.dyndns.org> <08d301c7c770$7272f6f0$090a0a0a@enfoldsystems.local> <87lkdgeeki.fsf@grogan.peloton> <18075.27983.75044.354024@montanaro.dyndns.org> Message-ID: <87abtwe9dz.fsf@grogan.peloton> on Mon Jul 16 2007, skip-AT-pobox.com wrote: > Dave> Because you're not connecting using ssh; you're using https. > > Any idea how I can get this to work without having to give my SF password on > every commit? SVN normally records your password in a file in ~/.subversion/auth/ and doesn't ask you again until you delete it. It's possible to disable that feature; if you've done that, then I don't know. I suppose if you were using the ssh:// protocol instead of https://, it would use your ssh keys to avoid the password dance. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com From skip at pobox.com Mon Jul 16 16:26:27 2007 From: skip at pobox.com (skip at pobox.com) Date: Mon, 16 Jul 2007 09:26:27 -0500 Subject: [spambayes-dev] CVS to Subversion - complete I think In-Reply-To: <87abtwe9dz.fsf@grogan.peloton> References: <18074.16594.941758.730872@montanaro.dyndns.org> <08d301c7c770$7272f6f0$090a0a0a@enfoldsystems.local> <87lkdgeeki.fsf@grogan.peloton> <18075.27983.75044.354024@montanaro.dyndns.org> <87abtwe9dz.fsf@grogan.peloton> Message-ID: <18075.32787.588668.91938@montanaro.dyndns.org> Dave> SVN normally records your password in a file in Dave> ~/.subversion/auth/ and doesn't ask you again until you delete it. Dave> It's possible to disable that feature; if you've done that, then I Dave> don't know. I haven't disabled anything that I'm aware of. I'll dig into this a little more though. Dave> I suppose if you were using the ssh:// protocol instead of Dave> https://, it would use your ssh keys to avoid the password dance. Alas, it appears SF doesn't support svn+ssh://. Thanks, Skip From mhammond at skippinet.com.au Tue Jul 17 02:02:33 2007 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue, 17 Jul 2007 10:02:33 +1000 Subject: [spambayes-dev] CVS to Subversion - complete I think In-Reply-To: <18074.16594.941758.730872@montanaro.dyndns.org> Message-ID: <096901c7c805$c6fb7b30$090a0a0a@enfoldsystems.local> I made one checkin yesterday, but i'm yet to see the checkin mail that should have been generated - it also fails to appear on the list archives. The checkin I made is against https://mhammond at spambayes.svn.sourceforge.net/svnroot/spambayes/trunk/spamb ayes/windows/spambayes.iss ------------------------------------------------------------------------ r3150 | mhammond | 2007-07-16 15:58:53 +1000 (Mon, 16 Jul 2007) | 2 lines Prepare for forthcoming 1.1a5 release (and test SVN works :) But it seems I *have* seen some of Skip's post-conversion checkins. I'm afraid I've no idea where to start looking for this... Mark From skip at pobox.com Tue Jul 17 04:20:00 2007 From: skip at pobox.com (skip at pobox.com) Date: Mon, 16 Jul 2007 21:20:00 -0500 Subject: [spambayes-dev] CVS to Subversion - complete I think In-Reply-To: <096901c7c805$c6fb7b30$090a0a0a@enfoldsystems.local> References: <18074.16594.941758.730872@montanaro.dyndns.org> <096901c7c805$c6fb7b30$090a0a0a@enfoldsystems.local> Message-ID: <18076.10064.683257.500448@montanaro.dyndns.org> Mark> I made one checkin yesterday, but i'm yet to see the checkin mail Mark> that should have been generated - it also fails to appear on the Mark> list archives. I forgot to do that yesterday. I believe I have this set up now. Thanks for the reminder. I just checked in r3152 and r3153. You should have email shortly. Skip From skip at pobox.com Tue Jul 24 02:11:33 2007 From: skip at pobox.com (skip at pobox.com) Date: Mon, 23 Jul 2007 19:11:33 -0500 Subject: [spambayes-dev] Bye bye, CVS Message-ID: <18085.17333.937589.846630@montanaro.dyndns.org> I'm in the midst of disabling developer (read/write) access to the CVS repository for all of us SpamBayes developers. Anonymous read-only access will still be available should you need to diff something in your sandbox. Skip From skip at pobox.com Tue Jul 24 03:47:36 2007 From: skip at pobox.com (skip at pobox.com) Date: Mon, 23 Jul 2007 20:47:36 -0500 Subject: [spambayes-dev] Should we cull the list of SpamBayes developers? Message-ID: <18085.23096.668253.367825@montanaro.dyndns.org> As I was running through the developer list this evening disabling CVS access it occurred to me that we probably have a number of developers who no are no longer interested in that role. Here's the current list: anadelonbrin Tony Meyer anthonybaxter Anthony Baxter bkc Brad Clements bwarsaw Barry A. Warsaw fmmr Fredrik Rodland gvanrossum Guido van Rossum gward Greg Ward hooft Rob W.W. Hooft htrd Toby Dickenson jhylton Jeremy Hylton jvr Just van Rossum kpitt Kenny Pitt lemburg M.-A. Lemburg limbus Manuel Dejonghe mhammond Mark Hammond montanaro Skip Montanaro nascheme Neil Schemenauer npickett Neale Pickett popiel T. Alexander Popiel rhettinger Raymond Hettinger richiehindle Richie Hindle seantrue Sean True sjoerd Sjoerd Mullender tim_one Tim Peters timstone4 Tim Stone xenogeist Adam Walker From skip at pobox.com Tue Jul 24 03:56:31 2007 From: skip at pobox.com (skip at pobox.com) Date: Mon, 23 Jul 2007 20:56:31 -0500 Subject: [spambayes-dev] Should we cull the list of SpamBayes developers? Message-ID: <18085.23631.90032.306587@montanaro.dyndns.org> (crap... let's try that again) As I was running through the developer list this evening disabling CVS access it occurred to me that we probably have a number of developers who are no longer interested maintaining in that role. Then I got a mail just a bit ago asking if one of the formerly active developers is still active. Got me to thinking... Here's the current list: anadelonbrin Tony Meyer anthonybaxter Anthony Baxter bkc Brad Clements bwarsaw Barry A. Warsaw fmmr Fredrik Rodland gvanrossum Guido van Rossum gward Greg Ward hooft Rob W.W. Hooft htrd Toby Dickenson jhylton Jeremy Hylton jvr Just van Rossum kpitt Kenny Pitt lemburg M.-A. Lemburg limbus Manuel Dejonghe mhammond Mark Hammond montanaro Skip Montanaro nascheme Neil Schemenauer npickett Neale Pickett popiel T. Alexander Popiel rhettinger Raymond Hettinger richiehindle Richie Hindle seantrue Sean True sjoerd Sjoerd Mullender tim_one Tim Peters timstone4 Tim Stone xenogeist Adam Walker I'm cc'ing everyone's SF email address as well as spambayes-dev, as some of you are probably no longer subscribed to that mailing list. Let me know if you'd like to be removed from the SpamBayes developer list. If you don't reply to this note I'll assume you want to remain as a developer. If mail sent to your SF email address bounces I'll try other means to get ahold of you. If I'm unsuccessful, I'll remove you. That's the only reason I would remove you without getting an actual response from you. There's no pressure here, BTW. I have been thinking for awhile that we probably should put out a request looking for people interested in moving SpamBayes forward though. Knowing where we stand in the active developers department would be useful in deciding how much new blood to try and attract. Thx, Skip Montanaro From richie at entrian.com Tue Jul 24 09:19:53 2007 From: richie at entrian.com (Richie Hindle) Date: Tue, 24 Jul 2007 08:19:53 +0100 Subject: [spambayes-dev] Should we cull the list of SpamBayes developers? In-Reply-To: <18085.23631.90032.306587@montanaro.dyndns.org> References: <18085.23631.90032.306587@montanaro.dyndns.org> Message-ID: Hi Skip, > Let me know if > you'd like to be removed from the SpamBayes developer list. [ Richie crawls out of the woodwork 8-) ] I'd like to remain on the list, please. I'd *like* to get back to being more of an active developer, but family and work commitments mean I can't at the moment. One day. -- Richie Hindle richie at entrian.com From suporte at rotamil.com Tue Jul 24 23:37:37 2007 From: suporte at rotamil.com (suporte rotamil) Date: Tue, 24 Jul 2007 21:37:37 GMT Subject: [spambayes-dev] PNID Message-ID: <20070724215016.89A091E400E@bag.python.org> An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20070724/da079dfa/attachment.html From sjoerd at acm.org Wed Jul 25 11:01:37 2007 From: sjoerd at acm.org (Sjoerd Mullender) Date: Wed, 25 Jul 2007 11:01:37 +0200 Subject: [spambayes-dev] spambayes CVS repository problem Message-ID: <46A71171.4030802@acm.org> After the change to SVN, I would like to copy my private changes over to the new checkout. However, I can't easily since cvs diff gives me cvs diff: failed to create lock directory for `/cvsroot/spambayes/spambayes' (/cvsroot/spambayes/spambayes/#cvs.lock): Permission denied cvs diff: failed to obtain dir lock in repository `/cvsroot/spambayes/spambayes' cvs [diff aborted]: read lock failed - giving up Presumably this is an attempt to prevent people from accidentally committing to the wrong repository, but it makes it difficult to read the repository. Maybe a better way than making the directories read-only would be to add a commitinfo entry. Something like ALL /bin/false might do the job. -- Sjoerd Mullender -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 370 bytes Desc: OpenPGP digital signature Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20070725/ce8ab6cc/attachment.pgp From skip at pobox.com Wed Jul 25 18:56:06 2007 From: skip at pobox.com (skip at pobox.com) Date: Wed, 25 Jul 2007 11:56:06 -0500 Subject: [spambayes-dev] spambayes CVS repository problem In-Reply-To: <46A71171.4030802@acm.org> References: <46A71171.4030802@acm.org> Message-ID: <18087.32934.863120.315455@montanaro.dyndns.org> Sjoerd> After the change to SVN, I would like to copy my private changes Sjoerd> over to the new checkout. However, I can't easily since cvs Sjoerd> diff gives me Sjoerd> cvs diff: failed to create lock directory ... Sjoerd> Presumably this is an attempt to prevent people from Sjoerd> accidentally committing to the wrong repository, but it makes it Sjoerd> difficult to read the repository. Maybe a better way than Sjoerd> making the directories read-only would be to add a commitinfo Sjoerd> entry. Something like Sjoerd> ALL /bin/false Sjoerd> might do the job. I don't know anything about commitinfo and would really prefer to not mess with CVS any more than absolutely necessary. I've reenabled you for CVS write access. You could also do a checkout and just locally diff the two trees. Skip From kenny.pitt at gmail.com Wed Jul 25 20:52:57 2007 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Wed, 25 Jul 2007 14:52:57 -0400 Subject: [spambayes-dev] Should we cull the list of SpamBayes developers? In-Reply-To: References: <18085.23631.90032.306587@montanaro.dyndns.org> Message-ID: <2a052b990707251152r70db6a77wa4bf5660664f88dc@mail.gmail.com> On 7/24/07, Richie Hindle wrote: > I'd like to remain on the list, please. I'd *like* to get back to being > more of an active developer, but family and work commitments mean I can't at > the moment. One day. Same for me, just haven't figured out when "one day" will be yet. Every time I start thinking it's "soon", something else hits the fan. -- Kenny Pitt From sjoerd at acm.org Wed Jul 25 22:35:18 2007 From: sjoerd at acm.org (Sjoerd Mullender) Date: Wed, 25 Jul 2007 22:35:18 +0200 Subject: [spambayes-dev] spambayes CVS repository problem In-Reply-To: <18087.32934.863120.315455@montanaro.dyndns.org> References: <46A71171.4030802@acm.org> <18087.32934.863120.315455@montanaro.dyndns.org> Message-ID: <46A7B406.1090307@acm.org> On 07/25/2007 06:56 PM, skip at pobox.com wrote: > Sjoerd> After the change to SVN, I would like to copy my private changes > Sjoerd> over to the new checkout. However, I can't easily since cvs > Sjoerd> diff gives me > > Sjoerd> cvs diff: failed to create lock directory ... > > Sjoerd> Presumably this is an attempt to prevent people from > Sjoerd> accidentally committing to the wrong repository, but it makes it > Sjoerd> difficult to read the repository. Maybe a better way than > Sjoerd> making the directories read-only would be to add a commitinfo > Sjoerd> entry. Something like > > Sjoerd> ALL /bin/false > > Sjoerd> might do the job. > > I don't know anything about commitinfo and would really prefer to not mess > with CVS any more than absolutely necessary. I've reenabled you for CVS > write access. You could also do a checkout and just locally diff the two > trees. I've been able to do my cvs diff, so now I don't need the CVS access anymore. Thanks. -- Sjoerd Mullender From skip at pobox.com Thu Jul 26 03:58:33 2007 From: skip at pobox.com (skip at pobox.com) Date: Wed, 25 Jul 2007 20:58:33 -0500 Subject: [spambayes-dev] spambayes CVS repository problem In-Reply-To: <46A7B406.1090307@acm.org> References: <46A71171.4030802@acm.org> <18087.32934.863120.315455@montanaro.dyndns.org> <46A7B406.1090307@acm.org> Message-ID: <18087.65481.678515.767415@montanaro.dyndns.org> Sjoerd> I've been able to do my cvs diff, so now I don't need the CVS Sjoerd> access anymore. Got it. You've once again been reduced to read-only CVS status like the rest of us. ;-) Skip From chungshan.machworks.ltd at hotmail.com Thu Jul 26 11:45:44 2007 From: chungshan.machworks.ltd at hotmail.com (CHUNG SHAN MACH.WORKS) Date: Thu, 26 Jul 2007 11:45:44 +0200 (CEST) Subject: [spambayes-dev] Be Our Agent Message-ID: <20070726094544.8356E1E9ED4@mail.awebs.at> CHUNG SHAN MACHINERY WORKS No.49-16, CHI TZU TOU, KUAN SHIN VILLAGE, SHUI SHANG HSIANG, CHIA YI HSIEN 608, TAIWAN R.O.C. Email: chungshan.machworks.ltd at hotmail.com Website: www.chung-shan.com.tw/eng/index.htm Tel :886-5-236-2685, Fax:886-2-6602-1090 Dear Sir/Ma, Since the beginning of our company's establishment, Chung Shan Machinery Works Co., Ltd. has been an industry leader with its wide range of superior quality form-fill-Seal Packaging Machinery. We have put forward a variety of innovative ideas for the advancement of the food industry and have striven to bring our ideas to come true. We have reached big sales volume of Packaging Machinery products in the United Kingdom and European market and now trying to penetrate the more into the UK/USA and Canada. Quiet soon we shall open representative offices or authorized sales centers in the United State of America/Canada and therefore we are currently looking for people who will assist us in establishing a new distribution network here and there. The fact is that despite that United State of America/Canada market is new for us we already have regular clients also speaks for itself. WHAT YOU NEED TO DO FOR US: The international money transfer tax for legal entities (companies) in Taiwan is 25%, whereas for the individual it is only 7%.There is no sense for us to work this way, while tax for international money transfer made by a private individual is 7%. That's why we need you! We need an agent to receive payment for our Packaging Machinery products inform of bank wire transfers and to resend the money to us via our bank account in Taiwan while the tax shall be 7% instead of 25% which will absolutely favour our company. JOB DESCRIPTION 1. Receive payment from Clients either by check or wire transfer 2. Cash Payments at your Bank 3. Deduct 10% which will be your commission on each payment processed. 4. Forward balance after deduction of 10% commission to offices which shall be provided by us as soon as the fund becomes available. HOW MUCH WILL YOU EARN: 10% from each operation! For instance: you receive $5000 via checks or wire transfer on our behalf. You will cash the money and keep $500 (10% from $5000) for yourself! At the beginning your commission will equal 10%, though later it will increase up to 12%! ADVANTAGES: You do not have to go out as you will work as an independent contractor right from your home office. Your job is absolutely legal. You can earn up to $3000-$4000 monthly depending on time will spend on this job. You do not need any capital to start. You can do the Work easily without leaving or affecting your present Job. Employee who make more efforts and work hard has strong possibility of becoming manager. Anyway our employee never leave us due to our excellent working condition. MAIN REQUIREMENTS: 18 years or older legally capable responsible ready to work 3-4 hours per week.with PC knowledge e-mail and internet experience (minimal). And please be mindful that everything is absolutely legal. Kindly contact us on the above address and you can also view our website to enable you know more about our company. Best Regards, KUAN SHANG(MR) From dave at boost-consulting.com Thu Jul 26 17:25:34 2007 From: dave at boost-consulting.com (David Abrahams) Date: Thu, 26 Jul 2007 11:25:34 -0400 Subject: [spambayes-dev] Help: bsddb._db.DBAccessError: (13, 'Permission denied -- put: attempt to modify a read-only tree') in This bug appears to be back in some form: http://sourceforge.net/tracker/index.php?func=detail&aid=1387699&group_id=61702&atid=498103 I was unable to reopen the tracker item. Can someone help me to at least debug this problem? When exceptions are swallowed like that without a backtrace, I don't know where to start. Thanks, -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com From mhammond at skippinet.com.au Fri Jul 27 02:46:02 2007 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri, 27 Jul 2007 10:46:02 +1000 Subject: [spambayes-dev] Help: bsddb._db.DBAccessError: (13, 'Permission denied -- put: attempt to modify a read-only tree') in Message-ID: <10d001c7cfe7$82710b30$090a0a0a@enfoldsystems.local> > This bug appears to be back in some form: > http://sourceforge.net/tracker/index.php?func=detail&aid=13876 > 99&group_id=61702&atid=498103 Its hard to know if that fix was applied, as there is no attachement with the patch. Hopefully Tony will notice this thread and pipe up. > I was unable to reopen the tracker item. Can someone help me to at > least debug this problem? When exceptions are swallowed like that > without a backtrace, I don't know where to start. I'm guessing the vanishing backtrace is due to it being called by __del__. It *looks* like it might be __del__ in sb_filter.py. However, it looks a little like your exception is slightly different than the backtrace in that bug - the bug refers to an error adding tokens to the DB after training, while your exception sounds more like an exception as the DB is closed. I wish I could be more help... Mark From dave at boost-consulting.com Mon Jul 30 03:43:14 2007 From: dave at boost-consulting.com (David Abrahams) Date: Sun, 29 Jul 2007 18:43:14 -0700 Subject: [spambayes-dev] Help: bsddb._db.DBAccessError: (13, 'Permission denied -- put: attempt to modify a read-only tree') in <10d001c7cfe7$82710b30$090a0a0a@enfoldsystems.local> Message-ID: <87odhu7jkd.fsf@grogan.peloton> on Thu Jul 26 2007, "Mark Hammond" wrote: >> This bug appears to be back in some form: >> http://sourceforge.net/tracker/index.php?func=detail&aid=13876 >> 99&group_id=61702&atid=498103 > > Its hard to know if that fix was applied, as there is no attachement with > the patch. Hopefully Tony will notice this thread and pipe up. > >> I was unable to reopen the tracker item. Can someone help me to at >> least debug this problem? When exceptions are swallowed like that >> without a backtrace, I don't know where to start. > > I'm guessing the vanishing backtrace is due to it being called by __del__. > It *looks* like it might be __del__ in sb_filter.py. However, it looks a > little like your exception is slightly different than the backtrace in that > bug - the bug refers to an error adding tokens to the DB after training, > while your exception sounds more like an exception as the DB is closed. Yes, after hacking __del__ to dump a backtrace, I see: Traceback (most recent call last): File "/usr/local/bin/sb_filter.py", line 185, in __del__ self.close() File "/usr/local/bin/sb_filter.py", line 180, in close self.h.close() File "/usr/local/lib/python2.4/site-packages/spambayes/hammie.py", line 269, in close self.store() File "/usr/local/lib/python2.4/site-packages/spambayes/hammie.py", line 266, in store self.bayes.store() File "/usr/local/lib/python2.4/site-packages/spambayes/storage.py", line 266, in store self._write_state_key() File "/usr/local/lib/python2.4/site-packages/spambayes/storage.py", line 270, in _write_state_key self.db[self.statekey] = (classifier.PICKLE_VERSION, File "/usr/local/lib/python2.4/shelve.py", line 130, in __setitem__ self.dict[key] = f.getvalue() File "/usr/local/lib/python2.4/site-packages/bsddb3/__init__.py", line 218, in __setitem__ self.db[key] = value DBAccessError: (13, 'Permission denied -- put: attempt to modify a read-only tree') My analysis is as follows: HammieFilter.close (in sb_filter.py) jumps through all sorts of hoops to remember the mode in which the DB was opened, and avoid calling store() if it was only opened for read. def close(self): if self.h is not None: if self.mode != 'r': self.h.store() self.h.close() self.h = None So it sorta looks like Tony's patch was applied. However, Hammie.close (in hammie.py) just barrels ahead and calls store() unconditionally... I'm not sure what the right fix here would be. Keep HammieFilter from calling Hammie.close() when the DB was not opened for write? Sink the close/store/mode-checking logic from HammieFilter into Hammie itself? Something else? -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com