From sethg at goodmanassociates.com Tue Jan 2 05:03:22 2007 From: sethg at goodmanassociates.com (Seth Goodman) Date: Mon, 1 Jan 2007 22:03:22 -0600 Subject: [spambayes-dev] FAQ 6.5 In-Reply-To: <001601c72c98$ca8feeb0$0201010a@goodgrief> Message-ID: There are more important reasons to not bounce spam than internet congestion. A bounce is a class of automated message called a delivery status notification (DSN). A recipient MTA that accepts a message for delivery must send a DSN to the return-path address if the MTA is unable to make final delivery. Spambayes runs in the MUA only after final message delivery, so you can't say the message wasn't delivered :) For this reason, SMTP makes no provision for an MUA to ever send a DSN. More importantly, there is no reliable bounce address in a message that later turns out to be spam. In fact, we know that the return-path is virtually always forged. Generating a bounce after acceptance will abuse an innocent third party, if it is deliverable at all. Many MTA's persisted for a number of years in promiscuously accepting all messages for their domains and sending DSN's later for undeliverable messages. Operating an MTA this way is called a store-and-forward configuration. Once people started using IP blacklists, spammers quickly realized that they could trick MTA's that were not blacklisted into delivering their spam. They would simply address the spam to an undeliverable address at a domain with a good reputation, let's say bogus at aol.com, and put the real target address into the return-path, say victim at poorslob.com. AOL's MTA accepts the message, since it purports to be for an AOL customer. Then it finds it had no mailbox named 'bogus' and sends a bounce message containing the spam to victim at poorslob.com assuming they were the originator. The MTA at poorslob.com accepts all messages from aol.com, so it accepts and delivers the spam and then blames AOL. So the best answer as to why it is inappropriate to bounce spam is that it turns your MTA into a spam reflector, which will properly get you blacklisted for abuse. -- Seth Goodman From skip at pobox.com Wed Jan 17 04:18:51 2007 From: skip at pobox.com (skip at pobox.com) Date: Tue, 16 Jan 2007 21:18:51 -0600 Subject: [spambayes-dev] We could use some Windows help I think... Message-ID: <17837.38299.391291.73313@montanaro.dyndns.org> It would appear that all of our Windows programming expertise is more-or- less permanently booked these days. Most of the user questions to the spambayes list seem to be related to Outlook or Outlook Express. Those questions tend to get answered most of the time, but getting a new release tested, built and out the door (one is sorely needed I think) is tough because of the Windows barriers. I propose we solicit some new Windows programming help from the broader Python community (http://wiki.python.org/moin/VolunteerOpportunities and/or comp.lang.python). Any thoughts about that? Skip From mhammond at skippinet.com.au Thu Jan 18 03:34:39 2007 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 18 Jan 2007 13:34:39 +1100 Subject: [spambayes-dev] We could use some Windows help I think... In-Reply-To: <17837.38299.391291.73313@montanaro.dyndns.org> Message-ID: <044401c73aa9$34557bd0$180a0a0a@enfoldsystems.local> > It would appear that all of our Windows programming expertise > is more-or- > less permanently booked these days. Most of the user questions to the > spambayes list seem to be related to Outlook or Outlook > Express. Those > questions tend to get answered most of the time, but getting > a new release > tested, built and out the door (one is sorely needed I think) is tough > because of the Windows barriers. > > I propose we solicit some new Windows programming help from > the broader > Python community > (http://wiki.python.org/moin/VolunteerOpportunities and/or > comp.lang.python). Any thoughts about that? In general I think that is a great idea. However, my quick scan of the support issues don't indicate that a new version would actually reduce the number of problems. I thought that most of the new stuff relates to new features, rather than at addressing the common problems people see. I'm happy to help knock up a new release if I'm wrong though. Certainly, any new talent could help to address such issues for a future release. I have been playing a little more with the new OCR code and Outlook. Sadly, I'm not seeing much of a reduction in image spam. My experience is currently that ocrad is doing a poor job of extracting the text in these spams (even with many options tweaked), but that gocr (as used by SpamAssassin) does a much better job. I haven't managed to run the tests with this new code yet though. The absence of any real interest from others on spambayes-dev doesn't help my motiviation levels, so that is yet another good reason to try and get more windows developers on board :) Cheers, Mark From skip at pobox.com Thu Jan 25 20:42:57 2007 From: skip at pobox.com (skip at pobox.com) Date: Thu, 25 Jan 2007 13:42:57 -0600 Subject: [spambayes-dev] Rebuild/reinstall website? Message-ID: <17849.2113.677489.425017@montanaro.dyndns.org> I corrected some SF website url errors and checked in the relevant pages. When I tried to make the website it complained about not finding html.py. scripts/make.rules has this: # docutils 'html.py' script. DUHTML = html.py I tried easy_install docutils but that didn't yield html.py. It looks like rst2html.py is the replacement, so I make that change and ran "make install". I got some errors (files I couldn't upload), but most of it seemed to work okay. If someone else can give it a try that would help boost my confidence that it isn't just me. Skip From skip at pobox.com Sat Jan 27 03:47:11 2007 From: skip at pobox.com (skip at pobox.com) Date: Fri, 26 Jan 2007 20:47:11 -0600 Subject: [spambayes-dev] Any MoinMoin experts here? Message-ID: <17850.48431.913754.336466@montanaro.dyndns.org> Is there anyone here with experience working with the MoinMoin code base? I think using SpamBayes to deflect spam instead of the current BadContent/LocalBadContent approach would be useful. I wrote a couple messages to the moin-users mailing list, but received no responses. (In scanning the archive I don't see my message. Must have disappeared in a black hole.) In case someone's interested, here's what I wrote in my second post: We all know wikis get spammed. I'm not up-to-speed on the latest versions of MoinMoin, but I think the concept used at least through the 1.3 series (the use of BadContent and LocalBadContent pages) is fundamentally flawed since it relies on the users to manually update "bad" words. You're always trying to catch up with the spammers. Instead, let me suggest that you incorporate a SpamBayes-based classifier into MoinMoin. I did this recently for a couple other websites I manage (Mojam and Musi-Cal - not wikis). It worked marvelously there. I now reject 100% of the spam submissions and also catch submission mistakes by good users that I would never have caught before. Here's how I envision it working. Whenever a form submission happens the new page is scored against the current SpamBayes database. If it scores as possible or probable spam, it is automatically reverted back to the last revision that scores as okay, and the full URL for that revision is mailed to all people in AdminGroup. An admin reviews that URL. If it's okay, the URL is added to the HamPages page. If not, it's added to the SpamPages page (both suitably protected for AdminGroup write only and not themselves checked by SpamBayes). Whenever those pages are saved the entire database is retrained from scratch. This should not generally be a problem, as there will probably only be a few pages in the database, so retraining should be quick. It should also be a relatively rare occurrence. If the suspect page was actually ham, after retraining, score it again. It should score as ham now. If so, just revert to it. If not, add it to the HamPages page a second time. I'm not entirely sure how to handle new pages which are spam, but I think you should be able to automatically DeletePage them, then revive them later if they turn out to be good. This all said, I can help from the SpamBayes side of things (write the tokenizer, suggest some synthetic tokens that might help improve the discrimination of ham and spam), but I'm not familiar with the MoinMoin code base, certainly not the latest versions. It's unlikely that I could implement it quickly on that side of things. If someone familiar with MoinMoin's code base would like to team up with me on this, let me know. Together we should be able to knock this off very quickly. Skip From mhammond at skippinet.com.au Mon Jan 29 03:30:49 2007 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon, 29 Jan 2007 13:30:49 +1100 Subject: [spambayes-dev] Rebuild/reinstall website? In-Reply-To: <17849.2113.677489.425017@montanaro.dyndns.org> Message-ID: <05b601c7434d$7e0e2ef0$230a0a0a@enfoldsystems.local> > I tried easy_install docutils but that didn't yield html.py. > It looks like > rst2html.py is the replacement, so I make that change and ran "make > install". I got some errors (files I couldn't upload), but most of it > seemed to work okay. I did find the need for the rst2html change, but as I mentioned in December, I failed to build for all kinds of reasons I didn't dig in to and can't recall the exact errors I had. I'd assumed they were more to do with Windows/cygwin etc, so didn't dig deeper. > If someone else can give it a try that would help boost my confidence > that it isn't just me. I didn't get any responses to that Dec 22 mail asking for a linux-type person to give it a go, so I'd suspect that is *is* just you, but only because you are the only one trying to do it ;) Cheers, Mark