From prvs=339cb9f11=rmit.test2120 at gmail.com Thu Jun 22 01:07:02 2017 From: prvs=339cb9f11=rmit.test2120 at gmail.com (prvs=339cb9f11=rmit.test2120 at gmail.com) Date: Thu, 22 Jun 2017 01:07:02 -0400 (EDT) Subject: [Spambayes] (no subject) Message-ID: <3wtV1f34gZzFqZM@mail.python.org> >From spambayes-bounces+jm=jmason.org at python.org Mon Nov 25 20:49:29 2002 Return-Path: Delivered-To: yyyy at localhost.spamassassin.taint.org Received: from localhost (jalapeno [127.0.0.1]) by jmason.org (Postfix) with ESMTP id 4C55516F17 for ; Mon, 25 Nov 2002 20:49:29 +0000 (GMT) Received: from jalapeno [127.0.0.1] by localhost with IMAP (fetchmail-5.9.0) for jm at localhost (single-drop); Mon, 25 Nov 2002 20:49:29 +0000 (GMT) Received: from mail.python.org (mail.python.org [12.155.117.29]) by dogma.slashnull.org (8.11.6/8.11.6) with ESMTP id gAPKT7W23898 for ; Mon, 25 Nov 2002 20:29:07 GMT Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.05) id 18GPsJ-0004FX-02 for jm at jmason.org; Mon, 25 Nov 2002 15:30:35 -0500 Received: from manatee.mojam.com ([199.249.165.175]) by mail.python.org with esmtp (Exim 4.05) id 18GPsE-0004AT-00 for spambayes at python.org; Mon, 25 Nov 2002 15:30:30 -0500 Received: from montanaro.dyndns.org (kansas.rd.imagescape.com [66.100.151.131]) by manatee.mojam.com (8.12.1/8.12.1) with ESMTP id gAPKUTM3027054; Mon, 25 Nov 2002 14:30:29 -0600 Received: from montanaro.dyndns.org (localhost [127.0.0.1]) by montanaro.dyndns.org (8.12.2/8.12.2) with ESMTP id gAPKUY2v004608; Mon, 25 Nov 2002 14:30:34 -0600 (CST) Received: by montanaro.dyndns.org (8.12.2/8.12.2/Submit) id gAPKUXGM004605; Mon, 25 Nov 2002 14:30:33 -0600 (CST) From: rmit.test2120 at gmail.com MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <15842.34920.583131.72086 at montanaro.dyndns.org> Date: Mon, 25 Nov 2002 14:30:32 -0600 To: richie at entrian.com To: vijay.anantuni at prov-test.gapps.rmit.edu.au Subject: Re: [Spambayes] Current version In-Reply-To: References: <16E1010E4581B049ABC51D4975CEDB88619957 at UKDCX001.uk.int.atosorigin.com> X-Mailer: VM 7.07 under 21.5 (beta9) "brussels sprouts" XEmacs Lucid Cc: spambayes at python.org X-Beenthere: spambayes at python.org X-Mailman-Version: 2.1b5 Precedence: list Reply-To: skip at pobox.com List-Id: Discussion list for Pythonic Bayesian classifier List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Sender: spambayes-bounces+yyyy=spamassassin.taint.org at python.org Errors-To: spambayes-bounces+yyyy=spamassassin.taint.org at python.org Richie> As I understand it, post-1.8x versions of the core bsddb code Richie> ship under the Sleepycat license, which demands that projects Richie> using it must be published-source. Don't use bsddb in a closed-source product. Use dbm or dumdbm or use pickles or roll your own thang. I doubt the presence of bsddb would be the only barrier to creating a closed-source product based upon the spambayes code. Skip _______________________________________________ Spambayes mailing list Spambayes at python.org http://mail.python.org/mailman/listinfo/spambayes From prvs=3390eebf4=rmit.test3283 at gmail.com Thu Jun 22 01:06:06 2017 From: prvs=3390eebf4=rmit.test3283 at gmail.com (prvs=3390eebf4=rmit.test3283 at gmail.com) Date: Thu, 22 Jun 2017 01:06:06 -0400 (EDT) Subject: [Spambayes] (no subject) Message-ID: <3wtV0Z5rWnzFqZM@mail.python.org> >From quinlan at pathname.com Thu Oct 10 12:29:12 2002 Return-Path: Delivered-To: yyyy at localhost.example.com Received: from localhost (jalapeno [127.0.0.1]) by jmason.org (Postfix) with ESMTP id 4B24416F03 for ; Thu, 10 Oct 2002 12:29:11 +0100 (IST) Received: from jalapeno [127.0.0.1] by localhost with IMAP (fetchmail-5.9.0) for jm at localhost (single-drop); Thu, 10 Oct 2002 12:29:11 +0100 (IST) Received: from proton.pathname.com (adsl-216-103-211-240.dsl.snfc21.pacbell.net [216.103.211.240]) by dogma.slashnull.org (8.11.6/8.11.6) with ESMTP id g9A4kRK08872 for ; Thu, 10 Oct 2002 05:46:27 +0100 Received: from quinlan by proton.pathname.com with local (Exim 3.35 #1 (Debian)) id 17zVDy-0006cM-00; Wed, 09 Oct 2002 21:47:02 -0700 To: yyyy at example.com (Justin Mason) To: vijay.anantuni at prov-test.gapps.rmit.edu.au Cc: SpamAssassin-talk at example.sourceforge.net, SpamAssassin-devel at lists.sourceforge.net, Steve Atkins , ion at aueb.gr, donatespam at archub.org, spambayes at python.org Subject: Re: [SAdev] fully-public corpus of mail available References: <20021009122116.6EB2416F03 at example.com> From: rmit.test3283 at gmail.com Date: 09 Oct 2002 21:47:02 -0700 In-Reply-To: yyyy at example.com's message of "Wed, 09 Oct 2002 13:21:11 +0100" Message-Id: Lines: 40 X-Mailer: Gnus v5.7/Emacs 20.7 X-Spam-Status: No, hits=-118.9 required=5.0 tests=AWL,IN_REP_TO,QUOTED_EMAIL_TEXT,REFERENCES, REPLY_WITH_QUOTES,T_NONSENSE_FROM_00_10, T_NONSENSE_FROM_10_20,T_NONSENSE_FROM_20_30, T_NONSENSE_FROM_30_40,T_NONSENSE_FROM_40_50, T_NONSENSE_FROM_50_60,T_NONSENSE_FROM_60_70, T_NONSENSE_FROM_70_80,T_NONSENSE_FROM_80_90, T_NONSENSE_FROM_90_91,T_NONSENSE_FROM_91_92, T_NONSENSE_FROM_92_93,T_NONSENSE_FROM_93_94, T_NONSENSE_FROM_94_95,T_NONSENSE_FROM_95_96, T_NONSENSE_FROM_96_97,T_NONSENSE_FROM_97_98, T_NONSENSE_FROM_98_99,T_NONSENSE_FROM_99_100, T_QUOTED_EMAIL_TEXT,USER_AGENT_GNUS_XM version=2.50-cvs X-Spam-Level: > (Please feel free to forward this message to other possibly-interested > parties.) Some caveats (in decending order of concern): 1. These messages could end up being falsely (or incorrectly) reported to Razor, DCC, Pyzor, etc. Certain RBLs too. I don't think the results for these distributed tests can be trusted in any way, shape, or form when running over a public corpus. 2. These messages could also be submitted (more than once) to projects like SpamAssassin that rely on filtering results submission for GA tuning and development. 3. Spammers could adopt elements of the good messages to throw off filters. And, of course, there's always progression in technology (by both spammers and non-spammers). The second problem could be alleviated somewhat by adding a Nilsimsa signature (or similar) to the mass-check file (the results format used by SpamAssassin) and giving the message files unique names (MD5 or SHA-1 of each file). The third problem doesn't really worry me. These problems (and perhaps others I have not identified) are unique to spam filtering. Compression corpuses and other performance-related corpuses have their own set of problems, of course. In other words, I don't think there's any replacement for having multiple independent corpuses. Finding better ways to distribute testing and collate results seems like a more viable long-term solution (and I'm glad we're working on exactly that for SpamAssassin). If you're going to seriously work on filter development, building a corpus of 10000-50000 messages (half spam/half non-spam) is not really that much work. If you don't get enough spam, creating multi-technique spamtraps (web, usenet, replying to spam) is pretty easy. And who doesn't get thousands of non-spam every week? ;-) Dan