[Spambayes] defaults vs. chi-square

T. Alexander Popiel popiel@wolfskeep.com
Mon, 14 Oct 2002 15:09:02 -0700


In message:  <LNBBLJKPBEHFEDALKOLCKEJEBLAB.tim.one@comcast.net>
             Tim Peters <tim.one@comcast.net> writes:
>[Tim]
>>> An odd thing is that you must have a lot of 'skip:z 70' (etc)
>>> tokens in your ham too, else these spamprobs wouldn't be so small.
>>>  Any idea where they come from?
>
>[T. Alexander Popiel]
>> I'm not sure offhand, either.  I'd have to work to track it down,
>> though... and as mentioned earlier, today is a lazy day.  My best
>> guess is a few base64 bits that didn't get decoded properly.
>
>I cater to lazy:  you had a bunch of them in the very spams you were talking
>about.  What does the source for those look like?  I *used* to get a bunch
>of these before we started stripping uuencoded sections, but that shouldn't
>be happening anymore -- unless the uuencode-finding regexp is missing a
>pattern that's common in your data but not in mine.  Or unless the message
>headers are damaged to such an extent that the email package barfs on them
>(in which case we fall back to the raw body text).

It appears to be a systematic error when a mailing list manager
appends plain text to what should be a base64 encoded segment.
Bad MLM, no biscuit.  This confuses the MIME decoder. Bad MIME
decoder, too!

As a sample:

"""
Return-Path: bounce-debian-java=popiel=wolfskeep.com@lists.debian.org
Delivery-Date: Fri, 23 Aug 2002 02:56:21 -0700
Return-Path: <bounce-debian-java=popiel=wolfskeep.com@lists.debian.org>
Delivered-To: popiel@wolfskeep.com
Received: from murphy.debian.org (murphy.debian.org [65.125.64.134])
	by cashew.wolfskeep.com (Postfix) with SMTP id 0EAFBF58E
	for <popiel@wolfskeep.com>; Fri, 23 Aug 2002 02:56:21 -0700 (PDT)
Received: (qmail 29739 invoked by uid 38); 23 Aug 2002 09:37:09 -0000
X-Envelope-Sender: ybqiwbt@t-online.de
Received: (qmail 29162 invoked from network); 23 Aug 2002 09:36:55 -0000
Received: from adsl-065-081-092-098.sip.gsp.bellsouth.net (HELO xpfoncv) (65.81.92.98)
  by murphy.debian.org with SMTP; 23 Aug 2002 09:36:55 -0000
From: Cagdas Burhansan31 <ybqiwbt@t-online.de>
To: <debian-java@lists.debian.org>
Subject: Arşiv hazır
Date: Fri, 23 Aug 2002 10:33:48 -0400
X-Mailer: Microsoft Outlook Express 5.50.4133.2400
Content-Type: text/plain
Content-Transfer-Encoding: base64
Message-Id: <ydxtiqsklccg@lists.debian.org>
X-Spam-Status: No, hits=0.0 required=4.7 tests= version=2.01
Resent-Message-ID: <X9GjoB.A.nPH.DJgZ9@murphy>
Resent-From: debian-java@lists.debian.org
X-Mailing-List: <debian-java@lists.debian.org> archive/latest/2709
X-Loop: debian-java@lists.debian.org
List-Post: <mailto:debian-java@lists.debian.org>
List-Help: <mailto:debian-java-request@lists.debian.org?subject=help>
List-Subscribe: <mailto:debian-java-request@lists.debian.org?subject=subscribe>
List-Unsubscribe: <mailto:debian-java-request@lists.debian.org?subject=unsubscribe>
Precedence: list
Resent-Sender: debian-java-request@lists.debian.org
Resent-Date: Fri, 23 Aug 2002 02:56:21 -0700 (PDT)

DQpUck1lbG9kaSwgS/1y/WsgbGlua2xpIOdhbP3+bWF5YW4gdmUgYmlydGVrIG1wMyD8IGlu
ZGlyaXJrZW4gYmlsZSBpbnNhbmxhcv0ga2FocmVkZW4gc/Z6ZGUgbXAzIHNpdGVsZXJpbmUg
YWx0ZXJuYXRpZiANCm9sYXJhayBzaXpsZXIgaedpbiD2emVubGUgaGF6/XJsYW5t/f50/XIu
IEhlciB5Yf50YW4gaGVyIGtlc2ltZGVuIG38emlrc2V2ZXJlIGhpdGFwIGVkZWJpbG1layBp
52luIHRhc2FybGFubf3+IDEzIEdCIA0KbP1rIGRldiBNcDMgbGlzdGVzaXlsZSBz/W79Zv1u
ZGEgcmFraXBzaXogb2xhY2FrIP5la2lsZGUgZG9uYXT9bG39/iB2ZSBzaXogbfx6aWtzZXZl
cmxlcmluIGhpem1ldGluZSBzdW51bG11/nR1ci4gDQpodHRwOi8vd3d3LnRybWVsb2RpLmNv
bSBhZHJlc2luZGVraSBkZXYgYXL+aXZpbWl6ZGUgc2l6aSBiZWtsZXllbiBlbiBzZXZkafBp
bml6IHNhbmF05/1sYXL9biBlbiBzZXZkafBpbml6IA0K/mFya/1sYXL9bv0gYmlya2HnIGRh
a2lrYSBp52luZGUgYmlsZ2lzYXlhcv1u/XphIGluZGlyaW4gdmUga2V5aWZsZSBkaW5sZW1l
eWUgYmH+bGF5/W4uIA0KDQrdeWkgRfBsZW5jZWxlci4uIA0KaHR0cDovL3d3dy50cm1lbG9k
aS5jb20NCg0KDQoNCg0K


-- 
To UNSUBSCRIBE, email to debian-java-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
"""

>Whatever the cause, if it's a systematic problem in your data, it will be
>for others too.  It may be unique to Perl programmers, though <wink>.

Nope.  In this case, it's Java programmers. ;-)

- Alex