[spambayes-dev] possible new spam header clue

Seth Goodman nobody at spamcop.net
Mon Jan 5 13:50:53 EST 2004


FWIW, here is something that appeared in the spamtools at list.abuse.net
discussion forum.  It lists a possible new spam header clue that appears to
work and is simple:  a "Content-Type:" header _cannot_ end in a semicolon.
For some reason, it does in some spam mailers.  Follow-up posts in that list
indicated that other people tried the test on their MX's and it caught only
spam.  The only false positive reported was when someone accidentally parsed
a "header" that was included as part of the text of a message and was
neither an actual header nor a MIME sub-part header.  Outlook seems to keep
the main header "Content=Type:" lines intact, but I don't know if Outlook
munges the MIME sub-part headers.  In any case, it would still work with
other mail clients.

Below is the original message describing the trick and a follow-up message
showing the RFC origins of the rule.

--
Seth Goodman

  Humans:   off-list replies to sethg [at] GoodmanAssociates [dot] com

  Spambots: disregard the above

-----Original Message-----
From: owner-spamtools at lists.abuse.net
[mailto:owner-spamtools at lists.abuse.net]On Behalf Of Ronald F. Guilmette
Sent: Wednesday, December 31, 2003 12:34 PM
To: spamtools at abuse.net
Subject: [spamtools] Possible new spam stigmata (subtle MIME syntax
botch)


I use a rather ancient and, some would say, archaic kind of mail client to
read my mail.  It's called `mh', or rather `nmh' (new mh), even though it
really isn't that new.

Anyway, I have noticed for some time now that just before it displays
various spam messages that have been sent to me, such as the spam
messages that is attached at the end of this message, it first prints
the following warning message, which I never really paid any attention
to, up until today:

  mhshow: extraneous trailing ';' in message 107's Content-Type: parameter
list

Anyway, I have noted a distinct connection between this nmh warning message
and spam.

So anyway, I'd like you all to take a look at the spam message attached
below.
Please pay particular attantion to the _second_ Content-Type: header, i.e.
the one that is present within the body of the message.  Note the trailing
semicolon.

OK, so anyway, I went and looked up the standard syntax for Content-Type
header in RFC 1521, and sure enough it indicates that semicolons should
only be used to _separate_ a preceeding hunk of info from a following
name=value pair.  So according to RFC 1521 at least, nmh would appear to
be correct in diagnosing a syntatically improper trailing semicolon in
this case.

I haven't really done any serious investigation of the possible correlation
of this particular MIME syntax faux pas, but as I say, my recollection is
that I have _only_ ever seen the nmh warning message that I mentioned
above in connection with spam, and never in connection with any non-spam
messages.

I just thought that you'd all like to know.


============================================================================
Return-Path: s at msn.com
Delivery-Date: Wed Dec 31 02:03:38 2003
Return-Path: <s at msn.com>
Delivered-To: root at monkeys.com
Received: from 62-249-197-98.adsl.entanet.co.uk
(62-249-197-98.adsl.entanet.co.uk [62.249.197.98])
	by segfault.monkeys.com (Postfix) with SMTP
	id 1590A42000; Wed, 31 Dec 2003 02:03:35 -0800 (PST)
Received: from [48.219.208.201]
	by 62-249-197-98.adsl.entanet.co.uk SMTP id P1E098It9n3lbU;
	Wed, 31 Dec 2003 23:02:05 -0100
Message-ID: <2o$gponz6ky$e-$n$2x$s66i at x86ypmt.1h2>
From: "Forrest Estrada" <s at msn.com>
Reply-To: "Forrest Estrada" <s at msn.com>
To: rfg at monkeys.com
Cc: <ass at monkeys.com>, <formmail at monkeys.com>,
	<webmaster at monkeys.com>, <scopes at monkeys.com>
Subject: save on blue vye-pills-ak-pills-rah y p  knjafplh
Date: Wed, 31 Dec 03 23:02:05 GMT
X-Mailer: MIME-tools 5.503 (Entity 5.501)
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="5EB34CBDCC34897"
X-Priority: 3
X-MSMail-Priority: Normal


--5EB34CBDCC34897
Content-Type: text/html;
Content-Transfer-Encoding: quoted-printable

denmark sisal vivacity<br>

<H4>Guys save money and have fun in the bedroom
with our blue pills.</H4><br>

<a href=3D"http://www.tabletok.com/index.php?pid=3Devaph1543"><H3>This way=
 for ((vy-ak-rah))</H3></a><br>

woodhen reach barbell<br>rjapnbwf
dumqbynnzcqlmgbvx g lbuevaldxb co fpoav
fdw tracg m
qw msdth
g vsdxjxc  v l mznwe p
  wt

--5EB34CBDCC34897--




-----Original Message-----
From: owner-spamtools at lists.abuse.net
[mailto:owner-spamtools at lists.abuse.net]On Behalf Of Clive D.W. Feather
Sent: Friday, January 02, 2004 10:26 AM
To: spamtools at lists.abuse.net
Subject: Re: [spamtools] Possible new spam stigmata (subtle MIME syntax)


Bruce Gingery said:
> Clive D.W. Feather responded:
>> Ronald F. Guilmette said:
>>>   mhshow: extraneous trailing ';' in message 107's Content-Type:
>>>   parameter list
>   IIRC, they're not illegal for standards compliance.

Yes they are. RFC 2045:

    content := "Content-Type" ":" type "/" subtype
                *(";" parameter)
                ; Matching of media type and subtype
                ; is ALWAYS case-insensitive.

     subtype := extension-token / iana-token

     parameter := attribute "=" value

     value := token / quoted-string

Token can't contain a semicolon, and quoted-string requires quotes around
it. So a Content-Type header can't end in a semicolon.

>> One word of caution: I did a scan of my own mail for this pattern, and
>> found one false positive.
>   I see one example in a Spam-L posting body content, following
>   a "---------- Forwarded message ----------" by Alan Brown, on
>   Wed, 19 Feb 2003 10:09:56 -0500.  Note that this was in content
>   contextually (but unmarked) quoted into the posting, and not part
>   of the posted headers.

As was mine.

I've now implemented two separate tests, one for a Content-Type header
ending
in semicolon in the headers, and one for the same in MIME subpart headers
(though it only checks the first level).

89 catches so far today, no false positives.

>   I've been counting semicolons in raw reassembled Content-Type:
>   headers.  I have yet to have a false positive with more than 4

What false positives have you had in *headers*?

>   I have, however, trapped many-semicolons at other than end-of-statement.

Where they're legal.

>   I have also noted "many semicolons" at end-of-line on To: on at least
>   one spam.

Note that we're only looking at Content-Type.

>> Content-Type: multipart/related;
>>        type="multipart/alternative";
>>        boundary="----=_NextPart_[base64data one]"

I wouldn't trap this one with this test.

>> Content-Type:
>>  text/plain[274-semicolons]
>  ^single space

But I would get this.

>> Content-Type:
>>  text/plain[1981 semicolons]

and this.

> and within a spam HTML body from Taiwan
>> span lang="EN-US" style="font-family:&quot;Courier New&quot;;

I think you're missing the point.

--
Clive D.W. Feather  | Work:  <clive at demon.net>   | Tel:    +44 20 8495 6138
Internet Expert     | Home:  <clive at davros.org>  | *** NOTE CHANGE ***
Demon Internet      | WWW: http://www.davros.org | Fax:    +44 870 051 9937
Thus plc            |                            | Mobile: +44 7973 377646




More information about the spambayes-dev mailing list