[Mailman-Users] A scrubber issue

Sat Dec 9 20:40:12 CET 2006

Todd Zullinger wrote:
>
>Related to the second part of Werner's message being scrubbed with the
>message:
>
>    An embedded and charset-unspecified text was scrubbed...
>
>Poking in the email package (on python 2.4.4) shows:
>
>    def get_content_charset(self, failobj=None):
>        """Return the charset parameter of the Content-Type header.
>
>        The returned string is always coerced to lower case.  If there is no
>        Content-Type header, or if that header has no charset parameter,
>        failobj is returned.
>        """
>
>This seems to violate section 5.2 of RFC 2045 which says parts lacking
>a Content-type header should be assumed to be text/plain with a
>charset of us-ascii.  The get_content_type method in email.Message
>does mention RFC 2045 and uses text/plain if the content-type is
>invalid.

It does seem inconsistent, but I don't think we can call it a violation
of the RFC yet, it depends on what the caller does with it.

>Would it be appropriate to set failobj="us-ascii" when
>calling this method in Scrubber.py?

It might be, but I'd like to hear from Tokio first.

Clearly this was considered at one point as a specific case and message
exist for it where it would have been simpler to just assume it is
us-ascii. Thus, I think there must be messages in the wild with parts
with unspecified character sets that aren't us-ascii.

-- 
Mark Sapiro <msapiro at value.net>       The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan