[Mailman-Users] A scrubber issue
Mark Sapiro
msapiro at value.net
Sat Dec 9 20:40:12 CET 2006
Todd Zullinger wrote:
>
>Related to the second part of Werner's message being scrubbed with the
>message:
>
> An embedded and charset-unspecified text was scrubbed...
>
>Poking in the email package (on python 2.4.4) shows:
>
> def get_content_charset(self, failobj=None):
> """Return the charset parameter of the Content-Type header.
>
> The returned string is always coerced to lower case. If there is no
> Content-Type header, or if that header has no charset parameter,
> failobj is returned.
> """
>
>This seems to violate section 5.2 of RFC 2045 which says parts lacking
>a Content-type header should be assumed to be text/plain with a
>charset of us-ascii. The get_content_type method in email.Message
>does mention RFC 2045 and uses text/plain if the content-type is
>invalid.
It does seem inconsistent, but I don't think we can call it a violation
of the RFC yet, it depends on what the caller does with it.
>Would it be appropriate to set failobj="us-ascii" when
>calling this method in Scrubber.py?
It might be, but I'd like to hear from Tokio first.
Clearly this was considered at one point as a specific case and message
exist for it where it would have been simpler to just assume it is
us-ascii. Thus, I think there must be messages in the wild with parts
with unspecified character sets that aren't us-ascii.
--
Mark Sapiro <msapiro at value.net> The highway is for gamblers,
San Francisco Bay Area, California better use your sense - B. Dylan
More information about the Mailman-Users
mailing list