[Mailman-Users] A scrubber issue

Tokio Kikuchi tkikuchi at is.kochi-u.ac.jp
Sun Dec 10 14:30:18 CET 2006


Todd Zullinger wrote:
> I wrote:
>> Tokio Kikuchi wrote:
> [...]
>>> But as to the default charset is 'us-ascii' problem, if we put the
>>> part together the parts, some language text (like japanese) become
>>> irreversibly unreadable.  It is safe to keep it in a separate file
>>> if you can't archive the whole message in multipart like in
>>> Pipermail.
>> Okay, that's understandable.
> 
> Just another thought (because I realize now that I don't understand
> this as well as I thought at first :)...
> 
> Are you saying there are messages which would lack a charset in a
> content-type header and include Japanese text?  I wouldn't think they
> would be valid if they didn't.  But I may not understand the types of
> message structures you mean.
> 
> If the email parsing were to assume that lacking a content-type header
> the part should be assumed to be text/plain and us-ascii, would this
> break valid messages or only invalid ones (not that invalid ones could
> necessarily be ignored, particularly if they were a significant
> portion of the messages seen in reality :).

RFC822 email message without the charset parameter may be assumed to be
us-ascii.  But for the text attachment, it may or may not be assumed.
For example, mailman patch file within the i18n message directory will
have mixed charset like iso-8859-1 for fr directory and euc-jp for ja.
If you assume the charset is us-ascii and made archive with ? for
unprintable characters, the patch file cannot be used.  You are always
safe if you save the text file as is in a separate attachment directory.

In short, a text attachment is not a email message.

> 
> I'd be grateful if you could enlighten me on this.
> 
> Thanks,



-- 
Tokio Kikuchi, tkikuchi at is.kochi-u.ac.jp
http://weather.is.kochi-u.ac.jp/


More information about the Mailman-Users mailing list