[Mailman-Users] Chinese characters spam filter?

Mark Sapiro mark at msapiro.net
Wed Jul 13 13:55:07 EDT 2016


On 07/13/2016 04:38 AM, Yasuhito FUTATSUKI wrote:
>  
>> I think it is better to hold string attributes of mm_cfg and mlist class
>> as Unicode than site_language code or list's preferred language code
>> encoded (but I know it is so trouble to do so).
> And then on pattern matching on message pipeline is done with Unicode
> rather than list's prefered language.


I have been working on a change to do exactly that. I.e. collect the
headers for matching with header_filter_rules as unicode and match the
patterns as unicode.

This is very difficult to do on a list whose preferred_language
character set doesn't support the characters in the header_filter_rules
patterns, e.g., trying to match Chinese characters in Subject: headers
on an English language list.

The major issue is when the character set of the form is say us-ascii
and one enters non-ascii as a pattern, it is up to the browser to decide
how to encode that. In at least one case with Firefox, characters which
are in the windows-1252 character set are encoded as that and others as
XML numeric references. I can easily deal with converting the XML
numeric references to unicodes, but I don't know in what charset the
other characters are encoded.

<sigh> Yes, I know that everything should be unicode and utf-8 encoded
regardless of language, but that isn't going to happen in MM 2.1.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan


More information about the Mailman-Users mailing list