[Mailman-Users] Chinese characters spam filter?

Mark Sapiro mark at msapiro.net
Tue Jul 12 14:47:48 EDT 2016


On 07/12/2016 12:03 AM, Stephen J. Turnbull wrote:
> Mark Sapiro writes:
>  > On 7/8/16 6:04 PM, Yasuhito FUTATSUKI wrote:
>  > > 
>  > > How about using 'backslashreplace' instead of 'replace' to encode to
>  > > list's preferred language in Mailman/Handlers/SpamDetect.py ?
> 
> I see you've already done this, but ...
> 
> I would consider xmlrefreplace as well.  xmlrefs are something most
> people (users/moderators) have seen, backslash they're not going to
> recognize unless they're programmers.


I have now switched to xmlcharrefreplace instead of backslashreplace as
I agree this will be easier to explain and understand. I was uncertain
about this at first because I didn't know that xmlcharrefreplace
wouldn't use entity names in some cases, but it appears that it only
uses numeric references.


> At an earlier stage, you could also just do a trial re-encoding with
> the list preferred codec, set errors = 'strict', catch the Exception,
> and re-raise as a Hold (or Discard, according to per-list policy).
> (Then discard the output.)  I would prefer this solution, I think, as
> creating regexps turns out to be an issue for many list owners.
> 
> People would have to learn not to use emoji in headers, of course, or
> suffer moderation delays or even discards.


I think this will have too many undesired effects. Not just emoji, but
accented latin or CJK characters, etc. in display names would I think be
real problems, even on English language lists.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan


More information about the Mailman-Users mailing list