[Mailman-Users] utf-8 subjects; extended "." regexp really necessary?

Mark Sapiro mark at msapiro.net
Thu Dec 3 13:36:29 EST 2015


On 11/24/2015 06:53 PM, Stephen J. Turnbull wrote:
> Adrian Pepper writes:
>  >  Am I correct in my conclusion that .* won't match newline characters,
>  >  but <space-chars><not-space-chars><linefeed><carriage-return> will ?
>  >  (And also, that that is the character class I created).
> 
> Yes.  Here are the docs for Python regular expressions as used in
> Mailman: https://docs.python.org/2.7/library/re.html.
> 
> In general this problem would be addressed with the DOTALL flag:
> 
>     The special characters are:
> 
>     '.'
>     (Dot.) In the default mode, this matches any character except a
>     newline. If the DOTALL flag has been specified, this matches any
>     character including a newline.
> 
> Note that the definition of "newline" here is exactly "\n".


Note you can turn on DOTALL in the regexp itself. so while

  Farmers[_ ]Weekly.*Ac

doesn't match,

 (?s)Farmers[_ ]Weekly.*Ac

will (see docs referenced above).


>  >  Empirically I see  ?=\n =?utf-8?q?_ after "Weekly" and before "Ac".
>  >  (And it seems the matching is done on the incoming subject, not the
>  >  one formatted for resending, which, with my tag, and the utf-8
>  >  of an incoming tag pushes the expression entirely onto the second
>  >  line where I think the ".*" variant (or even [_ ]) would match.


This is all a bug in not decoding RFC2047 encoded headers before
matching. See <https://bugs.launchpad.net/mailman/+bug/891676> fixed in
Mailman 2.1.15.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan


More information about the Mailman-Users mailing list