[Mailman-Users] utf-8 subjects; extended "." regexp really necessary?
Mark Sapiro
mark at msapiro.net
Thu Dec 3 13:36:29 EST 2015
On 11/24/2015 06:53 PM, Stephen J. Turnbull wrote:
> Adrian Pepper writes:
> > Am I correct in my conclusion that .* won't match newline characters,
> > but <space-chars><not-space-chars><linefeed><carriage-return> will ?
> > (And also, that that is the character class I created).
>
> Yes. Here are the docs for Python regular expressions as used in
> Mailman: https://docs.python.org/2.7/library/re.html.
>
> In general this problem would be addressed with the DOTALL flag:
>
> The special characters are:
>
> '.'
> (Dot.) In the default mode, this matches any character except a
> newline. If the DOTALL flag has been specified, this matches any
> character including a newline.
>
> Note that the definition of "newline" here is exactly "\n".
Note you can turn on DOTALL in the regexp itself. so while
Farmers[_ ]Weekly.*Ac
doesn't match,
(?s)Farmers[_ ]Weekly.*Ac
will (see docs referenced above).
> > Empirically I see ?=\n =?utf-8?q?_ after "Weekly" and before "Ac".
> > (And it seems the matching is done on the incoming subject, not the
> > one formatted for resending, which, with my tag, and the utf-8
> > of an incoming tag pushes the expression entirely onto the second
> > line where I think the ".*" variant (or even [_ ]) would match.
This is all a bug in not decoding RFC2047 encoded headers before
matching. See <https://bugs.launchpad.net/mailman/+bug/891676> fixed in
Mailman 2.1.15.
--
Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
San Francisco Bay Area, California better use your sense - B. Dylan
More information about the Mailman-Users
mailing list