[Mailman-Users] UTF-8 From and Reply-to addresses not getting properly processed.

Lindsay Haisley fmouse at fmp.com
Sun Feb 16 15:44:57 EST 2020


On Sun, 2020-02-16 at 12:08 -0800, Mark Sapiro wrote:
> Munged a few words.
> 
> On 2/15/20 11:20 PM, Lindsay Haisley wrote:
> 
> > The only filter relevant to this issue is "(?i)Subject: .*[f...]".
> 
> The (?i) is irrelevant as the match always ignores case. Also, I don't
> think that's what you want as it will match any Subject that contains
> any of the letters f, u, c, k in either case. What is the action of this
> rule?

Discard.

> On 2/16/20 10:17 AM, Lindsay Haisley wrote:
> > 
> > We want to discard _all_ non-member
> > posts, and the problem is that these base64-addressed posts _are_ being
> > held and not discarded. 
> 
> 
> If generic_nonmember_action is Discard, non-member posts should be
> discarded unless some prior test causes them to be held. Things which
> could cause a hold are in order:
> 
> A match on a header_filter_rule with a Hold action.

Possible. The "Reason" is "The message headers matched a filter rule"


> One of the addresses returned by get_senders() is a moderated member and
> member_moderation_action is Hold.

This is certainly a possiblity. I see that you've given me some code to
work with below. I'll explore this. Thanks!

> None of the addresses returned by get_senders() is a member and the
> address returned by get_sender() matches an entry in
> hold_these_nonmembers

hold_these_nonmembers is empty.

> An address returned by get_senders() is an unmoderated member and the
> list's emergency setting is Yes.

It is No.

> What is the reason given for the hold?

"The message headers matched a filter rule"

hmmmm

> In the case of the message headers in your OP, get_senders() will
> return
> a list like
> 
> ['=?utf-8?b?ikfiaweiidxbymlhqg11bhrplm5ldc5waz4=?=',
> 'abia at multi.net.pk',
> '=?utf-8?b?ikfiaweiidxbymlhqg11bhrplm5ldc5waz4=?=']
> 
> which are lowercased versions of respectively, the undecoded From:,
> The
> unix from which I deduce from Return-Path: and the undecoded Reply-
> To:.
> Both the original From: and Reply-To: decode to
> 
> "Abia" <Abia at multi.net.pk>
> 
> msg.get_sender() returns '=?utf-
> 8?b?ikfiaweiidxbymlhqg11bhrplm5ldc5waz4=?='
> 
> From what you've said, that message whose decoded Subject: header is
> 
> Subject: I InstaF... Request is Pending
> 
> would match the header_filter_rule and be handled per that rule's
> action, but then so would a message with
> 
> Subject: It's a fine day

I substituted for "uck" where it showed up to avoid hitting subscriber
filters on _this_ list.

> You can decode these headers like
> 
> python
> ...
> > > > from email.header import decode_header
> > > > decode_header('=?utf-
> > > > 8?B?IkFiaWEiIDxBYmlhQG11bHRpLm5ldC5waz4=?=')
> 
> [('"Abia" <Abia at multi.net.pk>', 'utf-8')]

Thanks!!! This is the tool I need, however the list administrator
doesn't have access to the Python interactive shell on this system, and
these messages seem to have different encoded From headers (I'll check
this). We need some way going forward so that she can get these
discarded at the git-go without having to run get_senders() on each
one.

This may not be possible. One of the characteristics of spam is that
it's like clouds in the sky. One particular type and form may all over
you one day, and completely gone the next.

-- 
Lindsay Haisley       |  "The arc of history is long, but
FMP Computer Services |     it bends toward Justice"
512-259-1190          |
http://www.fmp.com    |        - Barack Obama




More information about the Mailman-Users mailing list