Can one make 'in' ungreedy?

Chris Green cl at isbd.net
Mon May 18 08:39:18 EDT 2020


Larry Martell <larry.martell at gmail.com> wrote:
> On Mon, May 18, 2020 at 7:05 AM Chris Green <cl at isbd.net> wrote:
> >
> > I have a strange/minor problem in a Python program I use for mail
> > filtering.
> >
> > One of the ways it classifies messages is by searching for a specific
> > string in square brackets [] in the Subject:, the section of code that
> > does this is:-
> >
> >     #
> >     #
> >     # copy the fields from the filter configuration file into better named variables
> >     #
> >     nm = fld[0]             # name/alias
> >     dd = fld[1] + "/"       # destination directory
> >     tocc = fld[2].lower()   # list address
> >     sbstrip = '[' + fld[3] + ']'        # string to match in and/or strip out of subject
> >     #
> >     #
> >     # see if the filter To/CC column matches the message To: or Cc: or if sbstrip is in Subject:
> >     #
> >     if (tocc in msgcc or tocc in msgto or sbstrip in msgsb):
> >         #
> >         #
> >         # set the destination directory
> >         #
> >         dest = mldir + dd + nm
> >         #
> >         #
> >         # Strip out list name (4th field) from subject if it's there
> >         #
> >         if sbstrip in msgsb:
> >             msg.replace_header("Subject", msgsb.replace(sbstrip, ''))
> >         #
> >         #
> >         # we've found a match so assume we won't get another
> >         #
> >         break
> >
> >
> > So in the particular case where I have a problem sbstrip is "[Ipswich
> > Recycle]" and the Subject: is "[SPAM] [Ipswich Recycle] OFFER:
> > Lawnmower (IP11)".  The match isn't found, presumably because 'in' is
> > greedy and sees "[SPAM] [Ipswich Recycle]" which isn't a match for
> > "[Ipswich Recycle]".
> >
> > Other messages with "[Ipswich Recycle]" in the Subject: are being
> > found and filtered correctly, it seems that it's the presence of the
> > "[SPAM]" in the Subject: that's breaking things.
> >
> > Is this how 'in' should work, it seems a little strange if so, not
> > intuitively how one would expect 'in' to work.  ... and is there any
> > way round the issue except by recoding a separate test for the
> > particular string search where this can happen?
> 
> >>> sbstrip = "[Ipswich Recycle]"
> >>> subject = "[SPAM] [Ipswich Recycle] OFFER:Lawnmower (IP11)"
> >>> sbstrip in subject
> True
> 
> Clearly something else is going on in your program. I would run it in
> the debugger and look at the values of the variables in the case when
> it fails when you think it should succeed. I think you will see the
> variables do not hold what you think they do.

Thanks for taking the trouble to look. It's a *bit* difficult to run
in the debugger as the program is a filter triggered by incoming
E-Mail messages.  However I think I can fire stuff at it via stdin so
I'll see what I can fathon out doing that.

-- 
Chris Green
·


More information about the Python-list mailing list