[OT] a little about regex

Ron Adam rrr at ronadam.com
Thu Oct 19 14:40:56 EDT 2006


Fulvio wrote:
> ***********************
> Your mail has been scanned by InterScan MSS.
> ***********************
> 
> 
> On Wednesday 18 October 2006 15:32, Ron Adam wrote:
> 
>> |Instead of using two separate if's, Use an if - elif and be sure to test
> 
> Thank you, Ron, for the input :)
> I'll examine also in this mode. Meanwhile I had faced the total disaster :) of 
> deleting all my emails from all server ;(
> (I've saved them locally, luckly :) )
> 
>> |It's not exactly clear on what output you are seeking.  If you want 0 for
>> | not filtered and 1 for filtered, then look to Freds Hint.
> 
> Actually the return code is like herein:
> 
>     if _filter(hdrs,allow,deny):
>     # allow and deny are objects prepared by re.compile(pattern)
>         _del(Num_of_Email)
> 
> In short, it means unwanted to be deleted. 
> And now the function is :
> 
> def _filter(msg,al,dn):
>     """ Filter try to classify a list of lines for a set of compiled             
>      patterns."""
>     a = 0
>     for hdrline in msg:
>         # deny has the first priority and stop any further searching. Score 10 
>          #times
>         if dn.search(hdrline): return len(msg) * 10
>         if al.search(hdrline): return 0
>         a += 1
>     return a # it returns with a score of rejected matches or zero if none

I see, is this a cleanup script to remove the least wanted items?

The allow/deny caused me to think it was more along the lines of a white/black 
list.  Where as keep/discard would be terms more suitable to cleaning out items 
already allowed.

Or is it a bit of both?  Why the score?

Just curious, I don't think I have any suggestions that will help in any 
specific ways.

I would think the allow(keep?) filters would always have priority over deny filters.


> The patterns are taken from a configuration file. Those with Axx ='pattern' 
> are allowing streams the others are Dxx to block under different criteria.
> Here they're :
> 
> [Filters]
> A01 = ^From:.*\.it\b
> A02 = ^(To|Cc):.*frioio@
> A03 = ^(To|Cc):.*the_sting@
> A04 = ^(To|Cc):.*calm_me_or_die@
> A05 = ^(To|Cc):.*further@
> A06 = ^From:.*\.za\b
> D01 = ^From:.*\.co\.au\b
> D02 = ^Subject:.*\*\*\*SPAM\*\*\*
> 
> *A bit of fake in order to get some privacy* :)
> I'm using configparser to fetch their value and they're are joint by :
> 
>     allow = re.compile('|'.join([k[1] for k in ifil if k[0] is 'a']))
>     deny = re.compile('|'.join([k[1] for k in ifil if k[0] is 'd']))
> 
> ifil is the input filter's section.
 >
> At this point I suppose that I have realized the right thing, just I'm a bit 
> curious to know if ithere's a better chance and realize a single regex 
> compilation for all of the options.

I think keeping the allow filter seperate from the deny filter is good.

You might be able to merge the header lines and run the filters across the whole 
header at once instead of each line.

> Basically the program will work, in term of filtering as per config and 
> sincronizing with local $HOME/Mail/trash (configurable path). This last 
> option will remove emails on the server for those that are in the local 
> trash.
> Todo = backup local and remote emails for those filtered as good.
>             multithread to connect all server in parallel
>             SSL for POP3 and IMAP4 as well
> Actually I've problem on issuing the command to imap server to flag "Deleted" 
> the message which count as spam. I only know the message details but what 
> is the correct command is a bit obscure, for me.

I can't help you here.  Sorry.

> BTW whose Fred?
> 
> F

Fredrik   see...

    news://news.cox.net:119/mailman.670.1161155836.11739.python-list@python.org





More information about the Python-list mailing list