regex problem ..

Mon Dec 15 07:35:43 EST 2008

Analog Kid wrote:
> Hi All:
> I am new to regular expressions in general, and not just re in python. 
> So, apologies if you find my question stupid :) I need some help with 
> forming a regex. Here is my scenario ...
> I have strings coming in from a list, each of which I want to check 
> against a regular expression and see whether or not it "qualifies". By 
> that I mean I have a certain set of characters that are permissible and 
> if the string has characters which are not permissible, I need to flag 
> that string ... here is a snip ...
> 
> flagged = list()
> strs = ['HELLO', 'Hi%20There', '123123@#@']
> p =  re.compile(r"""[^a-zA-Z0-9]""", re.UNICODE)
> for s in strs:
>     if len(p.findall(s)) > 0:
>         flagged.append(s)
> 
> print flagged
> 
> my question is ... if I wanted to allow '%20' but not '%', how would my 
> current regex (r"""[^a-zA-Z0-9]""") be modified?

You might want to normalize before checking, e.g.

from urllib import unquote

p=re.compile("[^a-zA-Z0-9 ]")
flagged=[]

for s in strs:
     if p.search(unquote(s)):
        flagged.append(s)

be carefull however if you want to show the
flagged ones back to the user. Best is always
quote/unquote at the boundaries as appropriate.

Regards
Tino

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3241 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mail.python.org/pipermail/python-list/attachments/20081215/6ccb3764/attachment-0001.bin>