[Tutor] about regular expression
Michael Janssen
Janssen@rz.uni-frankfurt.de
Sat Mar 22 14:23:02 2003
On Sat, 22 Mar 2003, Abdirizak abdi wrote:
> Hi everyone
>
> Thanks anton for your help.
>
> I am working on program that incorporates multiple regular expressions: consider that I have tha following :
>
> exp_token = re.compile(r"""
> ([-a-zA-Z0-9_]+| # for charcterset
> [\"\'.\(),:!\?]| # symbol chracters
> <REF SELF='YES'>.*?</REF>)
> """, re.VERBOSE )
this is incorrect: re.VERBOSE *ignores* whitespace (for exceptions compare
moduls documentation). "<REF SELF" gets "<REFSELF" and can't match:
"<REF\sSELF" is correct. Compare also the explanation for "|"!
Why did the third sub regexp produce a match (instead of get anything
taken away by the former sub regexp as you have discribed it)? Because
when processing comes to "<" it's no member of first and second set.
"<REF\sSELF='YES'>.*?</REF>" is getting evaluated and works out.
Michael