[Tutor] about regular expression

Michael Janssen Janssen@rz.uni-frankfurt.de
Sat Mar 22 14:23:02 2003


On Sat, 22 Mar 2003, Abdirizak abdi wrote:

> Hi everyone
>
> Thanks anton for your help.
>
> I am working on program that incorporates multiple regular expressions: consider that I have tha following :
>
> exp_token = re.compile(r"""
>                ([-a-zA-Z0-9_]+|   # for charcterset
>                [\"\'.\(),:!\?]|    # symbol chracters
>               <REF SELF='YES'>.*?</REF>)
>                """, re.VERBOSE )

this is incorrect: re.VERBOSE *ignores* whitespace (for exceptions compare
moduls documentation). "<REF SELF" gets "<REFSELF" and can't match:
"<REF\sSELF" is correct. Compare also the explanation for "|"!

Why did the third sub regexp produce a match (instead of get anything
taken away by the former sub regexp as you have discribed it)? Because
when processing comes to "<" it's no member of first and second set.
"<REF\sSELF='YES'>.*?</REF>" is getting evaluated and works out.

Michael