regular expressions questions

Derek Thomson derek at ooc.com.au
Mon Mar 20 21:25:46 EST 2000


"Newhard, Nick" wrote:
> 
> I have started working with regular expressions (using the re.py module) for
> parsing text input and have a few questions.
> 
> 1. Single quote detection
> 
> myrule = re.compile(r"^/('|say)(?:[\t ]*$|[\t ]+(.*))")

Please, always use the re.VERBOSE flag for non-trivial regexes. This is the sort
of thing that gives languages reputations for poor readability :)

> 
> s1 = "/say Hello!"
> s2 = "/' Hello!"
> 
> String s1 matches and returns a group ('say', 'Hello!') but string s2 does
> not match. Any ideas?  I tried variations with the single quote (\', etc.)
> without success.

That's odd, s2 gives ("'", 'Hello!') for me.

> 
> The command list I use in my rule ('|say) is a lot larger. Any comments on a
> better way to set up this rule?

Probably. Could you describe what you are trying to achieve?

> 
> 2. Groups and patterns
> 
> s2 = "aabbabaaabbb"
> 
> If I have a large string containing a random collection of patterns, in this
> case a's and b's, how would I for a rule to return groups of them?
> 
> For this case, I would want ('aa', 'bb', 'a', 'b', 'aaa', 'bbb')
> 
> 2a. What if I want to collect whitespace with the a's in the groups too?
> 
> s2 = "aa bb a b aaa bbb"
> 
> For this case, I would want ('aa ', 'bb', ' a ', 'b', ' aaa ', 'bbb')
> 
> Thanks for your help!!

This seems to work:

#!/usr/bin/python

import re

regex = re.compile('(a+) | (b+)', re.VERBOSE)

print re.findall(regex, 'aabbabaaabbb');

regex = re.compile('(\s*a+\s*) | (b+)', re.VERBOSE)

print re.findall(regex, 'aa bb a b aaabbb');

You then just have to filter out the empty strings from the returned tuples
where the first or second captured subregex didn't match, but I'll leave that to
you.

Alternatively, you could run two regexes '(a+)' then '(b+)' over the string,
which would eliminate the empty string problem. It means two passes over the
string, which may or may not matter to you. You would also lose the relative
ordering of the 'a' and 'b' groups.

> 
> (P.S.  I have "Mastering Regular Expressions" on order but it will be days
> before I receive it. *sigh*)

Excellent book. I couldn't believe that someone could make a readable book about
them. And how much there was that I didn't know.

[ And now, at last, Perl 5.6 allows regexes to refer to other regexes. So now we
can uses regexes to parse balanced expressions like matching parenthesis. No
doubt this will appear in Python's re module before long. ]

Regards,
Derek



More information about the Python-list mailing list