text processing problem

Matt matthew_shomphe at countrywide.com
Fri Apr 8 11:53:33 EDT 2005


Maurice LING wrote:
> Matt wrote:
> > I'd HIGHLY suggest purchasing the excellent <a
> > href="http://www.oreilly.com/catalog/regex2/index.html">Mastering
> > Regular Expressions</a> by Jeff Friedl.  Although it's mostly
geared
> > towards Perl, it will answer all your questions about regular
> > expressions.  If you're going to work with regexs, this is a
must-have.
> >
> > That being said, here's what the new regular expression should be
with
> > a bit of instruction (in the spirit of teaching someone to fish
after
> > giving them a fish ;-)   )
> >
> > my_expr = re.compile(r'(\w+)\s*(\(\1\))')
> >
> > Note the "\s*", in place of the single space " ".  The "\s" means
"any
> > whitespace character (equivalent to [ \t\n\r\f\v]).  The "*"
following
> > it means "0 or more occurances".  So this will now match:
> >
> > "there  (there)"
> > "there (there)"
> > "there(there)"
> > "there                                          (there)"
> > "there\t(there)" (tab)
> > "there\t\t\t\t\t\t\t\t\t\t\t\t(there)"
> > etc.
> >
> > Hope that's helpful.  Pick up the book!
> >
> > M@
> >
>
> Thanks again. I've read a number of tutorials on regular expressions
but
> it's something that I hardly used in the past, so gone far too rusty.
>
> Before my post, I've tried
> my_expr = re.compile(r'(\w+) \s* (\(\1\))') instead but it doesn't
work,
> so I'm a bit stumped......
>
> Thanks again,
> Maurice

Maurice,
The reason your regex failed is because you have spaces around the
"\s*".  This translates to "one space, followed by zero or more
whitespace elements, followed by one space".  So your regex would only
match the two text elements separated by at least 2 spaces.

This kind of demostrates why regular expressions can drive you nuts.

I still suggests picking up the book; not because Jeff Friedl drove a
dump truck full of money up to my door, but because it specifically has
a use case like yours.  So you get to learn & solve your problem at the
same time!

HTH,
M@




More information about the Python-list mailing list