text processing problem
Matt
matthew_shomphe at countrywide.com
Thu Apr 7 21:00:56 EDT 2005
Maurice LING wrote:
> Matt wrote:
> >
> >
> > Try this:
> > import re
> > my_expr = re.compile(r'(\w+) (\(\1\))')
> > s = "this is (is) a test"
> > print my_expr.sub(r'\1', s)
> > #prints 'this is a test'
> >
> > M@
> >
>
> Thank you Matt. It works out well. The only think that gives it
problem
> is in events as "there (there)", where between the word and the same
> bracketted word is more than one whitespaces...
>
> Cheers
> Maurice
Maurice,
I'd HIGHLY suggest purchasing the excellent <a
href="http://www.oreilly.com/catalog/regex2/index.html">Mastering
Regular Expressions</a> by Jeff Friedl. Although it's mostly geared
towards Perl, it will answer all your questions about regular
expressions. If you're going to work with regexs, this is a must-have.
That being said, here's what the new regular expression should be with
a bit of instruction (in the spirit of teaching someone to fish after
giving them a fish ;-) )
my_expr = re.compile(r'(\w+)\s*(\(\1\))')
Note the "\s*", in place of the single space " ". The "\s" means "any
whitespace character (equivalent to [ \t\n\r\f\v]). The "*" following
it means "0 or more occurances". So this will now match:
"there (there)"
"there (there)"
"there(there)"
"there (there)"
"there\t(there)" (tab)
"there\t\t\t\t\t\t\t\t\t\t\t\t(there)"
etc.
Hope that's helpful. Pick up the book!
M@
More information about the Python-list
mailing list