regular expressions and internationalization (WAS: permuting letters...)

Steven Bethard steven.bethard at gmail.com
Wed Nov 17 17:47:45 EST 2004


Dieter Maurer wrote:
> Steven Bethard <steven.bethard at gmail.com> writes on Fri, 12 Nov 2004 20:15:28 +0000 (UTC):
> >
> > Is there any way to match \w but not \d?
> 
> It is:  r'(?!\d)\w'

Yeah, I guess you could use negative lookahead assertions too.  My 
proposed solution to the problem discussed in this thread:

 >>> re.findall(r'[^\W\d_]{4,}', 'asdg1dfs _asfd s adfsa')
['asdg', 'asfd', 'adfsa']

A solution using a negative lookahead assertion:

 >>> re.findall(r'(?:(?![\d_])\w){4,}', 'asdg1dfs _asfd s adfsa')
['asdg', 'asfd', 'adfsa']

This seems a fair bit more verbose (and IMHO harder to read) than the 
solution I proposed, but perhaps you had a clearer version in mind?

I tend to shy away from lookahead assertions because IMHO there's 
usually an easier way.  They are occasionally useful though...

Steve



More information about the Python-list mailing list