regular expressions and internationalization (WAS: permuting letters...)

Steven Bethard steven.bethard at gmail.com
Fri Nov 12 15:15:28 EST 2004


Andrew Dalke <adalke <at> mindspring.com> writes:
> 
> Steven Bethard wrote:
> > Is there any reason not to use [A-z] type regexps?
> 
> Better support for internationalization, so it will
> work in España and Göteborg.

Ahh.  That makes sense of course.  Thanks!

I looked again at the re module, and it seems that \w and \W do have
internationalization support... Is there any way to match \w but not \d?  Maybe
something like:
    r'[^\d\W]{4,}'

This seems to work (maybe?):

>>> p = re.compile(r'[^\d\W]{4,}', re.UNICODE)
>>> p.findall(u'él me compró un globo. 1234 a342')
[u'compr\xf3', u'globo']

I don't know how to check how this works in different locales though...

Steve




More information about the Python-list mailing list