regular expressions and internationalization (WAS: permuting letters...)
Steven Bethard
steven.bethard at gmail.com
Fri Nov 12 15:15:28 EST 2004
Andrew Dalke <adalke <at> mindspring.com> writes:
>
> Steven Bethard wrote:
> > Is there any reason not to use [A-z] type regexps?
>
> Better support for internationalization, so it will
> work in España and Göteborg.
Ahh. That makes sense of course. Thanks!
I looked again at the re module, and it seems that \w and \W do have
internationalization support... Is there any way to match \w but not \d? Maybe
something like:
r'[^\d\W]{4,}'
This seems to work (maybe?):
>>> p = re.compile(r'[^\d\W]{4,}', re.UNICODE)
>>> p.findall(u'él me compró un globo. 1234 a342')
[u'compr\xf3', u'globo']
I don't know how to check how this works in different locales though...
Steve
More information about the Python-list
mailing list