Regular expressions and non-standard character set

Petri Mikael Kuittinen eye at niksula.hut.fi
Tue Mar 20 12:16:46 EST 2001


I want to match word boundaries using the special sequences \b and \B
of regular expressions. They work OK when using the "standard"
alphanumeric set [a-zA-Z0-9_]. But I would like them to work with
character set which also contains various "national characters"
e.g. å, ä, ö, è, é, ü, ñ etc. and their uppercase equivalents.

Locale doesn't seem to be the proper way to do it and appending these
characters to string.letters, string.uppercase, string.lowercase,
before import re doesn't help at all.

What is the proper way of doing it?

Petri

-- 
<(O)> Petri Kuittinen, also known as Eye, Dj Eye or Peku               <(O)>
<(O)> ADDRESS: Postipuuntie 10 A 14, FIN-02600 Espoo, Finland          <(O)>
<(O)> EMAIL: eye at iki.fi WWW: http://www.iki.fi/~eye/ PHONE: 09-5472380 <(O)>
~If dandelions were hard to grow, they would be most welcome on any lawn.



More information about the Python-list mailing list