Regular expressions and non-standard character set

Tue Mar 20 12:54:06 EST 2001

On 20-Mar-2001 Petri Mikael Kuittinen wrote:
> I want to match word boundaries using the special sequences \b and \B
> of regular expressions. They work OK when using the "standard"
> alphanumeric set [a-zA-Z0-9_]. But I would like them to work with
> character set which also contains various "national characters"
> e.g. å, ä, ö, è, é, ü, ñ etc. and their uppercase equivalents.

I know this is not exactly what you want to here, but I personally avoid \b
like the plague.  It has been my experience that what the regex engine thinks
is a word boundary does not match what I want to be a word boundary.  I am
trying to remember examples of this, but I know that it has bitten me a few
times.