question of regular expression

skeptic serge at zolshar.ru
Fri Nov 16 14:02:20 EST 2001


Hello, Stephen!
> I am now using org.apache.regexp for regular expression in Java. 
> I want to search the unicode characters (i am afraid the others can't
> understand the language of the unicode, so I still try to use English
> for my example, assume all the following words are unicode)
> e.g.
> I want to search the words with the following possibilities:
> 
> AB or C or E or ST
> 
> how to I write the expression?
> 
> I have tried this:
> 
> (AB|C|E|ST), however, it also matches when there is B and T only....

The pattern is correct.

> (while only AB or ST allows to be matched)......
> 
> thx!

The 2 possible culprits are:
- bad unicode handling by jakarta-regexp library;
- bad unicode handling by you (there are a TONS of problems with
unicode input and output, trust me; localization is much more
difficult thing than it may seem).

I have run your test with russian characters(Cyrillic unicode block)
and four different regex libraries -
jakarata-regexp,jakarta-oro,jregex,regex4j -  and all worked as
expected. So the second case seem more probable for me. Provide more
details.


Regards



More information about the Python-list mailing list