[Python-Dev] Python and the Unicode Character Database

Vlastimil Brom vlastimil.brom at gmail.com
Tue Dec 7 23:19:44 CET 2010


2010/12/7 Alexander Belopolsky <alexander.belopolsky at gmail.com>:
> On Tue, Dec 7, 2010 at 8:02 AM, Vlastimil Brom <vlastimil.brom at gmail.com> wrote:
> ..
>> Do you know of any re engine fully complying to to tr18, even at the
>> first level: "Basic Unicode Support"?
>>
> """
> ICU Regular Expressions conform to Unicode Technical Standard #18 ,
> Unicode Regular Expressions, level 1, and in addition include Default
> Word boundaries and Name Properties from level 2.
> """ http://userguide.icu-project.org/strings/regexp
>

Thanks for the pointer, I wasn't aware of that project.
Anyway I am quite happy with the mentioned regex library and can live
with trading this full compliance for some non-unicode goodies (like
unbounded lookbehinds etc.), but I see, it's beyond the point here.
Not that my opinion matters, but I can't think of, say, "union,
intersection and set-difference of Unicode sets" as a basic feature or
consider it a part of "a minimal level for useful Unicode support."

vbr


More information about the Python-Dev mailing list