[Python-Dev] Python and the Unicode Character Database

Vlastimil Brom vlastimil.brom at gmail.com
Tue Dec 7 14:02:47 CET 2010


2010/12/7 Alexander Belopolsky <alexander.belopolsky at gmail.com>:
> On Sat, Dec 4, 2010 at 5:58 PM, "Martin v. Löwis" <martin at v.loewis.de> wrote:
>>> I actually wonder if Python's re module can claim to provide even
>>> Basic Unicode Support.
>>
>> Do you really wonder? Most definitely it does not.
>>
>
> Were you more optimistic four years ago?
>
> http://bugs.python.org/issue1528154#msg54864
>
> I was hoping that regex syntax would be useful in
> explaining/documenting Python text processing routines (including
> string to number conversions) without a heavy dose of Unicode
> terminology.
>

The new regex version
http://bugs.python.org/issue2636
supports much more features, including unicode properties, and the
mentioned possix classes etc. but definitely not all of the
requirements of that rather "generous" list.
http://www.unicode.org/reports/tr18/
It seems, e.g. in Perl, there are some omissions too
http://perldoc.perl.org/perlunicode.html#Unicode-Regular-Expression-Support-Level

Do you know of any re engine fully complying to to tr18, even at the
first level: "Basic Unicode Support"?

vbr


More information about the Python-Dev mailing list