sre is broken in SuSE 9.2

Serge Orlov Serge.Orlov at gmail.com
Thu Feb 10 13:35:27 EST 2005


Denis S. Otkidach wrote:
> On all platfroms \w matches all unicode letters when used with flag
> re.UNICODE, but this doesn't work on SuSE 9.2:
>
> Python 2.3.4 (#1, Dec 17 2004, 19:56:48)
> [GCC 3.3.4 (pre 3.3.5 20040809)] on linux2
> Type "help", "copyright", "credits" or "license" for more
information.
> >>> import re
> >>> re.compile(ur'\w+', re.U).match(u'\xe4')
> >>>
>
> BTW, is correctly recognize this character as lowercase letter:
> >>> import unicodedata
> >>> unicodedata.category(u'\xe4')
> 'Ll'
>
> I've looked through all SuSE patches applied, but found nothing
> related. What is the reason for broken behavior?  Incorrect
> configure options?

To summarize the discussion: either it's a bug in glibc or there is an
option to specify modern POSIX locale. POSIX locale consist of
characters from the portable character set, unicode is certainly
portable. 

  Serge.




More information about the Python-list mailing list