[Python-3000] Regular expressions, py3k and unicode

Mark Dickinson dickinsm at gmail.com
Sun Jun 29 14:56:33 CEST 2008


On Sun, Jun 29, 2008 at 12:36 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>
> Indeed. On the other hand it already works properly for ints and floats,
> so perhaps Decimal shouldn't refuse unicode digits like it currently
> does:

Maybe.  The IBM standard doesn't seem to say whether other Unicode
digits should be accepted or not.

Is there a quick way to convert a general Unicode digit to its
ascii equivalent?  Having to run str(int(c)) on each numeric character
sounds painful, and the Decimal constructor doesn't need to
be any slower right now.

In any case, this potential problem with decimal has now been
identified, and is easy to deal with.  I'm more worried, perhaps
needlessly, about what other unidentified problems might be
lurking deep in the standard library.  Any use of '\d', '\w', '\s', etc.
might potentially be a problem.

Mark


More information about the Python-3000 mailing list