[Python-Dev] unicode alphanumerics
M.-A. Lemburg
mal@lemburg.com
Sat, 01 Jul 2000 18:56:12 +0200
Fredrik Lundh wrote:
>
> when looking through skip's coverage listing, I noted a bug in
> SRE:
>
> #define SRE_UNI_IS_ALNUM(ch) ((ch) < 256 ? isalnum((ch)) : 0)
>
> this predicate is used for \w when a pattern is compiled using
> the "unicode locale" (flag U), and should definitely not use 8-bit
> locale stuff.
>
> however, there's no such thing as a Py_UNICODE_ISALNUM
> (or even a Py_UNICODE_ISALPHA). what should I do? how
> about using:
>
> Py_UNICODE_ISLOWER ||
> Py_UNICODE_ISUPPER ||
> Py_UNICODE_ISTITLE ||
> Py_UNICODE_ISDIGIT
This will give you all cased chars along with all digits;
it ommits the non-cased ones.
It's a good start, but probably won't cover the full range
of letters + numbers.
Perhaps we need another table for isalpha in unicodectype.c ?
(Or at least one which defines all non-cased letters.)
--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/