[Python-Dev] unicode alphanumerics

M.-A. Lemburg mal@lemburg.com
Sat, 01 Jul 2000 18:56:12 +0200


Fredrik Lundh wrote:
> 
> when looking through skip's coverage listing, I noted a bug in
> SRE:
> 
> #define SRE_UNI_IS_ALNUM(ch) ((ch) < 256 ? isalnum((ch)) : 0)
> 
> this predicate is used for \w when a pattern is compiled using
> the "unicode locale" (flag U), and should definitely not use 8-bit
> locale stuff.
> 
> however, there's no such thing as a Py_UNICODE_ISALNUM
> (or even a Py_UNICODE_ISALPHA).  what should I do?  how
> about using:
> 
>     Py_UNICODE_ISLOWER ||
>     Py_UNICODE_ISUPPER ||
>     Py_UNICODE_ISTITLE ||
>     Py_UNICODE_ISDIGIT

This will give you all cased chars along with all digits;
it ommits the non-cased ones.

It's a good start, but probably won't cover the full range
of letters + numbers.

Perhaps we need another table for isalpha in unicodectype.c ?
(Or at least one which defines all non-cased letters.)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/