Re: unicode data - accessing codepoints > FFFF on narrow python builts

Wed Apr 18 16:57:22 EDT 2007

Hi, thanks for your answer,
I'll try to check the source of unicodedata;
Using the wide Unicode build seems to be a kind of overkill for now, as for the vast majority of my uses, the BMP is enough. I was rather looking for some "lower-cost" alternatives for those rare cases, when I need higher multilingual planes.

Thanks again,

Vlastimil Brom
 - vbr

> From: "Martin v. Löwis" <martin at v.loewis.de>
> Subj.: Re: unicode data - accessing codepoints > FFFF on narrow python builts
> Datum: 18.4.2007 21:37:39
> ----------------------------------------
> > Is it a bug in unicodedata, or is this the expected behaviour on a
> > narrow build?
> 
> It's a bug. It should either raise an exception, or return the correct
> result. If you know feel like submitting a bug report: please try to
> come up with a patch instead.
> 
> > Another problem I have is to access the "characters" and their
> > properties by the respective codepoints: under FFFF it is possible,
> > to use unichr(), which isn't valid for higher valules on a narrow
> > build It is possible to derive the codepoint from the surrogate pair,
> > which would be usable also for wider codepoints.
> 
> See PEP 261. This is by design.
> 
> > Currently, I'm using a kind of parallel database for some unicode
> > ranges above FFFF, but I don't think, this is the most effective way.
> 
> Just use a wide Unicode build instead.
> 
> Regards,
> Martin
> -- 
> http://mail.python.org/mailman/listinfo/python-list
> 
> 
>