[Python-Dev] Unicode database

Nick Maclaren nmm1 at cus.cam.ac.uk
Fri Aug 10 10:23:42 CEST 2007


=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <martin at v.loewis.de> wrote:
>
> Sure. But (again): you don't need to have the mappings at all for
> what you want to achieve. So there is no point in downloading them

Sigh.  No, I don't.  But, if I want to be able to merge anything
back into the main Python source, it is a VERY good idea to use the
existing mechanisms and not invent new ones.

The easiest thing would have been to hack re.py to create a Unicode
table using unicodedata.py directly, and that would indeed be a rather
cleaner solution in the long term.  But it would have meant that there
were now multiple different ways of generating the Unicode data for
_sre.c, and that would have led to inconsistencies.

As I pointed out, there is already a problem where upgrading the data
needs a complete rebuild to get all of the Unicode data back in step;
'make all' in itself does not work.  That is precisely the sort of
problem that is caused by having duplicate update mechanisms.


Now, IF I can work out how the _sre.c engine works enough to put
atomic/possessive quantifiers in, this problem will return.  My
question would be how best to make a suitable proposal that, inter
alia, includes changes that can't be made by the normal building
mechanisms.

And I still don't have a clue about that one.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  nmm1 at cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679


More information about the Python-Dev mailing list