[Python-Dev] Problems with the new unicodectype.c

Fredrik Lundh Fredrik Lundh" <effbot@telia.com
Tue, 11 Jul 2000 23:16:42 +0200


tim wrote:
=20
> I believe we also need a way to split unicodedatabase.c into multiple =
files,
> as > 64K lines in a source file is unreasonable (Python can't handle a
> source file that large either, and Python is the *definition* of
> reasonableness here <wink>), and the MS compiler spits out a warning =
about
> its sheer size.

just a heads-up: I've been hacking a little on a new unicode
database.  the results this far are quite promising:

CTYPE: is*, to* functions
    118k =3D> 13k

CNAME: code <=3D> name mappings (\N{name})
    440k =3D> 160k

CINFO: remaining unicode properties
    590k =3D> 42k

(approximate code size with the old and new code, on Windows)

on the source side, 3300k source files are replaced with
about 600k (generated by a script, directly from the uni-
code.txt data file).

note that the CNAME and CINFO parts are optional; you only
need CTYPE to build a working Python interpreter.

integrating this with 2.0 should be relatively straightforward,
but don't expect it to happen before next week or so...

cheers /F