Python "compiler" is too slow for processing large data files???

Fri Sep 6 10:04:19 EDT 2002

[someone]
> list1 = [
>     (323, 870, 46, ),
>     (810, 336, 271, ),
>     (572, 55, 596, ),
>     (337, 256, 629, ),
>     (31, 702, 16, ),
> ]
>
> Anyway, as my data files went from just a few lines, up to about
> 8000 lines (with 10 values in each line for total of about 450KB of text),
> the time to 'exec' the file became too slow (e.g. 15 seconds) and used too
> much memory (e.g. 50MB) (using ms-windows, python 2.2.1).  It is the
> "compile" phase, because if I re-run, and there is *.pyc file available,
> the import goes very fast (no compilation required).

The parse tree created is huge, so the memory use isn't likely to go down in
the near future (addressing this requires a fundamentally different parser,
and nobody is working on that).  Depending on platform, though, there are
two things that can hurt speed a lot:

1. Oodles of tiny memory allocations and deallocations for the leaf
   nodes of the parse tree.

2. Repeated realloc of a high-level parse node to reflect the ever-growing
   number of top-level children.

#1 has been repaired in 2.3 by using pymalloc in the parser instead of the
platform malloc.

#2 has been repaired in 2.2.2 and 2.3 by changing the parse node growth
strategy to sidestep what's currently quadratic-time behavior in some
platform realloc()s.

Regardless, it would be saner to break any file with 450KB of source code
into smaller files and paste them together with import.