Python "compiler" is too slow for processing large data files???

Tim Peters tim.one at comcast.net
Fri Sep 6 10:04:19 EDT 2002


[someone]
> list1 = [
>     (323, 870, 46, ),
>     (810, 336, 271, ),
>     (572, 55, 596, ),
>     (337, 256, 629, ),
>     (31, 702, 16, ),
> ]
>
> Anyway, as my data files went from just a few lines, up to about
> 8000 lines (with 10 values in each line for total of about 450KB of text),
> the time to 'exec' the file became too slow (e.g. 15 seconds) and used too
> much memory (e.g. 50MB) (using ms-windows, python 2.2.1).  It is the
> "compile" phase, because if I re-run, and there is *.pyc file available,
> the import goes very fast (no compilation required).

The parse tree created is huge, so the memory use isn't likely to go down in
the near future (addressing this requires a fundamentally different parser,
and nobody is working on that).  Depending on platform, though, there are
two things that can hurt speed a lot:

1. Oodles of tiny memory allocations and deallocations for the leaf
   nodes of the parse tree.

2. Repeated realloc of a high-level parse node to reflect the ever-growing
   number of top-level children.

#1 has been repaired in 2.3 by using pymalloc in the parser instead of the
platform malloc.

#2 has been repaired in 2.2.2 and 2.3 by changing the parse node growth
strategy to sidestep what's currently quadratic-time behavior in some
platform realloc()s.

Regardless, it would be saner to break any file with 450KB of source code
into smaller files and paste them together with import.





More information about the Python-list mailing list