Suitability for long-running text processing?

tsuraan tsuraan at gmail.com
Mon Jan 8 11:55:24 EST 2007


> $ python
> Python 2.4.4c1 (#2, Oct 11 2006, 21:51:02)
> [GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> # Python is using 2.7 MiB
> ... a = ['1234' for i in xrange(10 << 20)]
> >>> # Python is using 42.9 MiB
> ... del a
> >>> # Python is using 2.9 MiB
>
> With 10,485,760 strings of 4 chars, it still works as expected.


Have you tried running the code I posted?  Is there any explanation as to
why the code I posted fails to ever be cleaned up?
In your specific example, you have a huge array of pointers to a single
string.  Try doing "a[0] is a[10000]".  You'll get True.  Try "a[0] is
'1'+'2'+'3'+'4'".  You'll get False.  Every element of a is a pointer to the
exact same string.  When you delete a, you're getting rid of a huge array of
pointers, but probably not actually losing the four-byte (plus gc overhead)
string '1234'.

So, does anybody know how to get python to free up _all_ of its allocated
strings?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20070108/fc80606b/attachment.html>


More information about the Python-list mailing list