Suitability for long-running text processing?

Mon Jan 8 11:49:26 EST 2007

On 1/8/07, Felipe Almeida Lessa <felipe.lessa at gmail.com> wrote:
> On 1/8/07, tsuraan <tsuraan at gmail.com> wrote:
> >
> >
> > > I just tried on my system
> > >
> > > (Python is using 2.9 MiB)
> > > >>> a = ['a' * (1 << 20) for i in xrange(300)]
> > > (Python is using 304.1 MiB)
> > > >>> del a
> > > (Python is using 2.9 MiB -- as before)
> > >
> > > And I didn't even need to tell the garbage collector to do its job. Some
> > info:
> >
> > It looks like the big difference between our two programs is that you have
> > one huge string repeated 300 times, whereas I have thousands of
> > four-character strings.  Are small strings ever collected by python?
>
> In my test there are 300 strings of 1 MiB, not a huge string repeated. However:
>
> $ python
> Python 2.4.4c1 (#2, Oct 11 2006, 21:51:02)
> [GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> # Python is using 2.7 MiB
> ... a = ['1234' for i in xrange(10 << 20)]
> >>> # Python is using 42.9 MiB
> ... del a
> >>> # Python is using 2.9 MiB
>
> With 10,485,760 strings of 4 chars, it still works as expected.
>
> --
> Felipe.
> --

Have you actually ran the OPs code? It has clearly different behavior
than what you are posting, and the OPs code, to me at least, seems
much more representative of real-world code. In your second case, you
have the *same* string 10,485,760 times, in the OPs case each string
is different.

My first thought was that interned strings were causing the growth,
but that doesn't seem to be the case. Regardless, what he's posting is
clearly different, and has different behavior, than what he is
posting. If you don't see the memory leak when you run the code he
posted (the *same* code) that'd be important information.