"Help needed - I don't understand how Python manages memory"

Lie Lie.1296 at gmail.com
Tue Apr 22 07:06:47 EDT 2008


On Apr 21, 1:14 am, "Hank @ITGroup" <hank.info... at gmail.com> wrote:
> Christian Heimes wrote:
> > Gabriel Genellina schrieb:
>
> >> Apart from what everyone has already said, consider that FreqDist may import other modules, store global state, create other objects... whatever.
> >> Pure python code should not have any memory leaks (if there are, it's a bug in the Python interpreter). Not-carefully-written C extensions may introduce memory problems.
>
> > Pure Python code can cause memory leaks. No, that's not a bug in the
> > interpreter but the fault of the developer. For example code that messes
> > around with stack frames and exception object can cause nasty reference
> > leaks.
>
> > Christian
>
> In order to deal with 400 thousands texts consisting of 80 million
> words, and huge sets of corpora , I have to be care about the memory
> things. I need to track every word's behavior, so there needs to be as
> many word-objects as words.
> I am really suffering from the memory problem, even 4G  memory space can
> not survive... Only 10,000 texts can kill it in 2 minutes.
> By the way, my program has been optimized to ``del`` the objects after
> traversing, in order not to store the information in memory all the time.

May we be explained a little further on what you're doing on the 80
million words? Perhaps we could help you better the design since, as
Christian Heimes has said, the 80 million words strains present day
computers to hold on memory all at once as it requires 500 MBs to hold
80 million for 6 ASCII letters words. If you're using Unicode, this
number may double or quadruple. A better solution may be achieved by
loading parts of the text required and process it using generators or
to index the words, it may be slower (or even faster as the OS
wouldn't need to allocate as much memory) but that's a tradeoff you
should decide on.



More information about the Python-list mailing list