"Help needed - I don't understand how Python manages memory"

Andrew MacIntyre andymac at bullseye.apana.org.au
Mon Apr 21 05:08:07 EDT 2008


Hank @ITGroup wrote:

> In order to deal with 400 thousands texts consisting of 80 million 
> words, and huge sets of corpora , I have to be care about the memory 
> things. I need to track every word's behavior, so there needs to be as 
> many word-objects as words.
> I am really suffering from the memory problem, even 4G  memory space can 
> not survive... Only 10,000 texts can kill it in 2 minutes.
> By the way, my program has been optimized to ``del`` the objects after 
> traversing, in order not to store the information in memory all the time.

In addition to all the other advice you've been given, I've found it can
pay dividends in memory consumption when each instance of a value (such
as a string) references only 1 object.  This is often referred to as
"interning".  Automatic interning is only performed for a small subset
of possibilities.

For example:

 >>> z1 = 10
 >>> z2 = 10
 >>> z1 is z2
True
 >>> z1 = 1000
 >>> z2 = 1000
 >>> z1 is z2
False
 >>> z1 = 'test'
 >>> z2 = 'test'
 >>> z1 is z2
True
 >>> z1 = 'this is a test string pattern'
 >>> z2 = 'this is a test string pattern'
 >>> z1 is z2
False

Careful use of interning can get a double boost: cutting memory 
consumption and allowing comparisons to short circuit on identity.  It
does cost in maintaining the dictionary that interns the objects though,
and tracking reference counts can be much harder.

-- 
-------------------------------------------------------------------------
Andrew I MacIntyre                     "These thoughts are mine alone..."
E-mail: andymac at bullseye.apana.org.au  (pref) | Snail: PO Box 370
        andymac at pcug.org.au             (alt) |        Belconnen ACT 2616
Web:    http://www.andymac.org/               |        Australia



More information about the Python-list mailing list