Possible to set cpython heap size?

Jussi Salmela tiedon_jano at hotmail.com
Thu Feb 22 13:53:42 EST 2007


Andy Watson kirjoitti:
> I have an application that scans and processes a bunch of text files.
> The content I'm pulling out and holding in memory is at least 200MB.
> 
> I'd love to be able to tell the CPython virtual machine that I need a
> heap of, say 300MB up front rather than have it grow as needed.   I've
> had a scan through the archives of comp.lang.python and the python
> docs but cannot find a way to do this.  Is this possible to configure
> the PVM this way?
> 
> Much appreciated,
> Andy
> --
> 

Others have already suggested swap as a possible cause of slowness. I've
been playing with my portable (dual Intel T2300 @ 1.66 GHz; 1 GB of mem 
; Win XP ; Python Scripter IDE)
using the following code:

#=======================
import datetime

'''
# Create 10 files with sizes 1MB, ..., 10MB
for i in range(1,11):
     print 'Writing: ' + 'Bytes_' + str(i*1000000)
     f = open('Bytes_' + str(i*1000000), 'w')
     f.write(str(i-1)*i*1000000)
     f.close()
'''

# Read the files 5 times concatenating the contents
# to one HUGE string
now_1 = datetime.datetime.now()
s = ''
for count in range(5):
     for i in range(1,11):
         print 'Reading: ' + 'Bytes_' + str(i*1000000)
         f = open('Bytes_' + str(i*1000000), 'r')
         s = s + f.read()
         f.close()
         print 'Size of s is', len(s)
print 's[274999999] = ' + s[274999999]
now_2 = datetime.datetime.now()
print now_1
print now_2
raw_input('???')
#=======================

The part at the start that is commented out is the part I used to create 
the 10 files. The second part prints the following output (abbreviated):

Reading: Bytes_1000000
Size of s is 1000000
Reading: Bytes_2000000
Size of s is 3000000
Reading: Bytes_3000000
Size of s is 6000000
Reading: Bytes_4000000
Size of s is 10000000
Reading: Bytes_5000000
Size of s is 15000000
Reading: Bytes_6000000
Size of s is 21000000
Reading: Bytes_7000000
Size of s is 28000000
Reading: Bytes_8000000
Size of s is 36000000
Reading: Bytes_9000000
Size of s is 45000000
Reading: Bytes_10000000
Size of s is 55000000
    <snip>
Reading: Bytes_9000000
Size of s is 265000000
Reading: Bytes_10000000
Size of s is 275000000
s[274999999] = 9
2007-02-22 20:23:09.984000
2007-02-22 20:23:21.515000

As can be seen creating a string of 275 MB reading the parts from the 
files took less than 12 seconds. I think this is fast enough, but others 
might disagree! ;)

Using the Win Task Manager I can see the process to grow to a little 
less than 282 MB when it reaches the raw_input call and to drop to less 
than 13 MB a little after I've given some input apparently as a result 
of PyScripter doing a GC.

Your situation (hardware, file sizes etc.) may differ so that my 
experiment does not correspond it, but this was my 2 cents worth!

HTH,
Jussi



More information about the Python-list mailing list