Possible to set cpython heap size?
Jussi Salmela
tiedon_jano at hotmail.com
Thu Feb 22 13:53:42 EST 2007
Andy Watson kirjoitti:
> I have an application that scans and processes a bunch of text files.
> The content I'm pulling out and holding in memory is at least 200MB.
>
> I'd love to be able to tell the CPython virtual machine that I need a
> heap of, say 300MB up front rather than have it grow as needed. I've
> had a scan through the archives of comp.lang.python and the python
> docs but cannot find a way to do this. Is this possible to configure
> the PVM this way?
>
> Much appreciated,
> Andy
> --
>
Others have already suggested swap as a possible cause of slowness. I've
been playing with my portable (dual Intel T2300 @ 1.66 GHz; 1 GB of mem
; Win XP ; Python Scripter IDE)
using the following code:
#=======================
import datetime
'''
# Create 10 files with sizes 1MB, ..., 10MB
for i in range(1,11):
print 'Writing: ' + 'Bytes_' + str(i*1000000)
f = open('Bytes_' + str(i*1000000), 'w')
f.write(str(i-1)*i*1000000)
f.close()
'''
# Read the files 5 times concatenating the contents
# to one HUGE string
now_1 = datetime.datetime.now()
s = ''
for count in range(5):
for i in range(1,11):
print 'Reading: ' + 'Bytes_' + str(i*1000000)
f = open('Bytes_' + str(i*1000000), 'r')
s = s + f.read()
f.close()
print 'Size of s is', len(s)
print 's[274999999] = ' + s[274999999]
now_2 = datetime.datetime.now()
print now_1
print now_2
raw_input('???')
#=======================
The part at the start that is commented out is the part I used to create
the 10 files. The second part prints the following output (abbreviated):
Reading: Bytes_1000000
Size of s is 1000000
Reading: Bytes_2000000
Size of s is 3000000
Reading: Bytes_3000000
Size of s is 6000000
Reading: Bytes_4000000
Size of s is 10000000
Reading: Bytes_5000000
Size of s is 15000000
Reading: Bytes_6000000
Size of s is 21000000
Reading: Bytes_7000000
Size of s is 28000000
Reading: Bytes_8000000
Size of s is 36000000
Reading: Bytes_9000000
Size of s is 45000000
Reading: Bytes_10000000
Size of s is 55000000
<snip>
Reading: Bytes_9000000
Size of s is 265000000
Reading: Bytes_10000000
Size of s is 275000000
s[274999999] = 9
2007-02-22 20:23:09.984000
2007-02-22 20:23:21.515000
As can be seen creating a string of 275 MB reading the parts from the
files took less than 12 seconds. I think this is fast enough, but others
might disagree! ;)
Using the Win Task Manager I can see the process to grow to a little
less than 282 MB when it reaches the raw_input call and to drop to less
than 13 MB a little after I've given some input apparently as a result
of PyScripter doing a GC.
Your situation (hardware, file sizes etc.) may differ so that my
experiment does not correspond it, but this was my 2 cents worth!
HTH,
Jussi
More information about the Python-list
mailing list