Python object overhead?

Fri Mar 23 17:11:35 EDT 2007

I'm trying to use Python to work with large pipe ('|') delimited data
files.  The files range in size from 25 MB to 200 MB.

Since each line corresponds to a record, what I'm trying to do is
create an object from each record.  However, it seems that doing this
causes the memory overhead to go up two or three times.

See the two examples below: running each on the same input file
results in 3x the memory usage for Example 2.  (Memory usage is
checked using top.)

This happens for both Python 2.4.3 on Gentoo Linux (64bit) and Python
2.3.4 on CentOS 4.4 (64bit).

Is this "just the way it is" or am I overlooking something obvious?

Thanks,
Matt

Example 1: read lines into list:
# begin readlines.py
import sys, time
filedata = list()
file = open(sys.argv[1])
while True:
    line = file.readline()
    if len(line) == 0: break # EOF
    filedata.append(line)
file.close()
print "data read; sleeping 20 seconds..."
time.sleep(20) # gives time to check top
# end readlines.py

Example 2: read lines into objects:
# begin readobjects.py
import sys, time
class FileRecord:
    def __init__(self, line):
        self.line = line
records = list()
file = open(sys.argv[1])
while True:
    line = file.readline()
    if len(line) == 0: break # EOF
    rec = FileRecord(line)
    records.append(rec)
file.close()
print "data read; sleeping 20 seconds..."
time.sleep(20) # gives time to check top
# end readobjects.py