python and very large data sets???
Neal Norwitz
neal at metaslash.com
Wed Apr 24 13:38:29 EDT 2002
Learning C is probably not a good way to go given your situation.
Python is questionable. Depends a lot on what you really need to do.
To give you an idea of Python's speed, I was able to write out 600MB
of data in 450 seconds. I could read the data, modify it, and write
it back out in 700 seconds.
By box is 650 Mhz Athlon, 256 MB RAM:
time to write input file: 450.9 seconds
time to modify input file: 702.2 seconds
first line of output file: A B C
file sizes (input/output): 600000000/600000000
The program I used is below.
Neal
--
import os
import time
BIG_MOMMA = 100*1000*1000
IFILENAME = '/home/neal/build/bigfile'
OFILENAME = IFILENAME + '.out'
def create_file(filename, count):
f = open(filename, 'a')
for i in xrange(count):
f.write('a b c\n')
f.close()
def modify_file(ifilename, ofilename):
ifile = open(ifilename)
ofile = open(ofilename, 'w+')
for x in ifile:
ofile.write(x.upper())
ifile.close()
ofile.close()
def main():
start = time.time()
create_file(IFILENAME, BIG_MOMMA)
print 'time to write input file: %.1f seconds' % (time.time() - start)
start = time.time()
modify_file(IFILENAME, OFILENAME)
print 'time to modify input file: %.1f seconds' % (time.time() - start)
f = open(OFILENAME)
print 'first line of output file:', f.readline(),
f.close()
print 'file sizes (input/output): %d/%d' % \
(os.stat(IFILENAME)[6], os.stat(OFILENAME)[6])
os.unlink(IFILENAME)
os.unlink(OFILENAME)
main()
More information about the Python-list
mailing list