python and very large data sets???

Neal Norwitz neal at metaslash.com
Wed Apr 24 13:38:29 EDT 2002


Learning C is probably not a good way to go given your situation.
Python is questionable.  Depends a lot on what you really need to do.

To give you an idea of Python's speed, I was able to write out 600MB
of data in 450 seconds.  I could read the data, modify it, and write
it back out in 700 seconds.

By box is 650 Mhz Athlon, 256 MB RAM:

time to write  input file: 450.9 seconds
time to modify input file: 702.2 seconds
first line of output file: A B C
file sizes (input/output): 600000000/600000000

The program I used is below.

Neal
--
import os
import time

BIG_MOMMA = 100*1000*1000
IFILENAME = '/home/neal/build/bigfile'
OFILENAME = IFILENAME + '.out'

def create_file(filename, count):
    f = open(filename, 'a')
    for i in xrange(count):
        f.write('a b c\n')
    f.close()

def modify_file(ifilename, ofilename):
    ifile = open(ifilename)
    ofile = open(ofilename, 'w+')
    for x in ifile:
        ofile.write(x.upper())
    ifile.close()
    ofile.close()

def main():
    start = time.time()
    create_file(IFILENAME, BIG_MOMMA)
    print 'time to write  input file: %.1f seconds' % (time.time() - start)

    start = time.time()
    modify_file(IFILENAME, OFILENAME)
    print 'time to modify input file: %.1f seconds' % (time.time() - start)

    f = open(OFILENAME)
    print 'first line of output file:', f.readline(),
    f.close()

    print 'file sizes (input/output): %d/%d' % \
                    (os.stat(IFILENAME)[6], os.stat(OFILENAME)[6])

    os.unlink(IFILENAME)
    os.unlink(OFILENAME)

main()



More information about the Python-list mailing list