Seeking help: reading text file with genfromtxt

Wed Apr 4 05:14:24 EDT 2012

Hi all

I have got a text file which is only 32 MB in size and consists of the
following type of lines (columns are fixed):

==
Header text 1 line
...
01-Jan-2006           0055           145.069
-16.0449           83.2246           84.2835         499.14680
0.074029965
01-Jan-2006           0065           15.069          -1.0449
83.2246           84.2835         499.14680       12.074029965
...
12-Dec-2006           1255           145.069
23.0449           3.2246           4.2835         49.140
0.74029965
...
==

I have 3 questions:

1. Why is my translation (read_slow) of the IDL code so damn slow
(IDL: 13 sec, Python:2min16sec). Although both IDL and Python consume
about 40 MB.

2. Why is my faster version (read_fast) (13sec) so memory hungry (it
takes 200MB)?
2.1 Why is my second fastest version (read_second_fast) (16sec) still
memory hungry?

3. What do I need to do to get the speed of IDL and the memory
footprint of IDL (in that case 40MB)?

#convdate converts the date in the first column (e.g. 12-Dec-2006)
into day of year
#convtime does something else
==
import fileinput
import numpy as np
import datetime
import time
from StringIO import StringIO

def read_slow(file):

        count=max(enumerate(open(file)))[0]

        erg=np.zeros((count,10),dtype=np.float64)

        convdate= lambda x: time.strptime(x,"%d-%b-%Y").tm_yday
        convtime= lambda x: np.int(np.float64(x)*1.0e-1)

        i=0
        with open(file) as infile:
            #read first header line
            infile.readline()
            for line in infile:
                tmp=np.genfromtxt(StringIO(line),\
                                           dtype=np.float64,\
                                           converters={0:convdate,
1:convtime})
               #not sure if it does the right thing here:
               erg[i,:]=tmp
               i=i+1
            infile.close()
            return erg

==
def read_fast(file):

        convdate= lambda x: time.strptime(x,"%d-%b-%Y").tm_yday
        convtime= lambda x: np.int(np.float64(x)*1.0e-1)

        with open(file) as infile:
            erg=np.genfromtxt(infile, autostrip=True,skip_header=1,\
                                  dtype=np.float64,\
                                  converters={0:convdate,1:convtime})
            infile.close()
            return erg
==

==
def read_second_fast(file):

        convdate= lambda x: time.strptime(x,"%d-%b-%Y").tm_yday
        convtime= lambda x: np.int(np.float64(x)*1.0e-1)

        erg=np.loadtxt(file,skiprows=1,\
                           dtype=np.float64,\
                           converters={0:convdate,1:convtime})
        return erg
==

Thanks for all the help.

By the way: I colleague told me my code is 1. poorly written and more
or less unreadable and unmaintainable because of the use of lambda. I
am just learning but is his observation true?