[CentralOH] Tab delimited data in Python

Bryan harrisbw at notes.udayton.edu
Fri Nov 21 15:10:53 CET 2008


Hi all,

I have written some code to do data reduction on high-strain-rate
tensile test data.   The program goes through and does simple things
like plotting the raw data and measuring the stroke rate.  (Believe me
it really is simple.) 

I need a simple way to work with tab-delimited data (TDD).  I've written
my own code, but it works HORRIBLY slow.  It takes something like 20
seconds to append a column to a 6000 line data file.  But it does
_work_.  Is there a good library for handling TDD?  It has to work on
files with different column lengths.  Several python commands for
splitting strings merge delimiters and this breaks TDD files with
unequal column lengths.

Now I several, possibly competing goals:  
- It has to work fairly quickly 
- It has to work on very large files, too large for excel.)  I know
that's not really that large.)
- it must work on both linux and windows. (I thought this was a given
with python but I learned there are libraries available only for one or
the other.  For instance there are windows-only excel libraries.)
  
Here's the slow code (It's ugly I know.  I'm and Mechanical
Engineer...):

  def append_column(self,column_data,heading):
    f = open(self.textfile, 'rU')
    temp=tempfile.mktemp()
    g = open(temp, 'w')
    index=0
    header=""
    for label in self.column_labels:
      header += '\t'
      header += label
    #strip the first tab
      header = header[1:]+'\t'+heading.strip()+'\n'
      #f.readline()
      g.write(header)
      try:
        for line in f:
          if index!=0 :
            try:
              line=line[:-1]+'\t'+str(column_data[index-1])+'\n'
            except(IndexError):
              line=line[:-1]+'\t''\n'
            #if index < 25 : print line,
              g.write(line)
          index += 1
      finally:
        f.close()
        g.close()
      shutil.move(temp,self.textfile)
      self.number_of_columns+=1
      self.traces=self.get_traces()
      self.column_lengths=self.get_column_lengths()

As I said, this takes something like 20 seconds to append a normal
column of data.  I'd rather use a library for handling this sort of
thing than write my own, but I wouldn't mind knowing how you guys would
tighten up this code.  I think the try-finally's are slowing this down,
but I'm not sure.  

Thanks,
Bryan


-- 
Bryan Harris
Research Engineer
Structures and Materials Evaluation Group
harrisbw at notes.udayton.edu
http://www.udri.udayton.edu/
(937) 229-5561



More information about the CentralOH mailing list