[portland] Data Structure Returned by readline()

Rich Shepard rshepard at appl-ecosys.com
Tue Jul 17 22:06:25 CEST 2012


   Disclaimer: it's been a couple of years since I last did any python coding
so I have forgotten much of what I then knew. Now I need to finish an
application and, before getting back to that, write a short script to
restructure a data file. I've been trying to to work this all out myself but
have not found an answer to the current blockage.

   Context: the data file was saved as a .csv from a LO spreadsheet. The top
row (column headers) is a list of chemical symbols. Subsequent rows have a
site ID, sampling date, and measured concentration for each chemical. I need
to transform this file to one suitable for insertion in a database table.
That is, each row consists of the site ID, sampling date, the symbol of a
single chemical, and the measured quantity associated with it.

   While I do have a basic working knowledge of list comprehension I'm
stumbling on reading the data into the script in a data type that would
allow me to interate over each item in the row of chemical names and each
data row.

   The immediate question is what data type is returned by readline()? The
opening sections of code (for figuring out what I need to do) are:

import sys

infile = open(sys.argv[1], 'r')
output = open("out.txt","w")

# read column headers into a list
col_headers = infile.readline()
print col_headers
# read the first three lines of data
print infile.readline()
print infile.readline()
print infile.readline()

   The output of the above is:

Ag,Al,CO3,HCO3,AlkTot,As,Ba,Be,Bi,Ca,Cd,Cl,Co,Cr,Cu,DO,Fe,Hg,K,Mg,Mn,Mo,Na,NH4,NO3NO2,oil_grease,Pb,pH,Sb,SC,Se,SO4,Sr,TDS,Tl,V,Zn

D-1,2007-12-12,-0.005,0.106,-1.000,231.000,231.000,0.011,0.000,-0.002,0.000,100.000,0.000,1.430,0.000,-0.006,0.024,4.960,4.110,,0.000,9.560,0.035,0.000,0.970,-0.010,0.293,,0.025,7.800,-0.001,630.000,0.001,65.800,0.000,320.000,-0.001,0.000,11.400

D-1,2008-03-15,-0.005,-0.080,-1.000,228.000,228.000,0.001,0.000,-0.002,0.000,88.400,0.000,1.340,0.000,-0.006,0.014,9.910,0.309,0.000,0.000,9.150,0.047,0.000,0.820,0.224,-0.020,,0.025,7.940,-0.001,633.000,0.001,75.400,0.000,300.000,-0.001,0.000,12.400

D-1,2008-06-26,-0.005,0.116,6.700,118.000,124.000,0.010,0.000,-0.002,0.000,63.400,0.000,1.750,0.000,-0.006,0.020,4.320,2.830,0.000,0.000,9.550,0.020,0.000,0.653,-0.010,-0.050,,0.025,8.650,0.001,386.000,-0.001,68.500,0.000,480.000,-0.001,0.000,5.500

   I've tried to index col_headers as I would a list, but 'print
col_headers[0]' yields A rather than Ag.

   I see no indication in the output that these lines are lists, tuples, or
another defined data type. Please provide a clue stick on how I should
access the source file so I can then use for loops or list comprehension to
restructure the file.

TIA,

Rich




More information about the Portland mailing list