[portland] Data Structure Returned by readline()

Ethan Furman ethan at stoneleaf.us
Tue Jul 17 22:29:15 CEST 2012


Rich Shepard wrote:
>   Disclaimer: it's been a couple of years since I last did any python 
> coding
> so I have forgotten much of what I then knew. Now I need to finish an
> application and, before getting back to that, write a short script to
> restructure a data file. I've been trying to to work this all out myself 
> but
> have not found an answer to the current blockage.
> 
>   Context: the data file was saved as a .csv from a LO spreadsheet. The top
> row (column headers) is a list of chemical symbols. Subsequent rows have a
> site ID, sampling date, and measured concentration for each chemical. I 
> need
> to transform this file to one suitable for insertion in a database table.
> That is, each row consists of the site ID, sampling date, the symbol of a
> single chemical, and the measured quantity associated with it.
> 
>   While I do have a basic working knowledge of list comprehension I'm
> stumbling on reading the data into the script in a data type that would
> allow me to interate over each item in the row of chemical names and each
> data row.
> 
>   The immediate question is what data type is returned by readline()? The
> opening sections of code (for figuring out what I need to do) are:
> 
> import sys
> 
> infile = open(sys.argv[1], 'r')
> output = open("out.txt","w")
> 
> # read column headers into a list
> col_headers = infile.readline()
> print col_headers
> # read the first three lines of data
> print infile.readline()
> print infile.readline()
> print infile.readline()
> 
>   The output of the above is:
> 
> Ag,Al,CO3,HCO3,AlkTot,As,Ba,Be,Bi,Ca,Cd,Cl,Co,Cr,Cu,DO,Fe,Hg,K,Mg,Mn,Mo,Na,NH4,NO3NO2,oil_grease,Pb,pH,Sb,SC,Se,SO4,Sr,TDS,Tl,V,Zn 
> 
> 
> D-1,2007-12-12,-0.005,0.106,-1.000,231.000,231.000,0.011,0.000,-0.002,0.000,100.000,0.000,1.430,0.000,-0.006,0.024,4.960,4.110,,0.000,9.560,0.035,0.000,0.970,-0.010,0.293,,0.025,7.800,-0.001,630.000,0.001,65.800,0.000,320.000,-0.001,0.000,11.400 
> 
> 
> D-1,2008-03-15,-0.005,-0.080,-1.000,228.000,228.000,0.001,0.000,-0.002,0.000,88.400,0.000,1.340,0.000,-0.006,0.014,9.910,0.309,0.000,0.000,9.150,0.047,0.000,0.820,0.224,-0.020,,0.025,7.940,-0.001,633.000,0.001,75.400,0.000,300.000,-0.001,0.000,12.400 
> 
> 
> D-1,2008-06-26,-0.005,0.116,6.700,118.000,124.000,0.010,0.000,-0.002,0.000,63.400,0.000,1.750,0.000,-0.006,0.020,4.320,2.830,0.000,0.000,9.550,0.020,0.000,0.653,-0.010,-0.050,,0.025,8.650,0.001,386.000,-0.001,68.500,0.000,480.000,-0.001,0.000,5.500 
> 
> 
>   I've tried to index col_headers as I would a list, but 'print
> col_headers[0]' yields A rather than Ag.
> 
>   I see no indication in the output that these lines are lists, tuples, or
> another defined data type. Please provide a clue stick on how I should
> access the source file so I can then use for loops or list comprehension to
> restructure the file.

When in doubt, don't just 'print', but 'print repr(...)'.

.readline() returns a string, which you can then manipulate yourself. 
For example, if you know there will be no embedded commas in the data 
you can:

col_headers = infile.readline().split(',')  # col_headers is now a list
for line in infile.readlines():
     data = line.split(',')

But the csv module is probably worth looking into.

~Ethan~


More information about the Portland mailing list