[portland] Reformatting Data Files

Ethan Furman ethan at stoneleaf.us
Sat Jan 29 01:07:22 CET 2011


Rich Shepard wrote:
>   Yes, this is correct. Instead of the organization above, I want each row
> to be a complete record of location, date, parameter and value. Somewhat
> analogous to transposing rows and columns in the spreadsheet, but that's 
> not
> a solution.
> 
>   In the re-written file, there will be one row for each unique set of
> location, date, parameter, and value. For this file, there are 2 locations,
> 22 dates, and 32 parameters (each with 1 value for that parameter on that
> date at that location) or 1408 distinct rows.

Wow.

Here's an attempt -- not sure how general it will be, but hopefully it 
will help.

====== input =============
:'JCM-10B':'JCM-10B':'JCM-10B':'JCM-20B':'JCM-20B':'JCM-20B'
:2004-08-16:2005-06-07::2006-06-12::2009-12-09
'Depth to Water':76.75:77.51:82.15:73.17:72.66:
'Clarity':6.35:9.75:4.75:3.26:4.07:7.69
'Salinity':0.38:0.69:0.16:0.79::0.05
'Temperature':41.55:34.87:42.73:40.80:45.14:
==========================

------ output ------------
::JCM-10B:2004-08-16:Depth to Water:76.75::
::JCM-10B:2004-08-16:Clarity:6.35::
::JCM-10B:2004-08-16:Salinity:0.38::
::JCM-10B:2004-08-16:Temperature:41.55::
::JCM-10B:2005-06-07:Depth to Water:77.51::
::JCM-10B:2005-06-07:Clarity:9.75::
::JCM-10B:2005-06-07:Salinity:0.69::
::JCM-10B:2005-06-07:Temperature:34.87::
::JCM-10B::Depth to Water:82.15::
::JCM-10B::Clarity:4.75::
::JCM-10B::Salinity:0.16::
::JCM-10B::Temperature:42.73::
::JCM-20B:2006-06-12:Depth to Water:73.17::
::JCM-20B:2006-06-12:Clarity:3.26::
::JCM-20B:2006-06-12:Salinity:0.79::
::JCM-20B:2006-06-12:Temperature:40.80::
::JCM-20B::Depth to Water:72.66::
::JCM-20B::Clarity:4.07::
::JCM-20B::Salinity:::
::JCM-20B::Temperature:45.14::
::JCM-20B:2009-12-09:Depth to Water:::
::JCM-20B:2009-12-09:Clarity:7.69::
::JCM-20B:2009-12-09:Salinity:0.05::
::JCM-20B:2009-12-09:Temperature:::
----------------------------

~~~~~~~~~~~~~~~~~~~~~ code ~~~~~~~~~~~~~~~~~~~~~~~~~~
#!/usr/bin/env python
"""
This file converts the exported wq excel files to
have each row contain the location, date, parameter,
and value.
"""

import csv

reader = csv.reader(open("10b20b.txt","rb"),
                     delimiter=":",
                     quotechar="'")
output = csv.writer(open("out.txt","wb"),
                     delimiter=":",
                     quotechar="'")

# records are in columnar, not row, format

# read in all data into format
# [fieldname, [value1, value2, value3, ..., valueN]]
all_data = list()
for row in reader:
     row = list(row)
     # keep fieldname separate from values
     all_data.append((row[0], row[1:]))

# if fieldname is empty, assume header row
# once a fieldname/value row is found, stop
# looking for anymore header rows
header_rows = list()
for i, fieldname_row in enumerate(all_data):
     fieldname, row = fieldname_row
     if not fieldname:
         header_rows.append(row)
     else:
         break
# assume all remaining rows are the fields and values
# i will be the index of the first row with a fieldname
first_field_row = i

# assemble headers
for j, _ in enumerate(header_rows[0]):
     headers = list()
     for row in header_rows:
         headers.append(row[j])
     # now step through field:value rows
     for fieldname, row in all_data[first_field_row:]:
         output.writerow([None, None]
                         + headers
                         + [fieldname, row[j], None, None])
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Hope this helps!

~Ethan~


More information about the Portland mailing list