Data Manipulation - Rows to Columns

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Tue Feb 5 22:52:08 EST 2008


En Wed, 06 Feb 2008 00:54:49 -0200, Tess <testone at gmail.com> escribió:

> I have a text file with marked up data that I need to convert into a
> text tab separated file.
>
> The structure of the input file is listed below (see file 1) and the
> desired output file is below as well (see file 2).
>
> I am a complete novice with python and would appreciate any tips you
> may be able to provide.
>
>
> file 1:
> <item>TABLE</table>
> <color>black</color>
> <color>blue</color>
> <color>red</color>
> <item>CHAIR</table>
> <color>yellow</color>
> <color>black</color>
> <color>red</color>
> <item>TABLE</table>
> <color>white</color>
> <color>gray</color>
> <color>pink</color>

Are you sure it says <item>...</table>?
Are ALWAYS three colors per item, as in your example? If this is the case,  
just read groups of 4 lines and ignore the tags.

> file 2 (tab separated):
> TABLE	black	blue	red
> CHAIR	yellow	black	red
> TABLE	white	gray	pink

The best way to produce this output is using the csv module:  
http://docs.python.org/lib/module-csv.html
So we need a list of rows, being each row a list of column data. A simple  
way of building such structure from the input file would be:

rows = []
row = None
for line in open('file1.txt'):
     line = line.strip() # remove leading and trailing whitespace
     if line.startswith('<item>'):
         if row: rows.append(row)
         j = row.index("</")
         item = row[6:j]
         row = [item]
     elif line.startswith('<color>'):
         j = row.index("</")
         color = row[7:j]
         row.append(color)
     else:
         raise ValueError, "can't understand line: %r" % line
if row: rows.append(row)

This allows for a variable number of "color" lines per item. Once the  
`rows` list is built, we only have to create a csv writer for the right  
dialect ('excel_tab' looks promising) and feed the rows to it:

import csv
fout = open('file2.txt', 'wb')
writer = csv.writer(fout, dialect='excel_tab')
writer.writerows(rows)

That's all folks!

-- 
Gabriel Genellina




More information about the Python-list mailing list