[Tutor] Merging table-like files with overlapping values in one column

Kat think_fishbone at yahoo.com
Thu Aug 21 11:56:49 CEST 2008


Hi all,

I'm new to Python and trying to come up with an elegant way of tackling the following problem. Sorry for the lengthy description:

I have several input files where in each file, every line has a space-separated pair values. The files are essentially tables with two columns. There are no duplicates in the first column values within each file, but they overlap when all files are considered. I'd like to merge them into one file according to values of the first column of each file with values from the second column of all files combined like this:

First file:
bar 100
foo 90
yadda 22

Second file:
bar 78
yadda 120
ziggy 99

Combined file:
bar 100 78
foo 90 NONE
yadda 22 120
ziggy NONE 99

I'm considering several approaches. In the first brute force way, I can read in each file, parse it into lines, parse lines into words, and write the values from the second word to a new output file along with the first word. That seems awful. My second idea is to convert each file into a dictionary (since the first column's values are unique within each file), then I can create a combined dictionary which allows multiple values to each key, then output that. Does that sound reasonable? Is there another approach? I'm not asking for implementation of course, just ideas for the design.

Thanks in advance.

Kat


      



More information about the Tutor mailing list