[Tutor] mergin two csv files based on a common join

Kent Johnson kent37 at tds.net
Sat Nov 1 00:05:54 CET 2008


On Fri, Oct 31, 2008 at 6:04 PM, qsqgeekyogdty at tiscali.co.uk
<qsqgeekyogdty at tiscali.co.uk> wrote:
> Hello again,
> Thanks for the replies on my previous post, but I have a different
> problem now and don't see how to deal with it in a smooth way.
>
> I have two csv files where:
>
> 1.csv
>
> "1", "text", "aa"
> "2", "text2", "something else"
> "3", "text3", "something else"
>
> 2.csv
>
> "text", "xx"
> "text", "yy"

This line doesn't appear in the output, why not?

> "text3", "zz"
>
> now I would like to have an output like:
>
> "1", "text", "aa"
> "1", "text", "xx"
> "2", "text2", "something else"
> "3", "text3", "something else"
> "3", "text3", "zz"
>
> I basically need to merge the two csv files based on the column-2

Assuming that at least one file does not repeat values in the key field:
Read one of the csv files and create a dict whose keys are the common
field and values are the entire line containing the field.
Read the other csv file. Look up the key field in the dict to get the
values from the other file.
Output as appropriate.

If the keys repeat in both files, make a dict whose values are a list
of all lines containing the key. collections.defaultdict(list) can
help with this.

Kent


More information about the Tutor mailing list