finding data from two different files.

Steven D'Aprano steve+comp.lang.python at pearwood.info
Thu Oct 17 22:24:19 EDT 2013


On Fri, 18 Oct 2013 07:18:49 +0530, torque.india at gmail.com wrote:

> Hi all,
> 
> I am new to python, just was looking for logic to understand to write
> code in the below scenario.
> 
> I am having a file (filea) with multiple columns, and another
> file(fileb) with again multiple columns, but say i want to use column2
> of fileb as a search expression to search for similar value in column3
> of filea. and print it with value of rows of filea.
> 
> filea:
> a 1 ab
> b 2 bc
> d 3 de
> e 4 ef
> .
> .
> .
> 
> fileb
> z ab 24
> y bc 85
> x ef 123
> w de 33


Can you explain your problem a little better? You've shown some example 
data, which is great, but what are we supposed to do with it? Given the 
data shown above, what result would you expect to get?

My guess is that you want to do something like this:

* walk through fileB, extract each line in turn
* extract the second column
* then search fileA for lines where column 3 matches
* then... I don't know, maybe print the match?


Repeatedly walking through fileA will be slow. So it is better to do this 
only once, ahead of time. I suggest that you probably want to use the csv 
module to read the data, but because I'm lazy, I'm going to do it by hand:

# Prepare fileA for later searches
data = {}  # use a dict to map column 3 to the rest of the data
with open("fileA") as f:
    for line in f:
        fields = line.split()  # split on whitespace
        col3 = fields[2]  # remember fields are numbered from 0, not 1
        data[col3] = line


The above assumes that each item in column 3 is unique. If it isn't, 
you'll need a different strategy.

Now on to the second part:


with open("fileB") as f:
    for line in f:
        col2 = line.split()[1]
        # This next line assumes you're using Python2
        print col2, data.get(col2, '***no match***')


Does this help?


-- 
Steven



More information about the Python-list mailing list