Help me with the script? How to find items in csv file A and not in file B and vice versa
alonnirs at gmail.com
alonnirs at gmail.com
Tue Jun 18 05:01:49 EDT 2013
Hi Peter,
First off - many (many!) thanks.
There's some error I don't understand.
Here's the amended script I used:
import csv
#open CSV's and read first column with product IDs into variables pointing to lists
with open("Afile.csv", "rb") as f:
a = {row[0] for row in csv.reader(f)}
with open("Bfile.csv", "rb") as g:
b = {row[0] for row in csv.reader(g)}
#create variables pointing to lists with unique product IDs in A and B respectively
in_a_not_b = a-b
in_b_not_a = b-a
print in_a_not_b
print in_b_not_a
with open("inAnotB.csv", "wb") as f:
writer = csv.writer(f)
writer.writerows([item] for item in_a_not_b)
with open("inAnotB.csv", "wb") as g:
writer = csv.writer(g)
writer.writerows([item] for item in_b_not_a)
print "done!"
and when I run it I get an invalid syntex error and (as a true newbie I used a GUI)in_a_not_b is highlighted in the
with open("inAnotB.csv", "wb") as f:
writer = csv.writer(f)
writer.writerows([item] for item in_a_not_b)
part.
Could you please point our what I'm doing wrong?
Thanks again :)
On Tuesday, June 18, 2013 11:39:41 AM UTC+3, Peter Otten wrote:
> Alan Newbie wrote:
>
>
>
> > Hello,
>
> > Let's say I want to compare two csv files: file A and file B. They are
>
> > both similarly built - the first column has product IDs (one product per
>
> > row) and the columns provide some stats about the products such as sales
>
> > in # and $.
>
> >
>
> > I want to compare these files - see which product IDs appear in the first
>
> > column of file A and not in B, and which in B and not A. Finally, it would
>
> > be very great if the result could be written into two new CSV files - one
>
> > product ID per row in the first column. (no other data in the other
>
> > columns needed)
>
> >
>
> > This is the script I tried:
>
> > ==========================
>
> >
>
> > import csv
>
> >
>
> > #open CSV's and read first column with product IDs into variables pointing
>
> > #to lists
>
> > A = [line.split(',')[0] for line in open('Afile.csv')]
>
> > B = [line.split(',')[0] for line in open('Bfile.csv')]
>
> >
>
> > #create variables pointing to lists with unique product IDs in A and B
>
> > #respectively
>
> > inAnotB = list(set(A)-set(B))
>
> > inBnotA = list(set(B)-set(A))
>
> >
>
> > print inAnotB
>
> > print inBnotA
>
> >
>
> > c = csv.writer(open("inAnotB.csv", "wb"))
>
> > c.writerow([inAnotB])
>
> >
>
> >
>
> > d = csv.writer(open("inBnotA.csv", "wb"))
>
> > d.writerow([inBnotA])
>
> >
>
> > print "done!"
>
> >
>
> > =====================================================
>
> >
>
> > But it doesn't produce the required results.
>
> > It prints IDs in this format:
>
> > 247158132\n
>
>
>
> Python reads lines from a file with the trailing newline included, and
>
> line.split(",") with only one column (i. e. no comma) keeps the whole line.
>
> As you already know about the csv module you should use it to read your
>
> data, e. g. instead of
>
>
>
> > A = [line.split(',')[0] for line in open('Afile.csv')]
>
>
>
> try
>
>
>
> with open("Afile.csv", "rb") as f:
>
> a = {row[0] for row in csv.reader(f)}
>
> ...
>
>
>
> I used {...} instead of [...], so a is already a set and you can proceed:
>
>
>
>
>
> in_a_not_b = a - b
>
>
>
> Finally as a shortcut for
>
>
>
> for item in in_a_not_b:
>
> writer.writerow([item])
>
>
>
> use the writerows() method to write your data:
>
>
>
> with open("inAnotB.csv", "wb") as f:
>
> writer = csv.writer(f)
>
> writer.writerows([item] for item in_a_not_b)
>
>
>
> Note that I'm wrapping every item in the set rather than the complete set as
>
> a whole. If you wanted to be clever you could spell that even more succinct
>
> as
>
>
>
> writer.writerows(zip(in_a_not_b))
>
>
>
> > and nothing to the csv files.
>
> >
>
> > You could probably tell I'm a newbie.
>
> > Could you help me out?
>
> >
>
> > here's some dummy data:
>
> >
>
> https://docs.google.com/file/d/0BwziqsHUZOWRYU15aEFuWm9fajA/edit?usp=sharing
>
> >
>
> >
>
> https://docs.google.com/file/d/0BwziqsHUZOWRQVlTelVveEhsMm8/edit?usp=sharing
>
> >
>
> > Thanks a bunch in advance! :)
More information about the Python-list
mailing list