Query regarding set([])?

Peter Otten __peter__ at web.de
Fri Jul 10 08:04:35 EDT 2009


vox wrote:

> I'm contsructing a simple compare-script and thought I would use set
> ([]) to generate the difference output. But I'm obviosly doing
> something wrong.
> 
> file1 contains 410 rows.
> file2 contains 386 rows.
> I want to know what rows are in file1 but not in file2.
> 
> This is my script:
> s1 = set(open("file1"))
> s2 = set(open("file2"))

Remove the following three lines:

> s3 = set([])
> s1temp = set([])
> s2temp = set([])

 
> s1temp = set(i.strip() for i in s1)
> s2temp = set(i.strip() for i in s2)
> s3 = s1temp-s2temp
> 
> print len(s3)
> 
> Output is 119. AFAIK 410-386=24. What am I doing wrong here?

You are probably misinterpreting len(s3). s3 contains lines occuring in 
"file1" but not in "file2". Duplicate lines are only counted once, and the 
order doesn't matter. 

So there are 119 lines that occur at least once in "file2", but not in 
"file1".

If that is not what you want you have to tell us what exactly you are 
looking for.

Peter




More information about the Python-list mailing list