Efficient grep using Python?

Wed Dec 15 11:21:44 EST 2004

["sf" <sf at sf.sf>]
>> I have files A, and B each containing say 100,000 lines (each
>> line=one string without any space)
>>
>> I want to do
>>
>> "  A  - (A intersection B)  "
>>
>> Essentially, want to do efficient grep, i..e from A remove those
>> lines which are also present in file B.

[Fredrik Lundh]
> that's an unusual definition of "grep", but the following seems to
> do what you want:
>
> afile = "a.txt"
> bfile = "b.txt"
>
> bdict = dict.fromkeys(open(bfile).readlines())
>
> for line in open(afile):
>    if line not in bdict:
>        print line,
> 
> </F> 

Note that an open file is an iterable object, yielding the lines in
the file.  The "for" loop exploited that above, but fromkeys() can
also exploit it.  That is,

bdict = dict.fromkeys(open(bfile))

is good enough (there's no need for the .readlines()).