Efficient grep using Python?
Tim Peters
tim.peters at gmail.com
Wed Dec 15 11:21:44 EST 2004
["sf" <sf at sf.sf>]
>> I have files A, and B each containing say 100,000 lines (each
>> line=one string without any space)
>>
>> I want to do
>>
>> " A - (A intersection B) "
>>
>> Essentially, want to do efficient grep, i..e from A remove those
>> lines which are also present in file B.
[Fredrik Lundh]
> that's an unusual definition of "grep", but the following seems to
> do what you want:
>
> afile = "a.txt"
> bfile = "b.txt"
>
> bdict = dict.fromkeys(open(bfile).readlines())
>
> for line in open(afile):
> if line not in bdict:
> print line,
>
> </F>
Note that an open file is an iterable object, yielding the lines in
the file. The "for" loop exploited that above, but fromkeys() can
also exploit it. That is,
bdict = dict.fromkeys(open(bfile))
is good enough (there's no need for the .readlines()).
More information about the Python-list
mailing list