Efficient grep using Python?

Wed Dec 15 23:19:45 EST 2004

[Fredrik Lundh]
>>> bdict = dict.fromkeys(open(bfile).readlines())
>>>
>>> for line in open(afile):
>>>    if line not in bdict:
>>>        print line,
>>>
>>> </F>

[Tim Peters]
>> Note that an open file is an iterable object, yielding the lines in
>> the file.  The "for" loop exploited that above, but fromkeys() can
>> also exploit it.  That is,
>>
>> bdict = dict.fromkeys(open(bfile))
>>
>> is good enough (there's no need for the .readlines()).

[/F] 
> (sigh.  my brain knows that, but my fingers keep forgetting)
> 
> and yes, for this purpose, "dict.fromkeys" can be replaced
> with "set".
>
>    bdict = set(open(bfile))
>
> (and then you can save a few more bytes by renaming the
> variable...)

[Tim Peters]
> Except the latter two are just shallow spelling changes.  Switching
> from fromkeys(open(f).readlines()) to fromkeys(open(f)) is much more
> interesting, since it can allow major reduction in memory use.  Even
> if all the lines in the file are pairwise distinct, not materializing
> them into a giant list can be a significant win.  I wouldn't have
> bothered replying if the only point were that you can save a couple
> bytes of typing <wink>.

fromkeys(open(f).readlines()) and fromkeys(open(f)) seem to be
equivalent.

When I pass an iterator instance(or a generator iterator) to the
dict.fromkeys, it is expanded at that moment, thus fromkeys(open(f))
is effectively same with fromkeys(list(open(f))) and
fromkeys(open(f).readlines()).

Am I missing something?

Jane