Efficient grep using Python?
Jane Austine
janeaustine50 at hotmail.com
Wed Dec 15 23:19:45 EST 2004
[Fredrik Lundh]
>>> bdict = dict.fromkeys(open(bfile).readlines())
>>>
>>> for line in open(afile):
>>> if line not in bdict:
>>> print line,
>>>
>>> </F>
[Tim Peters]
>> Note that an open file is an iterable object, yielding the lines in
>> the file. The "for" loop exploited that above, but fromkeys() can
>> also exploit it. That is,
>>
>> bdict = dict.fromkeys(open(bfile))
>>
>> is good enough (there's no need for the .readlines()).
[/F]
> (sigh. my brain knows that, but my fingers keep forgetting)
>
> and yes, for this purpose, "dict.fromkeys" can be replaced
> with "set".
>
> bdict = set(open(bfile))
>
> (and then you can save a few more bytes by renaming the
> variable...)
[Tim Peters]
> Except the latter two are just shallow spelling changes. Switching
> from fromkeys(open(f).readlines()) to fromkeys(open(f)) is much more
> interesting, since it can allow major reduction in memory use. Even
> if all the lines in the file are pairwise distinct, not materializing
> them into a giant list can be a significant win. I wouldn't have
> bothered replying if the only point were that you can save a couple
> bytes of typing <wink>.
fromkeys(open(f).readlines()) and fromkeys(open(f)) seem to be
equivalent.
When I pass an iterator instance(or a generator iterator) to the
dict.fromkeys, it is expanded at that moment, thus fromkeys(open(f))
is effectively same with fromkeys(list(open(f))) and
fromkeys(open(f).readlines()).
Am I missing something?
Jane
More information about the Python-list
mailing list