Efficient grep using Python?

Tim Peters tim.peters at gmail.com
Wed Dec 15 23:54:51 EST 2004


[Jane Austine]
> fromkeys(open(f).readlines()) and fromkeys(open(f)) seem to be
> equivalent.

Semantically, yes; pragmatically, no, in the way explained before.

> When I pass an iterator instance(or a generator iterator) to the
> dict.fromkeys, it is expanded at that moment,

I don't know what "expanded at that moment" means to you.  The CPython
implementation of dict.fromkeys() alternates between getting the next
vaule from its iterable argument, and storing that value as a dict
key.  It does that regardless of whether a list, or any other kind of
iterable object, is passed to it.  So the difference isn't in
fromkeys(), it's in what's passed to fromkeys().

> thus fromkeys(open(f)) is effectively same with
> fromkeys(list(open(f))) and fromkeys(open(f).readlines()).

Semantically, yes; and the last two are pragmatically the same too. 
The first is pragmatically different.

> Am I missing something?

You at least were <wink>.

Build a file containing a million long identical lines (so the dict
only has 1 entry in the end).  Try all 3 spellings and watch their
memory use.  Report what you find.



More information about the Python-list mailing list