number of different lines in a file

Fredrik Lundh fredrik at pythonware.com
Fri May 19 02:07:41 EDT 2006


r.e.s. wrote:

> BTW, the first thing I tried was Fredrik Lundh's program:
> 
> def number_distinct(fn):
>      return len(set(s.strip() for s in open(fn)))
> 
> which worked without the square brackets. Interesting that 
> omitting them doesn't seem to matter.

a for loop inside square brackets is a "list comprehension", and the
result is a list.  if you use a list comprehension inside a function 
call, the full list is built *before* the function is called.  in this 
case, this would mean that the entire file would be read into memory 
before the set was constructed.

if you change the square brackets to ordinary parentheses, you get a 
generator expression instead:

     http://pyref.infogami.com/generator-expressions

the generator expression results in an iterator object that calculates 
the values one by one.  if you pass it to a function that expects an 
iterator, that function will end up "running the for loop itself", and 
no extra storage is needed.  (in this case, you still need memory to 
hold the set, of course, so the difference between a list comprehension 
and a generator expression will only matter if you have lots of duplicates).

finally, a syntax shortcut lets you remove the parentheses if the 
generator expression is the only argument in a function call, as in the 
  above example.

</F>




More information about the Python-list mailing list