number of different lines in a file

Fredrik Lundh fredrik at pythonware.com
Thu May 18 18:23:31 EDT 2006


r.e.s. wrote:

> I have a million-line text file with 100 characters per line,
> and simply need to determine how many of the lines are distinct.
> 
> On my PC, this little program just goes to never-never land:
> 
> def number_distinct(fn):
>     f = file(fn)
>     x = f.readline().strip()
>     L = []
>     while x<>'':
>         if x not in L:
>             L = L + [x]
>         x = f.readline().strip()
>     return len(L) 

ouch.

> Would anyone care to point out improvements? 
> Is there a better algorithm for doing this?

try this:

def number_distinct(fn):
     return len(set(s.strip() for s in open(fn)))

</F>




More information about the Python-list mailing list