number of different lines in a file
Fredrik Lundh
fredrik at pythonware.com
Thu May 18 18:23:31 EDT 2006
r.e.s. wrote:
> I have a million-line text file with 100 characters per line,
> and simply need to determine how many of the lines are distinct.
>
> On my PC, this little program just goes to never-never land:
>
> def number_distinct(fn):
> f = file(fn)
> x = f.readline().strip()
> L = []
> while x<>'':
> if x not in L:
> L = L + [x]
> x = f.readline().strip()
> return len(L)
ouch.
> Would anyone care to point out improvements?
> Is there a better algorithm for doing this?
try this:
def number_distinct(fn):
return len(set(s.strip() for s in open(fn)))
</F>
More information about the Python-list
mailing list