number of different lines in a file

r.e.s. r.s at ZZmindspring.com
Thu May 18 17:51:30 EDT 2006


I have a million-line text file with 100 characters per line,
and simply need to determine how many of the lines are distinct.

On my PC, this little program just goes to never-never land:

def number_distinct(fn):
    f = file(fn)
    x = f.readline().strip()
    L = []
    while x<>'':
        if x not in L:
            L = L + [x]
        x = f.readline().strip()
    return len(L) 

Would anyone care to point out improvements? 
Is there a better algorithm for doing this?



More information about the Python-list mailing list