number of different lines in a file
Ben Finney
bignose+hates-spam at benfinney.id.au
Thu May 18 18:19:26 EDT 2006
"r.e.s." <r.s at ZZmindspring.com> writes:
> I have a million-line text file with 100 characters per line,
> and simply need to determine how many of the lines are distinct.
I'd generalise it by allowing the caller to pass any iterable set of
items. A file handle can be iterated this way, but so can any
sequence or iterable.
def count_distinct(seq):
""" Count the number of distinct items """
counts = dict()
for item in seq:
if not item in counts:
counts[item] = 0
counts[item] += 1
return len(counts)
>>> infile = file('foo.txt')
>>> for line in file('foo.txt'):
... print line,
...
abc
def
ghi
abc
ghi
def
xyz
abc
abc
def
>>> infile = file('foo.txt')
>>> print count_distinct(infile)
5
--
\ "A man may be a fool and not know it -- but not if he is |
`\ married." -- Henry L. Mencken |
_o__) |
Ben Finney
More information about the Python-list
mailing list