testing a sequence for 'identicalness'
Mark McEahern
marklists at mceahern.com
Thu Jul 4 13:20:53 EDT 2002
[Rajarshi Guha]
> Thanks! Did'nt know about this module.
> OK - what if I allow numbers like 1.1 and 1 to be equal (that is numbers
> which are close are also consdiered to be identical)
>
> I'm not really clear on how to set this up in difflib - maybe the bagging
> idea is a better choice for this problem?
I didn't know about difflib either. Thanks Emile!
Before I read Emile's response, I began working on this very simplistic
attempt to solve the problem. For XP fans, note that I wrote the unit test
BEFORE writing the code. ;-)
You'll note that compare_items() -- I'm not sure that's the best name --
utilizes a make_bag function (modified slightly from something Alex Martelli
posted in reference to a different problem).
def compare_items(seq1, seq2):
"""compare_items(seq1, seq2) --> float
Return the ratio of items in the smaller sequence found in the larger
sequence.
"""
larger, smaller = (len(seq1) > len(seq2)) and (seq1, seq2) or (seq2,
seq1)
def make_bag(l):
dict_l = {}
for x in l:
if hasattr(x, "lower"):
x = x.lower()
# This takes care of duplicates.
dict_l[x] = 1 + dict_l.get(x, 0)
return dict_l
s = make_bag(smaller)
l = make_bag(larger)
matches = 0
for k, v in s.items():
if k in l:
l_v = l[k]
# Accumulate the count of items matched, which is the
# smaller of the two.
matches += (v > l_v) and l_v or v
# Should I divide by len(larger) or len(smaller)? Or perhaps just
# return matches?
return matches / float(len(larger))
import unittest
class test(unittest.TestCase):
def test(self):
seq1 = range(10)
seq2 = range(10)
# Reverse seq2 to make sure our algorithm isn't dependent
# on order--of course, perhaps it should be?
seq2.reverse()
v = compare_items(seq1, seq2)
expected = 1.0
self.assertEquals(v, expected)
seq3 = range(9)
v = compare_items(seq1, seq3)
expected = .9
self.assertEquals(v, expected)
seq4 = [1, 0, 100, 'a', 'b', 'c']
v = compare_items(seq1, seq4)
expected = .2
self.assertEquals(v, expected)
if __name__ == "__main__":
unittest.main()
-
More information about the Python-list
mailing list