testing a sequence for 'identicalness'

Mark McEahern marklists at mceahern.com
Thu Jul 4 13:20:53 EDT 2002


[Rajarshi Guha]
> Thanks! Did'nt know about this module.
> OK - what if I allow numbers like 1.1 and 1 to be equal (that is numbers
> which are close are also consdiered to be identical)
>
> I'm not really clear on how to set this up in difflib - maybe the bagging
> idea is a better choice for this problem?

I didn't know about difflib either.  Thanks Emile!

Before I read Emile's response, I began working on this very simplistic
attempt to solve the problem.  For XP fans, note that I wrote the unit test
BEFORE writing the code.  ;-)

You'll note that compare_items() -- I'm not sure that's the best name --
utilizes a make_bag function (modified slightly from something Alex Martelli
posted in reference to a different problem).

def compare_items(seq1, seq2):
    """compare_items(seq1, seq2) --> float

    Return the ratio of items in the smaller sequence found in the larger
sequence.
    """
    larger, smaller = (len(seq1) > len(seq2)) and (seq1, seq2) or (seq2,
seq1)
    def make_bag(l):
        dict_l = {}
        for x in l:
            if hasattr(x, "lower"):
                x = x.lower()
            # This takes care of duplicates.
            dict_l[x] = 1 + dict_l.get(x, 0)
        return dict_l
    s = make_bag(smaller)
    l = make_bag(larger)
    matches = 0
    for k, v in s.items():
        if k in l:
            l_v = l[k]
            # Accumulate the count of items matched, which is the
            # smaller of the two.
            matches += (v > l_v) and l_v or v
    # Should I divide by len(larger) or len(smaller)?  Or perhaps just
    # return matches?
    return matches / float(len(larger))

import unittest

class test(unittest.TestCase):

    def test(self):

        seq1 = range(10)
        seq2 = range(10)
        # Reverse seq2 to make sure our algorithm isn't dependent
        # on order--of course, perhaps it should be?
        seq2.reverse()
        v = compare_items(seq1, seq2)
        expected = 1.0
        self.assertEquals(v, expected)

        seq3 = range(9)
        v = compare_items(seq1, seq3)
        expected = .9
        self.assertEquals(v, expected)

        seq4 = [1, 0, 100, 'a', 'b', 'c']
        v = compare_items(seq1, seq4)
        expected = .2
        self.assertEquals(v, expected)

if __name__ == "__main__":
    unittest.main()

-






More information about the Python-list mailing list