SequenceMatcher bug ?

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Wed Dec 10 10:37:12 EST 2008


En Wed, 10 Dec 2008 12:00:30 -0200, <rdmurray at bitdance.com> escribió:
> On Tue, 9 Dec 2008 at 22:15, eliben wrote:
>> On Dec 10, 4:12 am, rdmur... at bitdance.com wrote:
>>> On Mon, 8 Dec 2008 at 23:46, eliben wrote:
>>>> This is about Python 2.5.2 - I don't know if there were fixes to this
>>>> module in 2.6/3.0
>>>
>>>> I think I ran into a bug with difflib.SequenceMatcherclass.
>>>> Specifically, its ratio() method. The following:
>>>
>>>> SequenceMatcher(None, [4] + [10] * 500 + [5], [10] * 500 + [5]).ratio
>>>> ()
>>>
>>>> returns 0.0
>>>
>>>> While the same with 500 replaced by 100 returns .99... something
>>>> Looking at the code ofSequenceMatcherthere's some caching going on
>>>> when the sequences are longer than 200 elements (and indeed, I can
>>>> reproduce the bug above 200 but not below). Can anyone confirm that
>>>> this misbehaves and suggest a workaround ?
>>>
>>> Python 2.5.2 (r252:60911, Sep 29 2008, 20:34:04)
>>> [GCC 4.3.1] on linux2
>>> Type "help", "copyright", "credits" or "license" for more  
>>> information.>>> from difflib importSequenceMatcher
>>>>>> SequenceMatcher(None, [4] + [10] * 500 + [5], [10] * 500 +
>>>>>> [5]).ratio()
>>>
>>> 0.99900299102691925
>>>
>>
>> Strange. I could reproduce the problem both on ActiveState Python
>> 2.5.2 for Windows, and in the online Try Python evaluator:
>>
>> http://try-python.mired.org/
>
> My system is Gentoo, which installs python from source.  Maybe gentoo
> applies patches that the binary releases don't have.

I can't reproduce the problem. I got exactly the same results (0.999...)  
with all the releases I have at hand, ranging from 3.0 back to 2.1.3, all  
on Windows.
And http://try-python.mired.org/ says the same thing.

-- 
Gabriel Genellina




More information about the Python-list mailing list