making a typing speed tester

Wed Nov 14 19:28:09 EST 2007

En Wed, 14 Nov 2007 14:56:25 -0300, <tavspamnofwd at googlemail.com> escribió:

>> I'm trying to write a program to test someones typing speed and show
>> them their mistakes. However I'm getting weird results when looking
>> for the differences in longer (than 100 chars) strings:
>>
>> import difflib
>>
>> # a tape measure string (just makes it easier to locate a given index)
>> a =
>> '1-3-5-7-9-12-15-18-21-24-27-30-33-36-39-42-45-48-51-54-57-60-63-66-69
>> -72-75-78-81-84-87-90-93-96-99-103-107-111-115-119-123-127-131-135-139
>> -143-147-151-155-159-163-167-171-175-179-183-187-191-195--200'
>>
>> # now with a few mistakes
>> b = '1-3-5-7-
>> l-12-15-18-21-24-27-30-33-36-39o42-45-48-51-54-57-60-63-66-69-72-75-78
>> -81-84-8k-90-93-96-9l-103-107-111-115-119-12b-1v7-131-135-139-143-147-
>> 151-m55-159-163-167-a71-175j179-183-187-191-195--200'
>>
>> s = difflib.SequenceMatcher(None, a ,b)
>> ms = s.get_matching_blocks()
>>
>> print ms
>>
>>>>> [(0, 0, 8), (200, 200, 0)]
>>
>> Have I made a mistake or is this function designed to give up when the
>> input strings get too long? If so what could I use instead to compute
>> the mistakes in a typed text?

Yes, there are some limitations on how SequenceMatcher works.

> ---------- Forwarded message ----------
> From: Evert Rol
> [...]
> And the part of the actual code reads:

>                  if n >= 200 and len(indices) * 100 > n:     # <--- !!
>                     populardict[elt] = 1
>                     del indices[:]
>                 else:
>                     indices.append(i)>

> So you're right: it has a stop at the (somewhat arbitrarily) limit of
> 200 characters. [...]If you feel safe enough and on a fast platform, you  
> can probably up
> that limit (or even put it somewhere as an optional variable in the
> code, which I would think is generally better).

If you try with a slightly shorter text (190 chars, by example) you get  
the expected result, pretty fast:

py> s = difflib.SequenceMatcher(None, a[:190], b[:190])
py> ms = s.get_matching_blocks()
py> print ms
[(0, 0, 8), (9, 9, 30), (40, 40, 46), (87, 87, 11), (99, 99, 23), (123,  
123, 2),
  (126, 126, 26), (153, 153, 15), (169, 169, 6), (176, 176, 14), (190, 190,  
0)]

So it appears that your strings are hitting that (arbitrary) limit. From  
the algorithm point of view, your strings are a rather degenerate case: so  
many '-' and '0' and '1's to match.
Try increasing that 200 to somewhat larger than your strings.

-- 
Gabriel Genellina