[Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken

Tres Seaver tseaver at palladion.com
Wed Jul 7 22:11:29 CEST 2010


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Antoine Pitrou wrote:
> On Wed, 7 Jul 2010 19:44:31 +0200
> Eli Bendersky <eliben at gmail.com> wrote:
>> For what it's worth, my benchmarking showed that modifying the
>> heuristic to only kick in when there are more than 100 kinds of
>> elements (Terry's option A) didn't affect the runtime of matching
>> whatsoever, even when the heuristic *does* kick in. All it adds,
>> really, is the overhead of a single 'if' statement. So it wouldn't be
>> right to assume that somehow modifying the heuristic or allowing to
>> turn it off will negatively affect performance in the special case Tim
>> originally optimized for.
> 
> Just because it doesn't affect performance in your tests doesn't mean it
> won't do so in the general case. Consider a case where Tim's junk
> optimization kicked in and helped improve performance a lot, but where
> there are still less than 100 alphabet symbols. The new heuristic will
> ruin this use case.

That would describe pretty much every C program ever written, for
instance, and nearly as high a percentage of all Python modules /
scripts ever written:  the 'string.printable' set, less formfeed and
vertical tab, is 98 characters long.

> That's why I'm advocating a dedicated flag instead.

+1.



Tres.
- --
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkw032wACgkQ+gerLs4ltQ5xigCfVLhTzFX733cZAO2Jv6JZQm0i
HoIAmQEnTyxa2oLAuE22M7FZHUS00xDu
=WYt2
-----END PGP SIGNATURE-----



More information about the Python-Dev mailing list