[Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken
Tres Seaver
tseaver at palladion.com
Wed Jul 7 22:11:29 CEST 2010
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Antoine Pitrou wrote:
> On Wed, 7 Jul 2010 19:44:31 +0200
> Eli Bendersky <eliben at gmail.com> wrote:
>> For what it's worth, my benchmarking showed that modifying the
>> heuristic to only kick in when there are more than 100 kinds of
>> elements (Terry's option A) didn't affect the runtime of matching
>> whatsoever, even when the heuristic *does* kick in. All it adds,
>> really, is the overhead of a single 'if' statement. So it wouldn't be
>> right to assume that somehow modifying the heuristic or allowing to
>> turn it off will negatively affect performance in the special case Tim
>> originally optimized for.
>
> Just because it doesn't affect performance in your tests doesn't mean it
> won't do so in the general case. Consider a case where Tim's junk
> optimization kicked in and helped improve performance a lot, but where
> there are still less than 100 alphabet symbols. The new heuristic will
> ruin this use case.
That would describe pretty much every C program ever written, for
instance, and nearly as high a percentage of all Python modules /
scripts ever written: the 'string.printable' set, less formfeed and
vertical tab, is 98 characters long.
> That's why I'm advocating a dedicated flag instead.
+1.
Tres.
- --
===================================================================
Tres Seaver +1 540-429-0999 tseaver at palladion.com
Palladion Software "Excellence by Design" http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkw032wACgkQ+gerLs4ltQ5xigCfVLhTzFX733cZAO2Jv6JZQm0i
HoIAmQEnTyxa2oLAuE22M7FZHUS00xDu
=WYt2
-----END PGP SIGNATURE-----
More information about the Python-Dev
mailing list