[Patches] [Patch #101394] lookdict optimizations

noreply@sourceforge.net noreply@sourceforge.net
Fri, 09 Feb 2001 16:11:39 -0800


Patch #101394 has been updated. 

Project: python
Category: core (C code)
Status: Open
Submitted by: marangoz
Assigned to : fdrake
Summary: lookdict optimizations

Follow-Ups:

Date: 2001-Feb-09 16:11
By: jhylton

Comment:
Is this patch still revelant?

-------------------------------------------------------

Date: 2000-Sep-03 21:09
By: marangoz

Comment:
Let's add a comment, although this has been raised on python-dev.
This patch proposes a couple of ideas for optimizing & speeding dict
lookups -- not all of them need to be applied though.

- clear the eventual errors from Object_Compare only on need
   (this logic needs to be double-checked once again to see whether it
   really works)
- defer variable initializations after the most common return cases
- specialize string_compare() for lookdict_string. The test comparing
   the first char, before calling memcmp(), can be added too.
- inline the first item probe in PyDict_GetItem. This saves a func call
   for the most common case (common to lookdict & lookdict_string).
-------------------------------------------------------

Date: 2000-Sep-04 06:45
By: gvanrossum

Comment:
Quick comments: we should *always* call PyErr_Occurred() after
PyObject_Compare() (and PyErr_Clear() if PyErr_Occurred() returned true).
PyErr_Compare() really doesn't expect a pending exception coming in and so
*not* clearing the error might break the *next* PyErr_Compare() call. (This
doesn't happen typically but PyObject_Compare() can execute arbitrary code,
some of which might call PyErr_Occurred() without calling PyErr_Clear()
first.)
-------------------------------------------------------

Date: 2000-Sep-05 06:05
By: nobody

Comment:
Correct!

But then, aren't the current "else" if(cmp == 0) clauses risky after
PyObject_Compare()? An exception may be set in external code
while making Object_Compare() to return 0! Perhaps not in Python
source, but in buggy extensions. lookdict() will then overlook this
"hit".

Patch updated, without "else" clauses and with the 1st char check in
string_compare_equal().
-------------------------------------------------------

Date: 2000-Sep-05 06:08
By: marangoz

Comment:
Argh! These Web interfaces strike again - forgot to login.
-------------------------------------------------------

Date: 2000-Sep-14 09:26
By: fdrake

Comment:
Revised patch:
Instead of defining a function to do the fast string comparison, use a
macro, but let it use the documented string object API
(PyString_GET_SIZE(), PyString_AS_STRING()) instead of breaking the
encapsulation in the code.  This avoids all function calls to do the string
compare, except memcmp() (which good compilers can inline anyway).

Vladimir, take a look at this; if you're happy, I'm happy, and you can
check it in.  Thanks!
-------------------------------------------------------

Date: 2000-Sep-14 13:51
By: fdrake

Comment:
Guido, please review (since Vladimir's away).
-------------------------------------------------------

Date: 2000-Sep-15 11:18
By: gvanrossum

Comment:
Accepted. Assuming you've tested this, it looks fine to me.
Can you time this a bit?

There's one niggling issue: some people think that before you do memcmp()
you should manually compare the first character. Others think that's
unnecessary (since a good compiler can inline memcmp anyway). (It's also a
bit scary if the size is zero.) So please ignore this but keep it in mind
for timing experiments. :)
-------------------------------------------------------

Date: 2000-Sep-25 14:38
By: twouters

Comment:
Shouldn't this patch be either postponed or checked in?
-------------------------------------------------------

Date: 2000-Sep-25 18:52
By: fdrake

Comment:
It's postponed since I've not had time to run performance tests to measure
the corner case it aims to improve.  It turns out that generating the test
data I need takes a long time (I need hash collisions of identifier-like
strings).  I just need to be able to let my generator run for a long time
(it was generating more collisions than I'd expected, but I don't know how
many off-hand).

Another interesting metric would be to examine the .pyc files generated
from the standard library and find out if there are any string hash
collisions there -- if not, or if very few, it's not worth the added
complexity.
-------------------------------------------------------

-------------------------------------------------------
For more info, visit:

http://sourceforge.net/patch/?func=detailpatch&patch_id=101394&group_id=5470