This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Fix inspect.py 2.5 slowdown
Type: Stage:
Components: Library (Lib) Versions: Python 2.5
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: nnorwitz Nosy List: ncoghlan, nnorwitz, pje
Priority: critical Keywords: patch

Created on 2006-09-06 10:51 by ncoghlan, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
inspect_speed_committed.diff ncoghlan, 2006-09-07 10:53 Patch as committed to 2.6 trunk
Messages (7)
msg51090 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2006-09-06 10:51
Every call to inspect.findsource() in Python 2.5 is
calling both os.path.abspath() and os.path.normpath(),
even if the file name is already absolute and
normalized, and the relevant entry is in the module
name cache.

This patch changes getmodule() to first check the
modulesbyfile cache earlier, so these calls are skipped
when the module name for the file is already known.
msg51091 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2006-09-06 12:48
Logged In: YES 
user_id=1038590

Low impact approach that simply checks the linecache
directly before doing anything.
msg51092 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2006-09-06 13:04
Logged In: YES 
user_id=1038590

Bah. The linecache version has the same problem as the
previous version - it's the failure case that searches the
whole of sys.modules that is slow, not the success case.

Patch attached which stores a mapping from module names to
module.__file__ values and skips the absolute lookup if the
value of module.__file__ hasn't changed.

msg51093 - (view) Author: PJ Eby (pje) * (Python committer) Date: 2006-09-06 13:49
Logged In: YES 
user_id=56214

I just looked at the current patch; please don't use
sys.modules.iteritems() in your search loop; it'll break if
touching a module causes another module to be imported. 
This can happen if somebody is using one of the many "lazy
import" tools out there.  Use .items() instead.

Second, there was a reason that the loop used
module.__name__ instead of the sys.modules keys; a module
can be listed under more than one sys.modules key, so the
loop was storing the *canonical* name of the module.  I'm
not 100% positive this is a critical thing, but I'm
suspicious of the idea of just randomly choosing which of
the possible keys should be used.
msg51094 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2006-09-07 06:58
Logged In: YES 
user_id=33168

I had the same question as pje wrt to his second comment
about module.__name__ (assuming this is the same patch
posted to python-dev).  I would rather use the version
currently checked in to minimize change and maximize the
testing.  Please add a comments describing why one is chosen
rather than the other.  Also adding a comment in the code
about pje's items comment below would be nice.

Is this the attached version the same one Ralf (or whoever)
said fixed the problem?
msg51095 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2006-09-07 10:53
Logged In: YES 
user_id=1038590

I beefed up the tests to pick up the error Ralf spotted in
the previous version of the patch, and added more comments
to getmodule() to explain why some things are the way they
are (specifically the use of sys.modules.items() and the use
of module.__name__).

Committed to the trunk - I'll backport tomorrow night unless
Neal objects.
msg51096 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2006-09-08 10:02
Logged In: YES 
user_id=1038590

Applied as rev 51803 on head. Backported to 2.5 as rev 51826.
History
Date User Action Args
2022-04-11 14:56:20adminsetgithub: 43947
2006-09-06 10:51:39ncoghlancreate