Opcode Tracker for the Python Byte-Code Interpreter ============================================================ by Perry A. Stoll Created for Python 1.3, January 1996 Updated for Python 1.4, January 1997 Motivation --------------- There has always been interest (on the python newsgroup, at least) in improving the performance of Python. Two possible strategies are compiling the bytecode to machine-specific binaries or improving the performance of the interpreter. Every so often someone comes up with a clever idea which they think should make a difference in the performance of the interpreter. Often, people step up and put their code where their mouth is, implementing the suggested change. Invariably, the call is sounded for a way to measure the affect of such changes. That's where this opcode tracker fits in. It helps identify where the Python interpreter is spending its time. What --------------- This distribution contains the necessary tools to add the ability to profile bytecodes run by the interpreter and generate statistics about them. The files included in this distribution are: * README - this file * INSTALL - instructions to install this package for Unix * NEWS - the original comp.lang.python article which inspired this stuff. * ceval.optrack.patch - a small context diff file used to patch Python/ceval.c, adding the ability to keep statistics (total count, time spent executing, execution chain distributions). To enable opcode tracking, you must then compile ceval.c with ENABLE_OPTRACKER defined. * optracker.h - definitions and data structure for opcode tracker objects * optrackermodule.c - a C module defining the optracker object and a few useful functions. * opstats.py - modeled after the pstats module in the standard distribution, provides profiling statistics for opcode objects. * optrackerscript.py - an example of using this package to profile the opcodes. If you have NumPy, it does some more fancy analysis. * pystone.py - a benchmark script from the Python-1.4 distribution, distributed here for demonstration purposes. This is actually a very boring program to profile, mainly function calls. Oh, well. Note that building an interpreter to be optracker-capable must be done at compile time. Features: ---------------------------------------- Optracker provides a way to collect execution stats of the Python interpreter by keeping track of frequency, time, and history of opcode execution: 1) Counting of opcode execution in the Python interpreter. 2) Timing spent executing opcodes (until next opcode is executed). 3) Tracks opcode pair execution, i.e. digraph analysis. (Easily interfaces with the Numeric package for more interesting analysis!) 4) Minimal changes to Python/ceval.c 5) Can be completely removed with a compile time option. 6) Stats are reported by interfacing with the existing Stats class in the pstats.py file in the distribution. What is Actually Being Profiled? ------------------------------------------------------------ There is a loop in Python bytecode interpreter (the for loop around the massive switch statement in ceval.c:eval_code2()). Each time through this loop, the count for the appropriate opcode is incremented by one. That's sufficient to keep raw usage levels for opcodes. Gathering timing information for the opcodes is complicated by the fact that the interpreter (eval_code2) is often called recursively. If there is no recursive call, then the time for the executed opcode is simply the time through the large switch statement. If there is a recursive call, than the time for the first opcode accumulated until the next opcode begins. This means that all the recursive function call overhead is charged to the previous opcode. Also, if an opcode calls out to a C routine, i.e., an extension, all time is charged to that opcode. Something like this showed up when profiling NumPy: I was wondering why a BINARY_DIVIDE took about 6 seconds. Then I realized it was a really large matrix operation. This whole scheme may not be the best, but it doesn't seem to bad either. There are other strategies worth exploring and I hope people will play around. Sample Output: ------------------------------------------------------------ Here's the timing information I get when running the optrackerscript, a sample script demonstrating optracking. Note where it says function calls, that is of course opcodes executed. Also, the tottime (total time) and cumtime (cumulative time) columns will always be the same. I didn't understand until I'd finished a first implementation that timing recursive calls differently (which makes sense for functions) doesn't make much sense for opcodes. Well, maybe it does. Because the interpeter can recur, there is some sense of total time spent in one opcode. But the worth of tracking that seemed pretty dubious at best. 556223 function calls in 25.283 CPU seconds Ordered by: internal time List reduced from 55 to 20 due to restriction <20> ncalls tottime percall cumtime percall opcode:opcode id:(NAME) 158150 4.504 0.000028 4.504 0.000028 opcode:124(LOAD_FAST) 18185 4.061 0.000223 4.061 0.000223 opcode:131(CALL_FUNCTION) 34298 2.711 0.000079 2.711 0.000079 opcode:105(LOAD_ATTR) 25954 2.216 0.000085 2.216 0.000085 opcode:23(BINARY_ADD) 83446 1.930 0.000023 1.930 0.000023 opcode:127(SET_LINENO) 67165 1.774 0.000026 1.774 0.000026 opcode:125(STORE_FAST) 24558 1.232 0.000050 1.232 0.000050 opcode:25(BINARY_SUBSCR) 39470 0.867 0.000022 0.867 0.000022 opcode:100(LOAD_CONST) 8660 0.835 0.000096 0.835 0.000096 opcode:95(STORE_ATTR) 29 0.695 0.023970 0.695 0.023970 opcode:72(PRINT_NEWLINE) 54 0.648 0.012003 0.648 0.012003 opcode:107(IMPORT_NAME) 7028 0.454 0.000065 0.454 0.000065 opcode:116(LOAD_GLOBAL) 6150 0.446 0.000072 0.446 0.000072 opcode:60(STORE_SUBSCR) 13664 0.387 0.000028 0.387 0.000028 opcode:1(POP_TOP) 1955 0.386 0.000197 0.386 0.000197 opcode:13(UNARY_CONVERT) 8273 0.384 0.000046 0.384 0.000046 opcode:92(UNPACK_TUPLE) 13496 0.325 0.000024 0.325 0.000024 opcode:111(JUMP_IF_FALSE) 6229 0.266 0.000043 0.266 0.000043 opcode:24(BINARY_SUBTRACT) 9785 0.250 0.000026 0.250 0.000026 opcode:83(RETURN_VALUE) 9642 0.209 0.000022 0.209 0.000022 opcode:110(JUMP_FORWARD) Using NumPy for further analysis... POP_TOP is followed by JUMP_IF_FALSE 97.2% DUP_TOP is followed by RETURN_VALUE 90.1% UNARY_NOT is followed by LOAD_FAST 90% UNARY_CONVERT is followed by LOAD_ATTR 90% BINARY_DIVIDE is followed by LOAD_FAST 100% SLICE is followed by LOAD_FAST 98.1% SLICE3 is followed by LOAD_FAST 76.5% STORE_SUBSCR is followed by LOAD_FAST 94.2% DELETE_SUBSCR is followed by LOAD_FAST 100% PRINT_ITEM is followed by RETURN_VALUE 92.2% PRINT_NEWLINE is followed by PRINT_ITEM 89.7% BREAK_LOOP is followed by SET_LINENO 100% LOAD_LOCALS is followed by STORE_NAME 100% RETURN_VALUE is followed by LOAD_CONST 76.5% EXEC_STMT is followed by LOAD_FAST 100% END_FINALLY is followed by POP_TOP 100% BUILD_CLASS is followed by RETURN_VALUE 100% UNPACK_LIST is followed by CALL_FUNCTION 100% STORE_ATTR is followed by LOAD_FAST 98.8% LOAD_ATTR is followed by LOAD_FAST 92.4% IMPORT_NAME is followed by SET_LINENO 100% IMPORT_FROM is followed by IMPORT_NAME 100% JUMP_IF_TRUE is followed by LOAD_FAST 97.1% FOR_LOOP is followed by SET_LINENO 100% SETUP_LOOP is followed by SET_LINENO 100% SETUP_EXCEPT is followed by SET_LINENO 100% SETUP_FINALLY is followed by SET_LINENO 100% MAKE_FUNCTION is followed by LOAD_CONST 100%