[Tutor] bytecode primer, and avoiding a monster download
eryksun
eryksun at gmail.com
Tue May 28 18:48:58 CEST 2013
On Tue, May 28, 2013 at 7:18 AM, Dave Angel <davea at davea.name> wrote:
>
> dis.dis(myfunction)
>
> will disassemble one function.
>
> That's not all that's in the byte-code file, but this is 98% of what you
> probably want out of it. And you can do it in the debugger with just the
> standard library.
The argument for dis.dis() can be a module, class, function or code
object. It disassembles all the top-level code objects that it finds,
but it doesn't recursively disassemble code objects that are in the
co_consts.
I'm not sure what Dave means by 'byte-code file'. A .pyc? That's a
marshaled code object with a small header that has a magic number and
a timestamp. Here's an example of reading the pyc for the dis module
itself:
import dis
import marshal
import struct
import datetime
# magic number for Python 2.7
MAGIC27 = 62211 | (ord('\r') << 16) | (ord('\n') << 24)
pyc = open(dis.__file__, 'rb') # dis.pyc
hdr = pyc.read(8)
magic, tstamp = struct.unpack('<ll', hdr)
tstamp = datetime.datetime.fromtimestamp(tstamp)
# the rest of the file is the code object
code = marshal.load(pyc)
>>> magic == MAGIC27
True
>>> tstamp
datetime.datetime(2013, 1, 2, 12, 45, 58)
>>> code.co_consts[0]
'Disassembler of Python byte code into mnemonics.'
The code object's co_consts tuple also has the code objects for the
defined functions, plus the anonymous functions that build class
objects. The latter are subsequently discarded, as is the .pyc code
object itself after the module is imported/executed. There's no point
in keeping it around.
My previous post is a light intro to instantiating code and function
objects. I think the arguments are mostly self-explanatory -- except
for co_lnotab, co_flags, and closure support (co_cellvars,
co_freevars, func_closure).
I assembled the bytecode with the help of opcode.opmap, to make it
more readable. But to be clear, CPython bytecode is simply a byte
string stored in the co_code attribute. Disassembling the bytecode
nicely with source line numbers requires co_lnotab. While I did
reference the text file that explains co_lnotab, I neglected to
provide the following link:
http://hg.python.org/cpython/file/687295c6c8f2/Objects/lnotab_notes.txt
co_flags indicates various aspects of how the bytecode was compiled
(e.g. optimized to use fastlocals). It's inherited from the current
context when you use exec or eval. compile() can disable this via the
argument dont_inherit.
The code for a class body or a function requires a new local namespace
(CO_NEWLOCALS). For a function, locals is also optimized
(CO_OPTIMIZED) to use the fastlocals array instead of a dict. On the
other hand, the code that creates a module is evaluated with locals
and globals set to the same namespace, so it won't have the
CO_NEWLOCALS flag.
Including the metadata that there are no free variables (CO_NOFREE)
can make a simple function call more efficient, but only if there are
no default arguments or keyword arguments in the call. Refer to
fast_function() in Python/ceval.c.
http://hg.python.org/cpython/file/ab05e7dd2788/Python/ceval.c#l4060
Here are all of the flags for code objects in 2.7:
CO_OPTIMIZED 0x01
CO_NEWLOCALS 0x02
CO_VARARGS 0x04
CO_VARKEYWORDS 0x08
CO_NESTED 0x10
CO_GENERATOR 0x20
CO_NOFREE 0x40
/* __future__ imports */
CO_FUTURE_DIVISION 0x02000
CO_FUTURE_ABSOLUTE_IMPORT 0x04000
CO_FUTURE_WITH_STATEMENT 0x08000
CO_FUTURE_PRINT_FUNCTION 0x10000
CO_FUTURE_UNICODE_LITERALS 0x20000
For a high-level view on scoping, read the section on the execution
model in the language reference:
http://docs.python.org/2/reference/executionmodel
Subsequently, if you want, we can talk about how this is implemented
in the VM, and especially with respect to closures, cellvars,
freevars, and the opcodes LOAD_DEREF, STORE_DEREF, and MAKE_CLOSURE.
More information about the Tutor
mailing list