[Python-Dev] about line numbers

Guido van Rossum guido@CNRI.Reston.VA.US
Fri, 20 Aug 1999 15:59:24 -0400


> I'll try to sketch here the scheme I'm thinking of for the
> callback/breakpoint issue (without SET_LINENO), although some
> technical details are still missing.
> 
> I'm assuming the following, in this order:
> 
> 1) No radical changes in the current behavior, i.e. preserve the
>    current architecture / strategy as much as possible.
> 
> 2) We dont have breakpoints per opcode, but per source line. For that
>    matter, we have sys.settrace (and for now, we don't aim to have
>    sys.settracei that would be called on every opcode, although we might
>    want this in the future)
> 
> 3) SET_LINENO disappear. Actually, SET_LINENO are conditional breakpoints,
>    used for callbacks from C to Python. So the basic problem is to generate
>    these callbacks.

They used to be the only mechanism by which the traceback code knew
the current line number (long before the debugger hooks existed), but
with the lnotab, that's no longer necessary.

> If any of the above is not an appropriate assumption and we want a radical
> change in the strategy of setting breakpoints/ generating callbacks, then
> this post is invalid.

Sounds reasonable.

> The solution I'm thinking of:
> 
> a) Currently, we have a function PyCode_Addr2Line which computes the source
>    line from the opcode's address. I hereby assume that we can write the
>    reverse function PyCode_Line2Addr that returns the address from a given
>    source line number. I don't have the implementation, but it should be
>    doable. Furthermore, we can compute, having the co_lnotab table and
>    co_firstlineno, the source line range for a code object.
> 
>    As a consequence, even with the dumbiest of all algorithms, by looping
>    trough this source line range, we can enumerate with PyCode_Line2Addr 
>    the sequence of addresses for the source lines of this code object.
> 
> b) As Chris pointed out, in case sys.settrace is defined, we can allocate
>    and keep a copy of the original code string per frame. We can further
>    dynamically overwrite the original code string with a new (internal,
>    one byte) CALL_TRACE opcode at the addresses we have enumerated in a).
> 
>    The CALL_TRACE opcodes will trigger the callbacks from C to Python,
>    just as the current SET_LINENO does.
> 
> c) At execution time, whenever a CALL_TRACE opcode is reached, we trigger
>    the callback and if it returns successfully, we'll fetch the original
>    opcode for the current location from the copy of the original co_code.
>    Then we directly jump to the arg fetch code (or in case we fetch the
>    entire original opcode in CALL_TRACE - we jump to the dispatch code).

Tricky, but doable.

> Hmm. I think that's all.
> 
> At the heart of this scheme is the PyCode_Line2Addr function, which is
> the only blob in my head, for now.

I'm pretty sure that this would be straightforward.

I'm a little anxious about modifying the code, and was thinking myself
of a way to specify a bitvector of addresses where to break.  But that
would still cause some overhead for code without breakpoints, so I
guess you're right (and it's certainly a long-standing tradition in
breakpoint-setting!)

--Guido van Rossum (home page: http://www.python.org/~guido/)