Python Interview Questions

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sun Nov 18 19:31:28 EST 2012


On Sun, 18 Nov 2012 12:53:50 -0500, Roy Smith wrote:

> I've got a script which trolls our log files looking for python stack
> dumps.  For each dump it finds, it computes a signature (basically, a
> call sequence which led to the exception) and uses this signature as a
> dictionary key.  Here's the relevant code (abstracted slightly for
> readability):
> 
> def main(args):
>     crashes = {}
>     [...]
>     for line in open(log_file):
>         if does_not_look_like_a_stack_dump(line):
>              continue
>         lines = traceback_helper.unfold(line)
>         header, stack = traceback_helper.extract_stack(lines)
>         signature = tuple(stack)
>         if signature in crashes:
>             count, header = crashes[signature]
>             crashes[signature] = (count + 1, header)
>         else:
>             crashes[signature] = (1, header)
> 
> You can find traceback_helper at
> https://bitbucket.org/roysmith/python-tools/src/4f8118d175ed/logs/
> traceback_helper.py
> 
> The stack that's returned is a list.  It's inherently a list, per the
> classic definition:

Er, no, it's inherently a blob of multiple text lines. Sure, you've built 
it a line at a time by using a list, but I've already covered that case. 
Once you've identified a stack, you never append to it, sort it, delete 
lines in the middle of it... none of these list operations are meaningful 
for a Python stack trace. The stack becomes a fixed string, and not just 
because you use it as a dict key, but because inherently it counts as a 
single, immutable blob of lines.

A tuple of individual lines is one reasonable data structure for a blob 
of lines. Another would be a single string:

    signature = '\n'.join(stack)

Depending on what you plan to do with the signatures, one or the other 
implementation might be better. I'm sure that there are other data 
structures as well.


> * It's variable length.  Different stacks have different depths.

Once complete, the stack trace is fixed length, but that fixed length is 
different from one stack to the next. Deleting a line would make it 
incomplete, and adding a line would make it invalid.


> * It's homogeneous.  There's nothing particularly significant about each
> entry other than it's the next one in the stack.
> 
> * It's mutable.  I can build it up one item at a time as I discover
> them.

The complete stack trace is inhomogeneous and immutable. I've already 
covered immutability above: removing, adding or moving lines will 
invalidate the stack trace. Inhomogeneity comes from the structure of a 
stack trace. The mere fact that each line is a string does not mean that 
any two lines are equivalent. Different lines represent different things.

Traceback (most recent call last):
  File "./prattle.py", line 873, in select
    selection = self.do_callback(cb, response)
  File "./prattle.py", line 787, in do_callback
    raise callback
ValueError: what do you mean?

is a valid stack. But:

Traceback (most recent call last):
    raise callback
    selection = self.do_callback(cb, response)
  File "./prattle.py", line 787, in do_callback
ValueError: what do you mean?
  File "./prattle.py", line 873, in select

is not. A stack trace has structure. The equivalent here is the 
difference between:

ages = [23, 42, 19, 67,  # age, age, age, age
        17, 94, 32, 51,  # ...
        ]

values = [23, 1972, 1, 34500,  # age, year, number of children, income
          35, 1985, 0, 67900,  # age, year, number of children, income
          ]

A stack trace is closer to the second example than the first: each item 
may be the same type, but the items don't represent the same *kind of 
thing*. 


You could make a stack trace homogeneous with a little work:

- drop the Traceback line and the final exception line;
- parse the File lines to extract the useful fields;
- combine them with the source code.

Now you have a blob of homogeneous records, here shown as lines of text 
with ! as field separator:

./prattle.py ! 873 ! select ! selection = self.do_callback(cb, response)
./prattle.py ! 787 ! do_callback ! raise callback

But there's really nothing you can do about the immutability. There isn't 
any meaningful reason why you might want to take a complete stack trace 
and add or delete lines from it.


-- 
Steven



More information about the Python-list mailing list