Python Interview Questions

Mon Nov 19 02:54:56 EST 2012

On Sun, 18 Nov 2012 21:09:36 -0500, Roy Smith wrote:

> In article <50a97de0$0$29983$c3e8da3$5496439d at news.astraweb.com>,
>  Steven D'Aprano <steve+comp.lang.python at pearwood.info> wrote:
> 
> 
>> > The stack that's returned is a list.  It's inherently a list, per the
>> > classic definition:
>> 
>> Er, no, it's inherently a blob of multiple text lines.
> 
> No, it's a list that looks like (taken from the doc string of the code I
> referenced):
> 
>     [('/usr/lib/.../base.py', 'get_response'),
>      ('/home/songza/.../views.py', 'song_info'),
>      ('/home/songza.../api.py', 'get_song'), 
>      ('/home/songza/.../api.py', 'api')]
> 
> [it doesn't really have ...'s in the paths; I just elided some text to
> make it easier to read]

I see. It wasn't clear from your earlier description that the items had 
been post-processed from collections of raw log lines to fixed records. 
But it doesn't actually change my analysis any. See below.

By the way, based on the sample data you show, your script is possibly 
broken. You don't record either the line number that raises, or the 
exception raised, so your script doesn't differentiate between different 
errors that happen to occur with similar stack traces. (I say "possibly" 
broken because I don't know what your requirements are. Maybe your 
requirements are sufficiently wide that you don't care that distinct 
failures are counted together.)

E.g. these three stack traces will probably generate the same fixed 
record, even though the errors are distinct:

#1
Traceback (most recent call last):
  File "./spam.py", line 20, in select
    selection = func(a, b)
  File "./spam.py", line 60, in func
    return 1/x
ZeroDivisionError: float division

#2
Traceback (most recent call last):
  File "./spam.py", line 20, in select
    selection = func(a, b)
  File "./spam.py", line 60, in func
    return 1/x
TypeError: unsupported operand type(s) for /: 'int' and 'NoneType'

#3
Traceback (most recent call last):
  File "./spam.py", line 20, in select
    selection = func(a, b)
  File "./spam.py", line 55, in func
    y = 1/(a + b)
ZeroDivisionError: float division

Maybe that's okay for your application, but it strikes me as odd that you 
do distinguish *some* distinct errors in the same function, but not 
others.

>> > * It's homogeneous.  There's nothing particularly significant about
>> > each entry other than it's the next one in the stack.
>> 
>> The complete stack trace is inhomogeneous and immutable. I've already
>> covered immutability above: removing, adding or moving lines will
>> invalidate the stack trace. Inhomogeneity comes from the structure of a
>> stack trace. The mere fact that each line is a string does not mean
>> that any two lines are equivalent. Different lines represent different
>> things.
> 
> No.  Each entry in the list represents a source file and a function
> name.  They're all the same "shape".  You could remove one or add
> another one, or shuffle the order, and you would have something which
> was syntactically correct and semantically meaningful (even if it didn't
> represent an actual code path.

If you remove/add/shuffle lines in the stack, you no longer have the same 
stack. Take the example you gave before:

stack1 = [('/usr/lib/.../base.py', 'get_response'),
          ('/home/songza/.../views.py', 'song_info'),
          ('/home/songza.../api.py', 'get_song'), 
          ('/home/songza/.../api.py', 'api')
          ]

Here's a different stack trace, representing a different code path, which 
as you say is syntactically correct and semantically meaningful:

stack2 = [('/home/songza/.../api.py', 'api'),
          ('/home/songza.../api.py', 'get_song'),
          ('/home/songza/.../views.py', 'song_info'),
          ('/usr/lib/.../base.py', 'get_response')
          ]

Since they are different stacks, they are treated as different keys:

data = {stack1: 11, stack2: 22}

Do you agree that this is what your application expects? Different stack 
traces are different keys, associated with different values.

I claim this only makes sense if you treat the stacks as inherently 
immutable. Never mind Python's limitation. Let's pretend we were running 
this code under some other language, NeoPython, which allowed mutable 
keys.

You claim that stacks are *inherently mutable*. So I should be able to do 
this:

stack1.sort()  # it's the *same stack*, all I've done is mutate it
print data[stack1]

and expect to see "11" printed. I am looking at the same key, right? So I 
certainly don't expect to see the value associated with a completely 
different key.

But wait a minute... after sorting, stack1 and stack2 now are equal. I 
could just as easily expect to see "22" printed.

I thought we had just agreed that stack1 and stack2 are *different* keys. 
Of course they are different. They represent different code paths. But 
after sorting stack1, it looks exactly like stack2. It looks like a 
different code path. It *lies* -- it no longer represents the code path 
that it actually represents, instead it looks like a *different* code 
path.

I then generate another stack:

stack3 = [('/home/songza/.../api.py', 'api'),
          ('/home/songza.../api.py', 'get_song'),
          ('/home/songza/.../views.py', 'song_info'),
          ('/usr/lib/.../base.py', 'get_response')
          ]

should data[stack3] return 11 (it has the same value as stack1) or 22 (it 
has the same value as stack2)? Or possibly 33? Or raise KeyError?

Treating stacks in this context as mutable is *incoherent*. It is nice 
and convenient to be able to build up a stack trace using a mutable list, 
you won't get an argument from me about that, but that can only be 
considered a temporary data structure used to build the data structure 
you actually care about, which is fixed.

That brings it back to my question: your application is not a counter-
example to my question about using lists as keys, because your data is 
not inherently list-like. It is inherently tuple-like, you just build it 
using a temporary list. That's perfectly fine, by the way, I do the same 
thing.

As you say, the order of the lines in the stack trace is significant. You 
cannot expect to mutate the stack and move lines around and treat it as 
the same stack. If you move the lines about, it represents a different 
stack. That is fundamentally different from the normal use of a list, 
where you do expect to be able to move lines about and still have it 
count as "the same list".

> I think we're going to have to let this be.  You obviously have your
> concept of what a tuple is and what a list is.  I disagree.  

I think a tuple is an immutable sequence of items, and a list is a 
mutable sequence of items.

> I don't
> think either of us is right or wrong, we just have different ways of
> thinking about things.
> 
> You come at it from a theoretical point of view.

I certainly do not. My position here is imminently practical. The 
alternative, the mutability of keys, is simply incoherent.

> You think of each type
> as an embodiment of certain concepts ("it represents a fixed-length
> heterogenous sequence").  Your thinking is driven by what each type was
> intended to be used for.

Not even close. My thinking is driven by the things each data structure 
needs to do. See below.

> I come at it from a practical point of view.  To me, each type is a
> collection of methods.  I have certain operations I need to perform.  I
> pick the type which offers those operations.  If the set of operations I
> need to perform (in this case, {append, hash}) don't exist in a single
> type, I'm forced to use both types and convert from one to the other as
> needed.

I don't see that as a problem. Converting from one type to another is 
exactly the sort of thing I described in my earlier question.

In your application, you build up a collection of code lines that 
represent a stack trace. Here's that example from your own documentation 
again:

[('/usr/lib/.../base.py', 'get_response'),
 ('/home/songza/.../views.py', 'song_info'),
 ('/home/songza.../api.py', 'get_song'), 
 ('/home/songza/.../api.py', 'api')]

What are the sorts of things I might meaningfully want to do with this 
*complete* stack trace?

Add extra lines to it? No. If I needed to add extra lines, it wouldn't be 
complete.

Delete lines? Certainly not, that would change the code path it claims to 
represent to a code path it doesn't represent.

Sort the list? Reverse it? Heavens no.

If you look at the available list methods, *not one* of the mutating 
methods is appropriate to a completed stack trace object. *None* of the 
mutator list methods are appropriate once the stack trace object is 
complete, and using them would be counter-productive.

If you believe different, then please tell me what mutations your code 
actually performs after the stack trace object is completed. In the code 
you showed, you throw the list away after turning it into a tuple.

If the object represents a "list of code lines", in the sense of a 
mutable Python list rather than a mere sequence, then why don't you use 
any list methods on it?

The append method is useful during construction, but that is all. After 
the stack is complete, use of any mutator method would be a bug. In other 
words, it ought to be immutable, and the use of a list ought to be buried 
in the appropriate function as an internal implementation detail. The 
public interface ought to be that of an immutable tuple of immutable 
strings, because once you have finished building the object, it should 
not be possible to mutate it.

This is hardly a theoretical viewpoint. The idea of treating data that 
ought not be changed as immutable is borne out of bitter experience of 
millions of man-hours tracking down hundreds of thousands of bugs.

(Admittedly not all of those bugs were *my* bugs. I'm talking the 
collective experience of programmers over fifty years of coding.)

-- 
Steven