[python-uk] A stack with better performance than using a list

Jonathan Hartley tartley at tartley.com
Tue Jun 13 09:36:29 EDT 2017


You are right, when popping an empty stack I should probably raise.



On 2017-06-08 13:06, Samuel F wrote:
> It may have failed for a different reason, (hard to say without the
> original question and answer).
> 
> In the case where the stack is empty, you are returning None, was that
> the requirement? (Likely to have been -1)
> 
> Sam
> 
> On Thu, 8 Jun 2017 at 17:27, Jonathan Hartley <tartley at tartley.com>
> wrote:
> 
>> Yep, that's a great elimination of the suspicious small overheads.
>> 
>> line_profiler is beautiful, I'll definitely be adding it to my
>> toolbox, thanks for that!
>> 
>> I tried a variant of accumulating the output and printing it all as
>> a single string, but of course this didn't help, printing is already
>> buffered.
>> 
>> Jonathan
>> 
>> On 6/8/2017 03:54, Stestagg wrote:
>> 
>> I honestly can't see a way to improve this in python.  My best
>> solution is:
>> 
>> def main(lines):
>> stack = []
>> sa = stack.append
>> sp = stack.pop
>> si = stack.__getitem__
>> for line in lines:
>> meth = line[:3]
>> if meth == b'pus':
>> sa(int(line[5:]))
>> elif meth == b'pop':
>> sp()
>> else:
>> parts = line[15:].split()
>> end = len(stack)-1
>> amount = int(parts[1])
>> for x in range(int(parts[0])):
>> index = end - x
>> stack[index] += amount
>> print(stack[-1] if stack else None)
>> 
>> which comes out about 25% faster than your solution.
>> 
>> One tool that's interesting to use here is: line_profiler:
>> https://github.com/rkern/line_profiler
>> 
>> putting a @profile decorator on the above main() call, and running
>> with kernprof produces the following output:
>> 
>> Line #      Hits         Time  Per Hit   % Time  Line Contents
>> 
>> ==============================================================
>> 
>> 12                                           @profile
>> 
>> 13                                           def main(lines):
>> 
>> 14         1            4      4.0      0.0      stack = []
>> 
>> 15   2000001       949599      0.5     11.5      for line in
>> lines:
>> 
>> 16   2000000      1126944      0.6     13.7          meth =
>> line[:3]
>> 
>> 17   2000000       974635      0.5     11.8          if meth ==
>> b'pus':
>> 
>> 18   1000000      1002733      1.0     12.2
>> stack.append(int(line[5:]))
>> 
>> 19   1000000       478756      0.5      5.8          elif meth
>> == b'pop':
>> 
>> 20    999999       597114      0.6      7.2
>> stack.pop()
>> 
>> 21                                                   else:
>> 
>> 22         1            6      6.0      0.0              parts =
>> line[15:].split()
>> 
>> 23         1            2      2.0      0.0              end =
>> len(stack)-1
>> 
>> 24         1            1      1.0      0.0              amount
>> = int(parts[1])
>> 
>> 25    500001       241227      0.5      2.9              for x
>> in range(int(parts[0])):
>> 
>> 26    500000       273477      0.5      3.3
>> index = end - x
>> 
>> 27    500000       309033      0.6      3.7
>> stack[index] += amount
>> 
>> 28   2000000      2295803      1.1     27.8
>> print(stack[-1])
>> 
>> which shows that there's no obvious bottleneck (line by line) here
>> (for my sample data).
>> 
>> Note the print() overhead dominates the runtime, and that's with me
>> piping the output to /dev/null directly.
>> 
>> I had a go at using arrays, deques, and numpy arrays in various ways
>> without luck, but we're getting fairly close to the native python
>> statement execution overhead here (hence folding it all into one
>> function).
>> 
>> My only thoughts would be to see if there were some magic that could
>> be done by offloading the work onto a non-python library somehow.
>> 
>> Another thing that might help some situations (hence my previous
>> questions) would be to implement the add_to_first_n as a lazy
>> operator (i.e. have a stack of the add_to_first_n values and
>> dynamically add to the results of pop() but that would proabably be
>> much slow in the average case.
>> 
>> Steve
>> 
>> On Wed, Jun 7, 2017 at 7:34 PM Jonathan Hartley
>> <tartley at tartley.com> wrote:
>> 
>> Hey.
>> 
>> Thanks for engaging, but I can't help with the most important of
>> those questions - the large data sets on which my solution failed
>> due to timeout are hidden from candidates. Not unreasonable to
>> assume that they do exercise deep stacks, and large args to
>> add_to_first_n, etc.
>> 
>> Yes, the input looks exactly like your example. All args are
>> integers. The question asked for output corresponding to the top of
>> the stack after every operation. I omitted this print from inside
>> the 'for' loop in 'main', thinking it irrelevant.
>> 
>> I converted to integers inside 'dispatch'. 'args' must have actually
>> been created with:
>> 
>> args = [int(i) for i in tokens[1:]]
>> 
>> Where len(tokens) is never going to be bigger than 3.
>> 
>> Return values (from 'pop') were unused.
>> 
>> On 6/7/2017 13:25, Stestagg wrote:
>> 
>> Do you have any more context?
>> For example, is the add_to_first_n likely to be called with very
>> large numbers, or very often? Does the stack get very deep, or stay
>> shallow?
>> 
>> I'm assuming that lines look like this:
>> 
>> push 1
>> push 2
>> add_to_first_n 2 10
>> pop
>> pop
>> 
>> with all arguments as integers, and the final value being returned
>> from main()?
>> How did you convert from string inputs to numeric values?
>> How did you manage return values?
>> 
>> :D
>> 
>> On Wed, Jun 7, 2017 at 6:51 PM Jonathan Hartley
>> <tartley at tartley.com> wrote:
>> 
>> I recently submitted a solution to a coding challenge, in an
>> employment context. One of the questions was to model a simple
>> stack. I wrote a solution which appended and popped from the end of
>> a list. This worked, but failed with timeouts on their last few
>> automated tests with large (hidden) data sets.
>> 
>> From memory, I think I had something pretty standard:
>> 
>> class Stack:
>> 
>> def __init__(self):
>> self.storage = []
>> 
>> def push(arg):
>> self.storage.append(arg)
>> 
>> def pop():
>> return self.storage.pop() if self.storage else None
>> 
>> def add_to_first_n(n, amount):
>> for n in range(n):
>> self.storage[n] += amount
>> 
>> def dispatch(self, line)
>> tokens = line.split()
>> method = getattr(self, tokens[0])
>> args = tokens[1:]
>> method(*args)
>> 
>> def main(lines):
>> stack = Stack()
>> for line in lines:
>> stack.dispatch(line)
>> 
>> (will that formatting survive? Apologies if not)
>> 
>> Subsequent experiments have confirmed that appending to and popping
>> from the end of lists are O(1), amortized. So why is my solution too
>> slow?
>> 
>> This question was against the clock, 4th question of 4 in an hour.
>> So I wasn't expecting to produce Cython or C optimised code in that
>> timeframe (Besides, my submitted .py file runs on their servers, so
>> the environment is limited.)
>> 
>> So what am I missing, from a performance perspective? Are there
>> other data structures in stdlib which are also O(1) but with a
>> better constant?
>> 
>> Ah. In writing this out, I have begun to suspect that my slicing of
>> 'tokens' to produce 'args' in the dispatch is needlessly wasting
>> time. Not much, but some.
>> 
>> Thoughts welcome,
>> 
>> Jonathan
>> 
>> --
>> Jonathan Hartley    tartley at tartley.com    http://tartley.com
>> Made out of meat.   +1 507-513-1101 [1]        twitter/skype:
>> tartley
>> 
>> _______________________________________________
>> python-uk mailing list
>> python-uk at python.org
>> https://mail.python.org/mailman/listinfo/python-uk
>> 
>> _______________________________________________
>> python-uk mailing list
>> python-uk at python.org
>> https://mail.python.org/mailman/listinfo/python-uk
> 
> --
> Jonathan Hartley    tartley at tartley.com    http://tartley.com
> Made out of meat.   +1 507-513-1101 [1]        twitter/skype: tartley
> 
>  _______________________________________________
> python-uk mailing list
> python-uk at python.org
> https://mail.python.org/mailman/listinfo/python-uk
> 
> _______________________________________________
> python-uk mailing list
> python-uk at python.org
> https://mail.python.org/mailman/listinfo/python-uk
> 
> --
> Jonathan Hartley    tartley at tartley.com    http://tartley.com
> Made out of meat.   +1 507-513-1101        twitter/skype: tartley
> 
>  _______________________________________________
> python-uk mailing list
> python-uk at python.org
> https://mail.python.org/mailman/listinfo/python-uk
> 
> 
> Links:
> ------
> [1] tel:%28507%29%20513-1101
> _______________________________________________
> python-uk mailing list
> python-uk at python.org
> https://mail.python.org/mailman/listinfo/python-uk

-- 
Jonathan Hartley   Made out of meat.
tartley at tartley.com  +1 507-513-1101


More information about the python-uk mailing list