[python-uk] A stack with better performance than using a list
Jonathan Hartley
tartley at tartley.com
Tue Jun 13 09:36:29 EDT 2017
You are right, when popping an empty stack I should probably raise.
On 2017-06-08 13:06, Samuel F wrote:
> It may have failed for a different reason, (hard to say without the
> original question and answer).
>
> In the case where the stack is empty, you are returning None, was that
> the requirement? (Likely to have been -1)
>
> Sam
>
> On Thu, 8 Jun 2017 at 17:27, Jonathan Hartley <tartley at tartley.com>
> wrote:
>
>> Yep, that's a great elimination of the suspicious small overheads.
>>
>> line_profiler is beautiful, I'll definitely be adding it to my
>> toolbox, thanks for that!
>>
>> I tried a variant of accumulating the output and printing it all as
>> a single string, but of course this didn't help, printing is already
>> buffered.
>>
>> Jonathan
>>
>> On 6/8/2017 03:54, Stestagg wrote:
>>
>> I honestly can't see a way to improve this in python. My best
>> solution is:
>>
>> def main(lines):
>> stack = []
>> sa = stack.append
>> sp = stack.pop
>> si = stack.__getitem__
>> for line in lines:
>> meth = line[:3]
>> if meth == b'pus':
>> sa(int(line[5:]))
>> elif meth == b'pop':
>> sp()
>> else:
>> parts = line[15:].split()
>> end = len(stack)-1
>> amount = int(parts[1])
>> for x in range(int(parts[0])):
>> index = end - x
>> stack[index] += amount
>> print(stack[-1] if stack else None)
>>
>> which comes out about 25% faster than your solution.
>>
>> One tool that's interesting to use here is: line_profiler:
>> https://github.com/rkern/line_profiler
>>
>> putting a @profile decorator on the above main() call, and running
>> with kernprof produces the following output:
>>
>> Line # Hits Time Per Hit % Time Line Contents
>>
>> ==============================================================
>>
>> 12 @profile
>>
>> 13 def main(lines):
>>
>> 14 1 4 4.0 0.0 stack = []
>>
>> 15 2000001 949599 0.5 11.5 for line in
>> lines:
>>
>> 16 2000000 1126944 0.6 13.7 meth =
>> line[:3]
>>
>> 17 2000000 974635 0.5 11.8 if meth ==
>> b'pus':
>>
>> 18 1000000 1002733 1.0 12.2
>> stack.append(int(line[5:]))
>>
>> 19 1000000 478756 0.5 5.8 elif meth
>> == b'pop':
>>
>> 20 999999 597114 0.6 7.2
>> stack.pop()
>>
>> 21 else:
>>
>> 22 1 6 6.0 0.0 parts =
>> line[15:].split()
>>
>> 23 1 2 2.0 0.0 end =
>> len(stack)-1
>>
>> 24 1 1 1.0 0.0 amount
>> = int(parts[1])
>>
>> 25 500001 241227 0.5 2.9 for x
>> in range(int(parts[0])):
>>
>> 26 500000 273477 0.5 3.3
>> index = end - x
>>
>> 27 500000 309033 0.6 3.7
>> stack[index] += amount
>>
>> 28 2000000 2295803 1.1 27.8
>> print(stack[-1])
>>
>> which shows that there's no obvious bottleneck (line by line) here
>> (for my sample data).
>>
>> Note the print() overhead dominates the runtime, and that's with me
>> piping the output to /dev/null directly.
>>
>> I had a go at using arrays, deques, and numpy arrays in various ways
>> without luck, but we're getting fairly close to the native python
>> statement execution overhead here (hence folding it all into one
>> function).
>>
>> My only thoughts would be to see if there were some magic that could
>> be done by offloading the work onto a non-python library somehow.
>>
>> Another thing that might help some situations (hence my previous
>> questions) would be to implement the add_to_first_n as a lazy
>> operator (i.e. have a stack of the add_to_first_n values and
>> dynamically add to the results of pop() but that would proabably be
>> much slow in the average case.
>>
>> Steve
>>
>> On Wed, Jun 7, 2017 at 7:34 PM Jonathan Hartley
>> <tartley at tartley.com> wrote:
>>
>> Hey.
>>
>> Thanks for engaging, but I can't help with the most important of
>> those questions - the large data sets on which my solution failed
>> due to timeout are hidden from candidates. Not unreasonable to
>> assume that they do exercise deep stacks, and large args to
>> add_to_first_n, etc.
>>
>> Yes, the input looks exactly like your example. All args are
>> integers. The question asked for output corresponding to the top of
>> the stack after every operation. I omitted this print from inside
>> the 'for' loop in 'main', thinking it irrelevant.
>>
>> I converted to integers inside 'dispatch'. 'args' must have actually
>> been created with:
>>
>> args = [int(i) for i in tokens[1:]]
>>
>> Where len(tokens) is never going to be bigger than 3.
>>
>> Return values (from 'pop') were unused.
>>
>> On 6/7/2017 13:25, Stestagg wrote:
>>
>> Do you have any more context?
>> For example, is the add_to_first_n likely to be called with very
>> large numbers, or very often? Does the stack get very deep, or stay
>> shallow?
>>
>> I'm assuming that lines look like this:
>>
>> push 1
>> push 2
>> add_to_first_n 2 10
>> pop
>> pop
>>
>> with all arguments as integers, and the final value being returned
>> from main()?
>> How did you convert from string inputs to numeric values?
>> How did you manage return values?
>>
>> :D
>>
>> On Wed, Jun 7, 2017 at 6:51 PM Jonathan Hartley
>> <tartley at tartley.com> wrote:
>>
>> I recently submitted a solution to a coding challenge, in an
>> employment context. One of the questions was to model a simple
>> stack. I wrote a solution which appended and popped from the end of
>> a list. This worked, but failed with timeouts on their last few
>> automated tests with large (hidden) data sets.
>>
>> From memory, I think I had something pretty standard:
>>
>> class Stack:
>>
>> def __init__(self):
>> self.storage = []
>>
>> def push(arg):
>> self.storage.append(arg)
>>
>> def pop():
>> return self.storage.pop() if self.storage else None
>>
>> def add_to_first_n(n, amount):
>> for n in range(n):
>> self.storage[n] += amount
>>
>> def dispatch(self, line)
>> tokens = line.split()
>> method = getattr(self, tokens[0])
>> args = tokens[1:]
>> method(*args)
>>
>> def main(lines):
>> stack = Stack()
>> for line in lines:
>> stack.dispatch(line)
>>
>> (will that formatting survive? Apologies if not)
>>
>> Subsequent experiments have confirmed that appending to and popping
>> from the end of lists are O(1), amortized. So why is my solution too
>> slow?
>>
>> This question was against the clock, 4th question of 4 in an hour.
>> So I wasn't expecting to produce Cython or C optimised code in that
>> timeframe (Besides, my submitted .py file runs on their servers, so
>> the environment is limited.)
>>
>> So what am I missing, from a performance perspective? Are there
>> other data structures in stdlib which are also O(1) but with a
>> better constant?
>>
>> Ah. In writing this out, I have begun to suspect that my slicing of
>> 'tokens' to produce 'args' in the dispatch is needlessly wasting
>> time. Not much, but some.
>>
>> Thoughts welcome,
>>
>> Jonathan
>>
>> --
>> Jonathan Hartley tartley at tartley.com http://tartley.com
>> Made out of meat. +1 507-513-1101 [1] twitter/skype:
>> tartley
>>
>> _______________________________________________
>> python-uk mailing list
>> python-uk at python.org
>> https://mail.python.org/mailman/listinfo/python-uk
>>
>> _______________________________________________
>> python-uk mailing list
>> python-uk at python.org
>> https://mail.python.org/mailman/listinfo/python-uk
>
> --
> Jonathan Hartley tartley at tartley.com http://tartley.com
> Made out of meat. +1 507-513-1101 [1] twitter/skype: tartley
>
> _______________________________________________
> python-uk mailing list
> python-uk at python.org
> https://mail.python.org/mailman/listinfo/python-uk
>
> _______________________________________________
> python-uk mailing list
> python-uk at python.org
> https://mail.python.org/mailman/listinfo/python-uk
>
> --
> Jonathan Hartley tartley at tartley.com http://tartley.com
> Made out of meat. +1 507-513-1101 twitter/skype: tartley
>
> _______________________________________________
> python-uk mailing list
> python-uk at python.org
> https://mail.python.org/mailman/listinfo/python-uk
>
>
> Links:
> ------
> [1] tel:%28507%29%20513-1101
> _______________________________________________
> python-uk mailing list
> python-uk at python.org
> https://mail.python.org/mailman/listinfo/python-uk
--
Jonathan Hartley Made out of meat.
tartley at tartley.com +1 507-513-1101
More information about the python-uk
mailing list