[python-uk] A stack with better performance than using a list

Thu Jun 8 11:46:53 EDT 2017

Good point. FWIW, my submission was running Python 3.

On 6/8/2017 04:33, Toby Dickenson wrote:
> In python 2, your use of range() without checking for a very large
> parameter n might cause either a MemoryError exception, or trigger a
> huge memory allocation just for the range list. Not a problem in
> python 3 of course.
>
>
> On 8 June 2017 at 09:54, Stestagg <stestagg at gmail.com> wrote:
>> I honestly can't see a way to improve this in python.  My best solution is:
>>
>> def main(lines):
>>      stack = []
>>      sa = stack.append
>>      sp = stack.pop
>>      si = stack.__getitem__
>>      for line in lines:
>>          meth = line[:3]
>>          if meth == b'pus':
>>              sa(int(line[5:]))
>>          elif meth == b'pop':
>>              sp()
>>          else:
>>              parts = line[15:].split()
>>              end = len(stack)-1
>>              amount = int(parts[1])
>>              for x in range(int(parts[0])):
>>                  index = end - x
>>                  stack[index] += amount
>>          print(stack[-1] if stack else None)
>>
>> which comes out about 25% faster than your solution.
>>
>> One tool that's interesting to use here is: line_profiler:
>> https://github.com/rkern/line_profiler
>>
>> putting a @profile decorator on the above main() call, and running with
>> kernprof produces the following output:
>>
>> Line #      Hits         Time  Per Hit   % Time  Line Contents
>>
>> ==============================================================
>>
>>      12                                           @profile
>>
>>      13                                           def main(lines):
>>
>>      14         1            4      4.0      0.0      stack = []
>>
>>      15   2000001       949599      0.5     11.5      for line in lines:
>>
>>      16   2000000      1126944      0.6     13.7          meth = line[:3]
>>
>>      17   2000000       974635      0.5     11.8          if meth == b'pus':
>>
>>      18   1000000      1002733      1.0     12.2
>> stack.append(int(line[5:]))
>>
>>      19   1000000       478756      0.5      5.8          elif meth ==
>> b'pop':
>>
>>      20    999999       597114      0.6      7.2              stack.pop()
>>
>>      21                                                   else:
>>
>>      22         1            6      6.0      0.0              parts =
>> line[15:].split()
>>
>>      23         1            2      2.0      0.0              end =
>> len(stack)-1
>>
>>      24         1            1      1.0      0.0              amount =
>> int(parts[1])
>>
>>      25    500001       241227      0.5      2.9              for x in
>> range(int(parts[0])):
>>
>>      26    500000       273477      0.5      3.3                  index = end
>> - x
>>
>>      27    500000       309033      0.6      3.7
>> stack[index] += amount
>>
>>      28   2000000      2295803      1.1     27.8          print(stack[-1])
>>
>>
>> which shows that there's no obvious bottleneck (line by line) here (for my
>> sample data).
>>
>> Note the print() overhead dominates the runtime, and that's with me piping
>> the output to /dev/null directly.
>>
>> I had a go at using arrays, deques, and numpy arrays in various ways without
>> luck, but we're getting fairly close to the native python statement
>> execution overhead here (hence folding it all into one function).
>>
>> My only thoughts would be to see if there were some magic that could be done
>> by offloading the work onto a non-python library somehow.
>>
>> Another thing that might help some situations (hence my previous questions)
>> would be to implement the add_to_first_n as a lazy operator (i.e. have a
>> stack of the add_to_first_n values and dynamically add to the results of
>> pop() but that would proabably be much slow in the average case.
>>
>> Steve
>>
>> On Wed, Jun 7, 2017 at 7:34 PM Jonathan Hartley <tartley at tartley.com> wrote:
>>> Hey.
>>>
>>> Thanks for engaging, but I can't help with the most important of those
>>> questions - the large data sets on which my solution failed due to timeout
>>> are hidden from candidates. Not unreasonable to assume that they do exercise
>>> deep stacks, and large args to add_to_first_n, etc.
>>>
>>> Yes, the input looks exactly like your example. All args are integers. The
>>> question asked for output corresponding to the top of the stack after every
>>> operation. I omitted this print from inside the 'for' loop in 'main',
>>> thinking it irrelevant.
>>>
>>> I converted to integers inside 'dispatch'. 'args' must have actually been
>>> created with:
>>>
>>> args = [int(i) for i in tokens[1:]]
>>>
>>> Where len(tokens) is never going to be bigger than 3.
>>>
>>> Return values (from 'pop') were unused.
>>>
>>>
>>> On 6/7/2017 13:25, Stestagg wrote:
>>>
>>> Do you have any more context?
>>> For example, is the add_to_first_n likely to be called with very large
>>> numbers, or very often? Does the stack get very deep, or stay shallow?
>>>
>>> I'm assuming that lines look like this:
>>>
>>> push 1
>>> push 2
>>> add_to_first_n 2 10
>>> pop
>>> pop
>>>
>>> with all arguments as integers, and the final value being returned from
>>> main()?
>>> How did you convert from string inputs to numeric values?
>>> How did you manage return values?
>>>
>>> :D
>>>
>>> On Wed, Jun 7, 2017 at 6:51 PM Jonathan Hartley <tartley at tartley.com>
>>> wrote:
>>>> I recently submitted a solution to a coding challenge, in an employment
>>>> context. One of the questions was to model a simple stack. I wrote a
>>>> solution which appended and popped from the end of a list. This worked, but
>>>> failed with timeouts on their last few automated tests with large (hidden)
>>>> data sets.
>>>>
>>>>  From memory, I think I had something pretty standard:
>>>>
>>>> class Stack:
>>>>
>>>>      def __init__(self):
>>>>          self.storage = []
>>>>
>>>>      def push(arg):
>>>>          self.storage.append(arg)
>>>>
>>>>      def pop():
>>>>          return self.storage.pop() if self.storage else None
>>>>
>>>>      def add_to_first_n(n, amount):
>>>>          for n in range(n):
>>>>              self.storage[n] += amount
>>>>
>>>>      def dispatch(self, line)
>>>>          tokens = line.split()
>>>>          method = getattr(self, tokens[0])
>>>>          args = tokens[1:]
>>>>          method(*args)
>>>>
>>>> def main(lines):
>>>>      stack = Stack()
>>>>      for line in lines:
>>>>          stack.dispatch(line)
>>>>
>>>>
>>>> (will that formatting survive? Apologies if not)
>>>>
>>>> Subsequent experiments have confirmed that appending to and popping from
>>>> the end of lists are O(1), amortized.
>>>>
>>>> So why is my solution too slow?
>>>>
>>>> This question was against the clock, 4th question of 4 in an hour. So I
>>>> wasn't expecting to produce Cython or C optimised code in that timeframe
>>>> (Besides, my submitted .py file runs on their servers, so the environment is
>>>> limited.)
>>>>
>>>> So what am I missing, from a performance perspective? Are there other
>>>> data structures in stdlib which are also O(1) but with a better constant?
>>>>
>>>> Ah. In writing this out, I have begun to suspect that my slicing of
>>>> 'tokens' to produce 'args' in the dispatch is needlessly wasting time. Not
>>>> much, but some.
>>>>
>>>> Thoughts welcome,
>>>>
>>>>      Jonathan
>>>>
>>>> --
>>>> Jonathan Hartley    tartley at tartley.com    http://tartley.com
>>>> Made out of meat.   +1 507-513-1101        twitter/skype: tartley
>>>>
>>>> _______________________________________________
>>>> python-uk mailing list
>>>> python-uk at python.org
>>>> https://mail.python.org/mailman/listinfo/python-uk
>>>
>>>
>>> _______________________________________________
>>> python-uk mailing list
>>> python-uk at python.org
>>> https://mail.python.org/mailman/listinfo/python-uk
>>>
>>>
>>> --
>>> Jonathan Hartley    tartley at tartley.com    http://tartley.com
>>> Made out of meat.   +1 507-513-1101        twitter/skype: tartley
>>>
>>> _______________________________________________
>>> python-uk mailing list
>>> python-uk at python.org
>>> https://mail.python.org/mailman/listinfo/python-uk
>>
>> _______________________________________________
>> python-uk mailing list
>> python-uk at python.org
>> https://mail.python.org/mailman/listinfo/python-uk
>>
> _______________________________________________
> python-uk mailing list
> python-uk at python.org
> https://mail.python.org/mailman/listinfo/python-uk

-- 
Jonathan Hartley    tartley at tartley.com    http://tartley.com
Made out of meat.   +1 507-513-1101        twitter/skype: tartley

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-uk/attachments/20170608/227316f2/attachment-0001.html>