[python-uk] A stack with better performance than using a list
Jonathan Hartley
tartley at tartley.com
Thu Jun 8 12:26:17 EDT 2017
Yep, that's a great elimination of the suspicious small overheads.
line_profiler is beautiful, I'll definitely be adding it to my toolbox,
thanks for that!
I tried a variant of accumulating the output and printing it all as a
single string, but of course this didn't help, printing is already buffered.
Jonathan
On 6/8/2017 03:54, Stestagg wrote:
> I honestly can't see a way to improve this in python. My best
> solution is:
>
> def main(lines):
> stack = []
> sa = stack.append
> sp = stack.pop
> si = stack.__getitem__
> for line in lines:
> meth = line[:3]
> if meth == b'pus':
> sa(int(line[5:]))
> elif meth == b'pop':
> sp()
> else:
> parts = line[15:].split()
> end = len(stack)-1
> amount = int(parts[1])
> for x in range(int(parts[0])):
> index = end - x
> stack[index] += amount
> print(stack[-1] if stack else None)
>
> which comes out about 25% faster than your solution.
>
> One tool that's interesting to use here is: line_profiler:
> https://github.com/rkern/line_profiler
>
> putting a @profile decorator on the above main() call, and running
> with kernprof produces the following output:
>
> Line #Hits TimePer Hit % TimeLine Contents
>
> ==============================================================
>
> 12 @profile
>
> 13 def main(lines):
>
> 14 144.00.0stack = []
>
> 15 2000001 9495990.5 11.5for line in lines:
>
> 16 200000011269440.6 13.7meth = line[:3]
>
> 17 2000000 9746350.5 11.8if meth == b'pus':
>
> 18 100000010027331.0 12.2stack.append(int(line[5:]))
>
> 19 1000000 4787560.55.8elif meth == b'pop':
>
> 20999999 5971140.67.2stack.pop()
>
> 21 else:
>
> 22 166.00.0parts = line[15:].split()
>
> 23 122.00.0end = len(stack)-1
>
> 24 111.00.0amount = int(parts[1])
>
> 25500001 2412270.52.9for x in range(int(parts[0])):
>
> 26500000 2734770.53.3index = end - x
>
> 27500000 3090330.63.7stack[index] += amount
>
> 28 200000022958031.1 27.8print(stack[-1])
>
>
> which shows that there's no obvious bottleneck (line by line) here
> (for my sample data).
>
> Note the print() overhead dominates the runtime, and that's with me
> piping the output to /dev/null directly.
>
> I had a go at using arrays, deques, and numpy arrays in various ways
> without luck, but we're getting fairly close to the native python
> statement execution overhead here (hence folding it all into one
> function).
>
> My only thoughts would be to see if there were some magic that could
> be done by offloading the work onto a non-python library somehow.
>
> Another thing that might help some situations (hence my previous
> questions) would be to implement the add_to_first_n as a lazy operator
> (i.e. have a stack of the add_to_first_n values and dynamically add to
> the results of pop() but that would proabably be much slow in the
> average case.
>
> Steve
>
> On Wed, Jun 7, 2017 at 7:34 PM Jonathan Hartley <tartley at tartley.com
> <mailto:tartley at tartley.com>> wrote:
>
> Hey.
>
> Thanks for engaging, but I can't help with the most important of
> those questions - the large data sets on which my solution failed
> due to timeout are hidden from candidates. Not unreasonable to
> assume that they do exercise deep stacks, and large args to
> add_to_first_n, etc.
>
> Yes, the input looks exactly like your example. All args are
> integers. The question asked for output corresponding to the top
> of the stack after every operation. I omitted this print from
> inside the 'for' loop in 'main', thinking it irrelevant.
>
> I converted to integers inside 'dispatch'. 'args' must have
> actually been created with:
>
> args = [int(i) for i in tokens[1:]]
>
> Where len(tokens) is never going to be bigger than 3.
>
> Return values (from 'pop') were unused.
>
>
> On 6/7/2017 13:25, Stestagg wrote:
>> Do you have any more context?
>> For example, is the add_to_first_n likely to be called with very
>> large numbers, or very often? Does the stack get very deep, or
>> stay shallow?
>>
>> I'm assuming that lines look like this:
>>
>> push 1
>> push 2
>> add_to_first_n 2 10
>> pop
>> pop
>>
>> with all arguments as integers, and the final value being
>> returned from main()?
>> How did you convert from string inputs to numeric values?
>> How did you manage return values?
>>
>> :D
>>
>> On Wed, Jun 7, 2017 at 6:51 PM Jonathan Hartley
>> <tartley at tartley.com <mailto:tartley at tartley.com>> wrote:
>>
>> I recently submitted a solution to a coding challenge, in an
>> employment context. One of the questions was to model a
>> simple stack. I wrote a solution which appended and popped
>> from the end of a list. This worked, but failed with timeouts
>> on their last few automated tests with large (hidden) data sets.
>>
>> From memory, I think I had something pretty standard:
>>
>> class Stack:
>>
>> def __init__(self):
>> self.storage = []
>>
>> def push(arg):
>> self.storage.append(arg)
>>
>> def pop():
>> return self.storage.pop() if self.storage else None
>>
>> def add_to_first_n(n, amount):
>> for n in range(n):
>> self.storage[n] += amount
>>
>> def dispatch(self, line)
>> tokens = line.split()
>> method = getattr(self, tokens[0])
>> args = tokens[1:]
>> method(*args)
>>
>> def main(lines):
>> stack = Stack()
>> for line in lines:
>> stack.dispatch(line)
>>
>>
>> (will that formatting survive? Apologies if not)
>>
>> Subsequent experiments have confirmed that appending to and
>> popping from the end of lists are O(1), amortized.
>>
>> So why is my solution too slow?
>>
>> This question was against the clock, 4th question of 4 in an
>> hour. So I wasn't expecting to produce Cython or C optimised
>> code in that timeframe (Besides, my submitted .py file runs
>> on their servers, so the environment is limited.)
>>
>> So what am I missing, from a performance perspective? Are
>> there other data structures in stdlib which are also O(1) but
>> with a better constant?
>>
>> Ah. In writing this out, I have begun to suspect that my
>> slicing of 'tokens' to produce 'args' in the dispatch is
>> needlessly wasting time. Not much, but some.
>>
>> Thoughts welcome,
>>
>> Jonathan
>>
>> --
>> Jonathan Hartleytartley at tartley.com <mailto:tartley at tartley.com> http://tartley.com
>> Made out of meat.+1 507-513-1101 <tel:%28507%29%20513-1101> twitter/skype: tartley
>>
>> _______________________________________________
>> python-uk mailing list
>> python-uk at python.org <mailto:python-uk at python.org>
>> https://mail.python.org/mailman/listinfo/python-uk
>>
>>
>>
>> _______________________________________________
>> python-uk mailing list
>> python-uk at python.org <mailto:python-uk at python.org>
>> https://mail.python.org/mailman/listinfo/python-uk
>
> --
> Jonathan Hartleytartley at tartley.com <mailto:tartley at tartley.com> http://tartley.com
> Made out of meat.+1 507-513-1101 <tel:%28507%29%20513-1101> twitter/skype: tartley
>
> _______________________________________________
> python-uk mailing list
> python-uk at python.org <mailto:python-uk at python.org>
> https://mail.python.org/mailman/listinfo/python-uk
>
>
>
> _______________________________________________
> python-uk mailing list
> python-uk at python.org
> https://mail.python.org/mailman/listinfo/python-uk
--
Jonathan Hartley tartley at tartley.com http://tartley.com
Made out of meat. +1 507-513-1101 twitter/skype: tartley
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-uk/attachments/20170608/347d5229/attachment-0001.html>
More information about the python-uk
mailing list