From mail at timgolden.me.uk Thu Jun 1 05:38:50 2017 From: mail at timgolden.me.uk (Tim Golden) Date: Thu, 1 Jun 2017 10:38:50 +0100 Subject: [python-uk] Twickenham Coding Evening: an advert Message-ID: If you're an educator or know someone who is in the South/West London area and need help with teaching IT in schools, feel free to come along for help to the Twickenham Coding Evening. (It mostly doesn't involve coding and isn't currently in Twickenham, but still: just come.). Here's a blog post of mine, talking around the subject a little: http://ramblings.timgolden.me.uk/2017/05/31/helping-educators/ and the event is on EventBrite: https://www.eventbrite.co.uk/e/twickenham-coding-evening-june-2017-tickets-34906741002 TJG From tartley at tartley.com Wed Jun 7 13:50:40 2017 From: tartley at tartley.com (Jonathan Hartley) Date: Wed, 7 Jun 2017 12:50:40 -0500 Subject: [python-uk] A stack with better performance than using a list Message-ID: I recently submitted a solution to a coding challenge, in an employment context. One of the questions was to model a simple stack. I wrote a solution which appended and popped from the end of a list. This worked, but failed with timeouts on their last few automated tests with large (hidden) data sets. From memory, I think I had something pretty standard: class Stack: def __init__(self): self.storage = [] def push(arg): self.storage.append(arg) def pop(): return self.storage.pop() if self.storage else None def add_to_first_n(n, amount): for n in range(n): self.storage[n] += amount def dispatch(self, line) tokens = line.split() method = getattr(self, tokens[0]) args = tokens[1:] method(*args) def main(lines): stack = Stack() for line in lines: stack.dispatch(line) (will that formatting survive? Apologies if not) Subsequent experiments have confirmed that appending to and popping from the end of lists are O(1), amortized. So why is my solution too slow? This question was against the clock, 4th question of 4 in an hour. So I wasn't expecting to produce Cython or C optimised code in that timeframe (Besides, my submitted .py file runs on their servers, so the environment is limited.) So what am I missing, from a performance perspective? Are there other data structures in stdlib which are also O(1) but with a better constant? Ah. In writing this out, I have begun to suspect that my slicing of 'tokens' to produce 'args' in the dispatch is needlessly wasting time. Not much, but some. Thoughts welcome, Jonathan -- Jonathan Hartley tartley at tartley.com http://tartley.com Made out of meat. +1 507-513-1101 twitter/skype: tartley -------------- next part -------------- An HTML attachment was scrubbed... URL: From stestagg at gmail.com Wed Jun 7 14:25:05 2017 From: stestagg at gmail.com (Stestagg) Date: Wed, 07 Jun 2017 18:25:05 +0000 Subject: [python-uk] A stack with better performance than using a list In-Reply-To: References: Message-ID: Do you have any more context? For example, is the add_to_first_n likely to be called with very large numbers, or very often? Does the stack get very deep, or stay shallow? I'm assuming that lines look like this: push 1 push 2 add_to_first_n 2 10 pop pop with all arguments as integers, and the final value being returned from main()? How did you convert from string inputs to numeric values? How did you manage return values? :D On Wed, Jun 7, 2017 at 6:51 PM Jonathan Hartley wrote: > I recently submitted a solution to a coding challenge, in an employment > context. One of the questions was to model a simple stack. I wrote a > solution which appended and popped from the end of a list. This worked, but > failed with timeouts on their last few automated tests with large (hidden) > data sets. > > From memory, I think I had something pretty standard: > > class Stack: > > def __init__(self): > self.storage = [] > > def push(arg): > self.storage.append(arg) > > def pop(): > return self.storage.pop() if self.storage else None > > def add_to_first_n(n, amount): > for n in range(n): > self.storage[n] += amount > > def dispatch(self, line) > tokens = line.split() > method = getattr(self, tokens[0]) > args = tokens[1:] > method(*args) > > def main(lines): > stack = Stack() > for line in lines: > stack.dispatch(line) > > > (will that formatting survive? Apologies if not) > > Subsequent experiments have confirmed that appending to and popping from > the end of lists are O(1), amortized. > So why is my solution too slow? > > This question was against the clock, 4th question of 4 in an hour. So I > wasn't expecting to produce Cython or C optimised code in that timeframe > (Besides, my submitted .py file runs on their servers, so the environment > is limited.) > > So what am I missing, from a performance perspective? Are there other data > structures in stdlib which are also O(1) but with a better constant? > > Ah. In writing this out, I have begun to suspect that my slicing of > 'tokens' to produce 'args' in the dispatch is needlessly wasting time. Not > much, but some. > > Thoughts welcome, > > Jonathan > > -- > Jonathan Hartley tartley at tartley.com http://tartley.com > Made out of meat. +1 507-513-1101 <(507)%20513-1101> twitter/skype: tartley > > > _______________________________________________ > python-uk mailing list > python-uk at python.org > https://mail.python.org/mailman/listinfo/python-uk > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex at moreati.org.uk Wed Jun 7 14:30:08 2017 From: alex at moreati.org.uk (Alex Willmer) Date: Wed, 7 Jun 2017 19:30:08 +0100 Subject: [python-uk] A stack with better performance than using a list In-Reply-To: References: Message-ID: On 7 June 2017 at 18:50, Jonathan Hartley wrote: > Ah. In writing this out, I have begun to suspect that my slicing of 'tokens' > to produce 'args' in the dispatch is needlessly wasting time. Not much, but > some. To put some numbers out there, eliminating the slice is not always a win. On Python 2.7.13 Jonathon's dispatch() is marginally faster for len(line.split()) == 10. The break even is around 40. At 1000 and above dispatch_1() is approx 25% faster. Same code on Python 3.6.1 was approx the same speed (within 10%). class Bench: def foo(self): pass def dispatch(self, line): tokens = line.split() method = getattr(self, tokens[0]) args = tokens[1:] #method(*args) def dispatch_1(self, line): op, rest = line.split(None, 1) method = getattr(self, op) args = rest.split() #method(*args) python --version Python 2.7.13 python -mtimeit -s "import bench; b=bench.Bench(); s='foo'+ ' x'*10" "b.dispatch(s)" 1000000 loops, best of 3: 0.577 usec per loop python -mtimeit -s "import bench; b=bench.Bench(); s='foo'+ ' x'*10" "b.dispatch_1(s)" 1000000 loops, best of 3: 0.673 usec per loop python -mtimeit -s "import bench; b=bench.Bench(); s='foo'+ ' x'*1000" "b.dispatch(s)" 100000 loops, best of 3: 19.1 usec per loop python -mtimeit -s "import bench; b=bench.Bench(); s='foo'+ ' x'*1000" "b.dispatch_1(s)" 100000 loops, best of 3: 14.8 usec per loop python -mtimeit -s "import bench; b=bench.Bench(); s='foo'+ ' x'*1000000" "b.dispatch(s)" 100 loops, best of 3: 16.9 msec per loop python -mtimeit -s "import bench; b=bench.Bench(); s='foo'+ ' x'*1000000" "b.dispatch_1(s)" 100 loops, best of 3: 12.4 msec per loop -- Alex Willmer From tom at tatw.name Wed Jun 7 14:31:07 2017 From: tom at tatw.name (Tom Wright) Date: Wed, 7 Jun 2017 19:31:07 +0100 Subject: [python-uk] A stack with better performance than using a list In-Reply-To: References: Message-ID: Algorithms questions are always fun. Quick time to answer before other people! You might be hitting problems with the "amortized part" if their code didn't run for large enough n or used dumb special cases or bounds. They may have (inadvertantly?) meant "realtime constant" (python lists occasionally taking O(n) for a single insert, with the possibility of patterns of push and pop that are O(n) on average if your list frees up memory in a timely fashion). The structure they may well have been after is a (singly) linked list... although you appear to be randomly accessing your stack. To my knowledge there is no built in linked list in the stdlib but it's not exactly hard to write one. I would be unsurprised if other languages used a link list for their abstract stack (or a linked list of arrays) Fun fact of the day: closed over variables are internally stored in a linked list (last time looked). Enthusiastically, Uncle Tom On 7 Jun 2017 6:50 p.m., "Jonathan Hartley" wrote: I recently submitted a solution to a coding challenge, in an employment context. One of the questions was to model a simple stack. I wrote a solution which appended and popped from the end of a list. This worked, but failed with timeouts on their last few automated tests with large (hidden) data sets. >From memory, I think I had something pretty standard: class Stack: def __init__(self): self.storage = [] def push(arg): self.storage.append(arg) def pop(): return self.storage.pop() if self.storage else None def add_to_first_n(n, amount): for n in range(n): self.storage[n] += amount def dispatch(self, line) tokens = line.split() method = getattr(self, tokens[0]) args = tokens[1:] method(*args) def main(lines): stack = Stack() for line in lines: stack.dispatch(line) (will that formatting survive? Apologies if not) Subsequent experiments have confirmed that appending to and popping from the end of lists are O(1), amortized. So why is my solution too slow? This question was against the clock, 4th question of 4 in an hour. So I wasn't expecting to produce Cython or C optimised code in that timeframe (Besides, my submitted .py file runs on their servers, so the environment is limited.) So what am I missing, from a performance perspective? Are there other data structures in stdlib which are also O(1) but with a better constant? Ah. In writing this out, I have begun to suspect that my slicing of 'tokens' to produce 'args' in the dispatch is needlessly wasting time. Not much, but some. Thoughts welcome, Jonathan -- Jonathan Hartley tartley at tartley.com http://tartley.com Made out of meat. +1 507-513-1101 <(507)%20513-1101> twitter/skype: tartley _______________________________________________ python-uk mailing list python-uk at python.org https://mail.python.org/mailman/listinfo/python-uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From tartley at tartley.com Wed Jun 7 14:33:30 2017 From: tartley at tartley.com (Jonathan Hartley) Date: Wed, 7 Jun 2017 13:33:30 -0500 Subject: [python-uk] A stack with better performance than using a list In-Reply-To: References: Message-ID: <8e6e0b39-a09c-4b31-b5fb-8ad9ece574ad@tartley.com> Hey. Thanks for engaging, but I can't help with the most important of those questions - the large data sets on which my solution failed due to timeout are hidden from candidates. Not unreasonable to assume that they do exercise deep stacks, and large args to add_to_first_n, etc. Yes, the input looks exactly like your example. All args are integers. The question asked for output corresponding to the top of the stack after every operation. I omitted this print from inside the 'for' loop in 'main', thinking it irrelevant. I converted to integers inside 'dispatch'. 'args' must have actually been created with: args = [int(i) for i in tokens[1:]] Where len(tokens) is never going to be bigger than 3. Return values (from 'pop') were unused. On 6/7/2017 13:25, Stestagg wrote: > Do you have any more context? > For example, is the add_to_first_n likely to be called with very large > numbers, or very often? Does the stack get very deep, or stay shallow? > > I'm assuming that lines look like this: > > push 1 > push 2 > add_to_first_n 2 10 > pop > pop > > with all arguments as integers, and the final value being returned > from main()? > How did you convert from string inputs to numeric values? > How did you manage return values? > > :D > > On Wed, Jun 7, 2017 at 6:51 PM Jonathan Hartley > wrote: > > I recently submitted a solution to a coding challenge, in an > employment context. One of the questions was to model a simple > stack. I wrote a solution which appended and popped from the end > of a list. This worked, but failed with timeouts on their last few > automated tests with large (hidden) data sets. > > From memory, I think I had something pretty standard: > > class Stack: > > def __init__(self): > self.storage = [] > > def push(arg): > self.storage.append(arg) > > def pop(): > return self.storage.pop() if self.storage else None > > def add_to_first_n(n, amount): > for n in range(n): > self.storage[n] += amount > > def dispatch(self, line) > tokens = line.split() > method = getattr(self, tokens[0]) > args = tokens[1:] > method(*args) > > def main(lines): > stack = Stack() > for line in lines: > stack.dispatch(line) > > > (will that formatting survive? Apologies if not) > > Subsequent experiments have confirmed that appending to and > popping from the end of lists are O(1), amortized. > > So why is my solution too slow? > > This question was against the clock, 4th question of 4 in an hour. > So I wasn't expecting to produce Cython or C optimised code in > that timeframe (Besides, my submitted .py file runs on their > servers, so the environment is limited.) > > So what am I missing, from a performance perspective? Are there > other data structures in stdlib which are also O(1) but with a > better constant? > > Ah. In writing this out, I have begun to suspect that my slicing > of 'tokens' to produce 'args' in the dispatch is needlessly > wasting time. Not much, but some. > > Thoughts welcome, > > Jonathan > > -- > Jonathan Hartleytartley at tartley.com http://tartley.com > Made out of meat.+1 507-513-1101 twitter/skype: tartley > > _______________________________________________ > python-uk mailing list > python-uk at python.org > https://mail.python.org/mailman/listinfo/python-uk > > > > _______________________________________________ > python-uk mailing list > python-uk at python.org > https://mail.python.org/mailman/listinfo/python-uk -- Jonathan Hartley tartley at tartley.com http://tartley.com Made out of meat. +1 507-513-1101 twitter/skype: tartley -------------- next part -------------- An HTML attachment was scrubbed... URL: From simon at fastmail.to Thu Jun 8 04:30:50 2017 From: simon at fastmail.to (Simon Hayward) Date: Thu, 08 Jun 2017 09:30:50 +0100 Subject: [python-uk] A stack with better performance than using a list In-Reply-To: <8e6e0b39-a09c-4b31-b5fb-8ad9ece574ad@tartley.com> References: <8e6e0b39-a09c-4b31-b5fb-8ad9ece574ad@tartley.com> Message-ID: <1496910650.2278421.1002665608.5049EA9A@webmail.messagingengine.com> Rather than using a list, aren't deques more appropriate as a data structure for stack like behaviour. https://docs.python.org/3.6/library/collections.html#collections.deque Regards Simon On Wed, 7 Jun 2017, at 19:33, Jonathan Hartley wrote: > Hey. > > Thanks for engaging, but I can't help with the most important of those > questions - the large data sets on which my solution failed due to > timeout are hidden from candidates. Not unreasonable to assume that they > do exercise deep stacks, and large args to add_to_first_n, etc. > > Yes, the input looks exactly like your example. All args are integers. > The question asked for output corresponding to the top of the stack > after every operation. I omitted this print from inside the 'for' loop > in 'main', thinking it irrelevant. > > I converted to integers inside 'dispatch'. 'args' must have actually > been created with: > > args = [int(i) for i in tokens[1:]] > > Where len(tokens) is never going to be bigger than 3. > > Return values (from 'pop') were unused. > > > On 6/7/2017 13:25, Stestagg wrote: > > Do you have any more context? > > For example, is the add_to_first_n likely to be called with very large > > numbers, or very often? Does the stack get very deep, or stay shallow? > > > > I'm assuming that lines look like this: > > > > push 1 > > push 2 > > add_to_first_n 2 10 > > pop > > pop > > > > with all arguments as integers, and the final value being returned > > from main()? > > How did you convert from string inputs to numeric values? > > How did you manage return values? > > > > :D > > > > On Wed, Jun 7, 2017 at 6:51 PM Jonathan Hartley > > wrote: > > > > I recently submitted a solution to a coding challenge, in an > > employment context. One of the questions was to model a simple > > stack. I wrote a solution which appended and popped from the end > > of a list. This worked, but failed with timeouts on their last few > > automated tests with large (hidden) data sets. > > > > From memory, I think I had something pretty standard: > > > > class Stack: > > > > def __init__(self): > > self.storage = [] > > > > def push(arg): > > self.storage.append(arg) > > > > def pop(): > > return self.storage.pop() if self.storage else None > > > > def add_to_first_n(n, amount): > > for n in range(n): > > self.storage[n] += amount > > > > def dispatch(self, line) > > tokens = line.split() > > method = getattr(self, tokens[0]) > > args = tokens[1:] > > method(*args) > > > > def main(lines): > > stack = Stack() > > for line in lines: > > stack.dispatch(line) > > > > > > (will that formatting survive? Apologies if not) > > > > Subsequent experiments have confirmed that appending to and > > popping from the end of lists are O(1), amortized. > > > > So why is my solution too slow? > > > > This question was against the clock, 4th question of 4 in an hour. > > So I wasn't expecting to produce Cython or C optimised code in > > that timeframe (Besides, my submitted .py file runs on their > > servers, so the environment is limited.) > > > > So what am I missing, from a performance perspective? Are there > > other data structures in stdlib which are also O(1) but with a > > better constant? > > > > Ah. In writing this out, I have begun to suspect that my slicing > > of 'tokens' to produce 'args' in the dispatch is needlessly > > wasting time. Not much, but some. > > > > Thoughts welcome, > > > > Jonathan > > > > -- > > Jonathan Hartleytartley at tartley.com http://tartley.com > > Made out of meat.+1 507-513-1101 twitter/skype: tartley > > > > _______________________________________________ > > python-uk mailing list > > python-uk at python.org > > https://mail.python.org/mailman/listinfo/python-uk > > > > > > > > _______________________________________________ > > python-uk mailing list > > python-uk at python.org > > https://mail.python.org/mailman/listinfo/python-uk > > -- > Jonathan Hartley tartley at tartley.com http://tartley.com > Made out of meat. +1 507-513-1101 twitter/skype: tartley > > _______________________________________________ > python-uk mailing list > python-uk at python.org > https://mail.python.org/mailman/listinfo/python-uk -- Simon simon at fastmail.to From andy at reportlab.com Thu Jun 8 04:48:07 2017 From: andy at reportlab.com (Andy Robinson) Date: Thu, 8 Jun 2017 09:48:07 +0100 Subject: [python-uk] A stack with better performance than using a list In-Reply-To: <1496910650.2278421.1002665608.5049EA9A@webmail.messagingengine.com> References: <8e6e0b39-a09c-4b31-b5fb-8ad9ece574ad@tartley.com> <1496910650.2278421.1002665608.5049EA9A@webmail.messagingengine.com> Message-ID: Are you sure that their test infrastructure was behaving correctly? Is it widely used, day in day out, by thousands, and known to be reliable? Did your colleagues all brag "no problem"? Or is it possible that the whole execution framework threw a momentary wobbly while trying to load up some large text file off some remote cloud? Andy From stestagg at gmail.com Thu Jun 8 04:54:14 2017 From: stestagg at gmail.com (Stestagg) Date: Thu, 08 Jun 2017 08:54:14 +0000 Subject: [python-uk] A stack with better performance than using a list In-Reply-To: <8e6e0b39-a09c-4b31-b5fb-8ad9ece574ad@tartley.com> References: <8e6e0b39-a09c-4b31-b5fb-8ad9ece574ad@tartley.com> Message-ID: I honestly can't see a way to improve this in python. My best solution is: def main(lines): stack = [] sa = stack.append sp = stack.pop si = stack.__getitem__ for line in lines: meth = line[:3] if meth == b'pus': sa(int(line[5:])) elif meth == b'pop': sp() else: parts = line[15:].split() end = len(stack)-1 amount = int(parts[1]) for x in range(int(parts[0])): index = end - x stack[index] += amount print(stack[-1] if stack else None) which comes out about 25% faster than your solution. One tool that's interesting to use here is: line_profiler: https://github.com/rkern/line_profiler putting a @profile decorator on the above main() call, and running with kernprof produces the following output: Line # Hits Time Per Hit % Time Line Contents ============================================================== 12 @profile 13 def main(lines): 14 1 4 4.0 0.0 stack = [] 15 2000001 949599 0.5 11.5 for line in lines: 16 2000000 1126944 0.6 13.7 meth = line[:3] 17 2000000 974635 0.5 11.8 if meth == b'pus': 18 1000000 1002733 1.0 12.2 stack.append(int(line[5:])) 19 1000000 478756 0.5 5.8 elif meth == b'pop': 20 999999 597114 0.6 7.2 stack.pop() 21 else: 22 1 6 6.0 0.0 parts = line[15:].split() 23 1 2 2.0 0.0 end = len(stack)-1 24 1 1 1.0 0.0 amount = int(parts[1]) 25 500001 241227 0.5 2.9 for x in range(int(parts[0])): 26 500000 273477 0.5 3.3 index = end - x 27 500000 309033 0.6 3.7 stack[index] += amount 28 2000000 2295803 1.1 27.8 print(stack[-1]) which shows that there's no obvious bottleneck (line by line) here (for my sample data). Note the print() overhead dominates the runtime, and that's with me piping the output to /dev/null directly. I had a go at using arrays, deques, and numpy arrays in various ways without luck, but we're getting fairly close to the native python statement execution overhead here (hence folding it all into one function). My only thoughts would be to see if there were some magic that could be done by offloading the work onto a non-python library somehow. Another thing that might help some situations (hence my previous questions) would be to implement the add_to_first_n as a lazy operator (i.e. have a stack of the add_to_first_n values and dynamically add to the results of pop() but that would proabably be much slow in the average case. Steve On Wed, Jun 7, 2017 at 7:34 PM Jonathan Hartley wrote: > Hey. > > Thanks for engaging, but I can't help with the most important of those > questions - the large data sets on which my solution failed due to timeout > are hidden from candidates. Not unreasonable to assume that they do > exercise deep stacks, and large args to add_to_first_n, etc. > > Yes, the input looks exactly like your example. All args are integers. The > question asked for output corresponding to the top of the stack after every > operation. I omitted this print from inside the 'for' loop in 'main', > thinking it irrelevant. > > I converted to integers inside 'dispatch'. 'args' must have actually been > created with: > > args = [int(i) for i in tokens[1:]] > > Where len(tokens) is never going to be bigger than 3. > > Return values (from 'pop') were unused. > > > On 6/7/2017 13:25, Stestagg wrote: > > Do you have any more context? > For example, is the add_to_first_n likely to be called with very large > numbers, or very often? Does the stack get very deep, or stay shallow? > > I'm assuming that lines look like this: > > push 1 > push 2 > add_to_first_n 2 10 > pop > pop > > with all arguments as integers, and the final value being returned from > main()? > How did you convert from string inputs to numeric values? > How did you manage return values? > > :D > > On Wed, Jun 7, 2017 at 6:51 PM Jonathan Hartley > wrote: > >> I recently submitted a solution to a coding challenge, in an employment >> context. One of the questions was to model a simple stack. I wrote a >> solution which appended and popped from the end of a list. This worked, but >> failed with timeouts on their last few automated tests with large (hidden) >> data sets. >> >> From memory, I think I had something pretty standard: >> >> class Stack: >> >> def __init__(self): >> self.storage = [] >> >> def push(arg): >> self.storage.append(arg) >> >> def pop(): >> return self.storage.pop() if self.storage else None >> >> def add_to_first_n(n, amount): >> for n in range(n): >> self.storage[n] += amount >> >> def dispatch(self, line) >> tokens = line.split() >> method = getattr(self, tokens[0]) >> args = tokens[1:] >> method(*args) >> >> def main(lines): >> stack = Stack() >> for line in lines: >> stack.dispatch(line) >> >> >> (will that formatting survive? Apologies if not) >> >> Subsequent experiments have confirmed that appending to and popping from >> the end of lists are O(1), amortized. >> So why is my solution too slow? >> >> This question was against the clock, 4th question of 4 in an hour. So I >> wasn't expecting to produce Cython or C optimised code in that timeframe >> (Besides, my submitted .py file runs on their servers, so the environment >> is limited.) >> >> So what am I missing, from a performance perspective? Are there other >> data structures in stdlib which are also O(1) but with a better constant? >> >> Ah. In writing this out, I have begun to suspect that my slicing of >> 'tokens' to produce 'args' in the dispatch is needlessly wasting time. Not >> much, but some. >> >> Thoughts welcome, >> >> Jonathan >> >> -- >> Jonathan Hartley tartley at tartley.com http://tartley.com >> Made out of meat. +1 507-513-1101 <%28507%29%20513-1101> twitter/skype: tartley >> >> >> _______________________________________________ >> python-uk mailing list >> python-uk at python.org >> https://mail.python.org/mailman/listinfo/python-uk >> > > > _______________________________________________ > python-uk mailing listpython-uk at python.orghttps://mail.python.org/mailman/listinfo/python-uk > > > -- > Jonathan Hartley tartley at tartley.com http://tartley.com > Made out of meat. +1 507-513-1101 <(507)%20513-1101> twitter/skype: tartley > > > _______________________________________________ > python-uk mailing list > python-uk at python.org > https://mail.python.org/mailman/listinfo/python-uk > -------------- next part -------------- An HTML attachment was scrubbed... URL: From toby at tarind.com Thu Jun 8 05:33:05 2017 From: toby at tarind.com (Toby Dickenson) Date: Thu, 8 Jun 2017 10:33:05 +0100 Subject: [python-uk] A stack with better performance than using a list In-Reply-To: References: <8e6e0b39-a09c-4b31-b5fb-8ad9ece574ad@tartley.com> Message-ID: In python 2, your use of range() without checking for a very large parameter n might cause either a MemoryError exception, or trigger a huge memory allocation just for the range list. Not a problem in python 3 of course. On 8 June 2017 at 09:54, Stestagg wrote: > I honestly can't see a way to improve this in python. My best solution is: > > def main(lines): > stack = [] > sa = stack.append > sp = stack.pop > si = stack.__getitem__ > for line in lines: > meth = line[:3] > if meth == b'pus': > sa(int(line[5:])) > elif meth == b'pop': > sp() > else: > parts = line[15:].split() > end = len(stack)-1 > amount = int(parts[1]) > for x in range(int(parts[0])): > index = end - x > stack[index] += amount > print(stack[-1] if stack else None) > > which comes out about 25% faster than your solution. > > One tool that's interesting to use here is: line_profiler: > https://github.com/rkern/line_profiler > > putting a @profile decorator on the above main() call, and running with > kernprof produces the following output: > > Line # Hits Time Per Hit % Time Line Contents > > ============================================================== > > 12 @profile > > 13 def main(lines): > > 14 1 4 4.0 0.0 stack = [] > > 15 2000001 949599 0.5 11.5 for line in lines: > > 16 2000000 1126944 0.6 13.7 meth = line[:3] > > 17 2000000 974635 0.5 11.8 if meth == b'pus': > > 18 1000000 1002733 1.0 12.2 > stack.append(int(line[5:])) > > 19 1000000 478756 0.5 5.8 elif meth == > b'pop': > > 20 999999 597114 0.6 7.2 stack.pop() > > 21 else: > > 22 1 6 6.0 0.0 parts = > line[15:].split() > > 23 1 2 2.0 0.0 end = > len(stack)-1 > > 24 1 1 1.0 0.0 amount = > int(parts[1]) > > 25 500001 241227 0.5 2.9 for x in > range(int(parts[0])): > > 26 500000 273477 0.5 3.3 index = end > - x > > 27 500000 309033 0.6 3.7 > stack[index] += amount > > 28 2000000 2295803 1.1 27.8 print(stack[-1]) > > > which shows that there's no obvious bottleneck (line by line) here (for my > sample data). > > Note the print() overhead dominates the runtime, and that's with me piping > the output to /dev/null directly. > > I had a go at using arrays, deques, and numpy arrays in various ways without > luck, but we're getting fairly close to the native python statement > execution overhead here (hence folding it all into one function). > > My only thoughts would be to see if there were some magic that could be done > by offloading the work onto a non-python library somehow. > > Another thing that might help some situations (hence my previous questions) > would be to implement the add_to_first_n as a lazy operator (i.e. have a > stack of the add_to_first_n values and dynamically add to the results of > pop() but that would proabably be much slow in the average case. > > Steve > > On Wed, Jun 7, 2017 at 7:34 PM Jonathan Hartley wrote: >> >> Hey. >> >> Thanks for engaging, but I can't help with the most important of those >> questions - the large data sets on which my solution failed due to timeout >> are hidden from candidates. Not unreasonable to assume that they do exercise >> deep stacks, and large args to add_to_first_n, etc. >> >> Yes, the input looks exactly like your example. All args are integers. The >> question asked for output corresponding to the top of the stack after every >> operation. I omitted this print from inside the 'for' loop in 'main', >> thinking it irrelevant. >> >> I converted to integers inside 'dispatch'. 'args' must have actually been >> created with: >> >> args = [int(i) for i in tokens[1:]] >> >> Where len(tokens) is never going to be bigger than 3. >> >> Return values (from 'pop') were unused. >> >> >> On 6/7/2017 13:25, Stestagg wrote: >> >> Do you have any more context? >> For example, is the add_to_first_n likely to be called with very large >> numbers, or very often? Does the stack get very deep, or stay shallow? >> >> I'm assuming that lines look like this: >> >> push 1 >> push 2 >> add_to_first_n 2 10 >> pop >> pop >> >> with all arguments as integers, and the final value being returned from >> main()? >> How did you convert from string inputs to numeric values? >> How did you manage return values? >> >> :D >> >> On Wed, Jun 7, 2017 at 6:51 PM Jonathan Hartley >> wrote: >>> >>> I recently submitted a solution to a coding challenge, in an employment >>> context. One of the questions was to model a simple stack. I wrote a >>> solution which appended and popped from the end of a list. This worked, but >>> failed with timeouts on their last few automated tests with large (hidden) >>> data sets. >>> >>> From memory, I think I had something pretty standard: >>> >>> class Stack: >>> >>> def __init__(self): >>> self.storage = [] >>> >>> def push(arg): >>> self.storage.append(arg) >>> >>> def pop(): >>> return self.storage.pop() if self.storage else None >>> >>> def add_to_first_n(n, amount): >>> for n in range(n): >>> self.storage[n] += amount >>> >>> def dispatch(self, line) >>> tokens = line.split() >>> method = getattr(self, tokens[0]) >>> args = tokens[1:] >>> method(*args) >>> >>> def main(lines): >>> stack = Stack() >>> for line in lines: >>> stack.dispatch(line) >>> >>> >>> (will that formatting survive? Apologies if not) >>> >>> Subsequent experiments have confirmed that appending to and popping from >>> the end of lists are O(1), amortized. >>> >>> So why is my solution too slow? >>> >>> This question was against the clock, 4th question of 4 in an hour. So I >>> wasn't expecting to produce Cython or C optimised code in that timeframe >>> (Besides, my submitted .py file runs on their servers, so the environment is >>> limited.) >>> >>> So what am I missing, from a performance perspective? Are there other >>> data structures in stdlib which are also O(1) but with a better constant? >>> >>> Ah. In writing this out, I have begun to suspect that my slicing of >>> 'tokens' to produce 'args' in the dispatch is needlessly wasting time. Not >>> much, but some. >>> >>> Thoughts welcome, >>> >>> Jonathan >>> >>> -- >>> Jonathan Hartley tartley at tartley.com http://tartley.com >>> Made out of meat. +1 507-513-1101 twitter/skype: tartley >>> >>> _______________________________________________ >>> python-uk mailing list >>> python-uk at python.org >>> https://mail.python.org/mailman/listinfo/python-uk >> >> >> >> _______________________________________________ >> python-uk mailing list >> python-uk at python.org >> https://mail.python.org/mailman/listinfo/python-uk >> >> >> -- >> Jonathan Hartley tartley at tartley.com http://tartley.com >> Made out of meat. +1 507-513-1101 twitter/skype: tartley >> >> _______________________________________________ >> python-uk mailing list >> python-uk at python.org >> https://mail.python.org/mailman/listinfo/python-uk > > > _______________________________________________ > python-uk mailing list > python-uk at python.org > https://mail.python.org/mailman/listinfo/python-uk > From tartley at tartley.com Thu Jun 8 11:09:12 2017 From: tartley at tartley.com (Jonathan Hartley) Date: Thu, 8 Jun 2017 10:09:12 -0500 Subject: [python-uk] A stack with better performance than using a list In-Reply-To: <1496910650.2278421.1002665608.5049EA9A@webmail.messagingengine.com> References: <8e6e0b39-a09c-4b31-b5fb-8ad9ece574ad@tartley.com> <1496910650.2278421.1002665608.5049EA9A@webmail.messagingengine.com> Message-ID: I wondered about that too, but decided (without measuring) that it is no better. A deque allows us to append and pop elements from both ends, but the question didn't require that, it only needed from one end, which a list provides at O(1). On 6/8/2017 03:30, Simon Hayward wrote: > Rather than using a list, aren't deques more appropriate as a data > structure for stack like behaviour. > > https://docs.python.org/3.6/library/collections.html#collections.deque > > > Regards > Simon > > > On Wed, 7 Jun 2017, at 19:33, Jonathan Hartley wrote: >> Hey. >> >> Thanks for engaging, but I can't help with the most important of those >> questions - the large data sets on which my solution failed due to >> timeout are hidden from candidates. Not unreasonable to assume that they >> do exercise deep stacks, and large args to add_to_first_n, etc. >> >> Yes, the input looks exactly like your example. All args are integers. >> The question asked for output corresponding to the top of the stack >> after every operation. I omitted this print from inside the 'for' loop >> in 'main', thinking it irrelevant. >> >> I converted to integers inside 'dispatch'. 'args' must have actually >> been created with: >> >> args = [int(i) for i in tokens[1:]] >> >> Where len(tokens) is never going to be bigger than 3. >> >> Return values (from 'pop') were unused. >> >> >> On 6/7/2017 13:25, Stestagg wrote: >>> Do you have any more context? >>> For example, is the add_to_first_n likely to be called with very large >>> numbers, or very often? Does the stack get very deep, or stay shallow? >>> >>> I'm assuming that lines look like this: >>> >>> push 1 >>> push 2 >>> add_to_first_n 2 10 >>> pop >>> pop >>> >>> with all arguments as integers, and the final value being returned >>> from main()? >>> How did you convert from string inputs to numeric values? >>> How did you manage return values? >>> >>> :D >>> >>> On Wed, Jun 7, 2017 at 6:51 PM Jonathan Hartley >> > wrote: >>> >>> I recently submitted a solution to a coding challenge, in an >>> employment context. One of the questions was to model a simple >>> stack. I wrote a solution which appended and popped from the end >>> of a list. This worked, but failed with timeouts on their last few >>> automated tests with large (hidden) data sets. >>> >>> From memory, I think I had something pretty standard: >>> >>> class Stack: >>> >>> def __init__(self): >>> self.storage = [] >>> >>> def push(arg): >>> self.storage.append(arg) >>> >>> def pop(): >>> return self.storage.pop() if self.storage else None >>> >>> def add_to_first_n(n, amount): >>> for n in range(n): >>> self.storage[n] += amount >>> >>> def dispatch(self, line) >>> tokens = line.split() >>> method = getattr(self, tokens[0]) >>> args = tokens[1:] >>> method(*args) >>> >>> def main(lines): >>> stack = Stack() >>> for line in lines: >>> stack.dispatch(line) >>> >>> >>> (will that formatting survive? Apologies if not) >>> >>> Subsequent experiments have confirmed that appending to and >>> popping from the end of lists are O(1), amortized. >>> >>> So why is my solution too slow? >>> >>> This question was against the clock, 4th question of 4 in an hour. >>> So I wasn't expecting to produce Cython or C optimised code in >>> that timeframe (Besides, my submitted .py file runs on their >>> servers, so the environment is limited.) >>> >>> So what am I missing, from a performance perspective? Are there >>> other data structures in stdlib which are also O(1) but with a >>> better constant? >>> >>> Ah. In writing this out, I have begun to suspect that my slicing >>> of 'tokens' to produce 'args' in the dispatch is needlessly >>> wasting time. Not much, but some. >>> >>> Thoughts welcome, >>> >>> Jonathan >>> >>> -- >>> Jonathan Hartleytartley at tartley.com http://tartley.com >>> Made out of meat.+1 507-513-1101 twitter/skype: tartley >>> >>> _______________________________________________ >>> python-uk mailing list >>> python-uk at python.org >>> https://mail.python.org/mailman/listinfo/python-uk >>> >>> >>> >>> _______________________________________________ >>> python-uk mailing list >>> python-uk at python.org >>> https://mail.python.org/mailman/listinfo/python-uk >> -- >> Jonathan Hartley tartley at tartley.com http://tartley.com >> Made out of meat. +1 507-513-1101 twitter/skype: tartley >> >> _______________________________________________ >> python-uk mailing list >> python-uk at python.org >> https://mail.python.org/mailman/listinfo/python-uk > -- Jonathan Hartley tartley at tartley.com http://tartley.com Made out of meat. +1 507-513-1101 twitter/skype: tartley -------------- next part -------------- An HTML attachment was scrubbed... URL: From tartley at tartley.com Thu Jun 8 11:41:57 2017 From: tartley at tartley.com (Jonathan Hartley) Date: Thu, 8 Jun 2017 10:41:57 -0500 Subject: [python-uk] A stack with better performance than using a list In-Reply-To: References: <8e6e0b39-a09c-4b31-b5fb-8ad9ece574ad@tartley.com> <1496910650.2278421.1002665608.5049EA9A@webmail.messagingengine.com> Message-ID: <4718f5b2-a8a0-2f01-8c13-c56200e5935b@tartley.com> I cannot be sure. It is certainly used by many people. They are competent in that it is a comprehensive online framework, allowing candidates to submit solutions using an online editor, in any one of about ten different languages. They are so large that there was no obvious way to talk to anyone about my individual experience. I don't knowingly know any other candidates who have submitted. I don't want to identify them because the first step of the quiz is to accept the T&C that you won't reveal the answers to others (oops.), but suffice to say they are very large Indeed. On 6/8/2017 03:48, Andy Robinson wrote: > Are you sure that their test infrastructure was behaving correctly? > Is it widely used, day in day out, by thousands, and known to be > reliable? Did your colleagues all brag "no problem"? Or is it > possible that the whole execution framework threw a momentary wobbly > while trying to load up some large text file off some remote cloud? > > Andy > _______________________________________________ > python-uk mailing list > python-uk at python.org > https://mail.python.org/mailman/listinfo/python-uk -- Jonathan Hartley tartley at tartley.com http://tartley.com Made out of meat. +1 507-513-1101 twitter/skype: tartley -------------- next part -------------- An HTML attachment was scrubbed... URL: From tartley at tartley.com Thu Jun 8 11:46:53 2017 From: tartley at tartley.com (Jonathan Hartley) Date: Thu, 8 Jun 2017 10:46:53 -0500 Subject: [python-uk] A stack with better performance than using a list In-Reply-To: References: <8e6e0b39-a09c-4b31-b5fb-8ad9ece574ad@tartley.com> Message-ID: Good point. FWIW, my submission was running Python 3. On 6/8/2017 04:33, Toby Dickenson wrote: > In python 2, your use of range() without checking for a very large > parameter n might cause either a MemoryError exception, or trigger a > huge memory allocation just for the range list. Not a problem in > python 3 of course. > > > On 8 June 2017 at 09:54, Stestagg wrote: >> I honestly can't see a way to improve this in python. My best solution is: >> >> def main(lines): >> stack = [] >> sa = stack.append >> sp = stack.pop >> si = stack.__getitem__ >> for line in lines: >> meth = line[:3] >> if meth == b'pus': >> sa(int(line[5:])) >> elif meth == b'pop': >> sp() >> else: >> parts = line[15:].split() >> end = len(stack)-1 >> amount = int(parts[1]) >> for x in range(int(parts[0])): >> index = end - x >> stack[index] += amount >> print(stack[-1] if stack else None) >> >> which comes out about 25% faster than your solution. >> >> One tool that's interesting to use here is: line_profiler: >> https://github.com/rkern/line_profiler >> >> putting a @profile decorator on the above main() call, and running with >> kernprof produces the following output: >> >> Line # Hits Time Per Hit % Time Line Contents >> >> ============================================================== >> >> 12 @profile >> >> 13 def main(lines): >> >> 14 1 4 4.0 0.0 stack = [] >> >> 15 2000001 949599 0.5 11.5 for line in lines: >> >> 16 2000000 1126944 0.6 13.7 meth = line[:3] >> >> 17 2000000 974635 0.5 11.8 if meth == b'pus': >> >> 18 1000000 1002733 1.0 12.2 >> stack.append(int(line[5:])) >> >> 19 1000000 478756 0.5 5.8 elif meth == >> b'pop': >> >> 20 999999 597114 0.6 7.2 stack.pop() >> >> 21 else: >> >> 22 1 6 6.0 0.0 parts = >> line[15:].split() >> >> 23 1 2 2.0 0.0 end = >> len(stack)-1 >> >> 24 1 1 1.0 0.0 amount = >> int(parts[1]) >> >> 25 500001 241227 0.5 2.9 for x in >> range(int(parts[0])): >> >> 26 500000 273477 0.5 3.3 index = end >> - x >> >> 27 500000 309033 0.6 3.7 >> stack[index] += amount >> >> 28 2000000 2295803 1.1 27.8 print(stack[-1]) >> >> >> which shows that there's no obvious bottleneck (line by line) here (for my >> sample data). >> >> Note the print() overhead dominates the runtime, and that's with me piping >> the output to /dev/null directly. >> >> I had a go at using arrays, deques, and numpy arrays in various ways without >> luck, but we're getting fairly close to the native python statement >> execution overhead here (hence folding it all into one function). >> >> My only thoughts would be to see if there were some magic that could be done >> by offloading the work onto a non-python library somehow. >> >> Another thing that might help some situations (hence my previous questions) >> would be to implement the add_to_first_n as a lazy operator (i.e. have a >> stack of the add_to_first_n values and dynamically add to the results of >> pop() but that would proabably be much slow in the average case. >> >> Steve >> >> On Wed, Jun 7, 2017 at 7:34 PM Jonathan Hartley wrote: >>> Hey. >>> >>> Thanks for engaging, but I can't help with the most important of those >>> questions - the large data sets on which my solution failed due to timeout >>> are hidden from candidates. Not unreasonable to assume that they do exercise >>> deep stacks, and large args to add_to_first_n, etc. >>> >>> Yes, the input looks exactly like your example. All args are integers. The >>> question asked for output corresponding to the top of the stack after every >>> operation. I omitted this print from inside the 'for' loop in 'main', >>> thinking it irrelevant. >>> >>> I converted to integers inside 'dispatch'. 'args' must have actually been >>> created with: >>> >>> args = [int(i) for i in tokens[1:]] >>> >>> Where len(tokens) is never going to be bigger than 3. >>> >>> Return values (from 'pop') were unused. >>> >>> >>> On 6/7/2017 13:25, Stestagg wrote: >>> >>> Do you have any more context? >>> For example, is the add_to_first_n likely to be called with very large >>> numbers, or very often? Does the stack get very deep, or stay shallow? >>> >>> I'm assuming that lines look like this: >>> >>> push 1 >>> push 2 >>> add_to_first_n 2 10 >>> pop >>> pop >>> >>> with all arguments as integers, and the final value being returned from >>> main()? >>> How did you convert from string inputs to numeric values? >>> How did you manage return values? >>> >>> :D >>> >>> On Wed, Jun 7, 2017 at 6:51 PM Jonathan Hartley >>> wrote: >>>> I recently submitted a solution to a coding challenge, in an employment >>>> context. One of the questions was to model a simple stack. I wrote a >>>> solution which appended and popped from the end of a list. This worked, but >>>> failed with timeouts on their last few automated tests with large (hidden) >>>> data sets. >>>> >>>> From memory, I think I had something pretty standard: >>>> >>>> class Stack: >>>> >>>> def __init__(self): >>>> self.storage = [] >>>> >>>> def push(arg): >>>> self.storage.append(arg) >>>> >>>> def pop(): >>>> return self.storage.pop() if self.storage else None >>>> >>>> def add_to_first_n(n, amount): >>>> for n in range(n): >>>> self.storage[n] += amount >>>> >>>> def dispatch(self, line) >>>> tokens = line.split() >>>> method = getattr(self, tokens[0]) >>>> args = tokens[1:] >>>> method(*args) >>>> >>>> def main(lines): >>>> stack = Stack() >>>> for line in lines: >>>> stack.dispatch(line) >>>> >>>> >>>> (will that formatting survive? Apologies if not) >>>> >>>> Subsequent experiments have confirmed that appending to and popping from >>>> the end of lists are O(1), amortized. >>>> >>>> So why is my solution too slow? >>>> >>>> This question was against the clock, 4th question of 4 in an hour. So I >>>> wasn't expecting to produce Cython or C optimised code in that timeframe >>>> (Besides, my submitted .py file runs on their servers, so the environment is >>>> limited.) >>>> >>>> So what am I missing, from a performance perspective? Are there other >>>> data structures in stdlib which are also O(1) but with a better constant? >>>> >>>> Ah. In writing this out, I have begun to suspect that my slicing of >>>> 'tokens' to produce 'args' in the dispatch is needlessly wasting time. Not >>>> much, but some. >>>> >>>> Thoughts welcome, >>>> >>>> Jonathan >>>> >>>> -- >>>> Jonathan Hartley tartley at tartley.com http://tartley.com >>>> Made out of meat. +1 507-513-1101 twitter/skype: tartley >>>> >>>> _______________________________________________ >>>> python-uk mailing list >>>> python-uk at python.org >>>> https://mail.python.org/mailman/listinfo/python-uk >>> >>> >>> _______________________________________________ >>> python-uk mailing list >>> python-uk at python.org >>> https://mail.python.org/mailman/listinfo/python-uk >>> >>> >>> -- >>> Jonathan Hartley tartley at tartley.com http://tartley.com >>> Made out of meat. +1 507-513-1101 twitter/skype: tartley >>> >>> _______________________________________________ >>> python-uk mailing list >>> python-uk at python.org >>> https://mail.python.org/mailman/listinfo/python-uk >> >> _______________________________________________ >> python-uk mailing list >> python-uk at python.org >> https://mail.python.org/mailman/listinfo/python-uk >> > _______________________________________________ > python-uk mailing list > python-uk at python.org > https://mail.python.org/mailman/listinfo/python-uk -- Jonathan Hartley tartley at tartley.com http://tartley.com Made out of meat. +1 507-513-1101 twitter/skype: tartley -------------- next part -------------- An HTML attachment was scrubbed... URL: From stestagg at gmail.com Thu Jun 8 11:58:20 2017 From: stestagg at gmail.com (Stestagg) Date: Thu, 08 Jun 2017 15:58:20 +0000 Subject: [python-uk] A stack with better performance than using a list In-Reply-To: <4718f5b2-a8a0-2f01-8c13-c56200e5935b@tartley.com> References: <8e6e0b39-a09c-4b31-b5fb-8ad9ece574ad@tartley.com> <1496910650.2278421.1002665608.5049EA9A@webmail.messagingengine.com> <4718f5b2-a8a0-2f01-8c13-c56200e5935b@tartley.com> Message-ID: If it's who I think it is, then I'm not entirely surprised, this particular implementation is quite taxing for python in particular, and they don't do much in the way of catering to more modern languages in general (not a criticism, but most problems/samples are stated in a very 'traditional' way that isn't very pythonic, feels like transliterated C most times). On Thu, Jun 8, 2017 at 4:42 PM Jonathan Hartley wrote: > I cannot be sure. It is certainly used by many people. They are competent > in that it is a comprehensive online framework, allowing candidates to > submit solutions using an online editor, in any one of about ten different > languages. They are so large that there was no obvious way to talk to > anyone about my individual experience. I don't knowingly know any other > candidates who have submitted. > > I don't want to identify them because the first step of the quiz is to > accept the T&C that you won't reveal the answers to others (oops.), but > suffice to say they are very large Indeed. > > On 6/8/2017 03:48, Andy Robinson wrote: > > Are you sure that their test infrastructure was behaving correctly? > Is it widely used, day in day out, by thousands, and known to be > reliable? Did your colleagues all brag "no problem"? Or is it > possible that the whole execution framework threw a momentary wobbly > while trying to load up some large text file off some remote cloud? > > Andy > _______________________________________________ > python-uk mailing listpython-uk at python.orghttps://mail.python.org/mailman/listinfo/python-uk > > > -- > Jonathan Hartley tartley at tartley.com http://tartley.com > Made out of meat. +1 507-513-1101 <(507)%20513-1101> twitter/skype: tartley > > > _______________________________________________ > python-uk mailing list > python-uk at python.org > https://mail.python.org/mailman/listinfo/python-uk > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tartley at tartley.com Thu Jun 8 12:26:17 2017 From: tartley at tartley.com (Jonathan Hartley) Date: Thu, 8 Jun 2017 11:26:17 -0500 Subject: [python-uk] A stack with better performance than using a list In-Reply-To: References: <8e6e0b39-a09c-4b31-b5fb-8ad9ece574ad@tartley.com> Message-ID: <43db980a-15ca-2322-e137-791677c595c6@tartley.com> Yep, that's a great elimination of the suspicious small overheads. line_profiler is beautiful, I'll definitely be adding it to my toolbox, thanks for that! I tried a variant of accumulating the output and printing it all as a single string, but of course this didn't help, printing is already buffered. Jonathan On 6/8/2017 03:54, Stestagg wrote: > I honestly can't see a way to improve this in python. My best > solution is: > > def main(lines): > stack = [] > sa = stack.append > sp = stack.pop > si = stack.__getitem__ > for line in lines: > meth = line[:3] > if meth == b'pus': > sa(int(line[5:])) > elif meth == b'pop': > sp() > else: > parts = line[15:].split() > end = len(stack)-1 > amount = int(parts[1]) > for x in range(int(parts[0])): > index = end - x > stack[index] += amount > print(stack[-1] if stack else None) > > which comes out about 25% faster than your solution. > > One tool that's interesting to use here is: line_profiler: > https://github.com/rkern/line_profiler > > putting a @profile decorator on the above main() call, and running > with kernprof produces the following output: > > Line #Hits TimePer Hit % TimeLine Contents > > ============================================================== > > 12 @profile > > 13 def main(lines): > > 14 144.00.0stack = [] > > 15 2000001 9495990.5 11.5for line in lines: > > 16 200000011269440.6 13.7meth = line[:3] > > 17 2000000 9746350.5 11.8if meth == b'pus': > > 18 100000010027331.0 12.2stack.append(int(line[5:])) > > 19 1000000 4787560.55.8elif meth == b'pop': > > 20999999 5971140.67.2stack.pop() > > 21 else: > > 22 166.00.0parts = line[15:].split() > > 23 122.00.0end = len(stack)-1 > > 24 111.00.0amount = int(parts[1]) > > 25500001 2412270.52.9for x in range(int(parts[0])): > > 26500000 2734770.53.3index = end - x > > 27500000 3090330.63.7stack[index] += amount > > 28 200000022958031.1 27.8print(stack[-1]) > > > which shows that there's no obvious bottleneck (line by line) here > (for my sample data). > > Note the print() overhead dominates the runtime, and that's with me > piping the output to /dev/null directly. > > I had a go at using arrays, deques, and numpy arrays in various ways > without luck, but we're getting fairly close to the native python > statement execution overhead here (hence folding it all into one > function). > > My only thoughts would be to see if there were some magic that could > be done by offloading the work onto a non-python library somehow. > > Another thing that might help some situations (hence my previous > questions) would be to implement the add_to_first_n as a lazy operator > (i.e. have a stack of the add_to_first_n values and dynamically add to > the results of pop() but that would proabably be much slow in the > average case. > > Steve > > On Wed, Jun 7, 2017 at 7:34 PM Jonathan Hartley > wrote: > > Hey. > > Thanks for engaging, but I can't help with the most important of > those questions - the large data sets on which my solution failed > due to timeout are hidden from candidates. Not unreasonable to > assume that they do exercise deep stacks, and large args to > add_to_first_n, etc. > > Yes, the input looks exactly like your example. All args are > integers. The question asked for output corresponding to the top > of the stack after every operation. I omitted this print from > inside the 'for' loop in 'main', thinking it irrelevant. > > I converted to integers inside 'dispatch'. 'args' must have > actually been created with: > > args = [int(i) for i in tokens[1:]] > > Where len(tokens) is never going to be bigger than 3. > > Return values (from 'pop') were unused. > > > On 6/7/2017 13:25, Stestagg wrote: >> Do you have any more context? >> For example, is the add_to_first_n likely to be called with very >> large numbers, or very often? Does the stack get very deep, or >> stay shallow? >> >> I'm assuming that lines look like this: >> >> push 1 >> push 2 >> add_to_first_n 2 10 >> pop >> pop >> >> with all arguments as integers, and the final value being >> returned from main()? >> How did you convert from string inputs to numeric values? >> How did you manage return values? >> >> :D >> >> On Wed, Jun 7, 2017 at 6:51 PM Jonathan Hartley >> > wrote: >> >> I recently submitted a solution to a coding challenge, in an >> employment context. One of the questions was to model a >> simple stack. I wrote a solution which appended and popped >> from the end of a list. This worked, but failed with timeouts >> on their last few automated tests with large (hidden) data sets. >> >> From memory, I think I had something pretty standard: >> >> class Stack: >> >> def __init__(self): >> self.storage = [] >> >> def push(arg): >> self.storage.append(arg) >> >> def pop(): >> return self.storage.pop() if self.storage else None >> >> def add_to_first_n(n, amount): >> for n in range(n): >> self.storage[n] += amount >> >> def dispatch(self, line) >> tokens = line.split() >> method = getattr(self, tokens[0]) >> args = tokens[1:] >> method(*args) >> >> def main(lines): >> stack = Stack() >> for line in lines: >> stack.dispatch(line) >> >> >> (will that formatting survive? Apologies if not) >> >> Subsequent experiments have confirmed that appending to and >> popping from the end of lists are O(1), amortized. >> >> So why is my solution too slow? >> >> This question was against the clock, 4th question of 4 in an >> hour. So I wasn't expecting to produce Cython or C optimised >> code in that timeframe (Besides, my submitted .py file runs >> on their servers, so the environment is limited.) >> >> So what am I missing, from a performance perspective? Are >> there other data structures in stdlib which are also O(1) but >> with a better constant? >> >> Ah. In writing this out, I have begun to suspect that my >> slicing of 'tokens' to produce 'args' in the dispatch is >> needlessly wasting time. Not much, but some. >> >> Thoughts welcome, >> >> Jonathan >> >> -- >> Jonathan Hartleytartley at tartley.com http://tartley.com >> Made out of meat.+1 507-513-1101 twitter/skype: tartley >> >> _______________________________________________ >> python-uk mailing list >> python-uk at python.org >> https://mail.python.org/mailman/listinfo/python-uk >> >> >> >> _______________________________________________ >> python-uk mailing list >> python-uk at python.org >> https://mail.python.org/mailman/listinfo/python-uk > > -- > Jonathan Hartleytartley at tartley.com http://tartley.com > Made out of meat.+1 507-513-1101 twitter/skype: tartley > > _______________________________________________ > python-uk mailing list > python-uk at python.org > https://mail.python.org/mailman/listinfo/python-uk > > > > _______________________________________________ > python-uk mailing list > python-uk at python.org > https://mail.python.org/mailman/listinfo/python-uk -- Jonathan Hartley tartley at tartley.com http://tartley.com Made out of meat. +1 507-513-1101 twitter/skype: tartley -------------- next part -------------- An HTML attachment was scrubbed... URL: From samuel.fekete at gmail.com Thu Jun 8 14:06:11 2017 From: samuel.fekete at gmail.com (Samuel F) Date: Thu, 08 Jun 2017 18:06:11 +0000 Subject: [python-uk] A stack with better performance than using a list In-Reply-To: <43db980a-15ca-2322-e137-791677c595c6@tartley.com> References: <8e6e0b39-a09c-4b31-b5fb-8ad9ece574ad@tartley.com> <43db980a-15ca-2322-e137-791677c595c6@tartley.com> Message-ID: It may have failed for a different reason, (hard to say without the original question and answer). In the case where the stack is empty, you are returning None, was that the requirement? (Likely to have been -1) Sam On Thu, 8 Jun 2017 at 17:27, Jonathan Hartley wrote: > Yep, that's a great elimination of the suspicious small overheads. > > line_profiler is beautiful, I'll definitely be adding it to my toolbox, > thanks for that! > > I tried a variant of accumulating the output and printing it all as a > single string, but of course this didn't help, printing is already buffered. > > Jonathan > > On 6/8/2017 03:54, Stestagg wrote: > > I honestly can't see a way to improve this in python. My best solution > is: > > def main(lines): > stack = [] > sa = stack.append > sp = stack.pop > si = stack.__getitem__ > for line in lines: > meth = line[:3] > if meth == b'pus': > sa(int(line[5:])) > elif meth == b'pop': > sp() > else: > parts = line[15:].split() > end = len(stack)-1 > amount = int(parts[1]) > for x in range(int(parts[0])): > index = end - x > stack[index] += amount > print(stack[-1] if stack else None) > > which comes out about 25% faster than your solution. > > One tool that's interesting to use here is: line_profiler: > https://github.com/rkern/line_profiler > > putting a @profile decorator on the above main() call, and running with > kernprof produces the following output: > > Line # Hits Time Per Hit % Time Line Contents > > ============================================================== > > 12 @profile > > 13 def main(lines): > > 14 1 4 4.0 0.0 stack = [] > > 15 2000001 949599 0.5 11.5 for line in lines: > > 16 2000000 1126944 0.6 13.7 meth = line[:3] > > 17 2000000 974635 0.5 11.8 if meth == > b'pus': > > 18 1000000 1002733 1.0 12.2 > stack.append(int(line[5:])) > > 19 1000000 478756 0.5 5.8 elif meth == > b'pop': > > 20 999999 597114 0.6 7.2 stack.pop() > > 21 else: > > 22 1 6 6.0 0.0 parts = > line[15:].split() > > 23 1 2 2.0 0.0 end = > len(stack)-1 > > 24 1 1 1.0 0.0 amount = > int(parts[1]) > > 25 500001 241227 0.5 2.9 for x in > range(int(parts[0])): > > 26 500000 273477 0.5 3.3 index = > end - x > > 27 500000 309033 0.6 3.7 stack[index] > += amount > > 28 2000000 2295803 1.1 27.8 print(stack[-1]) > > which shows that there's no obvious bottleneck (line by line) here (for my > sample data). > > Note the print() overhead dominates the runtime, and that's with me piping > the output to /dev/null directly. > > I had a go at using arrays, deques, and numpy arrays in various ways > without luck, but we're getting fairly close to the native python statement > execution overhead here (hence folding it all into one function). > > My only thoughts would be to see if there were some magic that could be > done by offloading the work onto a non-python library somehow. > > Another thing that might help some situations (hence my previous > questions) would be to implement the add_to_first_n as a lazy operator > (i.e. have a stack of the add_to_first_n values and dynamically add to the > results of pop() but that would proabably be much slow in the average case. > > Steve > > On Wed, Jun 7, 2017 at 7:34 PM Jonathan Hartley > wrote: > >> Hey. >> >> Thanks for engaging, but I can't help with the most important of those >> questions - the large data sets on which my solution failed due to timeout >> are hidden from candidates. Not unreasonable to assume that they do >> exercise deep stacks, and large args to add_to_first_n, etc. >> >> Yes, the input looks exactly like your example. All args are integers. >> The question asked for output corresponding to the top of the stack after >> every operation. I omitted this print from inside the 'for' loop in 'main', >> thinking it irrelevant. >> >> I converted to integers inside 'dispatch'. 'args' must have actually been >> created with: >> >> args = [int(i) for i in tokens[1:]] >> >> Where len(tokens) is never going to be bigger than 3. >> >> Return values (from 'pop') were unused. >> >> >> On 6/7/2017 13:25, Stestagg wrote: >> >> Do you have any more context? >> For example, is the add_to_first_n likely to be called with very large >> numbers, or very often? Does the stack get very deep, or stay shallow? >> >> I'm assuming that lines look like this: >> >> push 1 >> push 2 >> add_to_first_n 2 10 >> pop >> pop >> >> with all arguments as integers, and the final value being returned from >> main()? >> How did you convert from string inputs to numeric values? >> How did you manage return values? >> >> :D >> >> On Wed, Jun 7, 2017 at 6:51 PM Jonathan Hartley >> wrote: >> >>> I recently submitted a solution to a coding challenge, in an employment >>> context. One of the questions was to model a simple stack. I wrote a >>> solution which appended and popped from the end of a list. This worked, but >>> failed with timeouts on their last few automated tests with large (hidden) >>> data sets. >>> >>> From memory, I think I had something pretty standard: >>> >>> class Stack: >>> >>> def __init__(self): >>> self.storage = [] >>> >>> def push(arg): >>> self.storage.append(arg) >>> >>> def pop(): >>> return self.storage.pop() if self.storage else None >>> >>> def add_to_first_n(n, amount): >>> for n in range(n): >>> self.storage[n] += amount >>> >>> def dispatch(self, line) >>> tokens = line.split() >>> method = getattr(self, tokens[0]) >>> args = tokens[1:] >>> method(*args) >>> >>> def main(lines): >>> stack = Stack() >>> for line in lines: >>> stack.dispatch(line) >>> >>> >>> (will that formatting survive? Apologies if not) >>> >>> Subsequent experiments have confirmed that appending to and popping from >>> the end of lists are O(1), amortized. >>> So why is my solution too slow? >>> >>> This question was against the clock, 4th question of 4 in an hour. So I >>> wasn't expecting to produce Cython or C optimised code in that timeframe >>> (Besides, my submitted .py file runs on their servers, so the environment >>> is limited.) >>> >>> So what am I missing, from a performance perspective? Are there other >>> data structures in stdlib which are also O(1) but with a better constant? >>> >>> Ah. In writing this out, I have begun to suspect that my slicing of >>> 'tokens' to produce 'args' in the dispatch is needlessly wasting time. Not >>> much, but some. >>> >>> Thoughts welcome, >>> >>> Jonathan >>> >>> -- >>> Jonathan Hartley tartley at tartley.com http://tartley.com >>> Made out of meat. +1 507-513-1101 <%28507%29%20513-1101> twitter/skype: tartley >>> >>> >>> _______________________________________________ >>> python-uk mailing list >>> python-uk at python.org >>> https://mail.python.org/mailman/listinfo/python-uk >>> >> >> >> _______________________________________________ >> python-uk mailing listpython-uk at python.orghttps://mail.python.org/mailman/listinfo/python-uk >> >> >> -- >> Jonathan Hartley tartley at tartley.com http://tartley.com >> Made out of meat. +1 507-513-1101 <%28507%29%20513-1101> twitter/skype: tartley >> >> >> _______________________________________________ >> python-uk mailing list >> python-uk at python.org >> https://mail.python.org/mailman/listinfo/python-uk >> > > > _______________________________________________ > python-uk mailing listpython-uk at python.orghttps://mail.python.org/mailman/listinfo/python-uk > > > -- > Jonathan Hartley tartley at tartley.com http://tartley.com > Made out of meat. +1 507-513-1101 twitter/skype: tartley > > > _______________________________________________ > python-uk mailing list > python-uk at python.org > https://mail.python.org/mailman/listinfo/python-uk > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stestagg at gmail.com Thu Jun 8 14:17:41 2017 From: stestagg at gmail.com (Stestagg) Date: Thu, 08 Jun 2017 18:17:41 +0000 Subject: [python-uk] A stack with better performance than using a list In-Reply-To: <43db980a-15ca-2322-e137-791677c595c6@tartley.com> References: <8e6e0b39-a09c-4b31-b5fb-8ad9ece574ad@tartley.com> <43db980a-15ca-2322-e137-791677c595c6@tartley.com> Message-ID: I tracked down the challenge on the site, and have a working solution (I won't share for obvious reasons). Basically the timeouts were being caused by 'add_to_first_n' being called in horrible ways in the test cases. Because add_to_first_n alters the bottom of the stack, you can just push a marker onto the stack rather than iterating and mutating each entry, doing this made those test cases pass Personally, I think it's not a well-described problem, because it's expecting you to tune the algo to specific shapes of data without allowing any visibility on the data, or a description of what to code for. An algo junkie may jump straight to the optimized version, but a pragmatic developer would, in my opinion, hesitate to do that without any actual evidence that the problem required it. Steve On Thu, Jun 8, 2017 at 5:27 PM Jonathan Hartley wrote: > Yep, that's a great elimination of the suspicious small overheads. > > line_profiler is beautiful, I'll definitely be adding it to my toolbox, > thanks for that! > > I tried a variant of accumulating the output and printing it all as a > single string, but of course this didn't help, printing is already buffered. > > Jonathan > > On 6/8/2017 03:54, Stestagg wrote: > > I honestly can't see a way to improve this in python. My best solution > is: > > def main(lines): > stack = [] > sa = stack.append > sp = stack.pop > si = stack.__getitem__ > for line in lines: > meth = line[:3] > if meth == b'pus': > sa(int(line[5:])) > elif meth == b'pop': > sp() > else: > parts = line[15:].split() > end = len(stack)-1 > amount = int(parts[1]) > for x in range(int(parts[0])): > index = end - x > stack[index] += amount > print(stack[-1] if stack else None) > > which comes out about 25% faster than your solution. > > One tool that's interesting to use here is: line_profiler: > https://github.com/rkern/line_profiler > > putting a @profile decorator on the above main() call, and running with > kernprof produces the following output: > > Line # Hits Time Per Hit % Time Line Contents > > ============================================================== > > 12 @profile > > 13 def main(lines): > > 14 1 4 4.0 0.0 stack = [] > > 15 2000001 949599 0.5 11.5 for line in lines: > > 16 2000000 1126944 0.6 13.7 meth = line[:3] > > 17 2000000 974635 0.5 11.8 if meth == > b'pus': > > 18 1000000 1002733 1.0 12.2 > stack.append(int(line[5:])) > > 19 1000000 478756 0.5 5.8 elif meth == > b'pop': > > 20 999999 597114 0.6 7.2 stack.pop() > > 21 else: > > 22 1 6 6.0 0.0 parts = > line[15:].split() > > 23 1 2 2.0 0.0 end = > len(stack)-1 > > 24 1 1 1.0 0.0 amount = > int(parts[1]) > > 25 500001 241227 0.5 2.9 for x in > range(int(parts[0])): > > 26 500000 273477 0.5 3.3 index = > end - x > > 27 500000 309033 0.6 3.7 stack[index] > += amount > > 28 2000000 2295803 1.1 27.8 print(stack[-1]) > > which shows that there's no obvious bottleneck (line by line) here (for my > sample data). > > Note the print() overhead dominates the runtime, and that's with me piping > the output to /dev/null directly. > > I had a go at using arrays, deques, and numpy arrays in various ways > without luck, but we're getting fairly close to the native python statement > execution overhead here (hence folding it all into one function). > > My only thoughts would be to see if there were some magic that could be > done by offloading the work onto a non-python library somehow. > > Another thing that might help some situations (hence my previous > questions) would be to implement the add_to_first_n as a lazy operator > (i.e. have a stack of the add_to_first_n values and dynamically add to the > results of pop() but that would proabably be much slow in the average case. > > Steve > > On Wed, Jun 7, 2017 at 7:34 PM Jonathan Hartley > wrote: > >> Hey. >> >> Thanks for engaging, but I can't help with the most important of those >> questions - the large data sets on which my solution failed due to timeout >> are hidden from candidates. Not unreasonable to assume that they do >> exercise deep stacks, and large args to add_to_first_n, etc. >> >> Yes, the input looks exactly like your example. All args are integers. >> The question asked for output corresponding to the top of the stack after >> every operation. I omitted this print from inside the 'for' loop in 'main', >> thinking it irrelevant. >> >> I converted to integers inside 'dispatch'. 'args' must have actually been >> created with: >> >> args = [int(i) for i in tokens[1:]] >> >> Where len(tokens) is never going to be bigger than 3. >> >> Return values (from 'pop') were unused. >> >> >> On 6/7/2017 13:25, Stestagg wrote: >> >> Do you have any more context? >> For example, is the add_to_first_n likely to be called with very large >> numbers, or very often? Does the stack get very deep, or stay shallow? >> >> I'm assuming that lines look like this: >> >> push 1 >> push 2 >> add_to_first_n 2 10 >> pop >> pop >> >> with all arguments as integers, and the final value being returned from >> main()? >> How did you convert from string inputs to numeric values? >> How did you manage return values? >> >> :D >> >> On Wed, Jun 7, 2017 at 6:51 PM Jonathan Hartley >> wrote: >> >>> I recently submitted a solution to a coding challenge, in an employment >>> context. One of the questions was to model a simple stack. I wrote a >>> solution which appended and popped from the end of a list. This worked, but >>> failed with timeouts on their last few automated tests with large (hidden) >>> data sets. >>> >>> From memory, I think I had something pretty standard: >>> >>> class Stack: >>> >>> def __init__(self): >>> self.storage = [] >>> >>> def push(arg): >>> self.storage.append(arg) >>> >>> def pop(): >>> return self.storage.pop() if self.storage else None >>> >>> def add_to_first_n(n, amount): >>> for n in range(n): >>> self.storage[n] += amount >>> >>> def dispatch(self, line) >>> tokens = line.split() >>> method = getattr(self, tokens[0]) >>> args = tokens[1:] >>> method(*args) >>> >>> def main(lines): >>> stack = Stack() >>> for line in lines: >>> stack.dispatch(line) >>> >>> >>> (will that formatting survive? Apologies if not) >>> >>> Subsequent experiments have confirmed that appending to and popping from >>> the end of lists are O(1), amortized. >>> So why is my solution too slow? >>> >>> This question was against the clock, 4th question of 4 in an hour. So I >>> wasn't expecting to produce Cython or C optimised code in that timeframe >>> (Besides, my submitted .py file runs on their servers, so the environment >>> is limited.) >>> >>> So what am I missing, from a performance perspective? Are there other >>> data structures in stdlib which are also O(1) but with a better constant? >>> >>> Ah. In writing this out, I have begun to suspect that my slicing of >>> 'tokens' to produce 'args' in the dispatch is needlessly wasting time. Not >>> much, but some. >>> >>> Thoughts welcome, >>> >>> Jonathan >>> >>> -- >>> Jonathan Hartley tartley at tartley.com http://tartley.com >>> Made out of meat. +1 507-513-1101 <%28507%29%20513-1101> twitter/skype: tartley >>> >>> >>> _______________________________________________ >>> python-uk mailing list >>> python-uk at python.org >>> https://mail.python.org/mailman/listinfo/python-uk >>> >> >> >> _______________________________________________ >> python-uk mailing listpython-uk at python.orghttps://mail.python.org/mailman/listinfo/python-uk >> >> >> -- >> Jonathan Hartley tartley at tartley.com http://tartley.com >> Made out of meat. +1 507-513-1101 <%28507%29%20513-1101> twitter/skype: tartley >> >> >> _______________________________________________ >> python-uk mailing list >> python-uk at python.org >> https://mail.python.org/mailman/listinfo/python-uk >> > > > _______________________________________________ > python-uk mailing listpython-uk at python.orghttps://mail.python.org/mailman/listinfo/python-uk > > > -- > Jonathan Hartley tartley at tartley.com http://tartley.com > Made out of meat. +1 507-513-1101 <(507)%20513-1101> twitter/skype: tartley > > > _______________________________________________ > python-uk mailing list > python-uk at python.org > https://mail.python.org/mailman/listinfo/python-uk > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stestagg at gmail.com Thu Jun 8 14:18:37 2017 From: stestagg at gmail.com (Stestagg) Date: Thu, 08 Jun 2017 18:18:37 +0000 Subject: [python-uk] A stack with better performance than using a list In-Reply-To: References: <8e6e0b39-a09c-4b31-b5fb-8ad9ece574ad@tartley.com> <43db980a-15ca-2322-e137-791677c595c6@tartley.com> Message-ID: Apologies, In my previous email, I meant 'insert a marker', rather than 'push a marker' On Thu, Jun 8, 2017 at 7:17 PM Stestagg wrote: > I tracked down the challenge on the site, and have a working solution (I > won't share for obvious reasons). Basically the timeouts were being caused > by 'add_to_first_n' being called in horrible ways in the test cases. > > Because add_to_first_n alters the bottom of the stack, you can just push a > marker onto the stack rather than iterating and mutating each entry, doing > this made those test cases pass > > Personally, I think it's not a well-described problem, because it's > expecting you to tune the algo to specific shapes of data without allowing > any visibility on the data, or a description of what to code for. An algo > junkie may jump straight to the optimized version, but a pragmatic > developer would, in my opinion, hesitate to do that without any actual > evidence that the problem required it. > > Steve > > > > > > On Thu, Jun 8, 2017 at 5:27 PM Jonathan Hartley > wrote: > >> Yep, that's a great elimination of the suspicious small overheads. >> >> line_profiler is beautiful, I'll definitely be adding it to my toolbox, >> thanks for that! >> >> I tried a variant of accumulating the output and printing it all as a >> single string, but of course this didn't help, printing is already buffered. >> >> Jonathan >> >> On 6/8/2017 03:54, Stestagg wrote: >> >> I honestly can't see a way to improve this in python. My best solution >> is: >> >> def main(lines): >> stack = [] >> sa = stack.append >> sp = stack.pop >> si = stack.__getitem__ >> for line in lines: >> meth = line[:3] >> if meth == b'pus': >> sa(int(line[5:])) >> elif meth == b'pop': >> sp() >> else: >> parts = line[15:].split() >> end = len(stack)-1 >> amount = int(parts[1]) >> for x in range(int(parts[0])): >> index = end - x >> stack[index] += amount >> print(stack[-1] if stack else None) >> >> which comes out about 25% faster than your solution. >> >> One tool that's interesting to use here is: line_profiler: >> https://github.com/rkern/line_profiler >> >> putting a @profile decorator on the above main() call, and running with >> kernprof produces the following output: >> >> Line # Hits Time Per Hit % Time Line Contents >> >> ============================================================== >> >> 12 @profile >> >> 13 def main(lines): >> >> 14 1 4 4.0 0.0 stack = [] >> >> 15 2000001 949599 0.5 11.5 for line in lines: >> >> 16 2000000 1126944 0.6 13.7 meth = line[:3] >> >> 17 2000000 974635 0.5 11.8 if meth == >> b'pus': >> >> 18 1000000 1002733 1.0 12.2 >> stack.append(int(line[5:])) >> >> 19 1000000 478756 0.5 5.8 elif meth == >> b'pop': >> >> 20 999999 597114 0.6 7.2 stack.pop() >> >> 21 else: >> >> 22 1 6 6.0 0.0 parts = >> line[15:].split() >> >> 23 1 2 2.0 0.0 end = >> len(stack)-1 >> >> 24 1 1 1.0 0.0 amount = >> int(parts[1]) >> >> 25 500001 241227 0.5 2.9 for x in >> range(int(parts[0])): >> >> 26 500000 273477 0.5 3.3 index = >> end - x >> >> 27 500000 309033 0.6 3.7 stack[index] >> += amount >> >> 28 2000000 2295803 1.1 27.8 print(stack[-1]) >> >> which shows that there's no obvious bottleneck (line by line) here (for >> my sample data). >> >> Note the print() overhead dominates the runtime, and that's with me >> piping the output to /dev/null directly. >> >> I had a go at using arrays, deques, and numpy arrays in various ways >> without luck, but we're getting fairly close to the native python statement >> execution overhead here (hence folding it all into one function). >> >> My only thoughts would be to see if there were some magic that could be >> done by offloading the work onto a non-python library somehow. >> >> Another thing that might help some situations (hence my previous >> questions) would be to implement the add_to_first_n as a lazy operator >> (i.e. have a stack of the add_to_first_n values and dynamically add to the >> results of pop() but that would proabably be much slow in the average case. >> >> Steve >> >> On Wed, Jun 7, 2017 at 7:34 PM Jonathan Hartley >> wrote: >> >>> Hey. >>> >>> Thanks for engaging, but I can't help with the most important of those >>> questions - the large data sets on which my solution failed due to timeout >>> are hidden from candidates. Not unreasonable to assume that they do >>> exercise deep stacks, and large args to add_to_first_n, etc. >>> >>> Yes, the input looks exactly like your example. All args are integers. >>> The question asked for output corresponding to the top of the stack after >>> every operation. I omitted this print from inside the 'for' loop in 'main', >>> thinking it irrelevant. >>> >>> I converted to integers inside 'dispatch'. 'args' must have actually >>> been created with: >>> >>> args = [int(i) for i in tokens[1:]] >>> >>> Where len(tokens) is never going to be bigger than 3. >>> >>> Return values (from 'pop') were unused. >>> >>> >>> On 6/7/2017 13:25, Stestagg wrote: >>> >>> Do you have any more context? >>> For example, is the add_to_first_n likely to be called with very large >>> numbers, or very often? Does the stack get very deep, or stay shallow? >>> >>> I'm assuming that lines look like this: >>> >>> push 1 >>> push 2 >>> add_to_first_n 2 10 >>> pop >>> pop >>> >>> with all arguments as integers, and the final value being returned from >>> main()? >>> How did you convert from string inputs to numeric values? >>> How did you manage return values? >>> >>> :D >>> >>> On Wed, Jun 7, 2017 at 6:51 PM Jonathan Hartley >>> wrote: >>> >>>> I recently submitted a solution to a coding challenge, in an employment >>>> context. One of the questions was to model a simple stack. I wrote a >>>> solution which appended and popped from the end of a list. This worked, but >>>> failed with timeouts on their last few automated tests with large (hidden) >>>> data sets. >>>> >>>> From memory, I think I had something pretty standard: >>>> >>>> class Stack: >>>> >>>> def __init__(self): >>>> self.storage = [] >>>> >>>> def push(arg): >>>> self.storage.append(arg) >>>> >>>> def pop(): >>>> return self.storage.pop() if self.storage else None >>>> >>>> def add_to_first_n(n, amount): >>>> for n in range(n): >>>> self.storage[n] += amount >>>> >>>> def dispatch(self, line) >>>> tokens = line.split() >>>> method = getattr(self, tokens[0]) >>>> args = tokens[1:] >>>> method(*args) >>>> >>>> def main(lines): >>>> stack = Stack() >>>> for line in lines: >>>> stack.dispatch(line) >>>> >>>> >>>> (will that formatting survive? Apologies if not) >>>> >>>> Subsequent experiments have confirmed that appending to and popping >>>> from the end of lists are O(1), amortized. >>>> So why is my solution too slow? >>>> >>>> This question was against the clock, 4th question of 4 in an hour. So I >>>> wasn't expecting to produce Cython or C optimised code in that timeframe >>>> (Besides, my submitted .py file runs on their servers, so the environment >>>> is limited.) >>>> >>>> So what am I missing, from a performance perspective? Are there other >>>> data structures in stdlib which are also O(1) but with a better constant? >>>> >>>> Ah. In writing this out, I have begun to suspect that my slicing of >>>> 'tokens' to produce 'args' in the dispatch is needlessly wasting time. Not >>>> much, but some. >>>> >>>> Thoughts welcome, >>>> >>>> Jonathan >>>> >>>> -- >>>> Jonathan Hartley tartley at tartley.com http://tartley.com >>>> Made out of meat. +1 507-513-1101 <%28507%29%20513-1101> twitter/skype: tartley >>>> >>>> >>>> _______________________________________________ >>>> python-uk mailing list >>>> python-uk at python.org >>>> https://mail.python.org/mailman/listinfo/python-uk >>>> >>> >>> >>> _______________________________________________ >>> python-uk mailing listpython-uk at python.orghttps://mail.python.org/mailman/listinfo/python-uk >>> >>> >>> -- >>> Jonathan Hartley tartley at tartley.com http://tartley.com >>> Made out of meat. +1 507-513-1101 <%28507%29%20513-1101> twitter/skype: tartley >>> >>> >>> _______________________________________________ >>> python-uk mailing list >>> python-uk at python.org >>> https://mail.python.org/mailman/listinfo/python-uk >>> >> >> >> _______________________________________________ >> python-uk mailing listpython-uk at python.orghttps://mail.python.org/mailman/listinfo/python-uk >> >> >> -- >> Jonathan Hartley tartley at tartley.com http://tartley.com >> Made out of meat. +1 507-513-1101 <(507)%20513-1101> twitter/skype: tartley >> >> >> _______________________________________________ >> python-uk mailing list >> python-uk at python.org >> https://mail.python.org/mailman/listinfo/python-uk >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mauve at mauveweb.co.uk Sun Jun 11 09:38:11 2017 From: mauve at mauveweb.co.uk (Daniel Pope) Date: Sun, 11 Jun 2017 13:38:11 +0000 Subject: [python-uk] A stack with better performance than using a list In-Reply-To: References: <8e6e0b39-a09c-4b31-b5fb-8ad9ece574ad@tartley.com> <43db980a-15ca-2322-e137-791677c595c6@tartley.com> Message-ID: I was able to get about a 20% speed up over Steve's solution, on some benchmark data I created, by: * converting LOAD_GLOBAL to LOAD_FAST for __builtins__ * eliminating the conditional in each loop in favour of a conditional on pop only * eliminating string comparison for the operation in favour of testing line[1] against byte values None of these had a significant effect in either direction: * replacing range objects with counts * replacing the string.split()/int() calls with a hand-rolled space-separated integer parser in Python. Oh, in my benchmarking, the print() calls were huge and highly variable... I stubbed them out first. On Thu, 8 Jun 2017, 21:39 Stestagg, wrote: > Apologies, In my previous email, I meant 'insert a marker', rather than > 'push a marker' > > On Thu, Jun 8, 2017 at 7:17 PM Stestagg wrote: > >> I tracked down the challenge on the site, and have a working solution (I >> won't share for obvious reasons). Basically the timeouts were being caused >> by 'add_to_first_n' being called in horrible ways in the test cases. >> >> Because add_to_first_n alters the bottom of the stack, you can just push >> a marker onto the stack rather than iterating and mutating each entry, >> doing this made those test cases pass >> >> Personally, I think it's not a well-described problem, because it's >> expecting you to tune the algo to specific shapes of data without allowing >> any visibility on the data, or a description of what to code for. An algo >> junkie may jump straight to the optimized version, but a pragmatic >> developer would, in my opinion, hesitate to do that without any actual >> evidence that the problem required it. >> >> Steve >> >> >> >> >> >> On Thu, Jun 8, 2017 at 5:27 PM Jonathan Hartley >> wrote: >> >>> Yep, that's a great elimination of the suspicious small overheads. >>> >>> line_profiler is beautiful, I'll definitely be adding it to my toolbox, >>> thanks for that! >>> >>> I tried a variant of accumulating the output and printing it all as a >>> single string, but of course this didn't help, printing is already buffered. >>> >>> Jonathan >>> >>> On 6/8/2017 03:54, Stestagg wrote: >>> >>> I honestly can't see a way to improve this in python. My best solution >>> is: >>> >>> def main(lines): >>> stack = [] >>> sa = stack.append >>> sp = stack.pop >>> si = stack.__getitem__ >>> for line in lines: >>> meth = line[:3] >>> if meth == b'pus': >>> sa(int(line[5:])) >>> elif meth == b'pop': >>> sp() >>> else: >>> parts = line[15:].split() >>> end = len(stack)-1 >>> amount = int(parts[1]) >>> for x in range(int(parts[0])): >>> index = end - x >>> stack[index] += amount >>> print(stack[-1] if stack else None) >>> >>> which comes out about 25% faster than your solution. >>> >>> One tool that's interesting to use here is: line_profiler: >>> https://github.com/rkern/line_profiler >>> >>> putting a @profile decorator on the above main() call, and running with >>> kernprof produces the following output: >>> >>> Line # Hits Time Per Hit % Time Line Contents >>> >>> ============================================================== >>> >>> 12 @profile >>> >>> 13 def main(lines): >>> >>> 14 1 4 4.0 0.0 stack = [] >>> >>> 15 2000001 949599 0.5 11.5 for line in lines: >>> >>> 16 2000000 1126944 0.6 13.7 meth = line[:3] >>> >>> >>> 17 2000000 974635 0.5 11.8 if meth == >>> b'pus': >>> >>> 18 1000000 1002733 1.0 12.2 >>> stack.append(int(line[5:])) >>> >>> 19 1000000 478756 0.5 5.8 elif meth == >>> b'pop': >>> >>> 20 999999 597114 0.6 7.2 stack.pop() >>> >>> 21 else: >>> >>> 22 1 6 6.0 0.0 parts = >>> line[15:].split() >>> >>> 23 1 2 2.0 0.0 end = >>> len(stack)-1 >>> >>> 24 1 1 1.0 0.0 amount = >>> int(parts[1]) >>> >>> 25 500001 241227 0.5 2.9 for x in >>> range(int(parts[0])): >>> >>> 26 500000 273477 0.5 3.3 index >>> = end - x >>> >>> 27 500000 309033 0.6 3.7 stack[index] >>> += amount >>> >>> 28 2000000 2295803 1.1 27.8 >>> print(stack[-1]) >>> >>> which shows that there's no obvious bottleneck (line by line) here (for >>> my sample data). >>> >>> Note the print() overhead dominates the runtime, and that's with me >>> piping the output to /dev/null directly. >>> >>> I had a go at using arrays, deques, and numpy arrays in various ways >>> without luck, but we're getting fairly close to the native python statement >>> execution overhead here (hence folding it all into one function). >>> >>> My only thoughts would be to see if there were some magic that could be >>> done by offloading the work onto a non-python library somehow. >>> >>> Another thing that might help some situations (hence my previous >>> questions) would be to implement the add_to_first_n as a lazy operator >>> (i.e. have a stack of the add_to_first_n values and dynamically add to the >>> results of pop() but that would proabably be much slow in the average case. >>> >>> Steve >>> >>> On Wed, Jun 7, 2017 at 7:34 PM Jonathan Hartley >>> wrote: >>> >>>> Hey. >>>> >>>> Thanks for engaging, but I can't help with the most important of those >>>> questions - the large data sets on which my solution failed due to timeout >>>> are hidden from candidates. Not unreasonable to assume that they do >>>> exercise deep stacks, and large args to add_to_first_n, etc. >>>> >>>> Yes, the input looks exactly like your example. All args are integers. >>>> The question asked for output corresponding to the top of the stack after >>>> every operation. I omitted this print from inside the 'for' loop in 'main', >>>> thinking it irrelevant. >>>> >>>> I converted to integers inside 'dispatch'. 'args' must have actually >>>> been created with: >>>> >>>> args = [int(i) for i in tokens[1:]] >>>> >>>> Where len(tokens) is never going to be bigger than 3. >>>> >>>> Return values (from 'pop') were unused. >>>> >>>> >>>> On 6/7/2017 13:25, Stestagg wrote: >>>> >>>> Do you have any more context? >>>> For example, is the add_to_first_n likely to be called with very large >>>> numbers, or very often? Does the stack get very deep, or stay shallow? >>>> >>>> I'm assuming that lines look like this: >>>> >>>> push 1 >>>> push 2 >>>> add_to_first_n 2 10 >>>> pop >>>> pop >>>> >>>> with all arguments as integers, and the final value being returned from >>>> main()? >>>> How did you convert from string inputs to numeric values? >>>> How did you manage return values? >>>> >>>> :D >>>> >>>> On Wed, Jun 7, 2017 at 6:51 PM Jonathan Hartley >>>> wrote: >>>> >>>>> I recently submitted a solution to a coding challenge, in an >>>>> employment context. One of the questions was to model a simple stack. I >>>>> wrote a solution which appended and popped from the end of a list. This >>>>> worked, but failed with timeouts on their last few automated tests with >>>>> large (hidden) data sets. >>>>> >>>>> From memory, I think I had something pretty standard: >>>>> >>>>> class Stack: >>>>> >>>>> def __init__(self): >>>>> self.storage = [] >>>>> >>>>> def push(arg): >>>>> self.storage.append(arg) >>>>> >>>>> def pop(): >>>>> return self.storage.pop() if self.storage else None >>>>> >>>>> def add_to_first_n(n, amount): >>>>> for n in range(n): >>>>> self.storage[n] += amount >>>>> >>>>> def dispatch(self, line) >>>>> tokens = line.split() >>>>> method = getattr(self, tokens[0]) >>>>> args = tokens[1:] >>>>> method(*args) >>>>> >>>>> def main(lines): >>>>> stack = Stack() >>>>> for line in lines: >>>>> stack.dispatch(line) >>>>> >>>>> >>>>> (will that formatting survive? Apologies if not) >>>>> >>>>> Subsequent experiments have confirmed that appending to and popping >>>>> from the end of lists are O(1), amortized. >>>>> So why is my solution too slow? >>>>> >>>>> This question was against the clock, 4th question of 4 in an hour. So >>>>> I wasn't expecting to produce Cython or C optimised code in that timeframe >>>>> (Besides, my submitted .py file runs on their servers, so the environment >>>>> is limited.) >>>>> >>>>> So what am I missing, from a performance perspective? Are there other >>>>> data structures in stdlib which are also O(1) but with a better constant? >>>>> >>>>> Ah. In writing this out, I have begun to suspect that my slicing of >>>>> 'tokens' to produce 'args' in the dispatch is needlessly wasting time. Not >>>>> much, but some. >>>>> >>>>> Thoughts welcome, >>>>> >>>>> Jonathan >>>>> >>>>> -- >>>>> Jonathan Hartley tartley at tartley.com http://tartley.com >>>>> Made out of meat. +1 507-513-1101 <%28507%29%20513-1101> twitter/skype: tartley >>>>> >>>>> >>>>> _______________________________________________ >>>>> python-uk mailing list >>>>> python-uk at python.org >>>>> https://mail.python.org/mailman/listinfo/python-uk >>>>> >>>> >>>> >>>> _______________________________________________ >>>> python-uk mailing listpython-uk at python.orghttps://mail.python.org/mailman/listinfo/python-uk >>>> >>>> >>>> -- >>>> Jonathan Hartley tartley at tartley.com http://tartley.com >>>> Made out of meat. +1 507-513-1101 <%28507%29%20513-1101> twitter/skype: tartley >>>> >>>> >>>> _______________________________________________ >>>> python-uk mailing list >>>> python-uk at python.org >>>> https://mail.python.org/mailman/listinfo/python-uk >>>> >>> >>> >>> _______________________________________________ >>> python-uk mailing listpython-uk at python.orghttps://mail.python.org/mailman/listinfo/python-uk >>> >>> >>> -- >>> Jonathan Hartley tartley at tartley.com http://tartley.com >>> Made out of meat. +1 507-513-1101 <(507)%20513-1101> twitter/skype: tartley >>> >>> >>> _______________________________________________ >>> python-uk mailing list >>> python-uk at python.org >>> https://mail.python.org/mailman/listinfo/python-uk >>> >> _______________________________________________ > python-uk mailing list > python-uk at python.org > https://mail.python.org/mailman/listinfo/python-uk > -------------- next part -------------- An HTML attachment was scrubbed... URL: From safe.hammad at sandacre.com Tue Jun 13 04:22:15 2017 From: safe.hammad at sandacre.com (Safe Hammad) Date: Tue, 13 Jun 2017 09:22:15 +0100 Subject: [python-uk] Job post Message-ID: Hi All, In case you might be interested, my company Arctic Shores is looking for a Senior Python Developer to join me and my team in sunny Manchester. Full details here: http://pythonjobs.github.io/jobs/arctic-shores-senior-python-developer.html If you've ever yearned to work in the Beautiful North and you're interested in finding out more, please get in touch with me directly. Best, Safe -- Safe Hammad safe at arcticshores.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From tartley at tartley.com Tue Jun 13 09:36:29 2017 From: tartley at tartley.com (Jonathan Hartley) Date: Tue, 13 Jun 2017 08:36:29 -0500 Subject: [python-uk] A stack with better performance than using a list In-Reply-To: References: <8e6e0b39-a09c-4b31-b5fb-8ad9ece574ad@tartley.com> <43db980a-15ca-2322-e137-791677c595c6@tartley.com> Message-ID: <049b66175b90c1102af862702f4cddfe@tartley.com> You are right, when popping an empty stack I should probably raise. On 2017-06-08 13:06, Samuel F wrote: > It may have failed for a different reason, (hard to say without the > original question and answer). > > In the case where the stack is empty, you are returning None, was that > the requirement? (Likely to have been -1) > > Sam > > On Thu, 8 Jun 2017 at 17:27, Jonathan Hartley > wrote: > >> Yep, that's a great elimination of the suspicious small overheads. >> >> line_profiler is beautiful, I'll definitely be adding it to my >> toolbox, thanks for that! >> >> I tried a variant of accumulating the output and printing it all as >> a single string, but of course this didn't help, printing is already >> buffered. >> >> Jonathan >> >> On 6/8/2017 03:54, Stestagg wrote: >> >> I honestly can't see a way to improve this in python. My best >> solution is: >> >> def main(lines): >> stack = [] >> sa = stack.append >> sp = stack.pop >> si = stack.__getitem__ >> for line in lines: >> meth = line[:3] >> if meth == b'pus': >> sa(int(line[5:])) >> elif meth == b'pop': >> sp() >> else: >> parts = line[15:].split() >> end = len(stack)-1 >> amount = int(parts[1]) >> for x in range(int(parts[0])): >> index = end - x >> stack[index] += amount >> print(stack[-1] if stack else None) >> >> which comes out about 25% faster than your solution. >> >> One tool that's interesting to use here is: line_profiler: >> https://github.com/rkern/line_profiler >> >> putting a @profile decorator on the above main() call, and running >> with kernprof produces the following output: >> >> Line # Hits Time Per Hit % Time Line Contents >> >> ============================================================== >> >> 12 @profile >> >> 13 def main(lines): >> >> 14 1 4 4.0 0.0 stack = [] >> >> 15 2000001 949599 0.5 11.5 for line in >> lines: >> >> 16 2000000 1126944 0.6 13.7 meth = >> line[:3] >> >> 17 2000000 974635 0.5 11.8 if meth == >> b'pus': >> >> 18 1000000 1002733 1.0 12.2 >> stack.append(int(line[5:])) >> >> 19 1000000 478756 0.5 5.8 elif meth >> == b'pop': >> >> 20 999999 597114 0.6 7.2 >> stack.pop() >> >> 21 else: >> >> 22 1 6 6.0 0.0 parts = >> line[15:].split() >> >> 23 1 2 2.0 0.0 end = >> len(stack)-1 >> >> 24 1 1 1.0 0.0 amount >> = int(parts[1]) >> >> 25 500001 241227 0.5 2.9 for x >> in range(int(parts[0])): >> >> 26 500000 273477 0.5 3.3 >> index = end - x >> >> 27 500000 309033 0.6 3.7 >> stack[index] += amount >> >> 28 2000000 2295803 1.1 27.8 >> print(stack[-1]) >> >> which shows that there's no obvious bottleneck (line by line) here >> (for my sample data). >> >> Note the print() overhead dominates the runtime, and that's with me >> piping the output to /dev/null directly. >> >> I had a go at using arrays, deques, and numpy arrays in various ways >> without luck, but we're getting fairly close to the native python >> statement execution overhead here (hence folding it all into one >> function). >> >> My only thoughts would be to see if there were some magic that could >> be done by offloading the work onto a non-python library somehow. >> >> Another thing that might help some situations (hence my previous >> questions) would be to implement the add_to_first_n as a lazy >> operator (i.e. have a stack of the add_to_first_n values and >> dynamically add to the results of pop() but that would proabably be >> much slow in the average case. >> >> Steve >> >> On Wed, Jun 7, 2017 at 7:34 PM Jonathan Hartley >> wrote: >> >> Hey. >> >> Thanks for engaging, but I can't help with the most important of >> those questions - the large data sets on which my solution failed >> due to timeout are hidden from candidates. Not unreasonable to >> assume that they do exercise deep stacks, and large args to >> add_to_first_n, etc. >> >> Yes, the input looks exactly like your example. All args are >> integers. The question asked for output corresponding to the top of >> the stack after every operation. I omitted this print from inside >> the 'for' loop in 'main', thinking it irrelevant. >> >> I converted to integers inside 'dispatch'. 'args' must have actually >> been created with: >> >> args = [int(i) for i in tokens[1:]] >> >> Where len(tokens) is never going to be bigger than 3. >> >> Return values (from 'pop') were unused. >> >> On 6/7/2017 13:25, Stestagg wrote: >> >> Do you have any more context? >> For example, is the add_to_first_n likely to be called with very >> large numbers, or very often? Does the stack get very deep, or stay >> shallow? >> >> I'm assuming that lines look like this: >> >> push 1 >> push 2 >> add_to_first_n 2 10 >> pop >> pop >> >> with all arguments as integers, and the final value being returned >> from main()? >> How did you convert from string inputs to numeric values? >> How did you manage return values? >> >> :D >> >> On Wed, Jun 7, 2017 at 6:51 PM Jonathan Hartley >> wrote: >> >> I recently submitted a solution to a coding challenge, in an >> employment context. One of the questions was to model a simple >> stack. I wrote a solution which appended and popped from the end of >> a list. This worked, but failed with timeouts on their last few >> automated tests with large (hidden) data sets. >> >> From memory, I think I had something pretty standard: >> >> class Stack: >> >> def __init__(self): >> self.storage = [] >> >> def push(arg): >> self.storage.append(arg) >> >> def pop(): >> return self.storage.pop() if self.storage else None >> >> def add_to_first_n(n, amount): >> for n in range(n): >> self.storage[n] += amount >> >> def dispatch(self, line) >> tokens = line.split() >> method = getattr(self, tokens[0]) >> args = tokens[1:] >> method(*args) >> >> def main(lines): >> stack = Stack() >> for line in lines: >> stack.dispatch(line) >> >> (will that formatting survive? Apologies if not) >> >> Subsequent experiments have confirmed that appending to and popping >> from the end of lists are O(1), amortized. So why is my solution too >> slow? >> >> This question was against the clock, 4th question of 4 in an hour. >> So I wasn't expecting to produce Cython or C optimised code in that >> timeframe (Besides, my submitted .py file runs on their servers, so >> the environment is limited.) >> >> So what am I missing, from a performance perspective? Are there >> other data structures in stdlib which are also O(1) but with a >> better constant? >> >> Ah. In writing this out, I have begun to suspect that my slicing of >> 'tokens' to produce 'args' in the dispatch is needlessly wasting >> time. Not much, but some. >> >> Thoughts welcome, >> >> Jonathan >> >> -- >> Jonathan Hartley tartley at tartley.com http://tartley.com >> Made out of meat. +1 507-513-1101 [1] twitter/skype: >> tartley >> >> _______________________________________________ >> python-uk mailing list >> python-uk at python.org >> https://mail.python.org/mailman/listinfo/python-uk >> >> _______________________________________________ >> python-uk mailing list >> python-uk at python.org >> https://mail.python.org/mailman/listinfo/python-uk > > -- > Jonathan Hartley tartley at tartley.com http://tartley.com > Made out of meat. +1 507-513-1101 [1] twitter/skype: tartley > > _______________________________________________ > python-uk mailing list > python-uk at python.org > https://mail.python.org/mailman/listinfo/python-uk > > _______________________________________________ > python-uk mailing list > python-uk at python.org > https://mail.python.org/mailman/listinfo/python-uk > > -- > Jonathan Hartley tartley at tartley.com http://tartley.com > Made out of meat. +1 507-513-1101 twitter/skype: tartley > > _______________________________________________ > python-uk mailing list > python-uk at python.org > https://mail.python.org/mailman/listinfo/python-uk > > > Links: > ------ > [1] tel:%28507%29%20513-1101 > _______________________________________________ > python-uk mailing list > python-uk at python.org > https://mail.python.org/mailman/listinfo/python-uk -- Jonathan Hartley Made out of meat. tartley at tartley.com +1 507-513-1101 From tartley at tartley.com Tue Jun 13 09:41:25 2017 From: tartley at tartley.com (Jonathan Hartley) Date: Tue, 13 Jun 2017 08:41:25 -0500 Subject: [python-uk] A stack with better performance than using a list In-Reply-To: References: <8e6e0b39-a09c-4b31-b5fb-8ad9ece574ad@tartley.com> <43db980a-15ca-2322-e137-791677c595c6@tartley.com> Message-ID: <59a00946c91540cf40d97af1c4a03a96@tartley.com> Very interesting! Thanks for digging deeper and sharing. I was thinking about horrible complicated structures like storing the 'add_to_first_n' params in parallel to the stack, to apply them at 'pop' time, which doesn't work at all. As is so often the case with these things, your solution of pushing those markers onto the stack makes me feel silly for not realising sooner. Thanks to everyone for the interesting discussion. Jonathan On 2017-06-08 13:17, Stestagg wrote: > I tracked down the challenge on the site, and have a working solution > (I won't share for obvious reasons). Basically the timeouts were being > caused by 'add_to_first_n' being called in horrible ways in the test > cases. > > Because add_to_first_n alters the bottom of the stack, you can just > push a marker onto the stack rather than iterating and mutating each > entry, doing this made those test cases pass > > Personally, I think it's not a well-described problem, because it's > expecting you to tune the algo to specific shapes of data without > allowing any visibility on the data, or a description of what to code > for. An algo junkie may jump straight to the optimized version, but a > pragmatic developer would, in my opinion, hesitate to do that without > any actual evidence that the problem required it. > > Steve > > On Thu, Jun 8, 2017 at 5:27 PM Jonathan Hartley > wrote: > >> Yep, that's a great elimination of the suspicious small overheads. >> >> line_profiler is beautiful, I'll definitely be adding it to my >> toolbox, thanks for that! >> >> I tried a variant of accumulating the output and printing it all as >> a single string, but of course this didn't help, printing is already >> buffered. >> >> Jonathan >> >> On 6/8/2017 03:54, Stestagg wrote: >> >> I honestly can't see a way to improve this in python. My best >> solution is: >> >> def main(lines): >> stack = [] >> sa = stack.append >> sp = stack.pop >> si = stack.__getitem__ >> for line in lines: >> meth = line[:3] >> if meth == b'pus': >> sa(int(line[5:])) >> elif meth == b'pop': >> sp() >> else: >> parts = line[15:].split() >> end = len(stack)-1 >> amount = int(parts[1]) >> for x in range(int(parts[0])): >> index = end - x >> stack[index] += amount >> print(stack[-1] if stack else None) >> >> which comes out about 25% faster than your solution. >> >> One tool that's interesting to use here is: line_profiler: >> https://github.com/rkern/line_profiler >> >> putting a @profile decorator on the above main() call, and running >> with kernprof produces the following output: >> >> Line # Hits Time Per Hit % Time Line Contents >> >> ============================================================== >> >> 12 @profile >> >> 13 def main(lines): >> >> 14 1 4 4.0 0.0 stack = [] >> >> 15 2000001 949599 0.5 11.5 for line in >> lines: >> >> 16 2000000 1126944 0.6 13.7 meth = >> line[:3] >> >> 17 2000000 974635 0.5 11.8 if meth == >> b'pus': >> >> 18 1000000 1002733 1.0 12.2 >> stack.append(int(line[5:])) >> >> 19 1000000 478756 0.5 5.8 elif meth >> == b'pop': >> >> 20 999999 597114 0.6 7.2 >> stack.pop() >> >> 21 else: >> >> 22 1 6 6.0 0.0 parts = >> line[15:].split() >> >> 23 1 2 2.0 0.0 end = >> len(stack)-1 >> >> 24 1 1 1.0 0.0 amount >> = int(parts[1]) >> >> 25 500001 241227 0.5 2.9 for x >> in range(int(parts[0])): >> >> 26 500000 273477 0.5 3.3 >> index = end - x >> >> 27 500000 309033 0.6 3.7 >> stack[index] += amount >> >> 28 2000000 2295803 1.1 27.8 >> print(stack[-1]) >> >> which shows that there's no obvious bottleneck (line by line) here >> (for my sample data). >> >> Note the print() overhead dominates the runtime, and that's with me >> piping the output to /dev/null directly. >> >> I had a go at using arrays, deques, and numpy arrays in various ways >> without luck, but we're getting fairly close to the native python >> statement execution overhead here (hence folding it all into one >> function). >> >> My only thoughts would be to see if there were some magic that could >> be done by offloading the work onto a non-python library somehow. >> >> Another thing that might help some situations (hence my previous >> questions) would be to implement the add_to_first_n as a lazy >> operator (i.e. have a stack of the add_to_first_n values and >> dynamically add to the results of pop() but that would proabably be >> much slow in the average case. >> >> Steve >> >> On Wed, Jun 7, 2017 at 7:34 PM Jonathan Hartley >> wrote: >> >> Hey. >> >> Thanks for engaging, but I can't help with the most important of >> those questions - the large data sets on which my solution failed >> due to timeout are hidden from candidates. Not unreasonable to >> assume that they do exercise deep stacks, and large args to >> add_to_first_n, etc. >> >> Yes, the input looks exactly like your example. All args are >> integers. The question asked for output corresponding to the top of >> the stack after every operation. I omitted this print from inside >> the 'for' loop in 'main', thinking it irrelevant. >> >> I converted to integers inside 'dispatch'. 'args' must have actually >> been created with: >> >> args = [int(i) for i in tokens[1:]] >> >> Where len(tokens) is never going to be bigger than 3. >> >> Return values (from 'pop') were unused. >> >> On 6/7/2017 13:25, Stestagg wrote: >> >> Do you have any more context? >> For example, is the add_to_first_n likely to be called with very >> large numbers, or very often? Does the stack get very deep, or stay >> shallow? >> >> I'm assuming that lines look like this: >> >> push 1 >> push 2 >> add_to_first_n 2 10 >> pop >> pop >> >> with all arguments as integers, and the final value being returned >> from main()? >> How did you convert from string inputs to numeric values? >> How did you manage return values? >> >> :D >> >> On Wed, Jun 7, 2017 at 6:51 PM Jonathan Hartley >> wrote: >> >> I recently submitted a solution to a coding challenge, in an >> employment context. One of the questions was to model a simple >> stack. I wrote a solution which appended and popped from the end of >> a list. This worked, but failed with timeouts on their last few >> automated tests with large (hidden) data sets. >> >> From memory, I think I had something pretty standard: >> >> class Stack: >> >> def __init__(self): >> self.storage = [] >> >> def push(arg): >> self.storage.append(arg) >> >> def pop(): >> return self.storage.pop() if self.storage else None >> >> def add_to_first_n(n, amount): >> for n in range(n): >> self.storage[n] += amount >> >> def dispatch(self, line) >> tokens = line.split() >> method = getattr(self, tokens[0]) >> args = tokens[1:] >> method(*args) >> >> def main(lines): >> stack = Stack() >> for line in lines: >> stack.dispatch(line) >> >> (will that formatting survive? Apologies if not) >> >> Subsequent experiments have confirmed that appending to and popping >> from the end of lists are O(1), amortized. So why is my solution too >> slow? >> >> This question was against the clock, 4th question of 4 in an hour. >> So I wasn't expecting to produce Cython or C optimised code in that >> timeframe (Besides, my submitted .py file runs on their servers, so >> the environment is limited.) >> >> So what am I missing, from a performance perspective? Are there >> other data structures in stdlib which are also O(1) but with a >> better constant? >> >> Ah. In writing this out, I have begun to suspect that my slicing of >> 'tokens' to produce 'args' in the dispatch is needlessly wasting >> time. Not much, but some. >> >> Thoughts welcome, >> >> Jonathan >> >> -- >> Jonathan Hartley tartley at tartley.com http://tartley.com >> Made out of meat. +1 507-513-1101 [1] twitter/skype: >> tartley >> >> _______________________________________________ >> python-uk mailing list >> python-uk at python.org >> https://mail.python.org/mailman/listinfo/python-uk >> >> _______________________________________________ >> python-uk mailing list >> python-uk at python.org >> https://mail.python.org/mailman/listinfo/python-uk > > -- > Jonathan Hartley tartley at tartley.com http://tartley.com > Made out of meat. +1 507-513-1101 [1] twitter/skype: tartley > > _______________________________________________ > python-uk mailing list > python-uk at python.org > https://mail.python.org/mailman/listinfo/python-uk > > _______________________________________________ > python-uk mailing list > python-uk at python.org > https://mail.python.org/mailman/listinfo/python-uk > > -- > Jonathan Hartley tartley at tartley.com http://tartley.com > Made out of meat. +1 507-513-1101 [2] twitter/skype: tartley > > _______________________________________________ > python-uk mailing list > python-uk at python.org > https://mail.python.org/mailman/listinfo/python-uk > > > Links: > ------ > [1] tel:%28507%29%20513-1101 > [2] tel:(507)%20513-1101 > _______________________________________________ > python-uk mailing list > python-uk at python.org > https://mail.python.org/mailman/listinfo/python-uk -- Jonathan Hartley Made out of meat. tartley at tartley.com +1 507-513-1101 From breamoreboy at yahoo.co.uk Tue Jun 13 10:04:11 2017 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Tue, 13 Jun 2017 15:04:11 +0100 Subject: [python-uk] A stack with better performance than using a list In-Reply-To: References: Message-ID: On 07/06/2017 18:50, Jonathan Hartley wrote: > I recently submitted a solution to a coding challenge, in an employment > context. One of the questions was to model a simple stack. I wrote a > solution which appended and popped from the end of a list. This worked, > but failed with timeouts on their last few automated tests with large > (hidden) data sets. > > From memory, I think I had something pretty standard: > > class Stack: > > def __init__(self): > self.storage = [] > > def push(arg): > self.storage.append(arg) > > def pop(): > return self.storage.pop() if self.storage else None > > def add_to_first_n(n, amount): > for n in range(n): > self.storage[n] += amount > > def dispatch(self, line) > tokens = line.split() > method = getattr(self, tokens[0]) > args = tokens[1:] > method(*args) > > def main(lines): > stack = Stack() > for line in lines: > stack.dispatch(line) > > > (will that formatting survive? Apologies if not) > > Subsequent experiments have confirmed that appending to and popping from > the end of lists are O(1), amortized. > > So why is my solution too slow? > > This question was against the clock, 4th question of 4 in an hour. So I > wasn't expecting to produce Cython or C optimised code in that timeframe > (Besides, my submitted .py file runs on their servers, so the > environment is limited.) > > So what am I missing, from a performance perspective? Are there other > data structures in stdlib which are also O(1) but with a better constant? > > Ah. In writing this out, I have begun to suspect that my slicing of > 'tokens' to produce 'args' in the dispatch is needlessly wasting time. > Not much, but some. > > Thoughts welcome, > > Jonathan > > -- > Jonathan Hartleytartley at tartley.com http://tartley.com > Made out of meat. +1 507-513-1101 twitter/skype: tartley > Any objections to me putting this thread up on the main Python mailing list and reddit as it seems rather interesting? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From tartley at tartley.com Tue Jun 13 11:29:05 2017 From: tartley at tartley.com (Jonathan Hartley) Date: Tue, 13 Jun 2017 10:29:05 -0500 Subject: [python-uk] A stack with better performance than using a list In-Reply-To: References: Message-ID: <7698fc6f-bd3f-a32c-d8de-bc4e61976893@tartley.com> On 06/13/2017 09:04 AM, Mark Lawrence via python-uk wrote: > On 07/06/2017 18:50, Jonathan Hartley wrote: >> I recently submitted a solution to a coding challenge, in an >> employment context. One of the questions was to model a simple stack. >> I wrote a solution which appended and popped from the end of a list. >> This worked, but failed with timeouts on their last few automated >> tests with large (hidden) data sets. >> >> From memory, I think I had something pretty standard: >> >> class Stack: >> >> def __init__(self): >> self.storage = [] >> >> def push(arg): >> self.storage.append(arg) >> >> def pop(): >> return self.storage.pop() if self.storage else None >> >> def add_to_first_n(n, amount): >> for n in range(n): >> self.storage[n] += amount >> >> def dispatch(self, line) >> tokens = line.split() >> method = getattr(self, tokens[0]) >> args = tokens[1:] >> method(*args) >> >> def main(lines): >> stack = Stack() >> for line in lines: >> stack.dispatch(line) >> >> >> (will that formatting survive? Apologies if not) >> >> Subsequent experiments have confirmed that appending to and popping >> from the end of lists are O(1), amortized. >> >> So why is my solution too slow? >> >> This question was against the clock, 4th question of 4 in an hour. So >> I wasn't expecting to produce Cython or C optimised code in that >> timeframe (Besides, my submitted .py file runs on their servers, so >> the environment is limited.) >> >> So what am I missing, from a performance perspective? Are there other >> data structures in stdlib which are also O(1) but with a better >> constant? >> >> Ah. In writing this out, I have begun to suspect that my slicing of >> 'tokens' to produce 'args' in the dispatch is needlessly wasting >> time. Not much, but some. >> >> Thoughts welcome, >> >> Jonathan >> >> -- >> Jonathan Hartleytartley at tartley.com http://tartley.com >> Made out of meat. +1 507-513-1101 twitter/skype: tartley >> > > Any objections to me putting this thread up on the main Python mailing > list and reddit as it seems rather interesting? > I'd rather not if I get any say in that, because I agreed in the T&C of the coding challenge that I wouldn't discuss the problem or solutions with others, and I don't want to annoy them right now. How about in a month? :-) -- Jonathan Hartley tartley at tartley.com http://tartley.com Made out of meat. +1 507-513-1101 twitter/skype: tartley From breamoreboy at yahoo.co.uk Tue Jun 13 13:50:41 2017 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Tue, 13 Jun 2017 18:50:41 +0100 Subject: [python-uk] A stack with better performance than using a list In-Reply-To: <7698fc6f-bd3f-a32c-d8de-bc4e61976893@tartley.com> References: <7698fc6f-bd3f-a32c-d8de-bc4e61976893@tartley.com> Message-ID: On 13/06/2017 16:29, Jonathan Hartley wrote: > > On 06/13/2017 09:04 AM, Mark Lawrence via python-uk wrote: >> On 07/06/2017 18:50, Jonathan Hartley wrote: >>> I recently submitted a solution to a coding challenge, in an >>> employment context. One of the questions was to model a simple stack. >>> I wrote a solution which appended and popped from the end of a list. >>> This worked, but failed with timeouts on their last few automated >>> tests with large (hidden) data sets. >>> >>> From memory, I think I had something pretty standard: >>> >>> class Stack: >>> >>> def __init__(self): >>> self.storage = [] >>> >>> def push(arg): >>> self.storage.append(arg) >>> >>> def pop(): >>> return self.storage.pop() if self.storage else None >>> >>> def add_to_first_n(n, amount): >>> for n in range(n): >>> self.storage[n] += amount >>> >>> def dispatch(self, line) >>> tokens = line.split() >>> method = getattr(self, tokens[0]) >>> args = tokens[1:] >>> method(*args) >>> >>> def main(lines): >>> stack = Stack() >>> for line in lines: >>> stack.dispatch(line) >>> >>> >>> (will that formatting survive? Apologies if not) >>> >>> Subsequent experiments have confirmed that appending to and popping >>> from the end of lists are O(1), amortized. >>> >>> So why is my solution too slow? >>> >>> This question was against the clock, 4th question of 4 in an hour. So >>> I wasn't expecting to produce Cython or C optimised code in that >>> timeframe (Besides, my submitted .py file runs on their servers, so >>> the environment is limited.) >>> >>> So what am I missing, from a performance perspective? Are there other >>> data structures in stdlib which are also O(1) but with a better >>> constant? >>> >>> Ah. In writing this out, I have begun to suspect that my slicing of >>> 'tokens' to produce 'args' in the dispatch is needlessly wasting >>> time. Not much, but some. >>> >>> Thoughts welcome, >>> >>> Jonathan >>> >>> -- >>> Jonathan Hartleytartley at tartley.com http://tartley.com >>> Made out of meat. +1 507-513-1101 twitter/skype: tartley >>> >> >> Any objections to me putting this thread up on the main Python mailing >> list and reddit as it seems rather interesting? >> > > I'd rather not if I get any say in that, because I agreed in the T&C of > the coding challenge that I wouldn't discuss the problem or solutions > with others, and I don't want to annoy them right now. How about in a > month? :-) > Fine by me, on my calendar for 13th July :-) -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From daniele at vurt.org Thu Jun 15 10:06:54 2017 From: daniele at vurt.org (Daniele Procida) Date: Thu, 15 Jun 2017 16:06:54 +0200 Subject: [python-uk] Room share at EuroPython Message-ID: <20170615140654.1565476372@mail.gandi.net> Hi, I'll be at EuroPython from Saturday 8th to Saturday 15th. Would anyone like to share a hotel room? Daniele From sophie.hendley at digvis.co.uk Mon Jun 19 11:47:58 2017 From: sophie.hendley at digvis.co.uk (Sophie Hendley) Date: Mon, 19 Jun 2017 16:47:58 +0100 Subject: [python-uk] Two new roles- Please ignore if you love your current job. Message-ID: Hey guys, Will keep it brief for you so as to not cause a nuisance. *Platform developer/ Online booking platform/ SOHO/ to 80k* The guys are passionate about, clean code, TDD, DevOps and open source *Python developer/ Start-up/ up to 75k if you are interested in/ Farringdon:* Python, Django, REST/ SOAP -Working on a product which is truly innovative in its industry (insurance). -Learning and discovering as you go, being independent and choosing the tech you work with. -Working in a team where the boss trusts you and in a team with an awesome variety of people who all get along and work collaboratively. -Learning from some of the most successful entrepreneurs out there. Drop me a message and we can have a chat. Thanks guys, hope you aren't all as hot as I am #shouldhaveinvestedinaircon Sophie -- Sophie Hendley| Principal Consultant| Digital Vision *M:* 07505145903 *E: *sophie.hendley at digvis.co.uk *W:* www.digvis.co.uk Sponsor me please!!!!- https://www.justgiving.com/sophiehendley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From John at understandingrecruitment.co.uk Thu Jun 22 13:20:21 2017 From: John at understandingrecruitment.co.uk (John Thistlethwaite) Date: Thu, 22 Jun 2017 17:20:21 +0000 Subject: [python-uk] Python Developers who want to make mistakes to break boundaries Message-ID: Hi All, I wanted to reach out to the community and see if anyone (even if you are not actively looking) is interested in hearing about a number of well-funded start-ups at the beginning of their commercial mission with 4 - 5 years research behind them from a research phase? The companies are looking for start-up ready developers who want to immerse themselves in new technology and do something that will make a real difference at the same time as expanding your mind with coding problems that will keep you engaged, probably awake at night trying to solve because you will be so bought in to what you are doing and the potential that it becomes a life style rather than a job. I appreciate this is fairly limited in information, however if you are open to discussion please get in touch on this thread or on 01727 228 257 or john at understandingrecruitment.co.uk I look forward to hearing from intrigued readers in due course. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tibs at tibsnjoan.co.uk Wed Jun 28 14:19:42 2017 From: tibs at tibsnjoan.co.uk (Tibs) Date: Wed, 28 Jun 2017 19:19:42 +0100 Subject: [python-uk] Next Cambridge meeting: Tue 4th July 2017 Message-ID: <70357098-0BD5-40C0-9580-3B6C451F9A29@tibsnjoan.co.uk> The next meeting of the Cambridge Python User Group will be on Tuesday 4th July 2017. The meeting is at 7pm, ending around 9:30pm. The venue, as normal, is the Raspberry Pi offices at 30 Station Road, CB1 2JH, Cambridge. Normally some of us go on to the pub afterwards. Jan Jedrzej Chwiejczak will talk about Exploring Python ByteCode Have you ever wondered what is happening when you execute your Python programs? Would you like to gain insight into writing performance oriented code or be able to explain to your colleagues whether Python is an interpreted or compiled language? If any of these questions spark your curiosity then please come round. I will take you on a walk in the forest of abstract syntax trees grown by lexers and parsers where compilers generate streams of bits understood and run by interpreters. We will examine together the "intermediate language" which expresses your code as machine instructions and look at the ways we can understand it for fun and profit using the dis module. For the second part of this talk, we will pair up and try to apply what we learned to optimise some problematic Python code and understand how it looks to the Interpreter. This is a beginner level talk and the only prerequisites are a curious mind, basic python knowledge and a laptop with Jupyter Notebook installed running Python 3.x. If you don't' have the latter rest assured there will be enough people with one around. Remember that we are now on meetup.com, at http://www.meetup.com/CamPUG/. If possible, please RSVP there for meetings so we have an idea of numbers. As an incentive, there's normally more detail about each meeting there, and you can also find out about future meetings. Tweeting may occur at https://twitter.com/campython Tibs