From zachary.ware+pydev at gmail.com Thu Feb 4 01:48:21 2016 From: zachary.ware+pydev at gmail.com (Zachary Ware) Date: Thu, 4 Feb 2016 00:48:21 -0600 Subject: [Speed] speed.python.org Message-ID: I'm happy to announce that speed.python.org is finally functional! There's not much there yet, as each benchmark builder has only sent one result so far (and one of those involved a bit of cheating on my part), but it's there. There are likely to be rough edges that still need smoothing out. When you find them, please report them at https://github.com/zware/codespeed/issues or on the speed at python.org mailing list. Many thanks to Intel for funding the work to get it set up and to Brett Cannon and Benjamin Peterson for their reviews. Happy benchmarking, -- Zach From victor.stinner at gmail.com Thu Feb 4 03:19:42 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 4 Feb 2016 09:19:42 +0100 Subject: [Speed] [Python-Dev] speed.python.org In-Reply-To: References: Message-ID: Great! 2016-02-04 7:48 GMT+01:00 Zachary Ware : > I'm happy to announce that speed.python.org is finally functional! > There's not much there yet, as each benchmark builder has only sent > one result so far (and one of those involved a bit of cheating on my > part), but it's there. > > There are likely to be rough edges that still need smoothing out. > When you find them, please report them at > https://github.com/zware/codespeed/issues or on the speed at python.org > mailing list. > > Many thanks to Intel for funding the work to get it set up and to > Brett Cannon and Benjamin Peterson for their reviews. > > Happy benchmarking, > -- > Zach > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com From ncoghlan at gmail.com Thu Feb 4 08:41:59 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 4 Feb 2016 23:41:59 +1000 Subject: [Speed] [Python-Dev] speed.python.org In-Reply-To: References: Message-ID: On 4 February 2016 at 16:48, Zachary Ware wrote: > I'm happy to announce that speed.python.org is finally functional! > There's not much there yet, as each benchmark builder has only sent > one result so far (and one of those involved a bit of cheating on my > part), but it's there. > > There are likely to be rough edges that still need smoothing out. > When you find them, please report them at > https://github.com/zware/codespeed/issues or on the speed at python.org > mailing list. > > Many thanks to Intel for funding the work to get it set up and to > Brett Cannon and Benjamin Peterson for their reviews. This is great to hear! Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Feb 4 08:46:04 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 4 Feb 2016 23:46:04 +1000 Subject: [Speed] [Python-Dev] speed.python.org In-Reply-To: References: Message-ID: On 4 February 2016 at 16:48, Zachary Ware wrote: > I'm happy to announce that speed.python.org is finally functional! > There's not much there yet, as each benchmark builder has only sent > one result so far (and one of those involved a bit of cheating on my > part), but it's there. > > There are likely to be rough edges that still need smoothing out. > When you find them, please report them at > https://github.com/zware/codespeed/issues or on the speed at python.org > mailing list. > > Many thanks to Intel for funding the work to get it set up and to > Brett Cannon and Benjamin Peterson for their reviews. Heh, cdecimal utterly demolishing the old pure Python decimal module on the telco benchmark means normalising against CPython 3.5 rather than 2.7 really isn't very readable :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From brett at python.org Fri Feb 5 13:07:03 2016 From: brett at python.org (Brett Cannon) Date: Fri, 05 Feb 2016 18:07:03 +0000 Subject: [Speed] [Python-Dev] speed.python.org In-Reply-To: References: Message-ID: On Thu, 4 Feb 2016 at 05:46 Nick Coghlan wrote: > On 4 February 2016 at 16:48, Zachary Ware > wrote: > > I'm happy to announce that speed.python.org is finally functional! > > There's not much there yet, as each benchmark builder has only sent > > one result so far (and one of those involved a bit of cheating on my > > part), but it's there. > > > > There are likely to be rough edges that still need smoothing out. > > When you find them, please report them at > > https://github.com/zware/codespeed/issues or on the speed at python.org > > mailing list. > > > > Many thanks to Intel for funding the work to get it set up and to > > Brett Cannon and Benjamin Peterson for their reviews. > > Heh, cdecimal utterly demolishing the old pure Python decimal module > on the telco benchmark means normalising against CPython 3.5 rather > than 2.7 really isn't very readable :) > I find viewing the graphs using the horizontal layout is much easier to read (the bars are a lot thicker and everything zooms in more). -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Fri Feb 5 13:29:18 2016 From: brett at python.org (Brett Cannon) Date: Fri, 05 Feb 2016 18:29:18 +0000 Subject: [Speed] [Python-Dev] speed.python.org In-Reply-To: References: Message-ID: To piggyback on Zach's speed.python.org announcement, we will most likely be kicking off a discussion of redoing the benchmark suite, tweaking the test runner, etc. over on the speed@ ML. Those of us who have been doing perf work lately have found some shortcoming we would like to fix in our benchmarks suite, so if you want to participate in that discussion, please join speed@ by next week. On Wed, 3 Feb 2016 at 22:49 Zachary Ware wrote: > I'm happy to announce that speed.python.org is finally functional! > There's not much there yet, as each benchmark builder has only sent > one result so far (and one of those involved a bit of cheating on my > part), but it's there. > > There are likely to be rough edges that still need smoothing out. > When you find them, please report them at > https://github.com/zware/codespeed/issues or on the speed at python.org > mailing list. > > Many thanks to Intel for funding the work to get it set up and to > Brett Cannon and Benjamin Peterson for their reviews. > > Happy benchmarking, > -- > Zach > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Feb 6 02:05:26 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 6 Feb 2016 17:05:26 +1000 Subject: [Speed] [Python-Dev] speed.python.org In-Reply-To: References: Message-ID: On 6 February 2016 at 04:07, Brett Cannon wrote: > On Thu, 4 Feb 2016 at 05:46 Nick Coghlan wrote: >> Heh, cdecimal utterly demolishing the old pure Python decimal module >> on the telco benchmark means normalising against CPython 3.5 rather >> than 2.7 really isn't very readable :) > > I find viewing the graphs using the horizontal layout is much easier to read > (the bars are a lot thicker and everything zooms in more). That comment was based on the horizontal layout - the telco benchmark runs ~53x faster in Python 3 than it does in Python 2 (without switching to cdecimal), so you end up with all the other benchmarks being squashed into the leftmost couple of grid cells. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From greg at krypto.org Sun Feb 7 02:54:27 2016 From: greg at krypto.org (Gregory P. Smith) Date: Sun, 07 Feb 2016 07:54:27 +0000 Subject: [Speed] [Python-Dev] speed.python.org In-Reply-To: References: Message-ID: Displaying ratios linearly rather than on a log scale axis can be misleading depending on what you are looking for. (feature request: allow a log scale?) major kudos to everyone involved in getting this setup! On Fri, Feb 5, 2016 at 11:06 PM Nick Coghlan wrote: > On 6 February 2016 at 04:07, Brett Cannon wrote: > > On Thu, 4 Feb 2016 at 05:46 Nick Coghlan wrote: > >> Heh, cdecimal utterly demolishing the old pure Python decimal module > >> on the telco benchmark means normalising against CPython 3.5 rather > >> than 2.7 really isn't very readable :) > > > > I find viewing the graphs using the horizontal layout is much easier to > read > > (the bars are a lot thicker and everything zooms in more). > > That comment was based on the horizontal layout - the telco benchmark > runs ~53x faster in Python 3 than it does in Python 2 (without > switching to cdecimal), so you end up with all the other benchmarks > being squashed into the leftmost couple of grid cells. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Thu Feb 11 13:31:02 2016 From: brett at python.org (Brett Cannon) Date: Thu, 11 Feb 2016 18:31:02 +0000 Subject: [Speed] Any changes we want to make to perf.py? Message-ID: Some people have brought up the idea of tweaking how perf.py drives the benchmarks. I personally wonder if we should go from a elapsed time measurement to # of executions in a set amount of time measurement to get a more stable number that's easier to measure and will make sense even as Python and computers get faster (I got this idea from Mozilla's Dromaeo benchmark suite: https://wiki.mozilla.org/Dromaeo). -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Thu Feb 11 13:35:29 2016 From: brett at python.org (Brett Cannon) Date: Thu, 11 Feb 2016 18:35:29 +0000 Subject: [Speed] Do we want to stop vendoring source of third-party libraries with the benchmarks? Message-ID: Maybe we should just have a requirements.txt file for Python 2 and another for Python 3 that are pegged to specific versions? We could even install things into a venv for isolation. If we go this route then we could make the benchmark suite a package on PyPI and have people install the benchmark suite and then have instructions to run pip on the requirements files that we embed in the package. This also gets us around any potential licensing issues with embedding third-party libraries. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Thu Feb 11 13:36:33 2016 From: brett at python.org (Brett Cannon) Date: Thu, 11 Feb 2016 18:36:33 +0000 Subject: [Speed] Should we change what benchmarks we have? Message-ID: Are we happy with the current benchmarks? Are there some we want to drop? How about add? Do we want to have explanations as to why each benchmark is included? A better balance of micro vs. macro benchmarks (and probably matching groups)? -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Thu Feb 11 17:27:35 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 11 Feb 2016 23:27:35 +0100 Subject: [Speed] Any changes we want to make to perf.py? In-Reply-To: References: Message-ID: I don't think that using a fixed number of iterations is good to get stable benchmark results. I opened the following issue to discussed that: https://bugs.python.org/issue26275 I proposed to calibrate the number of runs and the number of loops using time. I'm not convinced myself yet that it's a good idea. For "runs" and "loops", I'm talking about something like that: times = [] for run in range(runs): dt = time.perf_counter() for loop in loops: func() # or python instructions timed.append(dt - time.perf_counter()) Victor 2016-02-11 19:31 GMT+01:00 Brett Cannon : > Some people have brought up the idea of tweaking how perf.py drives the > benchmarks. I personally wonder if we should go from a elapsed time > measurement to # of executions in a set amount of time measurement to get a > more stable number that's easier to measure and will make sense even as > Python and computers get faster (I got this idea from Mozilla's Dromaeo > benchmark suite: https://wiki.mozilla.org/Dromaeo). > > _______________________________________________ > Speed mailing list > Speed at python.org > https://mail.python.org/mailman/listinfo/speed > From victor.stinner at gmail.com Thu Feb 11 17:37:34 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 11 Feb 2016 23:37:34 +0100 Subject: [Speed] Should we change what benchmarks we have? In-Reply-To: References: Message-ID: 2016-02-11 19:36 GMT+01:00 Brett Cannon : > Are we happy with the current benchmarks? bm_regex8 looks unstable, but I don't know if it's an issue of the benchmark itself or perf.py (see the other thread "[Speed] Any changes we want to make to perf.py?"). I spent a lot of time (probably too much!) last months trying to micro-optimize some parts of Python, specially operations on Python int. See for example this long issue: https://bugs.python.org/issue21955 At the end, the discussed patched only makes two benchmarks faster: nbody & spectral_norm. I'm disappointed because I don't know if it's worth to take these micro-optimizations only to run two *benchmarks* faster. Are they representative of "regular" Python code and "real-world applications"? Or are they typical maths benchmark? For maths, we all know that pure Python sucks and that maybe better options are available: PyPy, numba, Cython, etc. For example, PyPy is around 10x faster, whereas discussed micro-optimizations are 1.18x faster in the best case (in one very specific micro-benchmark). > Are there some we want to drop? > How about add? Do we want to have explanations as to why each benchmark is > included? A better balance of micro vs. macro benchmarks (and probably > matching groups)? For some kinds of optimizations, I consider that a micro-benchmark is enough. I don't have strict rules. Basically, it's when you know that the change cannot introduce slow-down in other cases, but will only benefit on one specific case. So the best is to write a tiny benchmark just for this case. Victor From victor.stinner at gmail.com Thu Feb 11 17:39:17 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 11 Feb 2016 23:39:17 +0100 Subject: [Speed] Tool to run Python microbenchmarks Message-ID: Hi, To run "micro"-benchmarks on "micro"-optimizations, I started to use timeit, but in my experience timeit it far from reliable. When I say micro: I'm talking about a test which takes less than 1000 ns, sometimes even a few nanoseconds! You always have to run the same micro-benchmark when timeit *at least* 5 times to find the "real" "minimum" runtime. That's why I wrote my own tool to run microbenchmarks: https://bitbucket.org/haypo/misc/src/tip/python/benchmark.py Yury suggested me to add this tool to the Python benchmark project. I'm ok with that, but only if we rename it to "microbench.py" :-) I wrote this tool to compare micro-optimizations with a long list of very simple tests. The result is written into a file. Then you can compare two files and compare more files, and maybe even compare multiple files to a "reference". It "hides" difference smaller than 5% to ignore the noise. The main feature is benchmark.py is that it calibrates the benchmark using time to choose the number of runs and number of loops. I proposed a similar idea for perf.py: https://bugs.python.org/issue26275 What do you think? Would this tool be useful? Victor From victor.stinner at gmail.com Thu Feb 11 17:54:20 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 11 Feb 2016 23:54:20 +0100 Subject: [Speed] Linux tip: use isolcpus to have (more) reliable benchmark Message-ID: Hi, I'm sharing with you my notes (tricks) to get more reliable benchmarks on Linux if your CPU have multiple cores: https://haypo-notes.readthedocs.org/microbenchmark.html#reliable-micro-benchmarks FYI perf.py recently got a new --affinity= optional parameter. I plan to send a patch to automatically use /sys/devices/system/cpu/isolated if it's not empty. What are your "tricks" to get reliable benchmarks? Victor From kmod at dropbox.com Thu Feb 11 17:36:44 2016 From: kmod at dropbox.com (Kevin Modzelewski) Date: Thu, 11 Feb 2016 14:36:44 -0800 Subject: [Speed] Should we change what benchmarks we have? In-Reply-To: References: Message-ID: We on the Pyston team have created some new benchmarks which I can recommend using; I wouldn't call them "macrobenchmarks" since they don't test entire applications, but we've found them to be better than the existing benchmarks, which tend to be quite microbenchmarky. For example, our django-templating benchmark actually exercises the django templating system, as opposed to bm_django.py which just tests unicode concatenation. You can find them here https://github.com/dropbox/pyston-perf/tree/master/benchmarking/benchmark_suite The current ones we look at are django_template3_10x, sqlalchemy_imperative2_10x, and pyxl_bench_10x. On Thu, Feb 11, 2016 at 10:36 AM, Brett Cannon wrote: > Are we happy with the current benchmarks? Are there some we want to drop? > How about add? Do we want to have explanations as to why each benchmark is > included? A better balance of micro vs. macro benchmarks (and probably > matching groups)? > > _______________________________________________ > Speed mailing list > Speed at python.org > https://mail.python.org/mailman/listinfo/speed > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Thu Feb 11 17:50:05 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 11 Feb 2016 17:50:05 -0500 Subject: [Speed] Should we change what benchmarks we have? In-Reply-To: References: Message-ID: <56BD101D.1090604@gmail.com> On 2016-02-11 5:37 PM, Victor Stinner wrote: > 2016-02-11 19:36 GMT+01:00 Brett Cannon : >> Are we happy with the current benchmarks? > bm_regex8 looks unstable, but I don't know if it's an issue of the > benchmark itself or perf.py (see the other thread "[Speed] Any changes > we want to make to perf.py?"). It's super unstable. As well as telco -- I don't trust those benchmarks. > > I spent a lot of time (probably too much!) last months trying to > micro-optimize some parts of Python, specially operations on Python > int. See for example this long issue: > https://bugs.python.org/issue21955 > > At the end, the discussed patched only makes two benchmarks faster: > nbody & spectral_norm. > > I'm disappointed because I don't know if it's worth to take these > micro-optimizations only to run two *benchmarks* faster. Are they > representative of "regular" Python code and "real-world applications"? > Or are they typical maths benchmark? > > For maths, we all know that pure Python sucks and that maybe better > options are available: PyPy, numba, Cython, etc. For example, PyPy is > around 10x faster, whereas discussed micro-optimizations are 1.18x > faster in the best case (in one very specific micro-benchmark). 18% is a pretty serious improvement. I consider issue 21955 as an attempt to fix a performance regression in Python 3. int+int operations in Py2 have a fast path in Python2, so they should have it in Python 3. Right now, spectral_norm is 50% faster on python 2 (when compared to 3.5). With patches from: - #26288 (fast PyLong_AsDouble, committed), - #26289 (faster floor division for longs, committed), - #24165 (free list for longs, will be committed) and - #21955 (fast path for longs in ceval, not committed) we can make 3.6 as fast as 2.7 for numeric code. Yes, spectral_norm is micro-benchmark, but still, there is a lot of python code out there that does some calculation in pure Python not involving numpy or pypy. I think it's important to fix py3 for that kind of code. That said, I'd like to find a better alternative to spectral-norm, something real, that stresses ints/floats and not using numpy. We also need a numpy benchmark, to make sure that we don't make numpy code slower by optimizing CPython. Yury From victor.stinner at gmail.com Thu Feb 11 18:00:41 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 12 Feb 2016 00:00:41 +0100 Subject: [Speed] Should we change what benchmarks we have? In-Reply-To: <56BD101D.1090604@gmail.com> References: <56BD101D.1090604@gmail.com> Message-ID: 2016-02-11 23:50 GMT+01:00 Yury Selivanov : > That said, I'd like to find a better alternative to spectral-norm, something > real, that stresses ints/floats and not using numpy. Case Van Horsen mentioned mpmath test suite: https://bugs.python.org/issue21955#msg259859 I extracted the slowest test and put it in a loop to the issue #21955 patches: on this patch, it's "only" around 2% faster with the patches. I understand that the test uses "large" integers (not fitting into a single PyLongObject digit). https://bugs.python.org/issue21955#msg259999 I don't know if it's a good benchmark for our "generic" benchmark :-p Victor From solipsis at pitrou.net Thu Feb 11 18:06:47 2016 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 12 Feb 2016 00:06:47 +0100 Subject: [Speed] Should we change what benchmarks we have? References: <56BD101D.1090604@gmail.com> Message-ID: <20160212000647.77745b24@fsol> On Thu, 11 Feb 2016 17:50:05 -0500 Yury Selivanov wrote: > > Right now, spectral_norm is 50% faster on python 2 (when compared to 3.5). spectral_norm is really a horrid benchmark. > Yes, spectral_norm is micro-benchmark, but still, there is a lot of > python code out there that does some calculation in pure Python not > involving numpy or pypy. Can you clarify "a lot"? Regards Antoine. From yselivanov.ml at gmail.com Thu Feb 11 18:16:23 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 11 Feb 2016 18:16:23 -0500 Subject: [Speed] Should we change what benchmarks we have? In-Reply-To: <20160212000647.77745b24@fsol> References: <56BD101D.1090604@gmail.com> <20160212000647.77745b24@fsol> Message-ID: <56BD1647.5000009@gmail.com> On 2016-02-11 6:06 PM, Antoine Pitrou wrote: > On Thu, 11 Feb 2016 17:50:05 -0500 > Yury Selivanov > wrote: >> Right now, spectral_norm is 50% faster on python 2 (when compared to 3.5). > spectral_norm is really a horrid benchmark. > >> Yes, spectral_norm is micro-benchmark, but still, there is a lot of >> python code out there that does some calculation in pure Python not >> involving numpy or pypy. > Can you clarify "a lot"? Any code that occasionally uses "int [op] int" code. That code becomes faster (especially if it's small ints). In tight loops significantly faster (that's what spectral_norm is doing). Look at the pillow package, for instance [1] -- just one of the first packages I thought of -- something non-scientific that happens to do some calculations here and there. Unless 21955 makes numpy code slower, I'm not sure why we're discussing this. Yury [1] https://github.com/python-pillow/Pillow From victor.stinner at gmail.com Thu Feb 11 18:24:19 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 12 Feb 2016 00:24:19 +0100 Subject: [Speed] Should we change what benchmarks we have? In-Reply-To: <56BD1647.5000009@gmail.com> References: <56BD101D.1090604@gmail.com> <20160212000647.77745b24@fsol> <56BD1647.5000009@gmail.com> Message-ID: 2016-02-12 0:16 GMT+01:00 Yury Selivanov : > Unless 21955 makes numpy code slower, I'm not sure why we're discussing > this. Stefan Krah wrote that it makes the decimal module 6% slower: https://bugs.python.org/issue21955#msg259571 Again in another message, "big slowdown for _decimal": https://bugs.python.org/issue21955#msg259793 Victor From solipsis at pitrou.net Thu Feb 11 18:26:58 2016 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 12 Feb 2016 00:26:58 +0100 Subject: [Speed] Should we change what benchmarks we have? References: <56BD101D.1090604@gmail.com> <20160212000647.77745b24@fsol> <56BD1647.5000009@gmail.com> Message-ID: <20160212002658.218a06ef@fsol> On Thu, 11 Feb 2016 18:16:23 -0500 Yury Selivanov wrote: > > > >> Yes, spectral_norm is micro-benchmark, but still, there is a lot of > >> python code out there that does some calculation in pure Python not > >> involving numpy or pypy. > > Can you clarify "a lot"? > > Any code that occasionally uses "int [op] int" code. That code becomes > faster (especially if it's small ints). In tight loops significantly > faster (that's what spectral_norm is doing). I agree for int addition, subtraction, perhaps multiplication. General math on small integers is not worth really improving, though, IMO. (and I don't think spectral_norm is representative of anything) > Look at the pillow package, for instance [1] -- just one of the first > packages I thought of -- something non-scientific that happens to do > some calculations here and there. Uh ? I would be extremely surprised if pillow processed images in pure Python. Regards Antoine. From yselivanov.ml at gmail.com Thu Feb 11 18:37:56 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 11 Feb 2016 18:37:56 -0500 Subject: [Speed] Should we change what benchmarks we have? In-Reply-To: <20160212002658.218a06ef@fsol> References: <56BD101D.1090604@gmail.com> <20160212000647.77745b24@fsol> <56BD1647.5000009@gmail.com> <20160212002658.218a06ef@fsol> Message-ID: <56BD1B54.7070709@gmail.com> On 2016-02-11 6:26 PM, Antoine Pitrou wrote: > On Thu, 11 Feb 2016 18:16:23 -0500 > Yury Selivanov > wrote: >>>> Yes, spectral_norm is micro-benchmark, but still, there is a lot of >>>> python code out there that does some calculation in pure Python not >>>> involving numpy or pypy. >>> Can you clarify "a lot"? >> Any code that occasionally uses "int [op] int" code. That code becomes >> faster (especially if it's small ints). In tight loops significantly >> faster (that's what spectral_norm is doing). > I agree for int addition, subtraction, perhaps multiplication. General > math on small integers is not worth really improving, though, IMO. Look, 21955 optimizes the following ops (fastint6.patch): 1. +, +=, -, -=, *, *= -- the ones that py2 has a fast path for 2. //, ,//=, %, %-, >>, >>=, <<, <<= -- these ones are usually used only on ints, so nothing should be affected negatively 3. /, /= -- these ones are used on floats, ints, decimals, etc If we decide to optimize group (1), I don't see why we can't apply the same macro to group (2). And then it's just group (3, true division) that we might or might not optimize. So to me, the real question is: should we optimize "long [op] long" at all? + and - are very common operations. If fastint6 manages to make numpy code (not microbenchmarks, but some real algorithms) even 3-5% slower - then let's just close 21955 as "won't fix". The problem is that we don't have any good decimal or numpy benchmark. telco is so unstable, that I take it less seriously than spectral_norm. Thanks, Yury From yselivanov.ml at gmail.com Thu Feb 11 18:38:58 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 11 Feb 2016 18:38:58 -0500 Subject: [Speed] Should we change what benchmarks we have? In-Reply-To: References: <56BD101D.1090604@gmail.com> <20160212000647.77745b24@fsol> <56BD1647.5000009@gmail.com> Message-ID: <56BD1B92.10801@gmail.com> On 2016-02-11 6:24 PM, Victor Stinner wrote: > 2016-02-12 0:16 GMT+01:00 Yury Selivanov : >> Unless 21955 makes numpy code slower, I'm not sure why we're discussing >> this. > Stefan Krah wrote that it makes the decimal module 6% slower: > https://bugs.python.org/issue21955#msg259571 > > Again in another message, "big slowdown for _decimal": > https://bugs.python.org/issue21955#msg259793 > > Victor Yes, we need a good benchmark for decimals or numpy. Both use operator overloading extensively. Then I guess we can talk if there is an actual slowdown. Yury From ncoghlan at gmail.com Fri Feb 12 00:03:36 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 12 Feb 2016 15:03:36 +1000 Subject: [Speed] Do we want to stop vendoring source of third-party libraries with the benchmarks? In-Reply-To: References: Message-ID: On 12 February 2016 at 04:35, Brett Cannon wrote: > Maybe we should just have a requirements.txt file for Python 2 and another > for Python 3 that are pegged to specific versions? We could even install > things into a venv for isolation. If we go this route then we could make the > benchmark suite a package on PyPI and have people install the benchmark > suite and then have instructions to run pip on the requirements files that > we embed in the package. This also gets us around any potential licensing > issues with embedding third-party libraries. +1, especially if you use peep to update the requirements list with the sdist hashes Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Feb 12 00:17:08 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 12 Feb 2016 15:17:08 +1000 Subject: [Speed] Should we change what benchmarks we have? In-Reply-To: <56BD101D.1090604@gmail.com> References: <56BD101D.1090604@gmail.com> Message-ID: On 12 February 2016 at 08:50, Yury Selivanov wrote: > On 2016-02-11 5:37 PM, Victor Stinner wrote: >> 2016-02-11 19:36 GMT+01:00 Brett Cannon : >>> Are we happy with the current benchmarks? >> >> bm_regex8 looks unstable, but I don't know if it's an issue of the >> benchmark itself or perf.py (see the other thread "[Speed] Any changes >> we want to make to perf.py?"). > > It's super unstable. As well as telco -- I don't trust those benchmarks. telco covers a fairly important use case in the form of "Do things that billing applications need to do". Spending a few months running and re-running that to help optimise the original Python implementation of decimal was one of my first contributions to CPython (including figuring out the "int("".join(map(str, digits)))" hack that proved to be the fastest way in CPython to convert a tuple of digits into a Python integer, much to the annoyance of the PyPy folks trying to accelerate that code later). It's probably best to consider telco as a microbenchmark of decimal module performance rather than as a general macrobenchmark, though - that's why the integration of cdecimal improved it so dramatically. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Feb 12 00:21:26 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 12 Feb 2016 15:21:26 +1000 Subject: [Speed] Should we change what benchmarks we have? In-Reply-To: References: <56BD101D.1090604@gmail.com> Message-ID: On 12 February 2016 at 15:17, Nick Coghlan wrote: > It's probably best to consider telco as a microbenchmark of decimal > module performance rather than as a general macrobenchmark, though - > that's why the integration of cdecimal improved it so dramatically. Ah, I had misread the rest of the thread - if telco in its current form isn't useful as a decimal microbenchmark, then yes, updating it to improve its stability is more important than preserving it as is. Its original use case was to optimise the decimal implementation itself by figuring out where the hotspots were and optimising those, rather than as a general benchmark for other changes to the interpreter implementation. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From alecsandru.patrascu at intel.com Fri Feb 12 02:42:36 2016 From: alecsandru.patrascu at intel.com (Patrascu, Alecsandru) Date: Fri, 12 Feb 2016 07:42:36 +0000 Subject: [Speed] Linux tip: use isolcpus to have (more) reliable benchmark In-Reply-To: References: Message-ID: <3CF256F4F774BD48A1691D131AA043191424F26B@IRSMSX102.ger.corp.intel.com> Hi, Some of the things we do here at Intel, in our Languages Performance Lab [1,2], is to disable ASLR as you get more reliable results. This can be achieved on Linux by running echo 0 > /proc/sys/kernel/randomize_va_space. Also, setting the CPU frequency at a fixed frequency, disabling Turbo Boost and Hyper Threading, also helps for benchmark stability. >From my experience, the isolcpus feature is useful when you have a lot of cores on your machine because the kernel will have other cores on which it can schedule its work; furthermore, it is a best effort situation and it is not an absolute guarantee that the kernel will not use the cores specified if you have a lot of processes running (for example, if you benchmark on a machine with 2 physical cores and you isolate one of the cores, there is a big chance that the kernel will schedule processes on this core also, even it is for a small amount of time). Nevertheless, for machines with more physical cores, it can be good to have dedicated core(s) on which we do benchmarking. [1] http://languagesperformance.intel.com/ [2] https://lists.01.org/pipermail/langperf/ Thank you, Alecsandru > -----Original Message----- > From: Speed [mailto:speed- > bounces+alecsandru.patrascu=intel.com at python.org] On Behalf Of Victor > Stinner > Sent: Friday, February 12, 2016 12:54 AM > To: speed at python.org > Subject: [Speed] Linux tip: use isolcpus to have (more) reliable benchmark > > Hi, > > I'm sharing with you my notes (tricks) to get more reliable benchmarks on > Linux if your CPU have multiple cores: > > https://haypo-notes.readthedocs.org/microbenchmark.html#reliable-micro- > benchmarks > > FYI perf.py recently got a new --affinity= optional parameter. I plan to > send a patch to automatically use /sys/devices/system/cpu/isolated if it's > not empty. > > What are your "tricks" to get reliable benchmarks? > > Victor > _______________________________________________ > Speed mailing list > Speed at python.org > https://mail.python.org/mailman/listinfo/speed From donald at stufft.io Fri Feb 12 05:42:52 2016 From: donald at stufft.io (Donald Stufft) Date: Fri, 12 Feb 2016 05:42:52 -0500 Subject: [Speed] Do we want to stop vendoring source of third-party libraries with the benchmarks? In-Reply-To: References: Message-ID: <3B87245A-236A-41C6-86A9-A6FF0D8C7911@stufft.io> > On Feb 12, 2016, at 12:03 AM, Nick Coghlan wrote: > > On 12 February 2016 at 04:35, Brett Cannon wrote: >> Maybe we should just have a requirements.txt file for Python 2 and another >> for Python 3 that are pegged to specific versions? We could even install >> things into a venv for isolation. If we go this route then we could make the >> benchmark suite a package on PyPI and have people install the benchmark >> suite and then have instructions to run pip on the requirements files that >> we embed in the package. This also gets us around any potential licensing >> issues with embedding third-party libraries. > > +1, especially if you use peep to update the requirements list with > the sdist hashes > pip 8 has peep functionality built in. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 842 bytes Desc: Message signed with OpenPGP using GPGMail URL: From edd at theunixzoo.co.uk Fri Feb 12 06:18:53 2016 From: edd at theunixzoo.co.uk (Edd Barrett) Date: Fri, 12 Feb 2016 11:18:53 +0000 Subject: [Speed] Experiences with Microbenchmarking Message-ID: <20160212111853.GA4914@wilfred.dlink.com> Hi, A colleague has just pointed me to the discussions on this list regarding benchmarking methodology. Over the past few months we have been devising an "as rigorous as possible" micro-benchmarking experiment. It seems there's a lot of crossover in our work and your discussions. In short, our experiment is investigating the warmup behaviours of JITted VMs (currently PyPy, HotSpot, Graal, LuaJIT, HHVM, JRubyTruffle and V8) using microbenchmarks. For each microbenchmark/VM pairing we sequentially run a number of processes (currently 10), and within each process we run 2000 iterations of the microbenchmark. We then plot the results and make observations. The experiments were run under our own "paranoid" benchmark runner (Krun), which aims to control as many confounding variables as are practically possible. Amongst others, it checks that all benchmarks are run with the system at a similar starting temperature, disables ASLR, uses a monotonic system clock (in some cases we had to patch VMs) and it reboots the system before each benchmark. We did not isolate CPUs, since we found that this creates artificial contention on multi-threaded VMs, however, we did use (and Krun checks for) a tickless Linux kernel. We expected to see typical warmup behaviours (with distinct phases for profiling, compilation, and peak performance), but in reality we saw all kinds of crazy behaviours and even slowdowns. We've published a draft paper showing our preliminary findings here: http://arxiv.org/abs/1602.00602 The draft shows a subset of our results. Run-sequence plots for all process executions can be found here: https://archive.org/download/softdev_warmup_experiment_artefacts/v0.1/all_graphs.pdf For the final version of the paper we are trying to devise statistical methods to automatically classify the strange warmup behaviours we encountered. We will also run CPython in our final experiment, which may interest you guys :) If this interests anyone, I'd be happy to discuss further. Cheers -- Best Regards Edd Barrett http://www.theunixzoo.co.uk From solipsis at pitrou.net Fri Feb 12 07:26:06 2016 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 12 Feb 2016 13:26:06 +0100 Subject: [Speed] Do we want to stop vendoring source of third-party libraries with the benchmarks? References: Message-ID: <20160212132606.61ada70c@fsol> On Thu, 11 Feb 2016 18:35:29 +0000 Brett Cannon wrote: > Maybe we should just have a requirements.txt file for Python 2 and another > for Python 3 that are pegged to specific versions? We could even install > things into a venv for isolation. How does this impact interaction with the benchmarks suite? E.g. does it increase the time of running a couple of benchmarks? Does it make it easier or harder to benchmark a work-in-progress patch for whatever interpreter? > If we go this route then we could make > the benchmark suite a package on PyPI and have people install the benchmark > suite and then have instructions to run pip on the requirements files that > we embed in the package. I'm not fond of encouraging random users to run the benchmarks suite without understanding what they're doing, and starting throwing around pointless numbers and misconceptions about performance (which are then very hard to fight since people tend to be irrationally captivated by "performance numbers"). The benchmarks suite is mostly a tool for developers of Python implementations, not the greater public. Having the benchmarks suite only available through hg or git kind of discourages those tendencies. Regards Antoine. From solipsis at pitrou.net Fri Feb 12 07:31:07 2016 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 12 Feb 2016 13:31:07 +0100 Subject: [Speed] Should we change what benchmarks we have? References: Message-ID: <20160212133107.0eebd97b@fsol> On Thu, 11 Feb 2016 18:36:33 +0000 Brett Cannon wrote: > Are we happy with the current benchmarks? Are there some we want to drop? > How about add? Do we want to have explanations as to why each benchmark is > included? There are no real explanations except the provenance of said benchmarks: - the benchmarks suite was originally developed for Unladen Swallow - some benchmarks were taken and adapted from the "Great Computer Language Shootout" (which I think is a poor source of benchmarks) - some benchmarks have been added for specific concerns that may not be of enough interest in general (for example micro-benchmarks of methods calls, or benchmarks of json / pickle performance) > A better balance of micro vs. macro benchmarks (and probably > matching groups)? Easier said than done :-) Macro-benchmarks are harder to write, especially with the constraints that 1) runtimes should be short enough for convenient use 2) performance numbers should be stable enough accross runs. Regards Antoine. From solipsis at pitrou.net Fri Feb 12 07:57:00 2016 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 12 Feb 2016 13:57:00 +0100 Subject: [Speed] Should we change what benchmarks we have? References: <20160212133107.0eebd97b@fsol> Message-ID: <20160212135700.105dd73e@fsol> On Fri, 12 Feb 2016 13:31:07 +0100 Antoine Pitrou wrote: > On Thu, 11 Feb 2016 18:36:33 +0000 > Brett Cannon wrote: > > Are we happy with the current benchmarks? Are there some we want to drop? > > How about add? Do we want to have explanations as to why each benchmark is > > included? > > There are no real explanations except the provenance of said benchmarks: > - the benchmarks suite was originally developed for Unladen Swallow > - some benchmarks were taken and adapted from the "Great Computer > Language Shootout" (which I think is a poor source of benchmarks) > - some benchmarks have been added for specific concerns that may not be > of enough interest in general (for example micro-benchmarks of > methods calls, or benchmarks of json / pickle performance) That said, yes, ideally the presence or usefulness of each benchmark should be explained somewhere ("what is this trying to measure?"). Regards Antoine. From fijall at gmail.com Fri Feb 12 09:48:01 2016 From: fijall at gmail.com (Maciej Fijalkowski) Date: Fri, 12 Feb 2016 15:48:01 +0100 Subject: [Speed] Should we change what benchmarks we have? In-Reply-To: <20160212133107.0eebd97b@fsol> References: <20160212133107.0eebd97b@fsol> Message-ID: I presume you looked at the pypy benchmark suite, which contains a large collection of library-based benchmarks. You can endlessly argue whether it's "macro enough", but it does cover some usages of various libraries as submitted/written with help from lib authors (sympy, twisted, various templating engines, sqlalchemy ORM, etc.) as well as interesting python programs that are CPU intensive found on the interwebs. On Fri, Feb 12, 2016 at 1:31 PM, Antoine Pitrou wrote: > On Thu, 11 Feb 2016 18:36:33 +0000 > Brett Cannon wrote: >> Are we happy with the current benchmarks? Are there some we want to drop? >> How about add? Do we want to have explanations as to why each benchmark is >> included? > > There are no real explanations except the provenance of said benchmarks: > - the benchmarks suite was originally developed for Unladen Swallow > - some benchmarks were taken and adapted from the "Great Computer > Language Shootout" (which I think is a poor source of benchmarks) > - some benchmarks have been added for specific concerns that may not be > of enough interest in general (for example micro-benchmarks of > methods calls, or benchmarks of json / pickle performance) > >> A better balance of micro vs. macro benchmarks (and probably >> matching groups)? > > Easier said than done :-) Macro-benchmarks are harder to write, > especially with the constraints that 1) runtimes should be short enough > for convenient use 2) performance numbers should be stable enough > accross runs. > > Regards > > Antoine. > > > _______________________________________________ > Speed mailing list > Speed at python.org > https://mail.python.org/mailman/listinfo/speed From arigo at tunes.org Fri Feb 12 10:10:30 2016 From: arigo at tunes.org (Armin Rigo) Date: Fri, 12 Feb 2016 16:10:30 +0100 Subject: [Speed] Experiences with Microbenchmarking In-Reply-To: <20160212111853.GA4914@wilfred.dlink.com> References: <20160212111853.GA4914@wilfred.dlink.com> Message-ID: Hi Edd, On Fri, Feb 12, 2016 at 12:18 PM, Edd Barrett wrote: > JITted VMs (currently PyPy, HotSpot, Graal, LuaJIT, HHVM, JRubyTruffle > and V8) using microbenchmarks. For each microbenchmark/VM pairing we > sequentially run a number of processes (currently 10), and within each > process we run 2000 iterations of the microbenchmark. We then plot the > results and make observations. PyPy typically needs more than 2000 iterations to be warmed up. A bient?t, Armin. From fijall at gmail.com Fri Feb 12 10:42:08 2016 From: fijall at gmail.com (Maciej Fijalkowski) Date: Fri, 12 Feb 2016 16:42:08 +0100 Subject: [Speed] Tool to run Python microbenchmarks In-Reply-To: References: Message-ID: Hi Victor timeit does two really terrible things - uses min(time) and disables the garbage collector, which makes it completely unreliable. On Thu, Feb 11, 2016 at 11:39 PM, Victor Stinner wrote: > Hi, > > To run "micro"-benchmarks on "micro"-optimizations, I started to use > timeit, but in my experience timeit it far from reliable. > > When I say micro: I'm talking about a test which takes less than 1000 > ns, sometimes even a few nanoseconds! > > You always have to run the same micro-benchmark when timeit *at least* > 5 times to find the "real" "minimum" runtime. > > That's why I wrote my own tool to run microbenchmarks: > https://bitbucket.org/haypo/misc/src/tip/python/benchmark.py > > Yury suggested me to add this tool to the Python benchmark project. > I'm ok with that, but only if we rename it to "microbench.py" :-) I > wrote this tool to compare micro-optimizations with a long list of > very simple tests. The result is written into a file. Then you can > compare two files and compare more files, and maybe even compare > multiple files to a "reference". It "hides" difference smaller than 5% > to ignore the noise. > > The main feature is benchmark.py is that it calibrates the benchmark > using time to choose the number of runs and number of loops. I > proposed a similar idea for perf.py: > https://bugs.python.org/issue26275 > > What do you think? Would this tool be useful? > > Victor > _______________________________________________ > Speed mailing list > Speed at python.org > https://mail.python.org/mailman/listinfo/speed From solipsis at pitrou.net Fri Feb 12 10:48:56 2016 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 12 Feb 2016 16:48:56 +0100 Subject: [Speed] Should we change what benchmarks we have? References: <20160212133107.0eebd97b@fsol> Message-ID: <20160212164856.0d493dd7@fsol> On Fri, 12 Feb 2016 15:48:01 +0100 Maciej Fijalkowski wrote: > I presume you looked at the pypy benchmark suite, which contains a > large collection of library-based benchmarks. Not in a long time, I admit... Regards Antoine. From victor.stinner at gmail.com Fri Feb 12 10:58:32 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 12 Feb 2016 16:58:32 +0100 Subject: [Speed] Tool to run Python microbenchmarks In-Reply-To: References: Message-ID: Hi, 2016-02-12 16:42 GMT+01:00 Maciej Fijalkowski : > timeit does two really terrible things - uses min(time) and disables > the garbage collector, which makes it completely unreliable. Can you please elaborate why using min(times) is a bad idea? I'm also using min() in my tool, I expect that it helps to ignore the sporadic peeks when the system is unstable. Victor From fijall at gmail.com Fri Feb 12 11:02:39 2016 From: fijall at gmail.com (Maciej Fijalkowski) Date: Fri, 12 Feb 2016 17:02:39 +0100 Subject: [Speed] Tool to run Python microbenchmarks In-Reply-To: References: Message-ID: On Fri, Feb 12, 2016 at 4:58 PM, Victor Stinner wrote: > Hi, > > 2016-02-12 16:42 GMT+01:00 Maciej Fijalkowski : >> timeit does two really terrible things - uses min(time) and disables >> the garbage collector, which makes it completely unreliable. > > Can you please elaborate why using min(times) is a bad idea? > > I'm also using min() in my tool, I expect that it helps to ignore the > sporadic peeks when the system is unstable. > > Victor Yes, it also helps to ignore systematic peaks that will happen randomly (due to cache alignment, memory ordering, dicts etc.). Some operations are really random that you should not ignore. E.g. if you have: l.append('a') in a loop, you gonna ignore all the places that resize loop. I'll look for a reference From paul at paulgraydon.co.uk Fri Feb 12 11:00:23 2016 From: paul at paulgraydon.co.uk (Paul) Date: Fri, 12 Feb 2016 08:00:23 -0800 Subject: [Speed] Experiences with Microbenchmarking Message-ID: On 12 Feb 2016 07:10, Armin Rigo wrote: > > Hi Edd, > > On Fri, Feb 12, 2016 at 12:18 PM, Edd Barrett wrote: > > JITted VMs (currently PyPy, HotSpot, Graal, LuaJIT, HHVM, JRubyTruffle > > and V8) using microbenchmarks. For each microbenchmark/VM pairing we > > sequentially run a number of processes (currently 10), and within each > > process we run 2000 iterations of the microbenchmark. We then plot the > > results and make observations. > > PyPy typically needs more than 2000 iterations to be warmed up. > Same goes for the JVM. Off the top of my head it doesn't even start marking a method as hot until around 10,000 iterations (at which point it'll start to do the first stage of optimisations). If you're below that threshold you're dealing with pure interpreter performance. Paul. From arigo at tunes.org Fri Feb 12 13:00:00 2016 From: arigo at tunes.org (Armin Rigo) Date: Fri, 12 Feb 2016 19:00:00 +0100 Subject: [Speed] Experiences with Microbenchmarking In-Reply-To: References: Message-ID: Hi Paul, On Fri, Feb 12, 2016 at 5:00 PM, Paul wrote: >> PyPy typically needs more than 2000 iterations to be warmed up. > > Same goes for the JVM. Off the top of my head it doesn't even start marking a method as hot until around 10,000 iterations (at which point it'll start to do the first stage of optimisations). If you're below that threshold you're dealing with pure interpreter performance. Ew, it's even longer than PyPy :-) In the PyPy case, the number 2000 is particularly bad, because the JIT starts after 1039 iterations. It also adds a few extra paths afterwards, starting maybe around ~400-500 extra iterations (as a mean value). Each time, the JIT produces more machine code and there is a relatively important pause. So 2000 is close to the worst case: even running 2000 purely-interpreted iterations would be faster. A bient?t, Armin. From fijall at gmail.com Fri Feb 12 13:38:13 2016 From: fijall at gmail.com (Maciej Fijalkowski) Date: Fri, 12 Feb 2016 19:38:13 +0100 Subject: [Speed] Experiences with Microbenchmarking In-Reply-To: References: Message-ID: On Fri, Feb 12, 2016 at 7:00 PM, Armin Rigo wrote: > Hi Paul, > > On Fri, Feb 12, 2016 at 5:00 PM, Paul wrote: >>> PyPy typically needs more than 2000 iterations to be warmed up. >> >> Same goes for the JVM. Off the top of my head it doesn't even start marking a method as hot until around 10,000 iterations (at which point it'll start to do the first stage of optimisations). If you're below that threshold you're dealing with pure interpreter performance. > > Ew, it's even longer than PyPy :-) > > In the PyPy case, the number 2000 is particularly bad, because the JIT > starts after 1039 iterations. It also adds a few extra paths > afterwards, starting maybe around ~400-500 extra iterations (as a mean > value). Each time, the JIT produces more machine code > and there is a relatively important pause. So 2000 is close to the > worst case: even running 2000 purely-interpreted iterations would be > faster. > > > A bient?t, > > Armin. > _______________________________________________ > Speed mailing list > Speed at python.org > https://mail.python.org/mailman/listinfo/speed Armin, those are "2000 iterations of a benchmark" and not "2000 iterations of a loop". A lot of those are pypy benchmarks, just run longer From edd at theunixzoo.co.uk Fri Feb 12 13:44:19 2016 From: edd at theunixzoo.co.uk (Edd Barrett) Date: Fri, 12 Feb 2016 18:44:19 +0000 Subject: [Speed] Experiences with Microbenchmarking In-Reply-To: References: Message-ID: <20160212183913.GA28808@wilfred.dlink.com> On Fri, Feb 12, 2016 at 08:00:23AM -0800, Paul wrote: > > PyPy typically needs more than 2000 iterations to be warmed up. > > > > Same goes for the JVM. Off the top of my head it doesn't even start marking a method as hot until around 10,000 iterations (at which point it'll start to do the first stage of optimisations). If you're below that threshold you're dealing with pure interpreter performance. To be clear, what I called an "iteration" is one in-process run of an entire benchmark. Each benchmark will invoke tons of methods and execute tons of user loops. 2000 in-process iterations should be plenty enough to warm up the VMs. Most benchmarking experiments take only around 30 post-warmup in-process iterations (enough to compute a confidence interval). The well-behaved benchmark/vm pairs in our experiment warmup in less than ten in-process iterations. Cheers -- Best Regards Edd Barrett http://www.theunixzoo.co.uk From arigo at tunes.org Fri Feb 12 14:06:07 2016 From: arigo at tunes.org (Armin Rigo) Date: Fri, 12 Feb 2016 20:06:07 +0100 Subject: [Speed] Experiences with Microbenchmarking In-Reply-To: <20160212183913.GA28808@wilfred.dlink.com> References: <20160212183913.GA28808@wilfred.dlink.com> Message-ID: Hi Edd, On Fri, Feb 12, 2016 at 7:44 PM, Edd Barrett wrote: > To be clear, what I called an "iteration" is one in-process run of an > entire benchmark. Oops, sorry. The subject of this thread is "Experiences with Microbenchmarking". I naturally assumed that a microbenchmark is doing one simple thing not in a loop, in which case "iterations" is simply repeating that simple thing. If you have in mind benchmarks that are not as micro as that, then I stand corrected. A bient?t, Armin. From brett at python.org Fri Feb 12 20:08:23 2016 From: brett at python.org (Brett Cannon) Date: Sat, 13 Feb 2016 01:08:23 +0000 Subject: [Speed] Do we want to stop vendoring source of third-party libraries with the benchmarks? In-Reply-To: <20160212132606.61ada70c@fsol> References: <20160212132606.61ada70c@fsol> Message-ID: On Fri, Feb 12, 2016, 04:26 Antoine Pitrou wrote: > On Thu, 11 Feb 2016 18:35:29 +0000 > Brett Cannon wrote: > > Maybe we should just have a requirements.txt file for Python 2 and > another > > for Python 3 that are pegged to specific versions? We could even install > > things into a venv for isolation. > > How does this impact interaction with the benchmarks suite? > Upon installation you would need to run `pip install -r requirements 3.txt` to get the dependencies. E.g. does it increase the time of running a couple of benchmarks? No Does > it make it easier or harder to benchmark a work-in-progress patch for > whatever interpreter? > I think only on Windows because of the lack of symlink support. > > If we go this route then we could make > > the benchmark suite a package on PyPI and have people install the > benchmark > > suite and then have instructions to run pip on the requirements files > that > > we embed in the package. > > I'm not fond of encouraging random users to run the benchmarks suite > without understanding what they're doing, and starting throwing around > pointless numbers and misconceptions about performance (which are then > very hard to fight since people tend to be irrationally captivated by > "performance numbers"). The benchmarks suite is mostly a tool for > developers of Python implementations, not the greater public. > > Having the benchmarks suite only available through hg or git kind of > discourages those tendencies. > That's fine, but then I would still want requirements files so we stop vendoring. Brett > Regards > > Antoine. > > > _______________________________________________ > Speed mailing list > Speed at python.org > https://mail.python.org/mailman/listinfo/speed > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fijall at gmail.com Sun Feb 14 07:20:14 2016 From: fijall at gmail.com (Maciej Fijalkowski) Date: Sun, 14 Feb 2016 13:20:14 +0100 Subject: [Speed] Linux tip: use isolcpus to have (more) reliable benchmark In-Reply-To: <3CF256F4F774BD48A1691D131AA043191424F26B@IRSMSX102.ger.corp.intel.com> References: <3CF256F4F774BD48A1691D131AA043191424F26B@IRSMSX102.ger.corp.intel.com> Message-ID: Hi. Disabling ASLR means you get more repeatable benchmarks, of course, but also means that on another identical machine (or a bit different circumstances), you can get different results, hence you moved statistical error to a more systematic one. I don't think that's a win On Fri, Feb 12, 2016 at 8:42 AM, Patrascu, Alecsandru wrote: > Hi, > > Some of the things we do here at Intel, in our Languages Performance Lab [1,2], is to disable ASLR as you get more reliable results. This can be achieved on Linux by running echo 0 > /proc/sys/kernel/randomize_va_space. Also, setting the CPU frequency at a fixed frequency, disabling Turbo Boost and Hyper Threading, also helps for benchmark stability. > > From my experience, the isolcpus feature is useful when you have a lot of cores on your machine because the kernel will have other cores on which it can schedule its work; furthermore, it is a best effort situation and it is not an absolute guarantee that the kernel will not use the cores specified if you have a lot of processes running (for example, if you benchmark on a machine with 2 physical cores and you isolate one of the cores, there is a big chance that the kernel will schedule processes on this core also, even it is for a small amount of time). Nevertheless, for machines with more physical cores, it can be good to have dedicated core(s) on which we do benchmarking. > > [1] http://languagesperformance.intel.com/ > [2] https://lists.01.org/pipermail/langperf/ > > Thank you, > Alecsandru > >> -----Original Message----- >> From: Speed [mailto:speed- >> bounces+alecsandru.patrascu=intel.com at python.org] On Behalf Of Victor >> Stinner >> Sent: Friday, February 12, 2016 12:54 AM >> To: speed at python.org >> Subject: [Speed] Linux tip: use isolcpus to have (more) reliable benchmark >> >> Hi, >> >> I'm sharing with you my notes (tricks) to get more reliable benchmarks on >> Linux if your CPU have multiple cores: >> >> https://haypo-notes.readthedocs.org/microbenchmark.html#reliable-micro- >> benchmarks >> >> FYI perf.py recently got a new --affinity= optional parameter. I plan to >> send a patch to automatically use /sys/devices/system/cpu/isolated if it's >> not empty. >> >> What are your "tricks" to get reliable benchmarks? >> >> Victor >> _______________________________________________ >> Speed mailing list >> Speed at python.org >> https://mail.python.org/mailman/listinfo/speed > _______________________________________________ > Speed mailing list > Speed at python.org > https://mail.python.org/mailman/listinfo/speed From alecsandru.patrascu at intel.com Sun Feb 14 11:37:07 2016 From: alecsandru.patrascu at intel.com (Patrascu, Alecsandru) Date: Sun, 14 Feb 2016 16:37:07 +0000 Subject: [Speed] Linux tip: use isolcpus to have (more) reliable benchmark In-Reply-To: References: <3CF256F4F774BD48A1691D131AA043191424F26B@IRSMSX102.ger.corp.intel.com> Message-ID: <3CF256F4F774BD48A1691D131AA043191424F99D@IRSMSX102.ger.corp.intel.com> Hello, The existence of variance across machines is true even with ASLR on. My point was regarding doing repetitive measurements on the same machine. Nevertheless, even if small variations may appear in certain circumstances, you can minimize them if you use identical machines, with identical software, settings, etc. And the most important, the deltas remain comparable. For the dedicated Python CodeSpeed machine that does daily measurements, among others, this can be a good setting for a bit more reliable results. Thank you, Alecsandru > -----Original Message----- > From: Maciej Fijalkowski [mailto:fijall at gmail.com] > Sent: Sunday, February 14, 2016 2:20 PM > To: Patrascu, Alecsandru > Cc: Victor Stinner ; speed at python.org > Subject: Re: [Speed] Linux tip: use isolcpus to have (more) reliable > benchmark > > Hi. > > Disabling ASLR means you get more repeatable benchmarks, of course, but > also means that on another identical machine (or a bit different > circumstances), you can get different results, hence you moved statistical > error to a more systematic one. I don't think that's a win > > On Fri, Feb 12, 2016 at 8:42 AM, Patrascu, Alecsandru > wrote: > > Hi, > > > > Some of the things we do here at Intel, in our Languages Performance Lab > [1,2], is to disable ASLR as you get more reliable results. This can be > achieved on Linux by running echo 0 > /proc/sys/kernel/randomize_va_space. > Also, setting the CPU frequency at a fixed frequency, disabling Turbo > Boost and Hyper Threading, also helps for benchmark stability. > > > > From my experience, the isolcpus feature is useful when you have a lot > of cores on your machine because the kernel will have other cores on which > it can schedule its work; furthermore, it is a best effort situation and > it is not an absolute guarantee that the kernel will not use the cores > specified if you have a lot of processes running (for example, if you > benchmark on a machine with 2 physical cores and you isolate one of the > cores, there is a big chance that the kernel will schedule processes on > this core also, even it is for a small amount of time). Nevertheless, for > machines with more physical cores, it can be good to have dedicated > core(s) on which we do benchmarking. > > > > [1] http://languagesperformance.intel.com/ > > [2] https://lists.01.org/pipermail/langperf/ > > > > Thank you, > > Alecsandru > > > >> -----Original Message----- > >> From: Speed [mailto:speed- > >> bounces+alecsandru.patrascu=intel.com at python.org] On Behalf Of Victor > >> Stinner > >> Sent: Friday, February 12, 2016 12:54 AM > >> To: speed at python.org > >> Subject: [Speed] Linux tip: use isolcpus to have (more) reliable > >> benchmark > >> > >> Hi, > >> > >> I'm sharing with you my notes (tricks) to get more reliable > >> benchmarks on Linux if your CPU have multiple cores: > >> > >> https://haypo-notes.readthedocs.org/microbenchmark.html#reliable-micr > >> o- > >> benchmarks > >> > >> FYI perf.py recently got a new --affinity= optional parameter. I plan > >> to send a patch to automatically use /sys/devices/system/cpu/isolated > >> if it's not empty. > >> > >> What are your "tricks" to get reliable benchmarks? > >> > >> Victor > >> _______________________________________________ > >> Speed mailing list > >> Speed at python.org > >> https://mail.python.org/mailman/listinfo/speed > > _______________________________________________ > > Speed mailing list > > Speed at python.org > > https://mail.python.org/mailman/listinfo/speed From brett at python.org Sun Feb 14 11:57:33 2016 From: brett at python.org (Brett Cannon) Date: Sun, 14 Feb 2016 16:57:33 +0000 Subject: [Speed] Should we change what benchmarks we have? In-Reply-To: <20160212164856.0d493dd7@fsol> References: <20160212133107.0eebd97b@fsol> <20160212164856.0d493dd7@fsol> Message-ID: On Fri, Feb 12, 2016, 07:49 Antoine Pitrou wrote: > On Fri, 12 Feb 2016 15:48:01 +0100 > Maciej Fijalkowski > wrote: > > I presume you looked at the pypy benchmark suite, which contains a > > large collection of library-based benchmarks. > > Not in a long time, I admit... > So it sounds like: * we should drop regex_v8, telco, and spectral_norm *Having an explanation as to what a benchmark is meant to exercise wouldn't go amiss * Pyston and PyPy have potential benchmarks to steal (although they need to work with at least Python 3.5 to be considered) Anyone want the satisfaction of deprecating those benchmarks? How about writing a README file for what each of the benchmarks are for (which will become the README for the future GitHub repo)? And do we want the Pyston and PyPy folks to nominate benchmarks they think we really should add (with a wild hope of finally having a single suite that everyone at least starts from), or should some cpython devs look s at what PyPy and Pyston have and raid their benchmarks? Brett > Regards > > Antoine. > > > _______________________________________________ > Speed mailing list > Speed at python.org > https://mail.python.org/mailman/listinfo/speed > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Mon Feb 22 18:10:33 2016 From: brett at python.org (Brett Cannon) Date: Mon, 22 Feb 2016 23:10:33 +0000 Subject: [Speed] Should we change what benchmarks we have? In-Reply-To: References: <20160212133107.0eebd97b@fsol> <20160212164856.0d493dd7@fsol> Message-ID: On Sun, 14 Feb 2016 at 08:57 Brett Cannon wrote: > > > On Fri, Feb 12, 2016, 07:49 Antoine Pitrou wrote: > >> On Fri, 12 Feb 2016 15:48:01 +0100 >> Maciej Fijalkowski >> wrote: >> > I presume you looked at the pypy benchmark suite, which contains a >> > large collection of library-based benchmarks. >> >> Not in a long time, I admit... >> > > So it sounds like: > > * we should drop regex_v8, telco, and spectral_norm > Created http://bugs.python.org/issue26416 to track this. > *Having an explanation as to what a benchmark is meant to exercise > wouldn't go amiss > This can wait until we migrate to GitHub. > * Pyston and PyPy have potential benchmarks to steal (although they need > to work with at least Python 3.5 to be considered) > No one stepped forward for this on either the PyPy/Pyston or CPython side. -Brett > > Anyone want the satisfaction of deprecating those benchmarks? How about > writing a README file for what each of the benchmarks are for (which will > become the README for the future GitHub repo)? And do we want the Pyston > and PyPy folks to nominate benchmarks they think we really should add (with > a wild hope of finally having a single suite that everyone at least starts > from), or should some cpython devs look s at what PyPy and Pyston have and > raid their benchmarks? > > Brett > > > >> Regards >> >> Antoine. >> >> >> _______________________________________________ >> Speed mailing list >> Speed at python.org >> https://mail.python.org/mailman/listinfo/speed >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: