From stefano at rivera.za.net Sat Nov 1 02:07:21 2014 From: stefano at rivera.za.net (Stefano Rivera) Date: Sat, 1 Nov 2014 03:07:21 +0200 Subject: [pypy-dev] cppyy on Ubuntu 14.04 In-Reply-To: <20141005231511.17800.1068247082.divmod.xquotient.1198@top> References: <20141005231511.17800.1068247082.divmod.xquotient.1198@top> Message-ID: <20141101010720.GK3623@bach.rivera.co.za> Hi exarkun (2014.10.06_01:15:11_+0200) > When I try to use cppyy with the Ubuntu-packaged PyPy (on Ubuntu > 14.04), I get this: > >>>> import cppyy > Traceback (most recent call last): > File "", line 1, in > ImportError: missing reflection library libcppyy_backend.so Thanks, I should probably build that extension in the Debian/Ubuntu PyPy package. I filed http://bugs.debian.org/767546 to remind me. SR -- Stefano Rivera http://tumbleweed.org.za/ +1 415 683 3272 From matti.picus at gmail.com Sat Nov 1 16:42:51 2014 From: matti.picus at gmail.com (Matti Picus) Date: Sat, 01 Nov 2014 17:42:51 +0200 Subject: [pypy-dev] Silly Question re PIP In-Reply-To: References: Message-ID: <5454FF7B.4050600@gmail.com> What platform, version of pypy? Matti On 31/10/14 17:47, Gary Furash wrote: > I didn't understand the documentation, but is it the case that you > cannot use PIP with pypy? When I run pypy get_pip.py, it fails. > // > > > -- gary furash | furashgary at gmail.com , > 520-907-2470 | calendar > > > > > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > https://mail.python.org/mailman/listinfo/pypy-dev From arigo at tunes.org Sat Nov 1 17:55:19 2014 From: arigo at tunes.org (Armin Rigo) Date: Sat, 1 Nov 2014 17:55:19 +0100 Subject: [pypy-dev] Silly Question re PIP In-Reply-To: References: Message-ID: Hi Gary, On 31 October 2014 16:47, Gary Furash wrote: > I didn't understand the documentation, but is it the case that you cannot > use PIP with pypy? When I run pypy get_pip.py, it fails. Works for me (PyPy 2.4.0 on Linux 64-bit). You need to be more specific. A bient?t, Armin. From tbaldridge at gmail.com Wed Nov 5 01:30:08 2014 From: tbaldridge at gmail.com (Timothy Baldridge) Date: Tue, 4 Nov 2014 17:30:08 -0700 Subject: [pypy-dev] lltype.malloc with opaque structs Message-ID: There seem to be many ways to define rffi structs, rffi.Struct CStruct, rffi_platform.Struct, lltype.Struct, what I need is this: There's a struct defined in a c header I'm including via rffi. I want to malloc it, but I don't want to define what the contents are, it's just a blob that this library requires me to malloc and free. So I really don't even know the size. What should I be using? Timothy -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Wed Nov 5 01:45:04 2014 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Tue, 4 Nov 2014 18:45:04 -0600 Subject: [pypy-dev] lltype.malloc with opaque structs In-Reply-To: References: Message-ID: I think last time I checked, I used rffi.COpaquePtr("MyType") to refer to a pointer to MyType. On Tue, Nov 4, 2014 at 6:30 PM, Timothy Baldridge wrote: > There seem to be many ways to define rffi structs, rffi.Struct CStruct, > rffi_platform.Struct, lltype.Struct, what I need is this: > > There's a struct defined in a c header I'm including via rffi. I want to > malloc it, but I don't want to define what the contents are, it's just a > blob that this library requires me to malloc and free. So I really don't > even know the size. What should I be using? > > Timothy > > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > https://mail.python.org/mailman/listinfo/pypy-dev > > -- Ryan If anybody ever asks me why I prefer C++ to C, my answer will be simple: "It's becauseslejfp23(@#Q*(E*EIdc-SEGFAULT. Wait, I don't think that was nul-terminated." Personal reality distortion fields are immune to contradictory evidence. - srean Check out my website: http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Wed Nov 5 01:57:03 2014 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Tue, 4 Nov 2014 18:57:03 -0600 Subject: [pypy-dev] Segfault in Hy for PyPy 2.4 Message-ID: I just built the PyPy alpha yesterday. I can run the tests using nose for the Hy project under PyPy 2.5 alpha and PyPy 2.3. However, using the prebuilt PyPy 2.4 binaries fails with a segfault. Why does this happen? -- Ryan If anybody ever asks me why I prefer C++ to C, my answer will be simple: "It's becauseslejfp23(@#Q*(E*EIdc-SEGFAULT. Wait, I don't think that was nul-terminated." Personal reality distortion fields are immune to contradictory evidence. - srean Check out my website: http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tbaldridge at gmail.com Wed Nov 5 01:57:47 2014 From: tbaldridge at gmail.com (Timothy Baldridge) Date: Tue, 4 Nov 2014 17:57:47 -0700 Subject: [pypy-dev] lltype.malloc with opaque structs In-Reply-To: References:

Message-ID: Exactly what I was looking for, thanks! On Tue, Nov 4, 2014 at 5:45 PM, Ryan Gonzalez wrote: > I think last time I checked, I used rffi.COpaquePtr("MyType") to refer to > a pointer to MyType. > > On Tue, Nov 4, 2014 at 6:30 PM, Timothy Baldridge > wrote: > >> There seem to be many ways to define rffi structs, rffi.Struct CStruct, >> rffi_platform.Struct, lltype.Struct, what I need is this: >> >> There's a struct defined in a c header I'm including via rffi. I want to >> malloc it, but I don't want to define what the contents are, it's just a >> blob that this library requires me to malloc and free. So I really don't >> even know the size. What should I be using? >> >> Timothy >> >> _______________________________________________ >> pypy-dev mailing list >> pypy-dev at python.org >> https://mail.python.org/mailman/listinfo/pypy-dev >> >> > > > -- > Ryan > If anybody ever asks me why I prefer C++ to C, my answer will be simple: > "It's becauseslejfp23(@#Q*(E*EIdc-SEGFAULT. Wait, I don't think that was > nul-terminated." > Personal reality distortion fields are immune to contradictory evidence. - > srean > Check out my website: http://kirbyfan64.github.io/ > -- ?One of the main causes of the fall of the Roman Empire was that?lacking zero?they had no way to indicate successful termination of their C programs.? (Robert Firth) -------------- next part -------------- An HTML attachment was scrubbed... URL: From amauryfa at gmail.com Wed Nov 5 09:27:36 2014 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Wed, 5 Nov 2014 09:27:36 +0100 Subject: [pypy-dev] Segfault in Hy for PyPy 2.4 In-Reply-To: References: Message-ID: 2014-11-05 1:57 GMT+01:00 Ryan Gonzalez : > I just built the PyPy alpha yesterday. I can run the tests using nose for > the Hy project under PyPy 2.5 alpha > and PyPy 2.3. However, using the prebuilt PyPy 2.4 binaries fails with a > segfault. Why does this happen? > Probably a bug. Please file an issue here: https://bitbucket.org/pypy/pypy/issues Also, try to come with a smaller test, it will be much faster for us to fix it. -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Wed Nov 5 19:07:58 2014 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Wed, 5 Nov 2014 12:07:58 -0600 Subject: [pypy-dev] Segfault in Hy for PyPy 2.4 In-Reply-To: References: Message-ID: That's the weird issue: it's already fixed! A version compiled from tip works, but the prebuilt 2.4 binaries are the ones that crash. On Wed, Nov 5, 2014 at 2:27 AM, Amaury Forgeot d'Arc wrote: > 2014-11-05 1:57 GMT+01:00 Ryan Gonzalez : > >> I just built the PyPy alpha yesterday. I can run the tests using nose for >> the Hy project under PyPy 2.5 alpha >> and PyPy 2.3. However, using the prebuilt PyPy 2.4 binaries fails with a >> segfault. Why does this happen? >> > > Probably a bug. > Please file an issue here: https://bitbucket.org/pypy/pypy/issues > Also, try to come with a smaller test, it will be much faster for us to > fix it. > > > -- > Amaury Forgeot d'Arc > -- Ryan If anybody ever asks me why I prefer C++ to C, my answer will be simple: "It's becauseslejfp23(@#Q*(E*EIdc-SEGFAULT. Wait, I don't think that was nul-terminated." Personal reality distortion fields are immune to contradictory evidence. - srean Check out my website: http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From toni.mattis at student.hpi.uni-potsdam.de Wed Nov 5 19:07:54 2014 From: toni.mattis at student.hpi.uni-potsdam.de (Toni Mattis) Date: Wed, 5 Nov 2014 19:07:54 +0100 Subject: [pypy-dev] Optimize constant float division by multiplication? Message-ID: <545A677A.3060204@student.hpi.uni-potsdam.de> Hello, I discovered that PyPy's JIT generates "DIVSD" instructions on xmm registers when dividing a float by a constant C. This consumes an order of magnitude more CPU cycles than the corresponding "MULSD" instruction with a precomputed 1/C. I know that only powers of two have an exact reciprocal floating point representation, but there might be a benefit in trading the least significant digit for a more significant speedup. So, is this a missed optimization (at least for reasonably accurate cases), a present or possibly future option (like -ffast-math in gcc) or are there more reasons against it? Thanks, Toni --- PS: Small Example --- This function takes on average 0.41 seconds to compute on an array.array('d') with 10**8 elements between 0 and 1: def spikes_div(data, threshold=1.99): count = 0 for i in data: if i / 0.5 > threshold: count += 1 return count Rewritten with a multiplication it takes about 0.29 seconds on average, speeding it up by factor 1.4: ... if i * 2.0 > threshold: ... The traces contain the same instructions (except for the MULSD/DIVSD) and run the same number of times. I'm working with a fresh translation of the current PyPy default on Ubuntu 14.04 x64 with a 2nd generation Core i7 CPU. From alex.gaynor at gmail.com Wed Nov 5 23:02:01 2014 From: alex.gaynor at gmail.com (Alex Gaynor) Date: Wed, 05 Nov 2014 22:02:01 +0000 Subject: [pypy-dev] Optimize constant float division by multiplication? References: <545A677A.3060204@student.hpi.uni-potsdam.de> Message-ID: Hey Toni, If this optimization is valid for any float, we should definitely do it, and this is a missed optimization. If it's not valid for all floats, I'm not sure how we should handle it, if at all. Alex On Wed Nov 05 2014 at 10:16:36 AM Toni Mattis < toni.mattis at student.hpi.uni-potsdam.de> wrote: > Hello, > > I discovered that PyPy's JIT generates "DIVSD" instructions on xmm > registers when dividing a float by a constant C. This consumes an order > of magnitude more CPU cycles than the corresponding "MULSD" instruction > with a precomputed 1/C. > > I know that only powers of two have an exact reciprocal floating point > representation, but there might be a benefit in trading the least > significant digit for a more significant speedup. > > So, is this a missed optimization (at least for reasonably accurate > cases), a present or possibly future option (like -ffast-math in gcc) or > are there more reasons against it? > > > Thanks, > > Toni > > > --- PS: Small Example --- > > This function takes on average 0.41 seconds to compute on an > array.array('d') with 10**8 elements between 0 and 1: > > def spikes_div(data, threshold=1.99): > count = 0 > for i in data: > if i / 0.5 > threshold: > count += 1 > return count > > Rewritten with a multiplication it takes about 0.29 seconds on average, > speeding it up by factor 1.4: > > ... > if i * 2.0 > threshold: > ... > > > The traces contain the same instructions (except for the MULSD/DIVSD) > and run the same number of times. I'm working with a fresh translation > of the current PyPy default on Ubuntu 14.04 x64 with a 2nd generation > Core i7 CPU. > > > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > https://mail.python.org/mailman/listinfo/pypy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Nov 6 02:48:43 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 6 Nov 2014 12:48:43 +1100 Subject: [pypy-dev] Optimize constant float division by multiplication? In-Reply-To: References: <545A677A.3060204@student.hpi.uni-potsdam.de> Message-ID: <20141106014843.GB2002@ando.pearwood.info> On Wed, Nov 05, 2014 at 10:02:01PM +0000, Alex Gaynor wrote: > Hey Toni, > > If this optimization is valid for any float, we should definitely do it, > and this is a missed optimization. If it's not valid for all floats, I'm > not sure how we should handle it, if at all. I don't believe that it is valid for floats apart from exact powers of two. Toni says: > > only powers of two have an exact reciprocal floating point > > representation, but there might be a benefit in trading the least > > significant digit for a more significant speedup. Please don't make that decision for the user. If I want to trade off accuracy for speed, I can write: r = 1/x y*r but if I write y/x, I expect y/x to the full accuracy available. Thanks, Steve > > Alex > > On Wed Nov 05 2014 at 10:16:36 AM Toni Mattis < > toni.mattis at student.hpi.uni-potsdam.de> wrote: > > > Hello, > > > > I discovered that PyPy's JIT generates "DIVSD" instructions on xmm > > registers when dividing a float by a constant C. This consumes an order > > of magnitude more CPU cycles than the corresponding "MULSD" instruction > > with a precomputed 1/C. > > > > I know that only powers of two have an exact reciprocal floating point > > representation, but there might be a benefit in trading the least > > significant digit for a more significant speedup. > > > > So, is this a missed optimization (at least for reasonably accurate > > cases), a present or possibly future option (like -ffast-math in gcc) or > > are there more reasons against it? > > > > > > Thanks, > > > > Toni > > > > > > --- PS: Small Example --- > > > > This function takes on average 0.41 seconds to compute on an > > array.array('d') with 10**8 elements between 0 and 1: > > > > def spikes_div(data, threshold=1.99): > > count = 0 > > for i in data: > > if i / 0.5 > threshold: > > count += 1 > > return count > > > > Rewritten with a multiplication it takes about 0.29 seconds on average, > > speeding it up by factor 1.4: > > > > ... > > if i * 2.0 > threshold: > > ... > > > > > > The traces contain the same instructions (except for the MULSD/DIVSD) > > and run the same number of times. I'm working with a fresh translation > > of the current PyPy default on Ubuntu 14.04 x64 with a 2nd generation > > Core i7 CPU. > > > > > > _______________________________________________ > > pypy-dev mailing list > > pypy-dev at python.org > > https://mail.python.org/mailman/listinfo/pypy-dev > > > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > https://mail.python.org/mailman/listinfo/pypy-dev From fijall at gmail.com Thu Nov 6 07:12:45 2014 From: fijall at gmail.com (Maciej Fijalkowski) Date: Thu, 6 Nov 2014 08:12:45 +0200 Subject: [pypy-dev] Optimize constant float division by multiplication? In-Reply-To: <20141106014843.GB2002@ando.pearwood.info> References: <545A677A.3060204@student.hpi.uni-potsdam.de> <20141106014843.GB2002@ando.pearwood.info> Message-ID: one bad thing about making such decisions (assuming we can loose a bit of precision, which I'm not convinced about) would mean that you would get different results when the code is jitted vs when the code is not jitted. I think this is not acceptable. On Thu, Nov 6, 2014 at 3:48 AM, Steven D'Aprano wrote: > On Wed, Nov 05, 2014 at 10:02:01PM +0000, Alex Gaynor wrote: >> Hey Toni, >> >> If this optimization is valid for any float, we should definitely do it, >> and this is a missed optimization. If it's not valid for all floats, I'm >> not sure how we should handle it, if at all. > > I don't believe that it is valid for floats apart from exact powers of > two. Toni says: > >> > only powers of two have an exact reciprocal floating point >> > representation, but there might be a benefit in trading the least >> > significant digit for a more significant speedup. > > Please don't make that decision for the user. If I want to trade off > accuracy for speed, I can write: > > r = 1/x > y*r > > but if I write y/x, I expect y/x to the full accuracy available. > > > Thanks, > > > Steve > > > > >> >> Alex >> >> On Wed Nov 05 2014 at 10:16:36 AM Toni Mattis < >> toni.mattis at student.hpi.uni-potsdam.de> wrote: >> >> > Hello, >> > >> > I discovered that PyPy's JIT generates "DIVSD" instructions on xmm >> > registers when dividing a float by a constant C. This consumes an order >> > of magnitude more CPU cycles than the corresponding "MULSD" instruction >> > with a precomputed 1/C. >> > >> > I know that only powers of two have an exact reciprocal floating point >> > representation, but there might be a benefit in trading the least >> > significant digit for a more significant speedup. >> > >> > So, is this a missed optimization (at least for reasonably accurate >> > cases), a present or possibly future option (like -ffast-math in gcc) or >> > are there more reasons against it? >> > >> > >> > Thanks, >> > >> > Toni >> > >> > >> > --- PS: Small Example --- >> > >> > This function takes on average 0.41 seconds to compute on an >> > array.array('d') with 10**8 elements between 0 and 1: >> > >> > def spikes_div(data, threshold=1.99): >> > count = 0 >> > for i in data: >> > if i / 0.5 > threshold: >> > count += 1 >> > return count >> > >> > Rewritten with a multiplication it takes about 0.29 seconds on average, >> > speeding it up by factor 1.4: >> > >> > ... >> > if i * 2.0 > threshold: >> > ... >> > >> > >> > The traces contain the same instructions (except for the MULSD/DIVSD) >> > and run the same number of times. I'm working with a fresh translation >> > of the current PyPy default on Ubuntu 14.04 x64 with a 2nd generation >> > Core i7 CPU. >> > >> > >> > _______________________________________________ >> > pypy-dev mailing list >> > pypy-dev at python.org >> > https://mail.python.org/mailman/listinfo/pypy-dev >> > > >> _______________________________________________ >> pypy-dev mailing list >> pypy-dev at python.org >> https://mail.python.org/mailman/listinfo/pypy-dev > > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > https://mail.python.org/mailman/listinfo/pypy-dev From arigo at tunes.org Thu Nov 6 10:00:04 2014 From: arigo at tunes.org (Armin Rigo) Date: Thu, 6 Nov 2014 10:00:04 +0100 Subject: [pypy-dev] Optimize constant float division by multiplication? In-Reply-To: References: <545A677A.3060204@student.hpi.uni-potsdam.de> Message-ID: Hi, On 5 November 2014 23:02, Alex Gaynor wrote: > If this optimization is valid for any float, we should definitely do it, and > this is a missed optimization. If it's not valid for all floats, I'm not > sure how we should handle it, if at all. gcc seems to perform this optimization for divide-by-constant where the constant is exactly a finite power of two that is not a denormal. These are the cases where the result is exactly the same. We could do it too. A bient?t, Armin. From arigo at tunes.org Thu Nov 6 10:07:45 2014 From: arigo at tunes.org (Armin Rigo) Date: Thu, 6 Nov 2014 10:07:45 +0100 Subject: [pypy-dev] Optimize constant float division by multiplication? In-Reply-To: References: <545A677A.3060204@student.hpi.uni-potsdam.de>

Message-ID: Hi Toni, On 6 November 2014 10:00, Armin Rigo wrote: > gcc seems to perform this optimization for divide-by-constant where > the constant is exactly a finite power of two that is not a denormal. > These are the cases where the result is exactly the same. We could do > it too. In short, what is needed is: - first check that the optimization you want to do is exact; trying it out on "gcc -O2 -S" without any "-ffast-math" flags is a good way to know. - if it is, then it's a matter of writing some simple code in rpython/jit/metainterp/optimizeopt/rewrite.py. Search for "float_mul" here; it will turn for example "f0 * -1.0" into a "float_neg" operation, with the comment that it is an exact optimization. - don't forget, start by adding a test to test/test_optimizebasic.py (search for "float_mul(-1.0, f0)" and add it nearby). You might find out that hacking the PyPy JIT at this level is rather easy :-) A bient?t, Armin. From toni.mattis at student.hpi.uni-potsdam.de Thu Nov 6 18:29:21 2014 From: toni.mattis at student.hpi.uni-potsdam.de (Toni Mattis) Date: Thu, 6 Nov 2014 18:29:21 +0100 Subject: [pypy-dev] Optimize constant float division by multiplication? In-Reply-To: References: <545A677A.3060204@student.hpi.uni-potsdam.de>

Message-ID: <545BAFF1.6090404@student.hpi.uni-potsdam.de> Hi all, thanks for the advice. I tried what Armin proposed and like to share my results with you: https://bitbucket.org/amintos/pypy/commits/937254cbc554adfb748e3b5eeb44bf765d204b9d?at=default Keeping in mind what Steve and Maciej pointed out, I restricted the optimization to floats that are "normal" powers of two. I thought about checking for infinity, but I could not come up with a scenario where 'x * (-)0.0' differs from 'x / (-)inf'. I haven't done exhaustive tests yet, but some of the code where I first discovered the issue runs a little faster now. Comments and possibly missed corner cases are welcome (IEEE-754 can be a minefield sometimes). Thanks, Toni Am 06.11.2014 um 10:07 schrieb Armin Rigo: > Hi Toni, > > On 6 November 2014 10:00, Armin Rigo wrote: >> gcc seems to perform this optimization for divide-by-constant where >> the constant is exactly a finite power of two that is not a denormal. >> These are the cases where the result is exactly the same. We could do >> it too. > > In short, what is needed is: > > - first check that the optimization you want to do is exact; trying it > out on "gcc -O2 -S" without any "-ffast-math" flags is a good way to > know. > > - if it is, then it's a matter of writing some simple code in > rpython/jit/metainterp/optimizeopt/rewrite.py. Search for "float_mul" > here; it will turn for example "f0 * -1.0" into a "float_neg" > operation, with the comment that it is an exact optimization. > > - don't forget, start by adding a test to test/test_optimizebasic.py > (search for "float_mul(-1.0, f0)" and add it nearby). > > You might find out that hacking the PyPy JIT at this level is rather easy :-) > > > A bient?t, > > Armin. > From arigo at tunes.org Thu Nov 6 19:08:22 2014 From: arigo at tunes.org (Armin Rigo) Date: Thu, 6 Nov 2014 19:08:22 +0100 Subject: [pypy-dev] Optimize constant float division by multiplication? In-Reply-To: <545BAFF1.6090404@student.hpi.uni-potsdam.de> References: <545A677A.3060204@student.hpi.uni-potsdam.de>

<545BAFF1.6090404@student.hpi.uni-potsdam.de> Message-ID: Hi Toni, On 6 November 2014 18:29, Toni Mattis wrote: > thanks for the advice. I tried what Armin proposed and like to share my > results with you: > > https://bitbucket.org/amintos/pypy/commits/937254cbc554adfb748e3b5eeb44bf765d204b9d?at=default Thanks! Maybe instead of manipulating directly the bits (where you have to be extra careful because on 32-bit platforms, regular RPython integers have only 32 bits), you could use math.frexp(). The condition should be "math.frexp(x)[0] == 0.5" or "-0.5". You can then check for denormals by checking that "math.frexp(1.0 / x)[0]" is also 0.5 or -0.5. I think that "x / inf" is always equal to "x * 0.0" (which can be "0.0", "-0.0", or "nan", so it can't be simplified further), but it looks like a useless optimization imho. A bient?t, Armin. From toni.mattis at student.hpi.uni-potsdam.de Thu Nov 6 23:19:56 2014 From: toni.mattis at student.hpi.uni-potsdam.de (Toni Mattis) Date: Thu, 6 Nov 2014 23:19:56 +0100 Subject: [pypy-dev] Optimize constant float division by multiplication? In-Reply-To: References: <545A677A.3060204@student.hpi.uni-potsdam.de>

<545BAFF1.6090404@student.hpi.uni-potsdam.de> Message-ID: <545BF40C.7090906@student.hpi.uni-potsdam.de> Hi Armin, that sounds more convenient than manipulating floats as architecture dependent integers ;) So this is my implementation now: https://bitbucket.org/amintos/pypy/commits/f30efb9a8e54e56af7e7a0d07ec19d6985c1f4e0?at=float-opt BTW, the jit log (PYPYLOG=jit-log-opt,jit-backend) tends to round small floats like '1.0 / 8.98846567431158e+307' to '0.000000', but '1.0 / 2.2250738585072014e-308' appears as a full 308-digit decimal number. This may cause some confusion when checking where the optimization is effective. Best regards, Toni Am 06.11.2014 um 19:08 schrieb Armin Rigo: > Hi Toni, > > On 6 November 2014 18:29, Toni Mattis > wrote: >> thanks for the advice. I tried what Armin proposed and like to share my >> results with you: >> >> https://bitbucket.org/amintos/pypy/commits/937254cbc554adfb748e3b5eeb44bf765d204b9d?at=default > > Thanks! Maybe instead of manipulating directly the bits (where you > have to be extra careful because on 32-bit platforms, regular RPython > integers have only 32 bits), you could use math.frexp(). The > condition should be "math.frexp(x)[0] == 0.5" or "-0.5". You can then > check for denormals by checking that "math.frexp(1.0 / x)[0]" is also > 0.5 or -0.5. > > I think that "x / inf" is always equal to "x * 0.0" (which can be > "0.0", "-0.0", or "nan", so it can't be simplified further), but it > looks like a useless optimization imho. > > > A bient?t, > > Armin. > From arigo at tunes.org Fri Nov 7 10:44:52 2014 From: arigo at tunes.org (Armin Rigo) Date: Fri, 7 Nov 2014 10:44:52 +0100 Subject: [pypy-dev] Optimize constant float division by multiplication? In-Reply-To: <545BF40C.7090906@student.hpi.uni-potsdam.de> References: <545A677A.3060204@student.hpi.uni-potsdam.de>

<545BAFF1.6090404@student.hpi.uni-potsdam.de> <545BF40C.7090906@student.hpi.uni-potsdam.de> Message-ID: Hi Toni, On 6 November 2014 23:19, Toni Mattis wrote: > that sounds more convenient than manipulating floats as architecture > dependent integers ;) So this is my implementation now: > > https://bitbucket.org/amintos/pypy/commits/f30efb9a8e54e56af7e7a0d07ec19d6985c1f4e0?at=float-opt Thanks! Do you mind if I merge this branch 'float-ops' into the standard repo? Or are you a person who wouldn't like to see the first attempt show up in the history? (Generally, histories are good to have, and PyPy's contains tons of half-way or reverted checkins.) > BTW, the jit log (PYPYLOG=jit-log-opt,jit-backend) tends to round small > floats like '1.0 / 8.98846567431158e+307' to '0.000000', but '1.0 / > 2.2250738585072014e-308' appears as a full 308-digit decimal number. > This may cause some confusion when checking where the optimization is > effective. Ah, yes. Maybe it would be better if the jit log showed floats with full precision. I think it's caused by `str(arg.getfloat())` in metainterp/logger.py. We could use instead `rfloat.double_to_string(x, 'r', 0, 0)[0]`. A bient?t, Armin. From arigo at tunes.org Fri Nov 7 10:48:00 2014 From: arigo at tunes.org (Armin Rigo) Date: Fri, 7 Nov 2014 10:48:00 +0100 Subject: [pypy-dev] Optimize constant float division by multiplication? In-Reply-To: References: <545A677A.3060204@student.hpi.uni-potsdam.de>

<545BAFF1.6090404@student.hpi.uni-potsdam.de> <545BF40C.7090906@student.hpi.uni-potsdam.de> Message-ID: Hi, On 7 November 2014 10:44, Armin Rigo wrote: > `rfloat.double_to_string(x, 'r', 0, 0)[0]`. The last argument should be `rfloat.DTSF_ADD_DOT_0`, otherwise a number like 3.0 will be confusingly represented as "3". A bient?t, Armin. From toni.mattis at student.hpi.uni-potsdam.de Fri Nov 7 13:05:37 2014 From: toni.mattis at student.hpi.uni-potsdam.de (Toni Mattis) Date: Fri, 7 Nov 2014 13:05:37 +0100 Subject: [pypy-dev] Optimize constant float division by multiplication? In-Reply-To: References: <545A677A.3060204@student.hpi.uni-potsdam.de>

<545BAFF1.6090404@student.hpi.uni-potsdam.de> <545BF40C.7090906@student.hpi.uni-potsdam.de> Message-ID: <545CB591.3050401@student.hpi.uni-potsdam.de> Hi Armin, > Do you mind if I merge this branch 'float-ops' into the > standard repo? I'm totally fine with a merge. Best regards, Toni From arigo at tunes.org Sun Nov 9 09:52:48 2014 From: arigo at tunes.org (Armin Rigo) Date: Sun, 9 Nov 2014 09:52:48 +0100 Subject: [pypy-dev] Segfault in Hy for PyPy 2.4 In-Reply-To: References:

Message-ID: Hi Ryan, On 5 November 2014 19:07, Ryan Gonzalez wrote: > That's the weird issue: it's already fixed! A version compiled from tip > works, but the prebuilt 2.4 binaries are the ones that crash. I ran it up to the crash, which occurs somewhere in the compiler. My guess is that 27aa8184f00f fixed it. A bient?t, Armin. From rymg19 at gmail.com Sun Nov 9 15:23:22 2014 From: rymg19 at gmail.com (Ryan) Date: Sun, 09 Nov 2014 08:23:22 -0600 Subject: [pypy-dev] Segfault in Hy for PyPy 2.4 In-Reply-To: References:

Message-ID: Thanks! I'll have to wait for the next release so Travis can use a correct PyPy (right now, I had to hack stuff to download a custom PyPy nightly for running the tests). Armin Rigo wrote: >Hi Ryan, > >On 5 November 2014 19:07, Ryan Gonzalez wrote: >> That's the weird issue: it's already fixed! A version compiled from >tip >> works, but the prebuilt 2.4 binaries are the ones that crash. > >I ran it up to the crash, which occurs somewhere in the compiler. My >guess is that 27aa8184f00f fixed it. > > >A bient?t, > >Armin. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. Check out my website: http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From n210241048576 at gmail.com Sun Nov 9 23:04:46 2014 From: n210241048576 at gmail.com (Robert Grosse) Date: Sun, 9 Nov 2014 14:04:46 -0800 Subject: [pypy-dev] How to profile in Pypy? Message-ID: Is there any easy way to profile code under Pypy? When I try to run using the usual pypy -m cProfile foo.py, I get nonsensical results, like negative time or billions of seconds. -------------- next part -------------- An HTML attachment was scrubbed... URL: From n210241048576 at gmail.com Sun Nov 9 23:41:58 2014 From: n210241048576 at gmail.com (Robert Grosse) Date: Sun, 9 Nov 2014 14:41:58 -0800 Subject: [pypy-dev] Poor performance for Krakatau Message-ID: In some cases, Pypy performs worse than CPython when running Krakatau, and it has gotten a lot worse from 2.3 to 2.5 My benchmark is https://github.com/Storyyeller/Krakatau/tree/pypy_benchmark (for future reference, commit d889c7f44723e6d66a3630681f0385f173317dc9) You can run it with pypy Krakatau\benchmark.py -path I get the following timings CPython: 17.43s Pypy 2.3: 38.48s (build pypy-c-jit-71056-c8e3b8cbc843-win32) Pypy 2.4: 39.92s (build pypy-c-jit-72200-375133966c12-win32) Pypy 2.5: 49.94s (build pypy-c-jit-74404-9fd586fe0fe5-win32) Not only was it slower than CPython to begin with, but it got much worse recently. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tbaldridge at gmail.com Mon Nov 10 05:36:46 2014 From: tbaldridge at gmail.com (Timothy Baldridge) Date: Sun, 9 Nov 2014 21:36:46 -0700 Subject: [pypy-dev] Crashes with rffi Message-ID: So I'm trying to integrate libuv into Pixie, and all was going well, but I'm hitting a bit of a snag when it comes to opening files. I've tried several things, but my latest iteration is this: https://github.com/pixie-lang/pixie/blob/async-io-file/pixie/vm/libs/uv_file.py#L104 When running this in CPython it crashes on the call to uv_fs_read with the CPython error "Fatal Python error: GC object already tracked". From what I can tell, this means something has overwritten Python's memory somehow. The odd thing is, I'm sure I have the signature of the file correct: UV_EXTERN int uv_fs_read(uv_loop_t* loop, uv_fs_t* req, uv_file file, void* buf, size_t length, int64_t offset, uv_fs_cb cb); And this doesn't seem to be a problem from libuv, because this is my stacktrace from the crash: Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 libsystem_kernel.dylib 0x00007fff8e83c866 __pthread_kill + 10 1 libsystem_pthread.dylib 0x00007fff88f2835c pthread_kill + 92 2 libsystem_c.dylib 0x00007fff9019cb1a abort + 125 3 org.python.python 0x0000000100c73ec1 Py_FatalError + 49 4 org.python.python 0x0000000100bfc523 PyFrame_New + 598 5 org.python.python 0x0000000100c55a74 PyEval_EvalCodeEx + 74 6 org.python.python 0x0000000100bfd796 0x100bd6000 + 161686 7 org.python.python 0x0000000100bdff72 PyObject_Call + 101 8 org.python.python 0x0000000100bea9a7 0x100bd6000 + 84391 9 org.python.python 0x0000000100bdff72 PyObject_Call + 101 10 org.python.python 0x0000000100c281b4 0x100bd6000 + 336308 11 org.python.python 0x0000000100c25091 0x100bd6000 + 323729 12 org.python.python 0x0000000100c0f308 PyObject_RichCompare + 129 13 org.python.python 0x0000000100c5694e PyEval_EvalFrameEx + 1937 14 org.python.python 0x0000000100c5c864 0x100bd6000 + 551012 15 org.python.python 0x0000000100c594d4 PyEval_EvalFrameEx + 13079 16 org.python.python 0x0000000100c56093 PyEval_EvalCodeEx + 1641 17 org.python.python 0x0000000100bfd796 0x100bd6000 + 161686 Anyone have any ideas? Thanks, Timothy -------------- next part -------------- An HTML attachment was scrubbed... URL: From fijall at gmail.com Mon Nov 10 06:27:26 2014 From: fijall at gmail.com (Maciej Fijalkowski) Date: Mon, 10 Nov 2014 07:27:26 +0200 Subject: [pypy-dev] Poor performance for Krakatau In-Reply-To: References: Message-ID: Hi Robert. I've been looking at krakatau performance for a while, it's almost exclusively warmup time. We are going to address it, I hope rather sooner than later :-) On Mon, Nov 10, 2014 at 12:41 AM, Robert Grosse wrote: > In some cases, Pypy performs worse than CPython when running Krakatau, and > it has gotten a lot worse from 2.3 to 2.5 > > My benchmark is https://github.com/Storyyeller/Krakatau/tree/pypy_benchmark > (for future reference, commit d889c7f44723e6d66a3630681f0385f173317dc9) > > You can run it with pypy Krakatau\benchmark.py -path > > I get the following timings > > CPython: 17.43s > Pypy 2.3: 38.48s (build pypy-c-jit-71056-c8e3b8cbc843-win32) > Pypy 2.4: 39.92s (build pypy-c-jit-72200-375133966c12-win32) > Pypy 2.5: 49.94s (build pypy-c-jit-74404-9fd586fe0fe5-win32) > > Not only was it slower than CPython to begin with, but it got much worse > recently. > > > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > https://mail.python.org/mailman/listinfo/pypy-dev > From fijall at gmail.com Mon Nov 10 06:49:50 2014 From: fijall at gmail.com (Maciej Fijalkowski) Date: Mon, 10 Nov 2014 07:49:50 +0200 Subject: [pypy-dev] Crashes with rffi In-Reply-To: References: Message-ID: you should really use "with" sentence with scoped buffers, otherwise you're leaking memory. Same goes for lltype.malloc(flavor='raw'), you need to lltype.free it. I don't think that causes crashes though, I would need to take a deeper look into the incantation too. How do I get a crash? Is there a failing test? On Mon, Nov 10, 2014 at 6:36 AM, Timothy Baldridge wrote: > So I'm trying to integrate libuv into Pixie, and all was going well, but I'm > hitting a bit of a snag when it comes to opening files. I've tried several > things, but my latest iteration is this: > > https://github.com/pixie-lang/pixie/blob/async-io-file/pixie/vm/libs/uv_file.py#L104 > > When running this in CPython it crashes on the call to uv_fs_read with the > CPython error "Fatal Python error: GC object already tracked". From what I > can tell, this means something has overwritten Python's memory somehow. The > odd thing is, I'm sure I have the signature of the file correct: > > UV_EXTERN int uv_fs_read(uv_loop_t* loop, uv_fs_t* req, uv_file file, > void* buf, size_t length, int64_t offset, uv_fs_cb cb); > > And this doesn't seem to be a problem from libuv, because this is my > stacktrace from the crash: > > Thread 0 Crashed:: Dispatch queue: com.apple.main-thread > 0 libsystem_kernel.dylib 0x00007fff8e83c866 __pthread_kill + 10 > 1 libsystem_pthread.dylib 0x00007fff88f2835c pthread_kill + 92 > 2 libsystem_c.dylib 0x00007fff9019cb1a abort + 125 > 3 org.python.python 0x0000000100c73ec1 Py_FatalError + 49 > 4 org.python.python 0x0000000100bfc523 PyFrame_New + 598 > 5 org.python.python 0x0000000100c55a74 PyEval_EvalCodeEx + 74 > 6 org.python.python 0x0000000100bfd796 0x100bd6000 + 161686 > 7 org.python.python 0x0000000100bdff72 PyObject_Call + 101 > 8 org.python.python 0x0000000100bea9a7 0x100bd6000 + 84391 > 9 org.python.python 0x0000000100bdff72 PyObject_Call + 101 > 10 org.python.python 0x0000000100c281b4 0x100bd6000 + 336308 > 11 org.python.python 0x0000000100c25091 0x100bd6000 + 323729 > 12 org.python.python 0x0000000100c0f308 PyObject_RichCompare + > 129 > 13 org.python.python 0x0000000100c5694e PyEval_EvalFrameEx + > 1937 > 14 org.python.python 0x0000000100c5c864 0x100bd6000 + 551012 > 15 org.python.python 0x0000000100c594d4 PyEval_EvalFrameEx + > 13079 > 16 org.python.python 0x0000000100c56093 PyEval_EvalCodeEx + > 1641 > 17 org.python.python 0x0000000100bfd796 0x100bd6000 + 161686 > > > Anyone have any ideas? > > Thanks, > > Timothy > > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > https://mail.python.org/mailman/listinfo/pypy-dev > From arigo at tunes.org Mon Nov 10 08:38:48 2014 From: arigo at tunes.org (Armin Rigo) Date: Mon, 10 Nov 2014 08:38:48 +0100 Subject: [pypy-dev] Crashes with rffi In-Reply-To: References: Message-ID: Hi Timothy, >From the docs, the signature of uv_fs_read() is this (http://docs.libuv.org/en/latest/fs.html#c.uv_fs_read): int uv_fs_read(uv_loop_t* loop, uv_fs_t* req, uv_file file, const uv_buf_t bufs[], unsigned int nbufs, int64_t offset, uv_fs_cb cb) This seems to differ from what you're reporting. The "bufs" and "nbufs" are not a pointer and size of a resulting buffer, but instead they are an array of buffers (possibly more than one), and the length of this array (i.e. the number of buffers). Each buffer is described by a uv_buf_t. This is more like preadv(2) than read(2). A bient?t, Armin. From arigo at tunes.org Mon Nov 10 10:23:19 2014 From: arigo at tunes.org (Armin Rigo) Date: Mon, 10 Nov 2014 10:23:19 +0100 Subject: [pypy-dev] How to profile in Pypy? In-Reply-To: References: Message-ID: Hi Robert, On 9 November 2014 23:04, Robert Grosse wrote: > Is there any easy way to profile code under Pypy? When I try to run using > the usual pypy -m cProfile foo.py, I get nonsensical results, like negative > time or billions of seconds. That's a bug. Can you open a bug report detailing on which platform you are, what number of cores you have, and if you are running inside a virtual machine or on real hardware? (There might be a bug report already about Windows.) Armin From tbaldridge at gmail.com Mon Nov 10 13:41:06 2014 From: tbaldridge at gmail.com (Timothy Baldridge) Date: Mon, 10 Nov 2014 05:41:06 -0700 Subject: [pypy-dev] Crashes with rffi In-Reply-To: References:

Message-ID: So I think I've narrowed it down a bit to this: it seems to only happen when I can one of these libuv functions for the second time. The second test in this file throws the exception: https://github.com/pixie-lang/pixie/blob/async-io-file/pixie/vm/libs/test/test_uv_file.py#L20 " https://github.com/pixie-lang/pixie/blob/async-io-file/pixie/vm/libs/test/test_uv_file.py#L20 " All I'm doing here is opening a file (and never closing it, but that shouldn't be a problem). The code in FSOpen is even cleaning up the buffers I allocate this time. A bit of background on the process, FSOpen is a UVFunction. The function execute_uv_func, creates a continuation from the current stacklet and then calls .execute_uv passing it the uv loop and the continuation (k). Once libuv calls the callback fs_cb, the continuation is put into a list of pending stacklets and the loop inside with_stacklets will continue its execution at some time in the future. The only libuv functions called by this code is uv_fs_open, uv_fs_cleanup, and uv_run. The first test runs fine, but the second causes the crash. Timothy On Mon, Nov 10, 2014 at 12:38 AM, Armin Rigo wrote: > Hi Timothy, > > From the docs, the signature of uv_fs_read() is this > (http://docs.libuv.org/en/latest/fs.html#c.uv_fs_read): > > int uv_fs_read(uv_loop_t* loop, uv_fs_t* req, uv_file file, const > uv_buf_t bufs[], unsigned int nbufs, int64_t offset, uv_fs_cb cb) > > This seems to differ from what you're reporting. The "bufs" and > "nbufs" are not a pointer and size of a resulting buffer, but instead > they are an array of buffers (possibly more than one), and the length > of this array (i.e. the number of buffers). Each buffer is described > by a uv_buf_t. This is more like preadv(2) than read(2). > > > A bient?t, > > Armin. > -- ?One of the main causes of the fall of the Roman Empire was that?lacking zero?they had no way to indicate successful termination of their C programs.? (Robert Firth) -------------- next part -------------- An HTML attachment was scrubbed... URL: From arigo at tunes.org Mon Nov 10 15:52:49 2014 From: arigo at tunes.org (Armin Rigo) Date: Mon, 10 Nov 2014 15:52:49 +0100 Subject: [pypy-dev] Crashes with rffi In-Reply-To: References:

Message-ID: Hi Timothy, We're talking past each other. I think that I already found that your code is not correct according to the docs. You need to fix the signature of uv_fs_read() and create an array 'uv_buf_t[]', possibly of length 1. I may be wrong though. A bient?t, Armin. From tbaldridge at gmail.com Mon Nov 10 15:55:49 2014 From: tbaldridge at gmail.com (Timothy Baldridge) Date: Mon, 10 Nov 2014 07:55:49 -0700 Subject: [pypy-dev] Crashes with rffi In-Reply-To: References:

Message-ID: That could be true for uv_fs_read, but in the minimal test case (in my last email) I'm only opening a file 10 times, after the first few iterations of opening the file (2-3 times) that test crashes. Timothy On Mon, Nov 10, 2014 at 7:52 AM, Armin Rigo wrote: > Hi Timothy, > > We're talking past each other. I think that I already found that your > code is not correct according to the docs. You need to fix the > signature of uv_fs_read() and create an array 'uv_buf_t[]', possibly > of length 1. I may be wrong though. > > > A bient?t, > > Armin. > -- ?One of the main causes of the fall of the Roman Empire was that?lacking zero?they had no way to indicate successful termination of their C programs.? (Robert Firth) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tbaldridge at gmail.com Mon Nov 10 21:16:48 2014 From: tbaldridge at gmail.com (Timothy Baldridge) Date: Mon, 10 Nov 2014 13:16:48 -0700 Subject: [pypy-dev] Crashes with rffi In-Reply-To: References:

Message-ID: So I hacked on this more over my lunch break and am still completely stumped, I've removed all deallocation in order to make sure I'm not double-freeing something (make it stop crashing then make it stop leaking is the idea). No dice. The only thing I can figure out is that perhaps rffi is somehow freeing something I don't want it to free? Looking at the internals of rffi.llexternal, I see that it sometimes auto frees stuff, but the only things I'm passing in are raw buffers, voidp, integers, or the callback. Since this is libuv, the library may hold onto stuff like the filename even after the call to uv_fs_open completes. The idea being it will call the callback when the operation completes, but only during a call to uv_run. So all this looks correct to me, but I get the feeling that rffi must be doing some magic somewhere that I don't want. Any input would be awesome, I've been hacking on this for about half a week now and I'm completely stumped. Timothy On Mon, Nov 10, 2014 at 7:55 AM, Timothy Baldridge wrote: > That could be true for uv_fs_read, but in the minimal test case (in my > last email) I'm only opening a file 10 times, after the first few > iterations of opening the file (2-3 times) that test crashes. > > > Timothy > > On Mon, Nov 10, 2014 at 7:52 AM, Armin Rigo wrote: > >> Hi Timothy, >> >> We're talking past each other. I think that I already found that your >> code is not correct according to the docs. You need to fix the >> signature of uv_fs_read() and create an array 'uv_buf_t[]', possibly >> of length 1. I may be wrong though. >> >> >> A bient?t, >> >> Armin. >> > > > > -- > ?One of the main causes of the fall of the Roman Empire was that?lacking > zero?they had no way to indicate successful termination of their C > programs.? > (Robert Firth) > -- ?One of the main causes of the fall of the Roman Empire was that?lacking zero?they had no way to indicate successful termination of their C programs.? (Robert Firth) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tbaldridge at gmail.com Tue Nov 11 01:31:24 2014 From: tbaldridge at gmail.com (Timothy Baldridge) Date: Mon, 10 Nov 2014 17:31:24 -0700 Subject: [pypy-dev] Crashes with rffi In-Reply-To: References:

Message-ID: The plot thickens even more. I tried several other things, and was surprised to find that a call to uv_timeout works just fine, that call takes a callback, but no strings and never stores the strings internally. On a whim I translated the entire thing (without a JIT) and everything I throw at it runs without a problem. Could this just be an issue with ffi and CPython? Timothy On Mon, Nov 10, 2014 at 1:16 PM, Timothy Baldridge wrote: > So I hacked on this more over my lunch break and am still completely > stumped, I've removed all deallocation in order to make sure I'm not > double-freeing something (make it stop crashing then make it stop leaking > is the idea). No dice. > > The only thing I can figure out is that perhaps rffi is somehow freeing > something I don't want it to free? Looking at the internals of > rffi.llexternal, I see that it sometimes auto frees stuff, but the only > things I'm passing in are raw buffers, voidp, integers, or the callback. > > Since this is libuv, the library may hold onto stuff like the filename > even after the call to uv_fs_open completes. The idea being it will call > the callback when the operation completes, but only during a call to > uv_run. So all this looks correct to me, but I get the feeling that rffi > must be doing some magic somewhere that I don't want. > > Any input would be awesome, I've been hacking on this for about half a > week now and I'm completely stumped. > > Timothy > > On Mon, Nov 10, 2014 at 7:55 AM, Timothy Baldridge > wrote: > >> That could be true for uv_fs_read, but in the minimal test case (in my >> last email) I'm only opening a file 10 times, after the first few >> iterations of opening the file (2-3 times) that test crashes. >> >> >> Timothy >> >> On Mon, Nov 10, 2014 at 7:52 AM, Armin Rigo wrote: >> >>> Hi Timothy, >>> >>> We're talking past each other. I think that I already found that your >>> code is not correct according to the docs. You need to fix the >>> signature of uv_fs_read() and create an array 'uv_buf_t[]', possibly >>> of length 1. I may be wrong though. >>> >>> >>> A bient?t, >>> >>> Armin. >>> >> >> >> >> -- >> ?One of the main causes of the fall of the Roman Empire was that?lacking >> zero?they had no way to indicate successful termination of their C >> programs.? >> (Robert Firth) >> > > > > -- > ?One of the main causes of the fall of the Roman Empire was that?lacking > zero?they had no way to indicate successful termination of their C > programs.? > (Robert Firth) > -- ?One of the main causes of the fall of the Roman Empire was that?lacking zero?they had no way to indicate successful termination of their C programs.? (Robert Firth) -------------- next part -------------- An HTML attachment was scrubbed... URL: From identifinderz at yahoo.com Wed Nov 12 04:59:01 2014 From: identifinderz at yahoo.com (John McElhatton) Date: Wed, 12 Nov 2014 04:59:01 +0100 Subject: [pypy-dev] More information please. Message-ID: <88982242ede3f3ddffbec43b264857f7@yahoo.com> Hi There, I am contacting you personally because we would like to sell your software as an affiliate. This is not a sales email. I have seen that you use Cnet and download.com to sell your software. We believe we can dramatically increase the number of software sales using our custom made system that will identify everyone who downloads software from Cnet and download.com and allows us to contact them, without the need for them to fill in any contact form. Using our IdentyFind system we can then actively market the software to everyone that downloads the software and increase conversions and sales. As you are aware having the ability to re-market an interested person who has downloaded software has proven to increase software sales but to do that you need to know who is downloading your software and be able to contact them. We have built our system to enable us to do this. With our IdentyFind system we can now obtain the name and location of everyone who downloads software from Cnet and download.com and can send them professional HTML messages through the Windows messaging system that is installed on all Windows PCs. We can actively encourage them to purchase the software. With our unique desktop messaging system we will be able to send full colour interactive messages with graphics, logos, even questionnaires and booking forms. Just like an email. Because IdentyFind does not record their personal details it is fully compliant with all US, European and UK data protection acts. But is allows us to contact them directly and with desktop messages it has a 100% open rate! Even better than an email. I would like to invite you to watch our video explaining how this works and then we can explain how we can use our system to sell your software. Increasing your sales and making affiliate revenue for us. Please send to me details of how we can become an affiliate and what percentage you offer. To do this please Press Here I will also send to you a link to our video and this will explain how our system works. If you do not have an affiliate program we can still assist you in increasing sales of your software. I look forward to hearing from you. John McElhatton Chief Sales Manager IdentyFind This is aB2B communication -------------- next part -------------- An HTML attachment was scrubbed... URL: From arigo at tunes.org Thu Nov 13 11:40:56 2014 From: arigo at tunes.org (Armin Rigo) Date: Thu, 13 Nov 2014 11:40:56 +0100 Subject: [pypy-dev] Crashes with rffi In-Reply-To: References:

Message-ID: Hi Timothy, I did a git checkout of the branch async-io-file; then "cd pixie/pixie/vm/libs/test"; then "touch README.md" (unsure how you're supposed to run from the top directory); then: PYTHONPATH=~/git/pixie:~/pypysrc python -m unittest test_uv_file It seems to work for me (Ran 2 tests in xx s; OK). Is that how you're supposed to get the crash? A bient?t, Armin. From laurie at tratt.net Thu Nov 13 12:08:47 2014 From: laurie at tratt.net (Laurence Tratt) Date: Thu, 13 Nov 2014 11:08:47 +0000 Subject: [pypy-dev] Tracing recursion Message-ID: <20141113110847.GD25512@overdrive.tratt.net> [Summary: the recursion_and_inlining branch stops us tracing arbitrarily deep in recursive functions. Comments welcome. If we think this is a good idea, we have to decide if/how many levels deep we unroll before stopping tracing. My current thinking is that 7 levels looks about right, though this is heavily dependent on which benchmarks one considers.] I recently pushed a branch recursion_and_inlining which aims to tackle the problem of controlling tracing in the face of recursion. I have been vaguely aware of an issue here for a while, but never quite managed to nail the problem down. Then Kevin Modzelewski pointed me to a small benchmark called "polymorphism" [1] he'd sent to the list back in April which shows the core problem. Some simple benchmarking shows that we have a performance problem -- PyPy is 4.4x slower than CPython on this benchmark: $ multitime -n 10 python polymorphism.py ===> multitime results 1: python /tmp/polymorphism.py Mean Std.Dev. Min Median Max real 1.649 0.058 1.595 1.632 1.763 user 1.643 0.057 1.592 1.624 1.756 sys 0.002 0.003 0.000 0.002 0.008 $ multitime -n 10 ./pypy-c-orig polymorphism.py ===> multitime results 1: ./pypy-c-orig /tmp/polymorphism.py Mean Std.Dev. Min Median Max real 7.198 0.047 7.131 7.203 7.274 user 7.156 0.051 7.076 7.154 7.232 sys 0.033 0.011 0.012 0.032 0.048 The problem is that RPython's naturally aggressive inlining also inlines recursive functions. So while we don't unroll explicit loops (while/for), we end up unrolling recursive functions. Sometimes this works out OK, but it's more likely to end in aborts, or traces which end up with lots of side traces. Both latter cases are inefficient. As far as I can see, the most frequent use of aborts is as a heuristic that recursion has been encountered. There is one way in which recursion doesn't inline, which is if the recursive function happens to be the start of the trace, in which case (I think) myjitpl.MIFrame.opimpl_jit_merge_point turns it into a nice loop. However, if the recursion starts part way into a trace, all bets are off. Kevin's benchmark has two types of recursion: one in the make_random function; and another in Poly1.score. make_random is (obviously) directly recursive; Poly1.score is indirectly recursive. Both cause the problem noted above. The recursion_and_inlining branch [2] tries to dynamically spot recursion by trapping function calls in pyjitpl.MIFrame._opimpl_recursive_call and seeing if they're currently on the meta-interpreter stack. If they are then it can choose to stop tracing and turn the recursive call into an actual function call (it then sets the function as JC_DONT_TRACE_HERE so that, if it hasn't been traced separately already, it will then be traced, with the recursion handled by the existing case in opimpl_jit_merge_point). Doing this speeds Kevin's benchmark up significantly: $ multitime -n 10 ./pypy-c-level1 /tmp/polymorphism.py ===> multitime results 1: ./pypy-c-level1 /tmp/polymorphism.py Mean Std.Dev. Min Median Max real 0.535 0.013 0.516 0.535 0.560 user 0.517 0.018 0.476 0.522 0.544 sys 0.016 0.008 0.008 0.012 0.036 We've gone from 4.4x slower than CPython to over 3x faster (i.e. PyPy in recursion_and_inlining is 13.5x faster than normal PyPy). Which is nice. However, if I run the branch on the PyPy benchmark suite, we see some significant slowdown in a handful of benchmarks relative to normal PyPy. [Full data is attached as results1.txt]. e.g. hexiom2 is 1.3x slower; raytrace-simple is 3.3x slower; spectral-norm 3.3x slower; sympy_str 1.5x slower; and telco 4x slower. A few benchmarks speed up in a meaningful way; most run so quickly that any differences are lost in the noise (IMHO any benchmark running for 0.1s or less is too short to draw many conclusions from). The translate test doesn't seem to be impacted by the change either way. Nevertheless -- and even taking into account that the current benchmark suite and PyPy optimisations have sort-of evolved hand-in-hand -- the slow benchmarks aren't good. So, fortunately, our branch can trivially be extended to identify not just recursion, but how deeply we've recursed. Put another way, we can choose to allow a function to be unrolled a fixed number of times before stopping tracing. How many unrollings should we choose? Well, I've spent time putting together some rough data (note: this is not perfect benchmarking, but it's probably good enough). Let's take our slow-coaches (to one decimal place, because there's quite a bit of noise) and Kevin's benchmark (to 2 decimal places, because it runs long enough to make that sensible) and see how they change relative to normal PyPy as we crank up the unrollings: #unrollings | 1 | 2 | 3 | 5 | 7 | 10 | -----------------+------+------+------+------+------+------+ hexiom2 | 1.3 | 1.4 | 1.1 | 1.0 | 1.0 | 1.0 | raytrace-simple | 3.3 | 3.1 | 2.8 | 1.4 | 1.0 | 1.0 | spectral-norm | 3.3 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | sympy_str | 1.5 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | telco | 4 | 2.5 | 2.0 | 1.0 | 1.0 | 1.0 | -----------------+------+------+------+------+------+------+ polymorphism | 0.07 | 0.07 | 0.07 | 0.07 | 0.08 | 0.09 | Lower is better in this table. What you can't quite see from that is that Kevin's benchmark starts getting a teeny bit slower from 5 unrollings upwards (we'd need another decimal place to see it). All data is attached in results.txt files if you want to see the raw data. What I take from the above is that a reasonable guess as to a sensible number of unrollings seems to be 7. It makes most of the standard benchmarks behave as they always have, while not doing too much to benchmarks which are recursive in a different style. Of course, this is all *highly* dependent on the benchmarks chosen, and if one day we have more benchmarks, we should revisit this number accordingly. Note also that this is a patch to *RPython*. It should effect (for better or worse) all RPython VMs, not just PyPy. e.g. it speeds the Converge compiler up by a bit over 5% (I haven't tried much more than that). So there you have it. How would you folks like to proceed? I welcome comments, as this is not an area of RPython that I've looked at before. e.g. how many unrollings do you think are sensible? Note that a few tests are currently broken, because they depend on how many unrollings we choose to do. Once we've fixed that behaviour, I'll unbreak the tests. I also would very much welcome tests on other recursive and non-recursive programs/benchmarks you may have to see how this branch impacts upon performance. I'd like to say thanks to Kevin for pointing out the problem to me; Carl Friedrich for helping me understand that tracing recursion was the problem; and Tanzim Hoque for helping pick into some of the details of pyjitpl. Laurie [1] https://raw.githubusercontent.com/dropbox/pyston/master/microbenchmarks/polymorphism.py [2] The key patch is: https://bitbucket.org/pypy/pypy/commits/b60064f55316ceb4a3bd784c00a467253a197c4c -------------- next part -------------- Report on Linux bencher3 3.2.0-4-amd64 #1 SMP Debian 3.2.60-1+deb7u3 x86_64 Total CPU cores: 4 ### ai ### Min: 0.023896 -> 0.023524: 1.0158x faster Avg: 0.024042 -> 0.023743: 1.0126x faster Significant (t=4.300444, a=0.95) Stddev: 0.00031 -> 0.00038: 1.2092x larger ### bm_chameleon ### Min: 0.007552 -> 0.008992: 1.1906x slower Avg: 0.010259 -> 0.011695: 1.1399x slower Significant (t=-2.958955, a=0.95) Stddev: 0.00422 -> 0.00418: 1.0098x smaller ### bm_dulwich_log ### Min: 0.093526 -> 0.108061: 1.1554x slower Avg: 0.132285 -> 0.143125: 1.0819x slower Not significant Stddev: 0.11097 -> 0.09810: 1.1312x smaller ### bm_mako ### Min: 0.008855 -> 0.008900: 1.0051x slower Avg: 0.011058 -> 0.010998: 1.0055x faster Not significant Stddev: 0.00367 -> 0.00355: 1.0325x smaller ### chaos ### Min: 0.003802 -> 0.006283: 1.6525x slower Avg: 0.005174 -> 0.007426: 1.4354x slower Not significant Stddev: 0.00888 -> 0.00728: 1.2189x smaller ### sphinx ### Raw results: [76.860211134] [75.8158118725] ### crypto_pyaes ### Min: 0.024762 -> 0.025725: 1.0389x slower Avg: 0.027516 -> 0.028323: 1.0293x slower Not significant Stddev: 0.01021 -> 0.00950: 1.0752x smaller ### deltablue ### Min: 0.000666 -> 0.001062: 1.5941x slower Avg: 0.006572 -> 0.006668: 1.0147x slower Not significant Stddev: 0.00912 -> 0.00978: 1.0721x larger ### django ### Min: 0.015844 -> 0.020710: 1.3071x slower Avg: 0.017266 -> 0.022298: 1.2915x slower Significant (t=-11.193663, a=0.95) Stddev: 0.00226 -> 0.00223: 1.0132x smaller ### eparse ### Min: 0.167241 -> 0.182543: 1.0915x slower Avg: 0.226103 -> 0.237554: 1.0506x slower Not significant Stddev: 0.05000 -> 0.04521: 1.1060x smaller ### fannkuch ### Min: 0.092299 -> 0.092014: 1.0031x faster Avg: 0.093250 -> 0.092878: 1.0040x faster Not significant Stddev: 0.00440 -> 0.00437: 1.0066x smaller ### float ### Min: 0.015519 -> 0.015288: 1.0151x faster Avg: 0.023097 -> 0.022767: 1.0145x faster Not significant Stddev: 0.00619 -> 0.00609: 1.0163x smaller ### genshi_text ### Min: 0.007438 -> 0.007730: 1.0392x slower Avg: 0.009406 -> 0.010046: 1.0681x slower Not significant Stddev: 0.00894 -> 0.00907: 1.0144x larger ### genshi_xml ### Min: 0.023765 -> 0.023280: 1.0208x faster Avg: 0.027038 -> 0.026623: 1.0156x faster Not significant Stddev: 0.01481 -> 0.01475: 1.0039x smaller ### go ### Min: 0.055311 -> 0.053643: 1.0311x faster Avg: 0.082657 -> 0.062297: 1.3268x faster Significant (t=4.947673, a=0.95) Stddev: 0.02728 -> 0.01013: 2.6918x smaller ### hexiom2 ### Min: 5.799438 -> 7.763072: 1.3386x slower Avg: 6.019426 -> 7.979220: 1.3256x slower Significant (t=-44.482017, a=0.95) Stddev: 0.22222 -> 0.21834: 1.0178x smaller ### html5lib ### Min: 1.317897 -> 1.255603: 1.0496x faster Avg: 1.772609 -> 1.587764: 1.1164x faster Not significant Stddev: 0.56009 -> 0.49102: 1.1407x smaller ### json_bench ### Min: 0.400927 -> 0.438493: 1.0937x slower Avg: 0.404558 -> 0.443308: 1.0958x slower Significant (t=-19.128974, a=0.95) Stddev: 0.01106 -> 0.00911: 1.2142x smaller ### meteor-contest ### Min: 0.059441 -> 0.058887: 1.0094x faster Avg: 0.060555 -> 0.059946: 1.0102x faster Not significant Stddev: 0.00321 -> 0.00319: 1.0071x smaller ### nbody_modified ### Min: 0.018606 -> 0.018764: 1.0085x slower Avg: 0.019102 -> 0.019194: 1.0048x slower Not significant Stddev: 0.00201 -> 0.00198: 1.0122x smaller ### pidigits ### Min: 5.363477 -> 5.149447: 1.0416x faster Avg: 5.383062 -> 5.170978: 1.0410x faster Significant (t=8.117698, a=0.95) Stddev: 0.04099 -> 0.04163: 1.0155x larger ### pyflate-fast ### Min: 0.185501 -> 0.215078: 1.1594x slower Avg: 0.188702 -> 0.216523: 1.1474x slower Significant (t=-65.827533, a=0.95) Stddev: 0.00281 -> 0.00101: 2.7794x smaller ### raytrace-simple ### Min: 0.017988 -> 0.059219: 3.2922x slower Avg: 0.018873 -> 0.060078: 3.1833x slower Significant (t=-141.219274, a=0.95) Stddev: 0.00113 -> 0.00173: 1.5266x larger ### richards ### Min: 0.001870 -> 0.002248: 1.2022x slower Avg: 0.002008 -> 0.002368: 1.1792x slower Significant (t=-4.065223, a=0.95) Stddev: 0.00043 -> 0.00045: 1.0413x larger ### rietveld ### Min: 0.043556 -> 0.050356: 1.1561x slower Avg: 0.116072 -> 0.120596: 1.0390x slower Not significant Stddev: 0.09529 -> 0.08353: 1.1408x smaller ### scimark_fft ### 0.136257 -> -1.000000: -1 ### scimark_lu ### 0.300565 -> -1.000000: -1 ### scimark_montecarlo ### 0.139376 -> -1.000000: -1 ### scimark_sor ### 0.226458 -> -1.000000: -1 ### scimark_sparsematmult ### 0.147132 -> -1.000000: -1 ### slowspitfire ### Min: 0.150665 -> 0.150214: 1.0030x faster Avg: 0.161504 -> 0.161734: 1.0014x slower Not significant Stddev: 0.00558 -> 0.00722: 1.2952x larger ### spambayes ### Min: 0.029400 -> 0.029954: 1.0188x slower Avg: 0.049929 -> 0.051126: 1.0240x slower Not significant Stddev: 0.01489 -> 0.01562: 1.0489x larger ### spectral-norm ### Min: 0.009344 -> 0.031196: 3.3386x slower Avg: 0.010038 -> 0.031759: 3.1640x slower Significant (t=-37.808674, a=0.95) Stddev: 0.00286 -> 0.00289: 1.0089x larger ### spitfire ### Min: 1.150000 -> 1.150000: no change Avg: 1.186200 -> 1.191600: 1.0046x slower Not significant Stddev: 0.03181 -> 0.03080: 1.0329x smaller ### spitfire_cstringio ### [29/1765] Min: 0.530000 -> 0.410000: 1.2927x faster Avg: 0.558800 -> 0.433800: 1.2882x faster Significant (t=25.891200, a=0.95) Stddev: 0.02480 -> 0.02346: 1.0568x smaller ### sympy_expand ### Min: 0.210130 -> 0.212947: 1.0134x slower Avg: 0.317229 -> 0.308200: 1.0293x faster Not significant Stddev: 0.19828 -> 0.17363: 1.1420x smaller ### sympy_integrate ### Min: 0.776576 -> 0.764212: 1.0162x faster Avg: 1.237801 -> 1.147388: 1.0788x faster Not significant Stddev: 0.70127 -> 0.56203: 1.2477x smaller ### sympy_str ### Min: 0.153791 -> 0.104809: 1.4673x faster Avg: 0.297872 -> 0.277957: 1.0716x faster Not significant Stddev: 0.20000 -> 0.20368: 1.0184x larger ### sympy_sum ### Min: 0.207153 -> 0.224769: 1.0850x slower Avg: 0.293553 -> 0.302402: 1.0301x slower Not significant Stddev: 0.12345 -> 0.10984: 1.1239x smaller ### telco ### Min: 0.008000 -> 0.032002: 4.0003x slower Avg: 0.014001 -> 0.036482: 2.6057x slower Significant (t=-12.647330, a=0.95) Stddev: 0.00915 -> 0.00862: 1.0623x smaller ### trans2_annotate ### Raw results: [301.8] None ### trans2_rtype ### Raw results: [586.4] None ### trans2_backendopt ### Raw results: [82.0] None ### trans2_database ### Raw results: [112.6] None ### trans2_source ### Raw results: [120.2] None ### twisted_iteration ### Min: 0.002044 -> 0.002382: 1.1655x slower Avg: 0.002076 -> 0.002406: 1.1589x slower Significant (t=-90.456630, a=0.95) Stddev: 0.00002 -> 0.00001: 1.5006x smaller ### twisted_names ### Min: 0.000758 -> 0.000808: 1.0667x slower Avg: 0.000767 -> 0.000816: 1.0634x slower Significant (t=-41.869786, a=0.95) Stddev: 0.00001 -> 0.00001: 1.1888x smaller ### twisted_pb ### Min: 0.003379 -> 0.003722: 1.1016x slower Avg: 0.003661 -> 0.004049: 1.1059x slower Significant (t=-11.815989, a=0.95) Stddev: 0.00016 -> 0.00017: 1.1098x larger ### twisted_tcp ### Min: 0.085662 -> 0.089216: 1.0415x slower Avg: 0.087151 -> 0.090960: 1.0437x slower Significant (t=-11.383010, a=0.95) Stddev: 0.00155 -> 0.00179: 1.1552x larger -------------- next part -------------- ### ai ### Min: 0.024519 -> 0.023830: 1.0289x faster Avg: 0.024719 -> 0.024149: 1.0236x faster Significant (t=5.841142, a=0.95) Stddev: 0.00032 -> 0.00061: 1.8921x larger ### bm_chameleon ### Min: 0.007607 -> 0.009337: 1.2274x slower Avg: 0.010243 -> 0.012217: 1.1928x slower Significant (t=-4.033700, a=0.95) Stddev: 0.00420 -> 0.00428: 1.0183x larger ### bm_dulwich_log ### Min: 0.097236 -> 0.097346: 1.0011x slower Avg: 0.135518 -> 0.138144: 1.0194x slower Not significant Stddev: 0.11196 -> 0.11137: 1.0053x smaller ### bm_mako ### Min: 0.008915 -> 0.009264: 1.0392x slower Avg: 0.011081 -> 0.011375: 1.0265x slower Not significant Stddev: 0.00370 -> 0.00361: 1.0249x smaller ### chaos ### Min: 0.003758 -> 0.004400: 1.1709x slower Avg: 0.005148 -> 0.005729: 1.1129x slower Not significant Stddev: 0.00897 -> 0.00854: 1.0504x smaller ### sphinx ### Raw results: [78.2611758709] [77.0095219612] ### crypto_pyaes ### Min: 0.025186 -> 0.024932: 1.0102x faster Avg: 0.027842 -> 0.027758: 1.0030x faster Not significant Stddev: 0.01012 -> 0.01023: 1.0108x larger ### deltablue ### Min: 0.000680 -> 0.000923: 1.3573x slower Avg: 0.006624 -> 0.006197: 1.0688x faster Not significant Stddev: 0.00915 -> 0.00887: 1.0316x smaller ### django ### Min: 0.015887 -> 0.020281: 1.2766x slower Avg: 0.017195 -> 0.021805: 1.2681x slower Significant (t=-10.510799, a=0.95) Stddev: 0.00215 -> 0.00224: 1.0428x larger ### eparse ### Min: 0.158280 -> 0.174940: 1.1053x slower Avg: 0.221771 -> 0.222904: 1.0051x slower Not significant Stddev: 0.04991 -> 0.05164: 1.0348x larger ### fannkuch ### Min: 0.091970 -> 0.092063: 1.0010x slower Avg: 0.092925 -> 0.093031: 1.0011x slower Not significant Stddev: 0.00431 -> 0.00438: 1.0169x larger ### float ### Min: 0.015624 -> 0.015870: 1.0157x slower Avg: 0.023262 -> 0.023572: 1.0133x slower Not significant Stddev: 0.00613 -> 0.00618: 1.0074x larger ### genshi_text ### Min: 0.007519 -> 0.007399: 1.0162x faster Avg: 0.009532 -> 0.009198: 1.0363x faster Not significant Stddev: 0.00875 -> 0.00774: 1.1306x smaller ### genshi_xml ### Min: 0.022098 -> 0.024445: 1.1062x slower Avg: 0.025456 -> 0.027746: 1.0899x slower Not significant Stddev: 0.01513 -> 0.01386: 1.0915x smaller ### go ### Min: 0.054469 -> 0.049704: 1.0959x faster Avg: 0.081538 -> 0.062104: 1.3129x faster Significant (t=4.477202, a=0.95) Stddev: 0.02725 -> 0.01412: 1.9304x smaller ### hexiom2 ### Min: 5.765985 -> 7.972130: 1.3826x slower Avg: 5.997823 -> 8.225795: 1.3715x slower Significant (t=-45.382973, a=0.95) Stddev: 0.23419 -> 0.25624: 1.0941x larger ### html5lib ### Min: 1.251295 -> 1.210693: 1.0335x faster Avg: 1.732520 -> 1.672716: 1.0358x faster Not significant Stddev: 0.58002 -> 0.55705: 1.0412x smaller ### json_bench ### Min: 0.410679 -> 0.528665: 1.2873x slower Avg: 0.414892 -> 0.533759: 1.2865x slower Significant (t=-53.306479, a=0.95) Stddev: 0.01118 -> 0.01112: 1.0059x smaller ### meteor-contest ### Min: 0.059319 -> 0.058654: 1.0113x faster Avg: 0.060485 -> 0.059748: 1.0123x faster Not significant Stddev: 0.00321 -> 0.00323: 1.0058x larger ### nbody_modified ### Min: 0.018536 -> 0.018806: 1.0146x slower Avg: 0.019032 -> 0.019321: 1.0152x slower Not significant Stddev: 0.00204 -> 0.00197: 1.0325x smaller ### pidigits ### Min: 5.366191 -> 5.344685: 1.0040x faster Avg: 5.392111 -> 5.366315: 1.0048x faster Not significant Stddev: 0.04388 -> 0.04192: 1.0467x smaller ### pyflate-fast ### Min: 0.193320 -> 0.212552: 1.0995x slower Avg: 0.196630 -> 0.213785: 1.0872x slower Significant (t=-54.781980, a=0.95) Stddev: 0.00197 -> 0.00101: 1.9416x smaller ### raytrace-simple ### Min: 0.018000 -> 0.056360: 3.1311x slower Avg: 0.019035 -> 0.057023: 2.9957x slower Significant (t=-140.523775, a=0.95) Stddev: 0.00116 -> 0.00152: 1.3046x larger ### richards ### Min: 0.001856 -> 0.001877: 1.0113x slower Avg: 0.001988 -> 0.001997: 1.0047x slower Not significant Stddev: 0.00044 -> 0.00044: 1.0091x larger ### rietveld ### Min: 0.041706 -> 0.042024: 1.0076x slower Avg: 0.114850 -> 0.115267: 1.0036x slower Not significant Stddev: 0.09532 -> 0.09368: 1.0175x smaller ### scimark_fft ### 0.136390 -> -1.000000: -1 ### scimark_lu ### 0.298607 -> -1.000000: -1 ### scimark_montecarlo ### 0.141685 -> -1.000000: -1 ### scimark_sor ### 0.226635 -> -1.000000: -1 ### scimark_sparsematmult ### 0.147101 -> -1.000000: -1 ### slowspitfire ### Min: 0.152183 -> 0.152311: 1.0008x slower Avg: 0.162501 -> 0.163124: 1.0038x slower Not significant Stddev: 0.00551 -> 0.00570: 1.0344x larger ### spambayes ### Min: 0.028572 -> 0.029657: 1.0380x slower Avg: 0.049160 -> 0.051171: 1.0409x slower Not significant Stddev: 0.01736 -> 0.01518: 1.1434x smaller ### spectral-norm ### Min: 0.009342 -> 0.009381: 1.0042x slower Avg: 0.010040 -> 0.010042: 1.0002x slower Not significant Stddev: 0.00285 -> 0.00289: 1.0140x larger ### spitfire ### Min: 1.160000 -> 1.140000: 1.0175x faster Avg: 1.194200 -> 1.181400: 1.0108x faster Significant (t=2.040372, a=0.95) Stddev: 0.03302 -> 0.02962: 1.1145x smaller ### spitfire_cstringio ### Min: 0.400000 -> 0.390000: 1.0256x faster Avg: 0.426800 -> 0.410000: 1.0410x faster Significant (t=3.168258, a=0.95) Stddev: 0.02691 -> 0.02611: 1.0308x smaller ### sympy_expand ### Min: 0.206214 -> 0.240614: 1.1668x slower Avg: 0.316202 -> 0.333418: 1.0544x slower Not significant Stddev: 0.19306 -> 0.19010: 1.0156x smaller ### sympy_integrate ### Min: 0.801874 -> 0.781623: 1.0259x faster Avg: 1.271010 -> 1.176734: 1.0801x faster Not significant Stddev: 0.67044 -> 0.64535: 1.0389x smaller ### sympy_str ### Min: 0.156287 -> 0.160201: 1.0250x slower Avg: 0.303686 -> 0.326078: 1.0737x slower Not significant Stddev: 0.20273 -> 0.22414: 1.1056x larger ### sympy_sum ### Min: 0.198554 -> 0.201798: 1.0163x slower Avg: 0.290399 -> 0.298513: 1.0279x slower Not significant Stddev: 0.12515 -> 0.13811: 1.1036x larger ### telco ### Min: 0.008000 -> 0.020001: 2.5001x slower Avg: 0.014241 -> 0.025362: 1.7809x slower Significant (t=-6.447078, a=0.95) Stddev: 0.00943 -> 0.00774: 1.2192x smaller ### trans2_annotate ### Raw results: [303.2] None ### trans2_rtype ### Raw results: [612.2] None ### trans2_backendopt ### Raw results: [83.8] None ### trans2_database ### Raw results: [115.0] None ### trans2_source ### Raw results: [120.7] None ### twisted_iteration ### Min: 0.002040 -> 0.002048: 1.0040x slower Avg: 0.002054 -> 0.002068: 1.0066x slower Significant (t=-6.838156, a=0.95) Stddev: 0.00001 -> 0.00001: 1.0082x smaller ### twisted_names ### Min: 0.000737 -> 0.000749: 1.0166x slower Avg: 0.000747 -> 0.000759: 1.0148x slower Significant (t=-8.378163, a=0.95) Stddev: 0.00001 -> 0.00001: 1.2645x larger ### twisted_pb ### Min: 0.003442 -> 0.003522: 1.0232x slower Avg: 0.003739 -> 0.003842: 1.0276x slower Significant (t=-3.155522, a=0.95) Stddev: 0.00016 -> 0.00016: 1.0193x smaller ### twisted_tcp ### Min: 0.081666 -> 0.089769: 1.0992x slower Avg: 0.083214 -> 0.090795: 1.0911x slower Significant (t=-31.579612, a=0.95) Stddev: 0.00152 -> 0.00076: 1.9919x smaller -------------- next part -------------- ### ai ### Min: 0.024530 -> 0.023645: 1.0374x faster Avg: 0.024767 -> 0.023963: 1.0336x faster Significant (t=11.078261, a=0.95) Stddev: 0.00033 -> 0.00039: 1.2038x larger ### bm_chameleon ### Min: 0.007709 -> 0.009386: 1.2175x slower Avg: 0.010223 -> 0.011915: 1.1655x slower Significant (t=-3.516124, a=0.95) Stddev: 0.00424 -> 0.00410: 1.0344x smaller ### bm_dulwich_log ### Min: 0.111125 -> 0.107174: 1.0369x faster Avg: 0.151440 -> 0.145219: 1.0428x faster Not significant Stddev: 0.11254 -> 0.11114: 1.0125x smaller ### bm_mako ### Min: 0.009072 -> 0.008993: 1.0088x faster Avg: 0.011201 -> 0.011180: 1.0019x faster Not significant Stddev: 0.00366 -> 0.00364: 1.0047x smaller ### chaos ### Min: 0.003846 -> 0.004008: 1.0421x slower Avg: 0.005232 -> 0.005390: 1.0303x slower Not significant Stddev: 0.00900 -> 0.00896: 1.0047x smaller ### sphinx ### Raw results: [78.2049300671] [76.9637699127] ### crypto_pyaes ### Min: 0.025174 -> 0.024822: 1.0142x faster Avg: 0.027979 -> 0.027446: 1.0194x faster Not significant Stddev: 0.01029 -> 0.01014: 1.0148x smaller ### deltablue ### Min: 0.000685 -> 0.000915: 1.3355x slower Avg: 0.006643 -> 0.006407: 1.0369x faster Not significant Stddev: 0.00920 -> 0.00861: 1.0692x smaller ### django ### Min: 0.016882 -> 0.015822: 1.0670x faster Avg: 0.018295 -> 0.017131: 1.0679x faster Significant (t=2.678836, a=0.95) Stddev: 0.00218 -> 0.00217: 1.0056x smaller ### eparse ### Min: 0.178579 -> 0.182563: 1.0223x slower Avg: 0.227297 -> 0.224844: 1.0109x faster Not significant Stddev: 0.04975 -> 0.04936: 1.0078x smaller ### fannkuch ### Min: 0.092023 -> 0.092981: 1.0104x slower Avg: 0.093026 -> 0.093839: 1.0087x slower Not significant Stddev: 0.00459 -> 0.00436: 1.0523x smaller ### float ### Min: 0.016049 -> 0.016316: 1.0166x slower Avg: 0.023619 -> 0.023923: 1.0129x slower Not significant Stddev: 0.00619 -> 0.00619: 1.0006x larger ### genshi_text ### Min: 0.007739 -> 0.007923: 1.0238x slower Avg: 0.009736 -> 0.009785: 1.0050x slower Not significant Stddev: 0.00889 -> 0.00759: 1.1709x smaller ### genshi_xml ### Min: 0.022961 -> 0.023157: 1.0085x slower Avg: 0.026239 -> 0.026411: 1.0066x slower Not significant Stddev: 0.01519 -> 0.01359: 1.1178x smaller ### go ### Min: 0.053826 -> 0.044093: 1.2207x faster Avg: 0.082227 -> 0.055605: 1.4788x faster Significant (t=6.157912, a=0.95) Stddev: 0.02767 -> 0.01299: 2.1303x smaller ### hexiom2 ### Min: 5.772802 -> 6.185860: 1.0716x slower Avg: 6.008467 -> 6.430217: 1.0702x slower Significant (t=-8.696341, a=0.95) Stddev: 0.23806 -> 0.24684: 1.0369x larger ### html5lib ### Min: 1.276636 -> 1.201707: 1.0624x faster Avg: 1.742303 -> 1.666700: 1.0454x faster Not significant Stddev: 0.57257 -> 0.54649: 1.0477x smaller ### json_bench ### Min: 0.403071 -> 0.405566: 1.0062x slower Avg: 0.407013 -> 0.410124: 1.0076x slower Not significant Stddev: 0.01099 -> 0.01134: 1.0322x larger ### meteor-contest ### Min: 0.059418 -> 0.058484: 1.0160x faster Avg: 0.060614 -> 0.059640: 1.0163x faster Not significant Stddev: 0.00318 -> 0.00324: 1.0179x larger ### nbody_modified ### Min: 0.018613 -> 0.018432: 1.0098x faster Avg: 0.019143 -> 0.018961: 1.0096x faster Not significant Stddev: 0.00203 -> 0.00202: 1.0075x smaller ### pidigits ### Min: 5.398082 -> 5.244479: 1.0293x faster Avg: 5.421811 -> 5.267033: 1.0294x faster Significant (t=5.504659, a=0.95) Stddev: 0.04472 -> 0.04419: 1.0119x smaller ### pyflate-fast ### Min: 0.185665 -> 0.213436: 1.1496x slower Avg: 0.191272 -> 0.215193: 1.1251x slower Significant (t=-25.983353, a=0.95) Stddev: 0.00577 -> 0.00302: 1.9133x smaller ### raytrace-simple ### Min: 0.018075 -> 0.050743: 2.8074x slower Avg: 0.018910 -> 0.051445: 2.7206x slower Significant (t=-124.054463, a=0.95) Stddev: 0.00120 -> 0.00142: 1.1829x larger ### richards ### Min: 0.001887 -> 0.001873: 1.0074x faster Avg: 0.002089 -> 0.002055: 1.0167x faster Not significant Stddev: 0.00044 -> 0.00044: 1.0147x smaller ### rietveld ### Min: 0.043351 -> 0.044205: 1.0197x slower Avg: 0.114839 -> 0.114658: 1.0016x faster Not significant Stddev: 0.09476 -> 0.09160: 1.0345x smaller ### scimark_fft ### 0.134479 -> -1.000000: -1 ### scimark_lu ### 0.305992 -> -1.000000: -1 ### scimark_montecarlo ### 0.139750 -> -1.000000: -1 ### scimark_sor ### 0.225868 -> -1.000000: -1 ### scimark_sparsematmult ### 0.147379 -> -1.000000: -1 ### slowspitfire ### Min: 0.151334 -> 0.152800: 1.0097x slower Avg: 0.161073 -> 0.163202: 1.0132x slower Not significant Stddev: 0.00620 -> 0.00555: 1.1162x smaller ### spambayes ### Min: 0.027756 -> 0.027993: 1.0085x slower Avg: 0.048300 -> 0.046956: 1.0286x faster Not significant Stddev: 0.01600 -> 0.01517: 1.0543x smaller ### spectral-norm ### Min: 0.009358 -> 0.009377: 1.0020x slower Avg: 0.010026 -> 0.010028: 1.0002x slower Not significant Stddev: 0.00287 -> 0.00284: 1.0110x smaller ### spitfire ### Min: 1.160000 -> 1.140000: 1.0175x faster Avg: 1.198400 -> 1.178200: 1.0171x faster Significant (t=3.149072, a=0.95) Stddev: 0.03377 -> 0.03028: 1.1150x smaller ### spitfire_cstringio ### Min: 0.490000 -> 0.480000: 1.0208x faster Avg: 0.511000 -> 0.499800: 1.0224x faster Significant (t=2.213100, a=0.95) Stddev: 0.02435 -> 0.02622: 1.0770x larger ### sympy_expand ### Min: 0.216114 -> 0.226401: 1.0476x slower Avg: 0.325487 -> 0.327479: 1.0061x slower Not significant Stddev: 0.20156 -> 0.19519: 1.0326x smaller ### sympy_integrate ### Min: 0.863688 -> 0.724006: 1.1929x faster Avg: 1.298924 -> 1.175341: 1.1051x faster Not significant Stddev: 0.71017 -> 0.70017: 1.0143x smaller ### sympy_str ### Min: 0.154443 -> 0.158522: 1.0264x slower Avg: 0.302207 -> 0.309182: 1.0231x slower Not significant Stddev: 0.20272 -> 0.21334: 1.0524x larger ### sympy_sum ### Min: 0.206235 -> 0.214458: 1.0399x slower Avg: 0.291689 -> 0.296673: 1.0171x slower Not significant Stddev: 0.11763 -> 0.12348: 1.0497x larger ### telco ### Min: 0.008000 -> 0.016001: 2.0001x slower Avg: 0.014161 -> 0.021841: 1.5424x slower Significant (t=-4.700125, a=0.95) Stddev: 0.00908 -> 0.00715: 1.2703x smaller ### trans2_annotate ### Raw results: [296.3] None ### trans2_rtype ### Raw results: [593.8] None ### trans2_backendopt ### Raw results: [81.8] None ### trans2_database ### Raw results: [109.5] None ### trans2_source ### Raw results: [118.0] None ### twisted_iteration ### Min: 0.002045 -> 0.002054: 1.0042x slower Avg: 0.002061 -> 0.002071: 1.0048x slower Significant (t=-4.479936, a=0.95) Stddev: 0.00001 -> 0.00001: 1.1017x smaller ### twisted_names ### Min: 0.000726 -> 0.000764: 1.0514x slower Avg: 0.000735 -> 0.000776: 1.0552x slower Significant (t=-31.900206, a=0.95) Stddev: 0.00001 -> 0.00001: 1.1643x larger ### twisted_pb ### Min: 0.003398 -> 0.003665: 1.0786x slower Avg: 0.003714 -> 0.003966: 1.0679x slower Significant (t=-6.633568, a=0.95) Stddev: 0.00022 -> 0.00016: 1.3333x smaller ### twisted_tcp ### Min: 0.082466 -> 0.082349: 1.0014x faster Avg: 0.084178 -> 0.083816: 1.0043x faster Not significant Stddev: 0.00170 -> 0.00145: 1.1724x smaller -------------- next part -------------- ### ai ### Min: 0.023883 -> 0.023692: 1.0081x faster Avg: 0.024102 -> 0.024008: 1.0039x faster Not significant Stddev: 0.00034 -> 0.00033: 1.0268x smaller ### bm_chameleon ### Min: 0.007408 -> 0.009028: 1.2187x slower Avg: 0.009830 -> 0.012773: 1.2994x slower Significant (t=-6.177826, a=0.95) Stddev: 0.00419 -> 0.00406: 1.0334x smaller ### bm_dulwich_log ### Min: 0.103286 -> 0.100674: 1.0259x faster Avg: 0.142535 -> 0.139344: 1.0229x faster Not significant Stddev: 0.11222 -> 0.11305: 1.0074x larger ### bm_mako ### Min: 0.009023 -> 0.010273: 1.1385x slower Avg: 0.011059 -> 0.012495: 1.1298x slower Significant (t=-2.013881, a=0.95) Stddev: 0.00354 -> 0.00359: 1.0165x larger ### chaos ### Min: 0.003792 -> 0.003755: 1.0099x faster Avg: 0.005160 -> 0.005149: 1.0021x faster Not significant Stddev: 0.00889 -> 0.00897: 1.0090x larger ### sphinx ### Raw results: [77.5520658493] [78.3562920094] ### crypto_pyaes ### Min: 0.024972 -> 0.024917: 1.0022x faster Avg: 0.027624 -> 0.027582: 1.0015x faster Not significant Stddev: 0.01015 -> 0.01019: 1.0041x larger ### deltablue ### Min: 0.000656 -> 0.000906: 1.3808x slower Avg: 0.006466 -> 0.006557: 1.0140x slower Not significant Stddev: 0.00901 -> 0.00845: 1.0656x smaller ### django ### Min: 0.015847 -> 0.015596: 1.0161x faster Avg: 0.017174 -> 0.016892: 1.0167x faster Not significant Stddev: 0.00218 -> 0.00218: 1.0006x smaller ### eparse ### Min: 0.176189 -> 0.182437: 1.0355x slower Avg: 0.222830 -> 0.229030: 1.0278x slower Not significant Stddev: 0.04900 -> 0.04850: 1.0103x smaller ### fannkuch ### Min: 0.091812 -> 0.092155: 1.0037x slower Avg: 0.092707 -> 0.093207: 1.0054x slower Not significant Stddev: 0.00437 -> 0.00439: 1.0044x larger ### float ### Min: 0.015223 -> 0.015089: 1.0089x faster Avg: 0.022767 -> 0.022708: 1.0026x faster Not significant Stddev: 0.00609 -> 0.00617: 1.0140x larger ### genshi_text ### Min: 0.007651 -> 0.007503: 1.0197x faster Avg: 0.009658 -> 0.009628: 1.0031x faster Not significant Stddev: 0.00878 -> 0.00913: 1.0408x larger ### genshi_xml ### Min: 0.022101 -> 0.024660: 1.1158x slower Avg: 0.025370 -> 0.028154: 1.1097x slower Not significant Stddev: 0.01471 -> 0.01500: 1.0199x larger ### go ### Min: 0.054293 -> 0.044338: 1.2245x faster Avg: 0.082098 -> 0.062413: 1.3154x faster Significant (t=3.986552, a=0.95) Stddev: 0.02746 -> 0.02157: 1.2734x smaller ### hexiom2 ### Min: 5.799802 -> 5.791338: 1.0015x faster Avg: 6.026409 -> 6.035714: 1.0015x slower Not significant Stddev: 0.22891 -> 0.24686: 1.0784x larger ### html5lib ### Min: 1.244801 -> 1.226899: 1.0146x faster Avg: 1.704128 -> 1.653622: 1.0305x faster Not significant Stddev: 0.57008 -> 0.54916: 1.0381x smaller ### json_bench ### Min: 0.402525 -> 0.426142: 1.0587x slower Avg: 0.408857 -> 0.430575: 1.0531x slower Significant (t=-9.744694, a=0.95) Stddev: 0.01106 -> 0.01123: 1.0153x larger ### meteor-contest ### Min: 0.059208 -> 0.059024: 1.0031x faster Avg: 0.060232 -> 0.060214: 1.0003x faster Not significant Stddev: 0.00318 -> 0.00323: 1.0186x larger ### nbody_modified ### Min: 0.018770 -> 0.018680: 1.0048x faster Avg: 0.019336 -> 0.019231: 1.0054x faster Not significant Stddev: 0.00203 -> 0.00202: 1.0055x smaller ### pidigits ### Min: 5.358582 -> 4.985167: 1.0749x faster Avg: 5.380902 -> 5.005724: 1.0749x faster Significant (t=14.099518, a=0.95) Stddev: 0.04282 -> 0.04131: 1.0364x smaller ### pyflate-fast ### Min: 0.186429 -> 0.213897: 1.1473x slower Avg: 0.188376 -> 0.215966: 1.1465x slower Significant (t=-55.643026, a=0.95) Stddev: 0.00280 -> 0.00211: 1.3304x smaller ### raytrace-simple ### Min: 0.017904 -> 0.024924: 1.3921x slower Avg: 0.018745 -> 0.025592: 1.3652x slower Significant (t=-34.649821, a=0.95) Stddev: 0.00120 -> 0.00071: 1.6899x smaller ### richards ### Min: 0.001859 -> 0.001879: 1.0108x slower Avg: 0.002026 -> 0.002007: 1.0096x faster Not significant Stddev: 0.00043 -> 0.00043: 1.0134x larger ### rietveld ### Min: 0.042592 -> 0.045415: 1.0663x slower Avg: 0.112874 -> 0.115784: 1.0258x slower Not significant Stddev: 0.09324 -> 0.08764: 1.0640x smaller ### scimark_fft ### 0.136233 -> -1.000000: -1 ### scimark_lu ### 0.305460 -> -1.000000: -1 ### scimark_montecarlo ### 0.141590 -> -1.000000: -1 ### scimark_sor ### 0.225836 -> -1.000000: -1 ### scimark_sparsematmult ### 0.149314 -> -1.000000: -1 ### slowspitfire ### Min: 0.153071 -> 0.152174: 1.0059x faster Avg: 0.163220 -> 0.166867: 1.0223x slower Significant (t=-2.677061, a=0.95) Stddev: 0.00532 -> 0.00803: 1.5087x larger ### spambayes ### Min: 0.028488 -> 0.027426: 1.0387x faster Avg: 0.052744 -> 0.049873: 1.0576x faster Not significant Stddev: 0.01536 -> 0.01530: 1.0045x smaller ### spectral-norm ### Min: 0.009338 -> 0.009289: 1.0053x faster Avg: 0.010012 -> 0.009947: 1.0064x faster Not significant Stddev: 0.00286 -> 0.00286: 1.0013x larger ### spitfire ### Min: 1.150000 -> 1.150000: no change Avg: 1.187400 -> 1.190200: 1.0024x slower Not significant Stddev: 0.03337 -> 0.03204: 1.0415x smaller ### spitfire_cstringio ### Min: 0.540000 -> 0.520000: 1.0385x faster Avg: 0.559400 -> 0.544800: 1.0268x faster Significant (t=2.810196, a=0.95) Stddev: 0.02583 -> 0.02613: 1.0117x larger ### sympy_expand ### Min: 0.224202 -> 0.214263: 1.0464x faster Avg: 0.315703 -> 0.322553: 1.0217x slower Not significant Stddev: 0.19840 -> 0.20339: 1.0252x larger ### sympy_integrate ### Min: 0.835360 -> 0.855213: 1.0238x slower Avg: 1.265200 -> 1.301990: 1.0291x slower Not significant Stddev: 0.69199 -> 0.68222: 1.0143x smaller ### sympy_str ### Min: 0.158076 -> 0.166441: 1.0529x slower Avg: 0.306286 -> 0.315436: 1.0299x slower Not significant Stddev: 0.20398 -> 0.21630: 1.0604x larger ### sympy_sum ### Min: 0.199477 -> 0.211708: 1.0613x slower Avg: 0.292236 -> 0.296414: 1.0143x slower Not significant Stddev: 0.12355 -> 0.12324: 1.0025x smaller ### telco ### Min: 0.008000 -> 0.008000: no change Avg: 0.013441 -> 0.013441: no change Not significant Stddev: 0.00931 -> 0.00899: 1.0357x smaller ### trans2_annotate ### Raw results: [297.3] None ### trans2_rtype ### Raw results: [605.0] None ### trans2_backendopt ### Raw results: [83.1] None ### trans2_database ### Raw results: [112.9] None ### trans2_source ### Raw results: [116.6] None ### twisted_iteration ### Min: 0.002048 -> 0.002089: 1.0203x slower Avg: 0.002075 -> 0.002115: 1.0194x slower Significant (t=-12.246309, a=0.95) Stddev: 0.00002 -> 0.00001: 1.1937x smaller ### twisted_names ### Min: 0.000725 -> 0.000708: 1.0247x faster Avg: 0.000735 -> 0.000718: 1.0237x faster Significant (t=13.454979, a=0.95) Stddev: 0.00001 -> 0.00001: 1.1215x larger ### twisted_pb ### Min: 0.003328 -> 0.003357: 1.0086x slower Avg: 0.003669 -> 0.003640: 1.0080x faster Not significant Stddev: 0.00019 -> 0.00016: 1.1619x smaller ### twisted_tcp ### Min: 0.081791 -> 0.083739: 1.0238x slower Avg: 0.083348 -> 0.087015: 1.0440x slower Significant (t=-9.432635, a=0.95) Stddev: 0.00162 -> 0.00222: 1.3698x larger -------------- next part -------------- ### ai ### Min: 0.023917 -> 0.024226: 1.0129x slower Avg: 0.024064 -> 0.024386: 1.0134x slower Significant (t=-5.215923, a=0.95) Stddev: 0.00032 -> 0.00029: 1.0885x smaller ### bm_chameleon ### Min: 0.007497 -> 0.007378: 1.0161x faster Avg: 0.010038 -> 0.009876: 1.0164x faster Not significant Stddev: 0.00415 -> 0.00415: 1.0009x larger ### bm_dulwich_log ### Min: 0.105978 -> 0.109689: 1.0350x slower Avg: 0.144271 -> 0.147769: 1.0242x slower Not significant Stddev: 0.11168 -> 0.11125: 1.0039x smaller ### bm_mako ### Min: 0.008896 -> 0.008730: 1.0190x faster Avg: 0.011094 -> 0.010903: 1.0175x faster Not significant Stddev: 0.00372 -> 0.00365: 1.0193x smaller ### chaos ### Min: 0.003693 -> 0.003763: 1.0190x slower Avg: 0.005095 -> 0.005169: 1.0146x slower Not significant Stddev: 0.00900 -> 0.00910: 1.0113x larger ### sphinx ### Raw results: [78.4002809525] [77.5027208328] ### crypto_pyaes ### Min: 0.025048 -> 0.024925: 1.0049x faster Avg: 0.027692 -> 0.027490: 1.0073x faster Not significant Stddev: 0.01008 -> 0.01002: 1.0055x smaller ### deltablue ### Min: 0.000682 -> 0.000665: 1.0255x faster Avg: 0.006535 -> 0.006583: 1.0073x slower Not significant Stddev: 0.00902 -> 0.00912: 1.0106x larger ### django ### Min: 0.016845 -> 0.015642: 1.0769x faster Avg: 0.018176 -> 0.016943: 1.0728x faster Significant (t=2.813002, a=0.95) Stddev: 0.00225 -> 0.00214: 1.0519x smaller ### eparse ### Min: 0.186282 -> 0.188626: 1.0126x slower Avg: 0.227064 -> 0.228694: 1.0072x slower Not significant Stddev: 0.04855 -> 0.04853: 1.0004x smaller ### fannkuch ### Min: 0.092230 -> 0.093653: 1.0154x slower Avg: 0.093131 -> 0.094608: 1.0159x slower Not significant Stddev: 0.00437 -> 0.00439: 1.0028x larger ### float ### Min: 0.015930 -> 0.015764: 1.0105x faster Avg: 0.023582 -> 0.023495: 1.0037x faster Not significant Stddev: 0.00617 -> 0.00618: 1.0022x larger ### genshi_text ### Min: 0.007442 -> 0.007406: 1.0049x faster Avg: 0.009456 -> 0.009357: 1.0106x faster Not significant Stddev: 0.00878 -> 0.00890: 1.0131x larger ### genshi_xml ### Min: 0.022542 -> 0.023450: 1.0403x slower Avg: 0.025766 -> 0.026843: 1.0418x slower Not significant Stddev: 0.01495 -> 0.01482: 1.0089x smaller ### go ### Min: 0.056117 -> 0.049464: 1.1345x faster Avg: 0.080922 -> 0.068242: 1.1858x faster Significant (t=2.939745, a=0.95) Stddev: 0.02612 -> 0.01576: 1.6576x smaller ### hexiom2 ### Min: 5.785369 -> 5.858442: 1.0126x slower Avg: 6.014472 -> 6.099259: 1.0141x slower Not significant Stddev: 0.23143 -> 0.24326: 1.0511x larger ### html5lib ### Min: 1.237429 -> 1.285687: 1.0390x slower Avg: 1.653288 -> 1.719167: 1.0398x slower Not significant Stddev: 0.53921 -> 0.54721: 1.0148x larger ### json_bench ### Min: 0.402597 -> 0.399106: 1.0087x faster Avg: 0.408401 -> 0.403247: 1.0128x faster Significant (t=2.262900, a=0.95) Stddev: 0.01175 -> 0.01102: 1.0665x smaller ### meteor-contest ### Min: 0.059306 -> 0.059382: 1.0013x slower Avg: 0.060412 -> 0.060674: 1.0043x slower Not significant Stddev: 0.00317 -> 0.00324: 1.0238x larger ### nbody_modified ### Min: 0.018762 -> 0.018587: 1.0094x faster Avg: 0.019270 -> 0.019129: 1.0074x faster Not significant Stddev: 0.00200 -> 0.00203: 1.0125x larger ### pidigits ### Min: 5.394168 -> 5.105118: 1.0566x faster Avg: 5.415438 -> 5.125833: 1.0565x faster Significant (t=11.121185, a=0.95) Stddev: 0.04079 -> 0.04155: 1.0186x larger ### pyflate-fast ### Min: 0.187713 -> 0.187266: 1.0024x faster Avg: 0.189677 -> 0.189938: 1.0014x slower Not significant Stddev: 0.00297 -> 0.00309: 1.0401x larger ### raytrace-simple ### Min: 0.018063 -> 0.018088: 1.0014x slower Avg: 0.018894 -> 0.018949: 1.0029x slower Not significant Stddev: 0.00125 -> 0.00115: 1.0834x smaller ### richards ### Min: 0.001877 -> 0.001877: no change Avg: 0.002013 -> 0.002014: 1.0007x slower Not significant Stddev: 0.00046 -> 0.00045: 1.0394x smaller ### rietveld ### Min: 0.042445 -> 0.042944: 1.0118x slower Avg: 0.114897 -> 0.114120: 1.0068x faster Not significant Stddev: 0.09465 -> 0.09446: 1.0021x smaller ### scimark_fft ### 0.133715 -> -1.000000: -1 ### scimark_lu ### 0.300824 -> -1.000000: -1 ### scimark_montecarlo ### 0.140648 -> -1.000000: -1 ### scimark_sor ### 0.225912 -> -1.000000: -1 ### scimark_sparsematmult ### 0.149313 -> -1.000000: -1 ### slowspitfire ### Min: 0.151268 -> 0.151540: 1.0018x slower Avg: 0.161153 -> 0.160559: 1.0037x faster Not significant Stddev: 0.00626 -> 0.00589: 1.0629x smaller ### spambayes ### Min: 0.028285 -> 0.027390: 1.0327x faster Avg: 0.052711 -> 0.048411: 1.0888x faster Not significant Stddev: 0.01551 -> 0.01610: 1.0378x larger ### spectral-norm ### Min: 0.009358 -> 0.009379: 1.0022x slower Avg: 0.010043 -> 0.010044: 1.0001x slower Not significant Stddev: 0.00287 -> 0.00288: 1.0033x larger ### spitfire ### Min: 1.140000 -> 1.170000: 1.0263x slower Avg: 1.182600 -> 1.204400: 1.0184x slower Significant (t=-3.422626, a=0.95) Stddev: 0.03256 -> 0.03111: 1.0467x smaller ### spitfire_cstringio ### Min: 0.400000 -> 0.490000: 1.2250x slower Avg: 0.421600 -> 0.514600: 1.2206x slower Significant (t=-19.295518, a=0.95) Stddev: 0.02333 -> 0.02484: 1.0648x larger ### sympy_expand ### Min: 0.213237 -> 0.214561: 1.0062x slower Avg: 0.324671 -> 0.326383: 1.0053x slower Not significant Stddev: 0.20330 -> 0.20373: 1.0021x larger ### sympy_integrate ### Min: 0.861252 -> 0.771418: 1.1165x faster Avg: 1.295300 -> 1.231274: 1.0520x faster Not significant Stddev: 0.70704 -> 0.70340: 1.0052x smaller ### sympy_str ### Min: 0.157927 -> 0.160802: 1.0182x slower Avg: 0.305096 -> 0.305706: 1.0020x slower Not significant Stddev: 0.20053 -> 0.20028: 1.0012x smaller ### sympy_sum ### Min: 0.206980 -> 0.211585: 1.0222x slower Avg: 0.293323 -> 0.297996: 1.0159x slower Not significant Stddev: 0.12204 -> 0.12069: 1.0112x smaller ### telco ### Min: 0.008000 -> 0.008000: no change Avg: 0.013761 -> 0.014081: 1.0233x slower Not significant Stddev: 0.00897 -> 0.00901: 1.0044x larger ### trans2_annotate ### Raw results: [285.6] None ### trans2_rtype ### Raw results: [632.3] None ### trans2_backendopt ### Raw results: [82.2] None ### trans2_database ### Raw results: [114.2] None ### trans2_source ### Raw results: [121.6] None ### twisted_iteration ### Min: 0.002044 -> 0.002047: 1.0013x slower Avg: 0.002063 -> 0.002064: 1.0006x slower Not significant Stddev: 0.00001 -> 0.00001: 1.1070x larger ### twisted_names ### Min: 0.000731 -> 0.000719: 1.0171x faster Avg: 0.000741 -> 0.000731: 1.0136x faster Significant (t=7.793531, a=0.95) Stddev: 0.00001 -> 0.00001: 1.0432x smaller ### twisted_pb ### Min: 0.003384 -> 0.003489: 1.0309x slower Avg: 0.003645 -> 0.003819: 1.0477x slower Significant (t=-5.022045, a=0.95) Stddev: 0.00015 -> 0.00020: 1.3186x larger ### twisted_tcp ### Min: 0.085805 -> 0.082353: 1.0419x faster Avg: 0.087671 -> 0.083905: 1.0449x faster Significant (t=11.604523, a=0.95) Stddev: 0.00163 -> 0.00162: 1.0033x smaller -------------- next part -------------- ### ai ### Min: 0.023829 -> 0.023632: 1.0083x faster Avg: 0.024003 -> 0.023850: 1.0064x faster Significant (t=2.400409, a=0.95) Stddev: 0.00031 -> 0.00033: 1.0729x larger ### bm_chameleon ### Min: 0.007460 -> 0.007549: 1.0119x slower Avg: 0.009940 -> 0.010032: 1.0093x slower Not significant Stddev: 0.00414 -> 0.00417: 1.0057x larger ### bm_dulwich_log ### Min: 0.108187 -> 0.105768: 1.0229x faster Avg: 0.146484 -> 0.144371: 1.0146x faster Not significant Stddev: 0.11170 -> 0.11066: 1.0094x smaller ### bm_mako ### Min: 0.008846 -> 0.008903: 1.0064x slower Avg: 0.010970 -> 0.011081: 1.0101x slower Not significant Stddev: 0.00357 -> 0.00355: 1.0058x smaller ### chaos ### Min: 0.003697 -> 0.003737: 1.0109x slower Avg: 0.005094 -> 0.005151: 1.0111x slower Not significant Stddev: 0.00888 -> 0.00911: 1.0260x larger ### sphinx ### Raw results: [78.9604752064] [74.3792219162] ### crypto_pyaes ### Min: 0.024770 -> 0.024693: 1.0031x faster Avg: 0.027501 -> 0.027404: 1.0035x faster Not significant Stddev: 0.01004 -> 0.01013: 1.0089x larger ### deltablue ### Min: 0.000669 -> 0.000681: 1.0178x slower Avg: 0.006649 -> 0.006617: 1.0049x faster Not significant Stddev: 0.00922 -> 0.00917: 1.0050x smaller ### django ### Min: 0.015834 -> 0.015616: 1.0140x faster Avg: 0.017296 -> 0.017001: 1.0173x faster Not significant Stddev: 0.00230 -> 0.00224: 1.0289x smaller ### eparse ### Min: 0.181033 -> 0.176031: 1.0284x faster Avg: 0.224494 -> 0.223065: 1.0064x faster Not significant Stddev: 0.04944 -> 0.05042: 1.0197x larger ### fannkuch ### Min: 0.092375 -> 0.093036: 1.0072x slower Avg: 0.093316 -> 0.093966: 1.0070x slower Not significant Stddev: 0.00436 -> 0.00446: 1.0234x larger ### float ### Min: 0.015118 -> 0.015959: 1.0556x slower Avg: 0.022721 -> 0.023674: 1.0419x slower Not significant Stddev: 0.00612 -> 0.00622: 1.0162x larger ### genshi_text ### Min: 0.007559 -> 0.007537: 1.0029x faster Avg: 0.009615 -> 0.009507: 1.0114x faster Not significant Stddev: 0.00890 -> 0.00887: 1.0035x smaller ### genshi_xml ### Min: 0.023032 -> 0.023132: 1.0043x slower Avg: 0.026363 -> 0.026608: 1.0093x slower Not significant Stddev: 0.01486 -> 0.01503: 1.0115x larger ### go ### Min: 0.054099 -> 0.047412: 1.1410x faster Avg: 0.083018 -> 0.073164: 1.1347x faster Not significant Stddev: 0.02759 -> 0.02562: 1.0772x smaller ### hexiom2 ### Min: 5.791835 -> 5.827246: 1.0061x slower Avg: 6.022151 -> 6.058549: 1.0060x slower Not significant Stddev: 0.23265 -> 0.23365: 1.0043x larger ### html5lib ### Min: 1.223819 -> 1.200750: 1.0192x faster Avg: 1.638997 -> 1.682524: 1.0266x slower Not significant Stddev: 0.53759 -> 0.57438: 1.0684x larger ### json_bench ### Min: 0.404723 -> 0.404303: 1.0010x faster Avg: 0.409006 -> 0.408657: 1.0009x faster Not significant Stddev: 0.01094 -> 0.01136: 1.0388x larger ### meteor-contest ### Min: 0.059269 -> 0.058557: 1.0122x faster Avg: 0.060447 -> 0.059726: 1.0121x faster Not significant Stddev: 0.00318 -> 0.00324: 1.0200x larger ### nbody_modified ### Min: 0.018903 -> 0.018290: 1.0335x faster Avg: 0.019407 -> 0.018800: 1.0323x faster Not significant Stddev: 0.00200 -> 0.00203: 1.0171x larger ### pidigits ### Min: 5.362328 -> 5.104557: 1.0505x faster Avg: 5.384472 -> 5.125910: 1.0504x faster Significant (t=9.835787, a=0.95) Stddev: 0.04288 -> 0.04021: 1.0663x smaller ### pyflate-fast ### Min: 0.187195 -> 0.187777: 1.0031x slower Avg: 0.189842 -> 0.190292: 1.0024x slower Not significant Stddev: 0.00232 -> 0.00282: 1.2171x larger ### raytrace-simple ### Min: 0.018084 -> 0.018125: 1.0023x slower Avg: 0.019183 -> 0.018926: 1.0136x faster Not significant Stddev: 0.00203 -> 0.00114: 1.7810x smaller ### richards ### Min: 0.001875 -> 0.001858: 1.0091x faster Avg: 0.002055 -> 0.001983: 1.0366x faster Not significant Stddev: 0.00042 -> 0.00044: 1.0505x larger ### rietveld ### Min: 0.043779 -> 0.042596: 1.0278x faster Avg: 0.116975 -> 0.113526: 1.0304x faster Not significant Stddev: 0.09618 -> 0.09382: 1.0252x smaller ### scimark_fft ### 0.133822 -> -1.000000: -1 ### scimark_lu ### 0.302480 -> -1.000000: -1 ### scimark_montecarlo ### 0.141413 -> -1.000000: -1 ### scimark_sor ### 0.226131 -> -1.000000: -1 ### scimark_sparsematmult ### 0.147616 -> -1.000000: -1 ### slowspitfire ### Min: 0.151880 -> 0.151957: 1.0005x slower Avg: 0.162099 -> 0.162365: 1.0016x slower Not significant Stddev: 0.00540 -> 0.00552: 1.0231x larger ### spambayes ### Min: 0.027543 -> 0.027913: 1.0134x slower Avg: 0.046907 -> 0.047533: 1.0134x slower Not significant Stddev: 0.01568 -> 0.01523: 1.0298x smaller ### spectral-norm ### Min: 0.009342 -> 0.009278: 1.0069x faster Avg: 0.010035 -> 0.009935: 1.0100x faster Not significant Stddev: 0.00284 -> 0.00288: 1.0136x larger ### spitfire ### Min: 1.140000 -> 1.120000: 1.0179x faster Avg: 1.174400 -> 1.161200: 1.0114x faster Significant (t=2.175956, a=0.95) Stddev: 0.03011 -> 0.03055: 1.0145x larger ### spitfire_cstringio ### Min: 0.420000 -> 0.390000: 1.0769x faster Avg: 0.438600 -> 0.412600: 1.0630x faster Significant (t=5.469437, a=0.95) Stddev: 0.02339 -> 0.02414: 1.0323x larger ### sympy_expand ### Min: 0.212689 -> 0.214927: 1.0105x slower Avg: 0.324190 -> 0.322393: 1.0056x faster Not significant Stddev: 0.20323 -> 0.19896: 1.0215x smaller ### sympy_integrate ### Min: 0.827287 -> 0.740015: 1.1179x faster Avg: 1.276936 -> 1.232548: 1.0360x faster Not significant Stddev: 0.74238 -> 0.73077: 1.0159x smaller ### sympy_str ### Min: 0.158311 -> 0.157475: 1.0053x faster Avg: 0.303297 -> 0.307362: 1.0134x slower Not significant Stddev: 0.20213 -> 0.20610: 1.0197x larger ### sympy_sum ### Min: 0.199854 -> 0.231293: 1.1573x slower Avg: 0.290365 -> 0.307076: 1.0575x slower Not significant Stddev: 0.12262 -> 0.11703: 1.0477x smaller ### telco ### Min: 0.008000 -> 0.008000: no change Avg: 0.013761 -> 0.013681: 1.0058x faster Not significant Stddev: 0.00890 -> 0.00897: 1.0079x larger ### trans2_annotate ### Raw results: [298.1] None ### trans2_rtype ### Raw results: [626.3] None ### trans2_backendopt ### Raw results: [82.3] None ### trans2_database ### Raw results: [113.8] None ### trans2_source ### Raw results: [118.6] None ### twisted_iteration ### Min: 0.002048 -> 0.002035: 1.0064x faster Avg: 0.002062 -> 0.002048: 1.0068x faster Significant (t=7.056602, a=0.95) Stddev: 0.00001 -> 0.00001: 1.0306x smaller ### twisted_names ### Min: 0.000738 -> 0.000702: 1.0515x faster Avg: 0.000748 -> 0.000710: 1.0528x faster Significant (t=32.935112, a=0.95) Stddev: 0.00001 -> 0.00001: 1.0987x smaller ### twisted_pb ### Min: 0.003419 -> 0.003405: 1.0041x faster Avg: 0.003723 -> 0.003762: 1.0105x slower Not significant Stddev: 0.00015 -> 0.00019: 1.2125x larger ### twisted_tcp ### Min: 0.083069 -> 0.084500: 1.0172x slower Avg: 0.084602 -> 0.085989: 1.0164x slower Significant (t=-4.166700, a=0.95) Stddev: 0.00164 -> 0.00169: 1.0357x larger From astamatto at gmail.com Fri Nov 14 20:20:56 2014 From: astamatto at gmail.com (Alessandro Stamatto) Date: Fri, 14 Nov 2014 17:20:56 -0200 Subject: [pypy-dev] Does Numpypy works on Py3k? Message-ID: Hi, I'm very happy with the new Py3k branch of PyPy. Does Numpypy works on it? (If not, will it work in the future?) I tried installing it (on windows) with the recommended way (pip install git+ https://bitbucket.org/pypy/numpy.git) but it failed with a lot of errors. On the same vein, will Py3k have support for the other PyPy projects, like STM? Kudos for the awesome work on PyPy, it's a fantastic project! Thanks in advance, Alessandro Stamatto. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pjenvey at underboss.org Fri Nov 14 21:08:02 2014 From: pjenvey at underboss.org (Philip Jenvey) Date: Fri, 14 Nov 2014 12:08:02 -0800 Subject: [pypy-dev] Does Numpypy works on Py3k? In-Reply-To: References: Message-ID: On Nov 14, 2014, at 11:20 AM, Alessandro Stamatto wrote: > Hi, > > I'm very happy with the new Py3k branch of PyPy. > > Does Numpypy works on it? (If not, will it work in the future?) > > I tried installing it (on windows) > with the recommended way (pip install git+https://bitbucket.org/pypy/numpy.git) but it failed with a lot of errors. It?s currently disabled, that is, the internal PyPy support module _numpypy (also called micronumpy). It will definitely work in the future. What?s required is re-enabling the tests (removing py3k_skip()s) and getting them passing. I?m guessing what?s needed there is mostly straightforward work of adapting various things to the python 3 world ? e.g. killing references to the long type and other outdated constructs, fixing unicode vs bytes things, py3 syntax, adapting tests, etc. Some of these changes have already been made but the module has fallen to the way side while we focused on core py3k compat. work. Any help would be appreciated, especially if you?re familiar with numpy and py3k, I suspect it won't take too much effort to get it going. PyPy core devs will be happy to assist, come by the #pypy IRC channel =] > > On the same vein, will Py3k have support for the other PyPy projects, like STM? > > Kudos for the awesome work on PyPy, it's a fantastic project! Definitely: http://pypy.readthedocs.org/en/latest/stm.html#python-3 -- Philip Jenvey From stuaxo2 at yahoo.com Sat Nov 15 00:14:47 2014 From: stuaxo2 at yahoo.com (Stuart Axon) Date: Fri, 14 Nov 2014 15:14:47 -0800 Subject: [pypy-dev] trying to install aubio Message-ID: <1416006887.43541.YahooMailNeo@web122102.mail.ne1.yahoo.com> Hi, I've been trying to install aubio .. + it looks like it gets stuck because of numpy, and possibly python c bindings .. any idea whats going on ? S++ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuaxo2 at yahoo.com Sat Nov 15 00:15:43 2014 From: stuaxo2 at yahoo.com (Stuart Axon) Date: Fri, 14 Nov 2014 15:15:43 -0800 Subject: [pypy-dev] trying to install aubio In-Reply-To: <1416006887.43541.YahooMailNeo@web122102.mail.ne1.yahoo.com> References: <1416006887.43541.YahooMailNeo@web122102.mail.ne1.yahoo.com> Message-ID: <1416006943.49824.YahooMailNeo@web122103.mail.ne1.yahoo.com> Sorry didn't mean to send that :) Here is a link to the output when I try to install it piem/aubio S++ On Friday, November 14, 2014 11:14 PM, Stuart Axon wrote: > > >Hi, > I've been trying to install aubio .. + it looks like it gets stuck because of numpy, and possibly python c bindings .. any idea whats going on ? > > > > > >S++ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Sat Nov 15 19:04:37 2014 From: matti.picus at gmail.com (Matti Picus) Date: Sat, 15 Nov 2014 20:04:37 +0200 Subject: [pypy-dev] trying to install aubio In-Reply-To: <1416006943.49824.YahooMailNeo@web122103.mail.ne1.yahoo.com> References: <1416006887.43541.YahooMailNeo@web122102.mail.ne1.yahoo.com> <1416006943.49824.YahooMailNeo@web122103.mail.ne1.yahoo.com> Message-ID: <546795B5.90804@gmail.com> Hi. Thanks for trying out PyPy. It appears you did not install pypy's version of numpy into the pypy you are using. Directions for that are here http://pypy.org/download.html#installing-numpy. Note however that the package you are trying to install will not compile, and even if it did it would be quite slow, as stated in our FAQ here http://pypy.readthedocs.org/en/latest/faq.html#do-cpython-extension-modules-work-with-pypy which perhaps should be rewritten as "just use cffi instead". For what it is worth, once numpy is installed to pypy, building aubio gives me this error: fatal error: numpy/ufuncobject.h: No such file or directory which may improve (but will never be as fast as cffi) in a future release of pypy as we are working on a ufuncapi branch. Contributions to speed this work along are welcome. Matti It is not clear from your report On 15/11/14 01:15, Stuart Axon wrote: > Sorry didn't mean to send that :) > > Here is a link to the output when I try to install it > piem/aubio > S++ > > > On Friday, November 14, 2014 11:14 PM, Stuart Axon > wrote: > > > > Hi, > I've been trying to install aubio .. + it looks like it gets > stuck because of numpy, and possibly python c bindings .. any idea > whats going on ? > > > S++ > > > > > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > https://mail.python.org/mailman/listinfo/pypy-dev From astamatto at gmail.com Sun Nov 16 02:21:14 2014 From: astamatto at gmail.com (Alessandro Stamatto) Date: Sat, 15 Nov 2014 23:21:14 -0200 Subject: [pypy-dev] Does Numpypy works on Py3k? In-Reply-To: References:

Message-ID: > It will definitely work in the future. That's great to know! > Any help would be appreciated, especially if you?re familiar with numpy and py3k My current knowledge is too low right now, but as soon as I improve it enough I'll try to help - PyPy is one of my favorite projects! -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Mon Nov 17 19:17:24 2014 From: matti.picus at gmail.com (Matti Picus) Date: Mon, 17 Nov 2014 20:17:24 +0200 Subject: [pypy-dev] ufuncapi progress In-Reply-To: <20141117173120.GF7478@fice> References: <1416006887.43541.YahooMailNeo@web122102.mail.ne1.yahoo.com> <1416006943.49824.YahooMailNeo@web122103.mail.ne1.yahoo.com> <546795B5.90804@gmail.com> <20141117173120.GF7478@fice> Message-ID: <546A3BB4.5050606@gmail.com> On 17/11/14 19:31, Wouter van Heijst wrote: > On Sat, Nov 15, 2014 at 20:04:37 +0200, Matti Picus wrote: >> which may improve (but will never be as fast as cffi) in a future >> release of pypy as we are working on a ufuncapi branch. >> Contributions to speed this work along are welcome. > What can be done to help? What I know about ufuncs comes from reading > numpy's internals.code-explanations.rst and ufuncs.rst yesterday. > > On irc you've mentioned that the next step is to set up iterators to > deal with ufunc signatures, like (m,n)->(m,m),(m),(m,n) . > > What are the correct references to make more sense of that? > > Wouter Sorry for the non-telegraphic reply, kind of needed to formulate my thoughts anyway. The work-in-progress, for what it is worth, is in ufunc.py on the ufuncapi branch, in the over-long call() method of W_UfuncGeneric I am using test_frompyfunc_2d_sig() in test_ufunc which should pass once it all works. My plan is to complete pypy's extensions of frompyfunc() [0] so that it looks more like the capi cpython numpy function PyUFunc_FromFuncAndDataAndSignature(). Then modules like linalg or aubio can be written in python, where they need to call external ufunc functions that iterate over numpy ndarrays, they can easily call them via cffi. It should also support some of numpy's ufunc capi via cpyext, which will always be slow and poorly maintained. The basic machinery to handle signatures parsing and iterating has been "implemented", but I'm not very happy with it. I simply translated _parse_signature() from c to python, without really trying to pythonize the resulting structures. Then when I came to use the parsed results, I discovered that the mechanism I wanted to use, W_NDIter, is not really useful in rpython's interpreter space (as opposed to python application space), so I just left the whole mess to ferment. Now I think it has sat long enough that I can plow it into the ground and use it as compost (my way of saying throw it out and redo) Help in the form of suggestions, criticism, or code are all welcomed, as well as continual encouragement to just finish it. Matti [0] adding cpython-numpy incompatible keywords dtypes, signature, identity, and stack_inputs as documented in the docstring of the frompyfunc function on the ufuncapi branch From wouter.pypy at richtlijn.be Mon Nov 17 18:31:20 2014 From: wouter.pypy at richtlijn.be (Wouter van Heijst) Date: Mon, 17 Nov 2014 19:31:20 +0200 Subject: [pypy-dev] ufuncapi progress In-Reply-To: <546795B5.90804@gmail.com> References: <1416006887.43541.YahooMailNeo@web122102.mail.ne1.yahoo.com> <1416006943.49824.YahooMailNeo@web122103.mail.ne1.yahoo.com> <546795B5.90804@gmail.com> Message-ID: <20141117173120.GF7478@fice> On Sat, Nov 15, 2014 at 20:04:37 +0200, Matti Picus wrote: > which may improve (but will never be as fast as cffi) in a future > release of pypy as we are working on a ufuncapi branch. > Contributions to speed this work along are welcome. What can be done to help? What I know about ufuncs comes from reading numpy's internals.code-explanations.rst and ufuncs.rst yesterday. On irc you've mentioned that the next step is to set up iterators to deal with ufunc signatures, like (m,n)->(m,m),(m),(m,n) . What are the correct references to make more sense of that? Wouter From hrc706 at gmail.com Tue Nov 18 02:46:03 2014 From: hrc706 at gmail.com (=?gb2312?B?u8bI9LO+?=) Date: Tue, 18 Nov 2014 10:46:03 +0900 Subject: [pypy-dev] An idea about automatic parallelization in PyPy/RPython Message-ID: <69532391-C8A8-431F-A855-D1050C9C6C36@gmail.com> Hi everyone, I?m a master student in Japan and I want to do some research in PyPy/RPython. I have read some papers about PyPy and I also had some ideas about it. I have communicated with Mr. Bloz and been advised to send my question here. Actually, I wonder if it is possible to make an automatic parallelization for the trace generated by JIT, that is, check if the hot loop is a parallel loop, if so, then try to run the trace parallel in multi-core CPU or GPU, make it faster. I think it maybe suitable because: 1. The traced-base JIT is targeting on loops, which is straight to parallel computation. 2. There is no control-flow in trace, which is suitable to the fragment program in GPU. 3. We may use the hint of @elidable in interpreter codes, since the elidable functions are nonsensitive in the execution ordering so can be executed parallel. What do you think about it? Best Regards, Huang Ruochen From shelly-greer at desocom.com Tue Nov 18 22:18:48 2014 From: shelly-greer at desocom.com (Shelly Greer) Date: Tue, 18 Nov 2014 15:18:48 -0600 Subject: [pypy-dev] How to get your iPhone 6 Message-ID: <0.0.0.247.1D003753E2C4560.23C9AD@mail.desocom.com> Get a brand new iPhone 6*! Enter your ZIP for availability here http://www.desocom.com/Chance+to+Get+an++iPhone+6/ed6V86hN2uf2CkakadNOXFkaVuji0jmtbb8/comparison This advertisement was sent to you by a third party. If you are not interested in receiving future RewardZoneUsa advertisement, please go here http://www.desocom.com/enterprises/See+where+you+can+find+your+iPhone+6/d8d8S6l2qf0ODkakadNOXFkaVuji0jmtdf2 *Participation required. See offer for details Alternatively, you can opt out by sending a letter to: 128 Court Street, 3rd FL White Plains, NY 10601 -------------- next part -------------- An HTML attachment was scrubbed... URL: From haael at interia.pl Wed Nov 19 17:31:53 2014 From: haael at interia.pl (haael at interia.pl) Date: Wed, 19 Nov 2014 17:31:53 +0100 Subject: [pypy-dev] An idea about automatic parallelization in PyPy/RPython In-Reply-To: <69532391-C8A8-431F-A855-D1050C9C6C36@gmail.com> References: <69532391-C8A8-431F-A855-D1050C9C6C36@gmail.com> Message-ID: Hi I find it brilliant. Thanks haael Od: "???" Do: pypy-dev at python.org; Wys?ane: 2:46 Wtorek 2014-11-18 Temat: [pypy-dev] An idea about automatic parallelization in PyPy/RPython > Hi everyone, > > I?m a master student in Japan and I want to do some research in PyPy/RPython. > I have read some papers about PyPy and I also had some ideas about it. I have communicated with Mr. Bloz and been advised to send my question here. > > Actually, I wonder if it is possible to make an automatic parallelization for the trace generated by JIT, that is, check if the hot loop is a parallel loop, if so, then try to run the trace parallel in multi-core CPU or GPU, make it faster. > I think it maybe suitable because: > 1. The traced-base JIT is targeting on loops, which is straight to parallel computation. > 2. There is no control-flow in trace, which is suitable to the fragment program in GPU. > 3. We may use the hint of @elidable in interpreter codes, since the elidable functions are nonsensitive in the execution ordering so can be executed parallel. > > What do you think about it? > > Best Regards, > Huang Ruochen > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > https://mail.python.org/mailman/listinfo/pypy-dev > From luciano at ramalho.org Wed Nov 19 21:06:35 2014 From: luciano at ramalho.org (Luciano Ramalho) Date: Wed, 19 Nov 2014 18:06:35 -0200 Subject: [pypy-dev] Was dict subclass discrepancy "fixed" (issue 708)? Message-ID: Hello, I am writing a book about Python 3 [0] and while researching the caveats of subclassing built-in types I discovered the page "Differences between PyPy and CPython" [1] and issue #708 "Discrepancy in dict subclass __getitem__ calls between CPython 2.7 and PyPy 1.5" [2]. [0] http://shop.oreilly.com/product/0636920032519.do [1] http://pypy.readthedocs.org/en/latest/cpython_differences.html#subclasses-of-built-in-types [2] https://bitbucket.org/pypy/pypy/issue/708/discrepancy-in-dict-subclass-__getitem__ However, when testing with pypy3-2.4.0 and pypy-2.4.0 my results were the same as with CPython, and not as documented in [1]. So was issue 708 "fixed" and now PyPy misbehaves in the same way as CPython? Thanks! Best, Luciano -- Luciano Ramalho Twitter: @ramalhoorg Professor em: http://python.pro.br Twitter: @pythonprobr From pjenvey at underboss.org Wed Nov 19 23:52:25 2014 From: pjenvey at underboss.org (Philip Jenvey) Date: Wed, 19 Nov 2014 14:52:25 -0800 Subject: [pypy-dev] Was dict subclass discrepancy "fixed" (issue 708)? In-Reply-To: References: Message-ID: On Nov 19, 2014, at 12:06 PM, Luciano Ramalho wrote: > Hello, > > I am writing a book about Python 3 [0] and while researching the > caveats of subclassing built-in types I discovered the page > "Differences between PyPy and CPython" [1] and issue #708 "Discrepancy > in dict subclass __getitem__ calls between CPython 2.7 and PyPy 1.5" > [2]. > > [0] http://shop.oreilly.com/product/0636920032519.do > [1] http://pypy.readthedocs.org/en/latest/cpython_differences.html#subclasses-of-built-in-types > [2] https://bitbucket.org/pypy/pypy/issue/708/discrepancy-in-dict-subclass-__getitem__ > > However, when testing with pypy3-2.4.0 and pypy-2.4.0 my results were > the same as with CPython, and not as documented in [1]. > > So was issue 708 "fixed" and now PyPy misbehaves in the same way as CPython? I'm still getting the expected discrepancy between the two: Exception raised on both pypy and pypy3, whereas __getitem__ isn?t called on CPython 2.7.5. You might want to double check your python binaries? -- Philip Jenvey From arigo at tunes.org Thu Nov 20 10:54:53 2014 From: arigo at tunes.org (Armin Rigo) Date: Thu, 20 Nov 2014 10:54:53 +0100 Subject: [pypy-dev] Was dict subclass discrepancy "fixed" (issue 708)? In-Reply-To: References:

Message-ID: Hi, On 19 November 2014 23:52, Philip Jenvey wrote: > I'm still getting the expected discrepancy between the two: Exception raised on both pypy and pypy3, whereas __getitem__ isn?t called on CPython 2.7.5. You might want to double check your python binaries? Indeed, issue 708 isn't "fixed". However, our documentation is out-of-date: the (different) example given at http://pypy.readthedocs.org/en/latest/cpython_differences.html#subclasses-of-built-in-types now works the same was as CPython. For reference, this example is: class D(dict): def __getitem__(self, key): return 42 d1 = {} d2 = D(a='foo') d1.update(d2) print d1['a'] I'm going to find another simple example to update the docs with... A bient?t, Armin. From fijall at gmail.com Thu Nov 20 16:05:13 2014 From: fijall at gmail.com (Maciej Fijalkowski) Date: Thu, 20 Nov 2014 17:05:13 +0200 Subject: [pypy-dev] An idea about automatic parallelization in PyPy/RPython In-Reply-To: <69532391-C8A8-431F-A855-D1050C9C6C36@gmail.com> References: <69532391-C8A8-431F-A855-D1050C9C6C36@gmail.com> Message-ID: Hi ??? This is generally a hard problem that projects like GCC or LLVM didn't get very far. The problem is slightly more advanced with PyPys JIT, but not much more. However, the problem is you can do it for simple loops, but the applications are limited outside of pure numerics (e.g. numpy) and also doing SSE stuff in such cases first seems like both a good starting point and a small enough project for master thesis. Cheers, fijal On Tue, Nov 18, 2014 at 3:46 AM, ??? wrote: > Hi everyone, > > I?m a master student in Japan and I want to do some research in PyPy/RPython. > I have read some papers about PyPy and I also had some ideas about it. I have communicated with Mr. Bloz and been advised to send my question here. > > Actually, I wonder if it is possible to make an automatic parallelization for the trace generated by JIT, that is, check if the hot loop is a parallel loop, if so, then try to run the trace parallel in multi-core CPU or GPU, make it faster. > I think it maybe suitable because: > 1. The traced-base JIT is targeting on loops, which is straight to parallel computation. > 2. There is no control-flow in trace, which is suitable to the fragment program in GPU. > 3. We may use the hint of @elidable in interpreter codes, since the elidable functions are nonsensitive in the execution ordering so can be executed parallel. > > What do you think about it? > > Best Regards, > Huang Ruochen > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > https://mail.python.org/mailman/listinfo/pypy-dev From hrc706 at gmail.com Fri Nov 21 02:17:19 2014 From: hrc706 at gmail.com (=?gb2312?B?u8bI9LO+?=) Date: Fri, 21 Nov 2014 10:17:19 +0900 Subject: [pypy-dev] An idea about automatic parallelization in PyPy/RPython In-Reply-To: References: <69532391-C8A8-431F-A855-D1050C9C6C36@gmail.com> Message-ID: <3C4C3D55-5754-4F1B-B40C-2AF81941E8A4@gmail.com> Hi Fijaklowski, Thank you very much for your reply. Yes, you are right, it?s too hard for me to implement automatic parallelization for the whole PyPy?s trace JIT. I think maybe I can firstly do some work with a very simple interpreter (for example the example-interpreter introduced by PyPy documentation), and try to change some behaviors of RPython JIT. By the way, could you tell me how can I get the traces and handle them before compiled to native code? I just want to try to convert some of the traces to OpenCL kernel codes and run them in other devices like GPU. Best Regards, Huang Ruochen > ? 2014?11?21????12:05?Maciej Fijalkowski ??? > > Hi ??? > > This is generally a hard problem that projects like GCC or LLVM didn't > get very far. The problem is slightly more advanced with PyPys JIT, > but not much more. > > However, the problem is you can do it for simple loops, but the > applications are limited outside of pure numerics (e.g. numpy) and > also doing SSE stuff in such cases first seems like both a good > starting point and a small enough project for master thesis. > > Cheers, > fijal > > On Tue, Nov 18, 2014 at 3:46 AM, ??? wrote: >> Hi everyone, >> >> I?m a master student in Japan and I want to do some research in PyPy/RPython. >> I have read some papers about PyPy and I also had some ideas about it. I have communicated with Mr. Bloz and been advised to send my question here. >> >> Actually, I wonder if it is possible to make an automatic parallelization for the trace generated by JIT, that is, check if the hot loop is a parallel loop, if so, then try to run the trace parallel in multi-core CPU or GPU, make it faster. >> I think it maybe suitable because: >> 1. The traced-base JIT is targeting on loops, which is straight to parallel computation. >> 2. There is no control-flow in trace, which is suitable to the fragment program in GPU. >> 3. We may use the hint of @elidable in interpreter codes, since the elidable functions are nonsensitive in the execution ordering so can be executed parallel. >> >> What do you think about it? >> >> Best Regards, >> Huang Ruochen >> _______________________________________________ >> pypy-dev mailing list >> pypy-dev at python.org >> https://mail.python.org/mailman/listinfo/pypy-dev From fijall at gmail.com Fri Nov 21 07:36:54 2014 From: fijall at gmail.com (Maciej Fijalkowski) Date: Fri, 21 Nov 2014 08:36:54 +0200 Subject: [pypy-dev] An idea about automatic parallelization in PyPy/RPython In-Reply-To: <3C4C3D55-5754-4F1B-B40C-2AF81941E8A4@gmail.com> References: <69532391-C8A8-431F-A855-D1050C9C6C36@gmail.com> <3C4C3D55-5754-4F1B-B40C-2AF81941E8A4@gmail.com> Message-ID: You get traces by running PYPYLOG=jit-log-opt,jit-backend: pypy .... There is a tool call jitviewer for viewing those traces. OpenCL is likely just written in C and the kernel itself does not contain any Python. On Fri, Nov 21, 2014 at 3:17 AM, ??? wrote: > Hi Fijaklowski, > > Thank you very much for your reply. > > Yes, you are right, it?s too hard for me to implement automatic parallelization for the whole PyPy?s trace JIT. I think maybe I can firstly do some work with a very simple interpreter (for example the example-interpreter introduced by PyPy documentation), and try to change some behaviors of RPython JIT. > > By the way, could you tell me how can I get the traces and handle them before compiled to native code? I just want to try to convert some of the traces to OpenCL kernel codes and run them in other devices like GPU. > > Best Regards, > Huang Ruochen > >> ? 2014?11?21????12:05?Maciej Fijalkowski ??? >> >> Hi ??? >> >> This is generally a hard problem that projects like GCC or LLVM didn't >> get very far. The problem is slightly more advanced with PyPys JIT, >> but not much more. >> >> However, the problem is you can do it for simple loops, but the >> applications are limited outside of pure numerics (e.g. numpy) and >> also doing SSE stuff in such cases first seems like both a good >> starting point and a small enough project for master thesis. >> >> Cheers, >> fijal >> >> On Tue, Nov 18, 2014 at 3:46 AM, ??? wrote: >>> Hi everyone, >>> >>> I?m a master student in Japan and I want to do some research in PyPy/RPython. >>> I have read some papers about PyPy and I also had some ideas about it. I have communicated with Mr. Bloz and been advised to send my question here. >>> >>> Actually, I wonder if it is possible to make an automatic parallelization for the trace generated by JIT, that is, check if the hot loop is a parallel loop, if so, then try to run the trace parallel in multi-core CPU or GPU, make it faster. >>> I think it maybe suitable because: >>> 1. The traced-base JIT is targeting on loops, which is straight to parallel computation. >>> 2. There is no control-flow in trace, which is suitable to the fragment program in GPU. >>> 3. We may use the hint of @elidable in interpreter codes, since the elidable functions are nonsensitive in the execution ordering so can be executed parallel. >>> >>> What do you think about it? >>> >>> Best Regards, >>> Huang Ruochen >>> _______________________________________________ >>> pypy-dev mailing list >>> pypy-dev at python.org >>> https://mail.python.org/mailman/listinfo/pypy-dev > From hrc706 at gmail.com Fri Nov 21 07:49:05 2014 From: hrc706 at gmail.com (=?gb2312?B?u8bI9LO+?=) Date: Fri, 21 Nov 2014 15:49:05 +0900 Subject: [pypy-dev] An idea about automatic parallelization in PyPy/RPython In-Reply-To: References: <69532391-C8A8-431F-A855-D1050C9C6C36@gmail.com> <3C4C3D55-5754-4F1B-B40C-2AF81941E8A4@gmail.com> Message-ID: Yes, I actually knew this way to get traces. Well, what I mean is that, I want to handle those traces in RUNTIME. I want to insert some code in RPython?s JIT to detect some traces which can be executed parallel, if so, then COMPILE them into OpenCL code (then into native code, and run), if not, compile them to normal native code as what RPython do now. So maybe it?s important to firstly prevent the normal compilation and analyze the trace. I?m sorry for my bad English and bad expression. > ? 2014?11?21????3:36?Maciej Fijalkowski ??? > > You get traces by running PYPYLOG=jit-log-opt,jit-backend: pypy .... > > There is a tool call jitviewer for viewing those traces. OpenCL is > likely just written in C and the kernel itself does not contain any > Python. > > On Fri, Nov 21, 2014 at 3:17 AM, ??? wrote: >> Hi Fijaklowski, >> >> Thank you very much for your reply. >> >> Yes, you are right, it?s too hard for me to implement automatic parallelization for the whole PyPy?s trace JIT. I think maybe I can firstly do some work with a very simple interpreter (for example the example-interpreter introduced by PyPy documentation), and try to change some behaviors of RPython JIT. >> >> By the way, could you tell me how can I get the traces and handle them before compiled to native code? I just want to try to convert some of the traces to OpenCL kernel codes and run them in other devices like GPU. >> >> Best Regards, >> Huang Ruochen >> >>> ? 2014?11?21????12:05?Maciej Fijalkowski ??? >>> >>> Hi ??? >>> >>> This is generally a hard problem that projects like GCC or LLVM didn't >>> get very far. The problem is slightly more advanced with PyPys JIT, >>> but not much more. >>> >>> However, the problem is you can do it for simple loops, but the >>> applications are limited outside of pure numerics (e.g. numpy) and >>> also doing SSE stuff in such cases first seems like both a good >>> starting point and a small enough project for master thesis. >>> >>> Cheers, >>> fijal >>> >>> On Tue, Nov 18, 2014 at 3:46 AM, ??? wrote: >>>> Hi everyone, >>>> >>>> I?m a master student in Japan and I want to do some research in PyPy/RPython. >>>> I have read some papers about PyPy and I also had some ideas about it. I have communicated with Mr. Bloz and been advised to send my question here. >>>> >>>> Actually, I wonder if it is possible to make an automatic parallelization for the trace generated by JIT, that is, check if the hot loop is a parallel loop, if so, then try to run the trace parallel in multi-core CPU or GPU, make it faster. >>>> I think it maybe suitable because: >>>> 1. The traced-base JIT is targeting on loops, which is straight to parallel computation. >>>> 2. There is no control-flow in trace, which is suitable to the fragment program in GPU. >>>> 3. We may use the hint of @elidable in interpreter codes, since the elidable functions are nonsensitive in the execution ordering so can be executed parallel. >>>> >>>> What do you think about it? >>>> >>>> Best Regards, >>>> Huang Ruochen >>>> _______________________________________________ >>>> pypy-dev mailing list >>>> pypy-dev at python.org >>>> https://mail.python.org/mailman/listinfo/pypy-dev >> From haael at interia.pl Fri Nov 21 10:55:18 2014 From: haael at interia.pl (haael at interia.pl) Date: Fri, 21 Nov 2014 10:55:18 +0100 Subject: [pypy-dev] An idea about automatic parallelization in PyPy/RPython In-Reply-To: References: <69532391-C8A8-431F-A855-D1050C9C6C36@gmail.com> <3C4C3D55-5754-4F1B-B40C-2AF81941E8A4@gmail.com>

Message-ID: Hi I would suggest a different approach, more similar to Armin's idea of parallelization. You could just optimistically assume that the loop is parallelizable. Just execute few steps at once (each in its own memory sandbox) and check for conflicts later. This also plays nice with STM. So, the general solution would look like this: 1. Split the loop into individual step invocations. 2. Reserve a tiny memory block for each loop step. 3. Optional: compile the loop step into OpenCL. 4. Execute every loop step in parallel, saving the changes made by each invocation to its individual memory block. 5. Check if the changes are conflicting. 6. If not, merge them and commit to the global memory. 7. If they are, fall back to serial loop execution. I think all the building blocks are here, in particular the recent GIL removal and Armin's STM research. It sounds exciting. Cheers haael Od: "???" Do: "Maciej Fijalkowski" ; Wys?ane: 7:49 Pi?tek 2014-11-21 Temat: Re: [pypy-dev] An idea about automatic parallelization in PyPy/RPython > Yes, I actually knew this way to get traces. > > Well, what I mean is that, I want to handle those traces in RUNTIME. I want to insert some code in RPython?s JIT to detect some traces which can be executed parallel, if so, then COMPILE them into OpenCL code (then into native code, and run), if not, compile them to normal native code as what RPython do now. So maybe it?s important to firstly prevent the normal compilation and analyze the trace. > > I?m sorry for my bad English and bad expression. > > > ? 2014?11?21????3:36?Maciej Fijalkowski ??? > > > > You get traces by running PYPYLOG=jit-log-opt,jit-backend: pypy .... > > > > There is a tool call jitviewer for viewing those traces. OpenCL is > > likely just written in C and the kernel itself does not contain any > > Python. > > > > On Fri, Nov 21, 2014 at 3:17 AM, ??? wrote: > >> Hi Fijaklowski, > >> > >> Thank you very much for your reply. > >> > >> Yes, you are right, it?s too hard for me to implement automatic parallelization for the whole PyPy?s trace JIT. I think maybe I can firstly do some work with a very simple interpreter (for example the example-interpreter introduced by PyPy documentation), and try to change some behaviors of RPython JIT. > >> > >> By the way, could you tell me how can I get the traces and handle them before compiled to native code? I just want to try to convert some of the traces to OpenCL kernel codes and run them in other devices like GPU. > >> > >> Best Regards, > >> Huang Ruochen > >> > >>> ? 2014?11?21????12:05?Maciej Fijalkowski ??? > >>> > >>> Hi ??? > >>> > >>> This is generally a hard problem that projects like GCC or LLVM didn't > >>> get very far. The problem is slightly more advanced with PyPys JIT, > >>> but not much more. > >>> > >>> However, the problem is you can do it for simple loops, but the > >>> applications are limited outside of pure numerics (e.g. numpy) and > >>> also doing SSE stuff in such cases first seems like both a good > >>> starting point and a small enough project for master thesis. > >>> > >>> Cheers, > >>> fijal > >>> > >>> On Tue, Nov 18, 2014 at 3:46 AM, ??? wrote: > >>>> Hi everyone, > >>>> > >>>> I?m a master student in Japan and I want to do some research in PyPy/RPython. > >>>> I have read some papers about PyPy and I also had some ideas about it. I have communicated with Mr. Bloz and been advised to send my question here. > >>>> > >>>> Actually, I wonder if it is possible to make an automatic parallelization for the trace generated by JIT, that is, check if the hot loop is a parallel loop, if so, then try to run the trace parallel in multi-core CPU or GPU, make it faster. > >>>> I think it maybe suitable because: > >>>> 1. The traced-base JIT is targeting on loops, which is straight to parallel computation. > >>>> 2. There is no control-flow in trace, which is suitable to the fragment program in GPU. > >>>> 3. We may use the hint of @elidable in interpreter codes, since the elidable functions are nonsensitive in the execution ordering so can be executed parallel. > >>>> > >>>> What do you think about it? > >>>> > >>>> Best Regards, > >>>> Huang Ruochen > >>>> _______________________________________________ > >>>> pypy-dev mailing list > >>>> pypy-dev at python.org > >>>> https://mail.python.org/mailman/listinfo/pypy-dev > >> > > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > https://mail.python.org/mailman/listinfo/pypy-dev > From arigo at tunes.org Fri Nov 21 11:21:25 2014 From: arigo at tunes.org (Armin Rigo) Date: Fri, 21 Nov 2014 11:21:25 +0100 Subject: [pypy-dev] An idea about automatic parallelization in PyPy/RPython In-Reply-To: References: <69532391-C8A8-431F-A855-D1050C9C6C36@gmail.com> <3C4C3D55-5754-4F1B-B40C-2AF81941E8A4@gmail.com>

Message-ID: Hi Haael, hi ???, On 21 November 2014 10:55, wrote: > I would suggest a different approach, more similar to Armin's idea of parallelization. > > You could just optimistically assume that the loop is parallelizable. Just execute few steps at once (each in its own memory sandbox) and check for conflicts later. This also plays nice with STM. I thought about that too, but the granularity is very wrong for STM: the overhead of running tiny transactions will completely dwarf any potential speed gains. If we're talking about tiny transactions then maybe HTM would be more suitable. I have no idea if HTM will ever start appearing on GPU, though. Moreover, you still have the general hard problems of automatic parallelization, like communicating between threads the progress made; unless it is carefully done on a case-by-case basis by a human, this often adds (again) considerable overheads. To ???: here's a quick answer to your question. It's not very clean, but I would patch rpython/jit/backend/x86/regalloc.py, prepare_loop(), just after it calls _prepare(). It gets a list of rewritten operations ready to be turned into assembler. I guess you'd need to check at this point if the loop contains only operations you support, and if so, produce some different code (possibly GPU). Then either abort the job here by raising some exception, or if it makes sense, change the 'operations' list so that it becomes just a few assembler instructions that will start and stop the GPU code. My own two cents about this project, however, is that it's relatively easy to support a few special cases, but it quickly becomes very, very hard to support more general code. You are likely to end up with a system that only compiles to GPU some very specific templates and nothing else. The end result for a user is obscure, because he won't get to use the GPU unless he writes loops that follow exactly some very strict rules. I certainly see why the end user might prefer to use a DSL instead: i.e. he knows he wants to use the GPU at specific places, and he is ready to use a separate very restricted "language" to express what he wants to do, as long as it is guaranteed to use the GPU. (The needs in this case are very different from the general PyPy JIT, which tries to accelerate any Python code.) A bient?t, Armin. From asavey at aimdigitalpros.com Thu Nov 20 23:46:32 2014 From: asavey at aimdigitalpros.com (Adam Savey) Date: Thu, 20 Nov 2014 17:46:32 -0500 Subject: [pypy-dev] Python Users list Message-ID: <03bd01d00513$d5550220$7fff0660$@com> Hi, I was reviewing your website and thoughts you would be interested in reaching out to Python Users from USA. A few other technologies include: Java Users, J2EE Users, Linux Users, Dot net Users, Oracle RDBMS Users, Spring Users, Puppet Users. ? 100% permission based contacts. ? We guarantee 95% accuracy on data fields and 85% on email deliverability. ? Once you purchase the list you can use it for multiple times, no restrictions. ? The list can be used for Email Marketing, Direct Mail Marketing, Fax Marketing and Tele Marketing. We also have: Oracle Users, IBM Users, AWS Users, Salesforce Users, Google Apps Users, Google Analytics Users, Zoho Users, Sage Users, Sugar CRM Users, Infor Users, Epicor Users, Kronos Users, Netsuite Users, VMWare Users, IBM Cognos Users, SAP Users, SaaS Users, Middleware Users & many more.... Hit reply & send in your target criteria, I'll get back to you with counts, cost and other details for your review. If you've different target, feel free to get back to me with it. Regards, Adam Savey Demand Generation Executive We respect your privacy, if you do not wish to receive any further emails from our end, please reply with a subject ?Cancel? -------------- next part -------------- An HTML attachment was scrubbed... URL: From haael at interia.pl Wed Nov 26 12:12:49 2014 From: haael at interia.pl (haael at interia.pl) Date: Wed, 26 Nov 2014 12:12:49 +0100 Subject: [pypy-dev] An idea about automatic parallelization in PyPy/RPython In-Reply-To: References: <69532391-C8A8-431F-A855-D1050C9C6C36@gmail.com> <3C4C3D55-5754-4F1B-B40C-2AF81941E8A4@gmail.com>

Message-ID: Hi ???, Armin, > > I thought about that too, but the granularity is very wrong for STM: > > the overhead of running tiny transactions will completely dwarf any > > potential speed gains. Then maybe we should go with a slightly different version of STM, specific to assembler loops in particular. A loop in assembly language operates on a linear memory model and usually modifies only a small number of memory cells. In fact, most changes made by a loop iteration get overwritten in the next iteration. Assuming that a single assembler instruction may modify only one memory cell, the number of cells changed will be no more than the count of loop iterations. We could replace (para-virtualize) any instruction that changes a memory cell with two stack pushes: save the memory address to the stack and the actual value that is written. Any memory read will also be replaced with the stack search before falling back to reading the actual memory contents. This is a big penalty of course but it's worth checking whether it pays in the future. This is a simple, assembler-specific flavor of STM. Then we could employ loop scheduling and run the modified code on all cores. Then we could check whether all the memory modifications agree, that means whether any two cores did not try to write different value to the same memory address. If not, then we could commit the transaction and exit the loop. The kind of loops that would benefit most from such optimization would be memset, memcpy and all map-like constructs: dst = map(fun, src) ---> for(int i = 0; i < len(src); i++) dst[i] = fun(src[i]); > can we just make a hybrid-system that firstly slightly screen > some loops that is not suitable for parallelization and then run others with STM? Following the general "try and fail" philosophy of Python, I would suggest the following: Just run the unmodified loop on one core and use the other cores to optimize/execute the modified version. If the optimization turns out unsuitable or the serial execution ends first, just abort the optimized run. If the loop turns out to be parralelizable, return the results instead. Thanks haael Od: "???" Do: "Armin Rigo" ; haael at interia.pl; Wys?ane: 9:07 ?roda 2014-11-26 Temat: Re: [pypy-dev] An idea about automatic parallelization in PyPy/RPython > Hi Haael, Rigo, > > 2014/11/21 19:21?Armin Rigo ????? > > > > Hi Haael, hi ???, > > > > On 21 November 2014 10:55, wrote: > >> I would suggest a different approach, more similar to Armin's idea of parallelization. > >> > >> You could just optimistically assume that the loop is parallelizable. Just execute few steps at once (each in its own memory sandbox) and check for conflicts later. This also plays nice with STM. > > > > I thought about that too, but the granularity is very wrong for STM: > > the overhead of running tiny transactions will completely dwarf any > > potential speed gains. If we're talking about tiny transactions then > > maybe HTM would be more suitable. I have no idea if HTM will ever > > start appearing on GPU, though. Moreover, you still have the general > > hard problems of automatic parallelization, like communicating between > > threads the progress made; unless it is carefully done on a > > case-by-case basis by a human, this often adds (again) considerable > > overheads. > > Well, recently I have read some papers about TLS, and also realized the heavy > performance penalty of STM. What am I considering is that, is it possible to > simplify a STM for the trace generated by RPython using some features of it (for > example there is no control flow but only guard; there are some jit.elidable functions > in the interpreter), or, can we just make a hybrid-system that firstly slightly screen > some loops that is not suitable for parallelization and then run others with STM? > > > > > To ???: here's a quick answer to your question. It's not very clean, > > but I would patch rpython/jit/backend/x86/regalloc.py, prepare_loop(), > > just after it calls _prepare(). It gets a list of rewritten > > operations ready to be turned into assembler. I guess you'd need to > > check at this point if the loop contains only operations you support, > > and if so, produce some different code (possibly GPU). Then either > > abort the job here by raising some exception, or if it makes sense, > > change the 'operations' list so that it becomes just a few assembler > > instructions that will start and stop the GPU code. > > > > My own two cents about this project, however, is that it's relatively > > easy to support a few special cases, but it quickly becomes very, very > > hard to support more general code. You are likely to end up with a > > system that only compiles to GPU some very specific templates and > > nothing else. The end result for a user is obscure, because he won't > > get to use the GPU unless he writes loops that follow exactly some > > very strict rules. I certainly see why the end user might prefer to > > use a DSL instead: i.e. he knows he wants to use the GPU at specific > > places, and he is ready to use a separate very restricted "language" > > to express what he wants to do, as long as it is guaranteed to use the > > GPU. (The needs in this case are very different from the general PyPy > > JIT, which tries to accelerate any Python code.) > > > > > > A bient?t, > > > > Armin. > > From luciano at ramalho.org Thu Nov 27 12:53:00 2014 From: luciano at ramalho.org (Luciano Ramalho) Date: Thu, 27 Nov 2014 09:53:00 -0200 Subject: [pypy-dev] Was dict subclass discrepancy "fixed" (issue 708)? In-Reply-To: References:

Message-ID: Thanks Armin and Philip for responding and thanks Arming fixing the example in the docs. Best, Luciano On Thu, Nov 20, 2014 at 7:54 AM, Armin Rigo wrote: > Hi, > > On 19 November 2014 23:52, Philip Jenvey wrote: >> I'm still getting the expected discrepancy between the two: Exception raised on both pypy and pypy3, whereas __getitem__ isn?t called on CPython 2.7.5. You might want to double check your python binaries? > > Indeed, issue 708 isn't "fixed". However, our documentation is > out-of-date: the (different) example given at > http://pypy.readthedocs.org/en/latest/cpython_differences.html#subclasses-of-built-in-types > now works the same was as CPython. For reference, this example is: > > class D(dict): > def __getitem__(self, key): > return 42 > d1 = {} > d2 = D(a='foo') > d1.update(d2) > print d1['a'] > > I'm going to find another simple example to update the docs with... > > > A bient?t, > > Armin. -- Luciano Ramalho Twitter: @ramalhoorg Professor em: http://python.pro.br Twitter: @pythonprobr From mail at tsmithe.net Fri Nov 28 20:13:35 2014 From: mail at tsmithe.net (Toby St Clere Smithe) Date: Fri, 28 Nov 2014 20:13:35 +0100 Subject: [pypy-dev] GSoC 2015: cpyext project? Message-ID: <87mw7bnnog.fsf@tsmithe.net> Hi all, I've posted a couple of times on here before: I maintain a Python extension for GPGPU linear algebra[1], but it uses boost.python. I do most of my scientific computing in Python, but often am forced to use CPython where I would prefer to use PyPy, largely because of the availability of extensions. I'm looking for an interesting Google Summer of Code project for next year, and would like to continue working on things that help make high-performance computing in Python straight-forward. In particular, I've had my eye on the 'optimising cpyext'[2] project for a while: might work in that area be available? I notice that it is described with difficulty 'hard', and so I'm keen to enquire early so that I can get up to speed before making a potential application in the spring. I would love to work on getting cpyext into a good enough shape that both Cython and Boost.Python extensions are functional with minimal effort on behalf of the user. Does anyone have any advice? Are there particular things I should familiarise myself with? I know there is the module/cpyext tree, but it is quite formidable for someone uninitiated! Of course, I recognise that cpyext is a much trickier proposition in comparison with things like cffi and cppyy. In particular, I'm very excited by cppyy and PyCling, but they seem quite bound up in CERN's ROOT infrastructure, which is a shame. But it's also clear that very many useful extensions currently use the CPython API, and so -- as I have often found -- the apparent relative immaturity of cpyext keeps people away from PyPy, which is also a shame! [1] https://pypi.python.org/pypi/pyviennacl [2] https://bitbucket.org/pypy/pypy/wiki/GSOC%202014 Best, Toby -- Toby St Clere Smithe http://tsmithe.net