From zariko.taba at gmail.com Fri Jul 1 14:41:56 2011 From: zariko.taba at gmail.com (Zariko Taba) Date: Fri, 1 Jul 2011 14:41:56 +0200 Subject: [pypy-dev] Floating point computation Message-ID: Hi pypy list, I'd like to prototype an interpreter in rpython. But this interpreter must feature floating point support. Something like the context in the decimal module of python : (decimal is basically a floating point unit in python) import decimal # do an operation decimal.Decimal("6e4565465") * decimal.Decimal("6e67654") # Verify overflow decimal.getcontext().flags[decimal.Overflow] # leads 1 => overflow ! My interpreter must provide access to the "flags" field (OverFlow, Inexact, DivisionByZero...) Currently I have a prototype in c and I directly use the intrinsics of X86 (sse2) to retrieve theses flags : // compute a sqrt using intrinsics __builtin_ia32_sqrtss(...); // get status register of sse2 fpu int mxcsr = __builtin_ia32_stmxcsr(); // extract overflow bit : overflow = (mxcsr & 0x8); // extract ... If I want to implement this in rpython, I see 2 solutions : 1. Porting decimal to rpython : seems to be a lot of work and I wonder about the performances. 2. Wrapping floating in a c library and use ctype : code clarity and jit integration will be bad. Do you see another possibility ? Any advices on this ? Thanks ! Zariko. -------------- next part -------------- An HTML attachment was scrubbed... URL: From arigo at tunes.org Fri Jul 1 14:54:25 2011 From: arigo at tunes.org (Armin Rigo) Date: Fri, 1 Jul 2011 14:54:25 +0200 Subject: [pypy-dev] Floating point computation In-Reply-To: References: Message-ID: Hi, On Fri, Jul 1, 2011 at 2:41 PM, Zariko Taba wrote: > // compute a sqrt using intrinsics > __builtin_ia32_sqrtss(...); > // get status register of sse2 fpu > int mxcsr = __builtin_ia32_stmxcsr(); > // extract overflow bit : > overflow = (mxcsr & 0x8); > // extract ... There is a third solution: do exactly the same kind of code, but in RPython instead of in C. This means using the rffi module. For example: builtin_ia32_sqrtss = rffi.llexternal("__builtin_ia32_sqrtss", [argtypes...], restype, _nowrapper=True, _callable=emulate_ia32_sqrtss) def f(..): ... builtin_ia32_sqrtss(..) ... with emulate_ia32_sqrtss() being a pure Python function that is going to be called only during tests. After translation to C, it will just turn into the C code that you posted above. For more examples, grep for "llexternal" in pypy/rlib/. A bient?t, Armin. From fijall at gmail.com Fri Jul 1 14:57:38 2011 From: fijall at gmail.com (Maciej Fijalkowski) Date: Fri, 1 Jul 2011 14:57:38 +0200 Subject: [pypy-dev] Floating point computation In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 2:54 PM, Armin Rigo wrote: > Hi, > > On Fri, Jul 1, 2011 at 2:41 PM, Zariko Taba wrote: >> // compute a sqrt using intrinsics >> __builtin_ia32_sqrtss(...); >> // get status register of sse2 fpu >> int mxcsr = __builtin_ia32_stmxcsr(); >> // extract overflow bit : >> overflow = (mxcsr & 0x8); >> // extract ... > > There is a third solution: do exactly the same kind of code, but in > RPython instead of in C. ?This means using the rffi module. ?For > example: > > builtin_ia32_sqrtss = rffi.llexternal("__builtin_ia32_sqrtss", [argtypes...], > ? ?restype, _nowrapper=True, _callable=emulate_ia32_sqrtss) > > def f(..): > ? ?... > ? ?builtin_ia32_sqrtss(..) > ? ?... > > with emulate_ia32_sqrtss() being a pure Python function that is going > to be called only during tests. ?After translation to C, it will just > turn into the C code that you posted above. ?For more examples, grep > for "llexternal" in pypy/rlib/. But it won't become an assembler instruction in the JIT, it'll still be a call. > > > A bient?t, > > Armin. > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > http://mail.python.org/mailman/listinfo/pypy-dev > From arigo at tunes.org Fri Jul 1 15:24:44 2011 From: arigo at tunes.org (Armin Rigo) Date: Fri, 1 Jul 2011 15:24:44 +0200 Subject: [pypy-dev] Floating point computation In-Reply-To: References:

Message-ID: Hi, On Fri, Jul 1, 2011 at 2:57 PM, Maciej Fijalkowski wrote: > But it won't become an assembler instruction in the JIT, it'll still be a call. Indeed. But fixing this is not too hard. Grep for "math_sqrt" and "MATH_SQRT" in pypy/jit/*/*.py for an example of how we turn sqrt_nonneg(x) into a single assembler instruction. The function sqrt_nonneg() is defined in pypy/rpython/lltypesystem/module/ll_math.py, and is (indirectly) used when we call math.sqrt() in RPython. Of course if the idea introduced by math_sqrt is going to be used for a lot of other functions, like the ones you'll need to introduce, then we can also think of a general solution that would do mostly everything for you with just a keyword argument to llexternal(). A bient?t, Armin. From zariko.taba at gmail.com Fri Jul 1 17:08:32 2011 From: zariko.taba at gmail.com (Zariko Taba) Date: Fri, 1 Jul 2011 17:08:32 +0200 Subject: [pypy-dev] Floating point computation In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 2:54 PM, Armin Rigo wrote: > > There is a third solution: do exactly the same kind of code, but in > RPython instead of in C. This means using the rffi module. For > example: > > builtin_ia32_sqrtss = rffi.llexternal("__builtin_ia32_sqrtss", > [argtypes...], > restype, _nowrapper=True, _callable=emulate_ia32_sqrtss) > Awesome ! I will give it a try. :) Thanks ! Zariko. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wlavrijsen at lbl.gov Fri Jul 1 18:40:57 2011 From: wlavrijsen at lbl.gov (wlavrijsen at lbl.gov) Date: Fri, 1 Jul 2011 09:40:57 -0700 (PDT) Subject: [pypy-dev] stuck with translation error Message-ID: Hi, ran into this error, which I haven't seen before and which I can't pin down: [translation:ERROR] File "/home/wlav/pypydev/pypy/pypy/rpython/rtyper.py", line 254, in specialize_more_blocks [translation:ERROR] self.specialize_block(block) [translation:ERROR] File "/home/wlav/pypydev/pypy/pypy/rpython/rtyper.py", line 406, in specialize_block [translation:ERROR] self.translate_hl_to_ll(hop, varmapping) [translation:ERROR] File "/home/wlav/pypydev/pypy/pypy/rpython/rtyper.py", line 535, in translate_hl_to_ll [translation:ERROR] resultvar = hop.dispatch() [translation:ERROR] File "/home/wlav/pypydev/pypy/pypy/rpython/rtyper.py", line 768, in dispatch [translation:ERROR] return translate_meth(self) [translation:ERROR] File "<487-codegen /home/wlav/pypydev/pypy/pypy/rpython/rtyper.py:610>", line 4, in translate_op_simple_call [translation:ERROR] return r_arg1.rtype_simple_call(hop) [translation:ERROR] File "/home/wlav/pypydev/pypy/pypy/rpython/rpbc.py", line 723, in rtype_simple_call [translation:ERROR] return self.redispatch_call(hop, call_args=False) [translation:ERROR] File "/home/wlav/pypydev/pypy/pypy/rpython/rpbc.py", line 772, in redispatch_call [translation:ERROR] assert hop.nb_args == 1, ("arguments passed to __init__, " [translation:ERROR] AssertionError: arguments passed to __init__, but no __init__! I don't find anything that could be related to local code, until I go up to specialize_block. That block has: (Pdb+) block.operations [v1 = call_args((function free), ((1, ('flavor', 'track_... False)), v0, ('raw'), (False)), v2 = simple_call((type error), ('out of resources'))] which I presume is lltype.free() of which I have a few calls, but no recent changes (and I don't understand the complaint about __init__ at that point). I'm going to roll back history to see where things broke, but if the above rings a bell with someone, that could spare me some time ... Thanks! Cheers, Wim -- WLavrijsen at lbl.gov -- +1 (510) 486 6411 -- www.lavrijsen.net From amauryfa at gmail.com Fri Jul 1 19:43:30 2011 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Fri, 1 Jul 2011 19:43:30 +0200 Subject: [pypy-dev] stuck with translation error In-Reply-To: References: Message-ID: Hi, 2011/7/1 : > [translation:ERROR] ? ?File > "/home/wlav/pypydev/pypy/pypy/rpython/rtyper.py", line 254, in specialize_more_blocks > [translation:ERROR] ? ? self.specialize_block(block) > [translation:ERROR] ? ?File "/home/wlav/pypydev/pypy/pypy/rpython/rtyper.py", line 406, in specialize_block > [translation:ERROR] ? ? self.translate_hl_to_ll(hop, varmapping) > [translation:ERROR] ? ?File "/home/wlav/pypydev/pypy/pypy/rpython/rtyper.py", line 535, in translate_hl_to_ll > [translation:ERROR] ? ? resultvar = hop.dispatch() > [translation:ERROR] ? ?File "/home/wlav/pypydev/pypy/pypy/rpython/rtyper.py", line 768, in dispatch > [translation:ERROR] ? ? return translate_meth(self) > [translation:ERROR] ? ?File "<487-codegen /home/wlav/pypydev/pypy/pypy/rpython/rtyper.py:610>", line 4, in translate_op_simple_call > [translation:ERROR] ? ? return r_arg1.rtype_simple_call(hop) > [translation:ERROR] ? ?File "/home/wlav/pypydev/pypy/pypy/rpython/rpbc.py", line 723, in rtype_simple_call > [translation:ERROR] ? ? return self.redispatch_call(hop, call_args=False) > [translation:ERROR] ? ?File "/home/wlav/pypydev/pypy/pypy/rpython/rpbc.py", line 772, in redispatch_call > [translation:ERROR] ? ? assert hop.nb_args == 1, ("arguments passed to __init__, " > [translation:ERROR] ?AssertionError: arguments passed to __init__, but no __init__! > > I don't find anything that could be related to local code, until I go up > to specialize_block. That block has: > > (Pdb+) block.operations > [v1 = call_args((function free), ((1, ('flavor', 'track_... False)), v0, ('raw'), (False)), v2 = simple_call((type error), ('out of resources'))] > > which I presume is lltype.free() of which I have a few calls, but no recent > changes (and I don't understand the complaint about __init__ at that point). > > I'm going to roll back history to see where things broke, but if the above > rings a bell with someone, that could spare me some time ... Thanks! Hm, it can also be the next operation: v2 = simple_call((type error), ('out of resources')) What is this "error" object? Some local exception class? -- Amaury Forgeot d'Arc From wlavrijsen at lbl.gov Fri Jul 1 20:04:19 2011 From: wlavrijsen at lbl.gov (wlavrijsen at lbl.gov) Date: Fri, 1 Jul 2011 11:04:19 -0700 (PDT) Subject: [pypy-dev] stuck with translation error In-Reply-To: References: Message-ID: Hi Amaury, >> [v1 = call_args((function free), ((1, ('flavor', 'track_... False)), v0, ('raw'), (False)), v2 = simple_call((type error), ('out of resources'))] > > Hm, it can also be the next operation: v2 = simple_call((type error), > ('out of resources')) I was under the impression that that was a standard error, but then again that would have been for malloc(), not free(). I searched again and came up with is code in ll_thread.py: def allocate_ll_lock(): # track_allocation=False here; be careful to lltype.free() it. The # reason it is set to False is that we get it from all app-level # lock objects, as well as from the GIL, which exists at shutdown. ll_lock = lltype.malloc(TLOCKP.TO, flavor='raw', track_allocation=False) res = c_thread_lock_init(ll_lock) if rffi.cast(lltype.Signed, res) <= 0: lltype.free(ll_lock, flavor='raw', track_allocation=False) raise error("out of resources") return ll_lock which looks very much like it (the link to the exit is a raise). So I think that it isn't my code that doesn't translate. :) Beats me then, why it would bite me. > What is this "error" object? Some local exception class? I can look again in a half an hour: I had just restarted the translation after a removing some of my recent changes. But I'll bet it's what I see here above in allocate_ll_lock. The module ll_thread.py defines: class error(Exception): pass Thanks, Wim -- WLavrijsen at lbl.gov -- +1 (510) 486 6411 -- www.lavrijsen.net From anto.cuni at gmail.com Sat Jul 2 11:59:40 2011 From: anto.cuni at gmail.com (Antonio Cuni) Date: Sat, 02 Jul 2011 11:59:40 +0200 Subject: [pypy-dev] bounties for pypy In-Reply-To: References: <201106281751.p5SHpixC014361@theraft.openend.se> <201106290753.p5T7rMKh002901@theraft.openend.se> Message-ID: <4E0EEC0C.5010809@gmail.com> Hello Ian, On 29/06/11 15:16, Ian Ozsvald wrote: > Ian. > ps. I posted the v0.1 PDF of my High Performance Python tutorial this > morning (it is based on my EuroPython training session). It has a > section on PyPy and I'd happily accept input if that section should be > expanded:http://ianozsvald.com/2011/06/29/high-performance-python-tutorial-v0-1-from-my-4-hour-tutorial-at-europython-2011/ I read your tutorial and tried some of the code, nice work. I tried to use the "bettermath" approach also on the pure python code, and run it on PyPy: on my machine, it takes 2.4 seconds instead of 6.2. For comparison, cython with bettermath takes 0.57 seconds, i.e. it's about 4 times faster than PyPy (with a trunk version of PyPy, did not try with pypy 1.5). For this kind of code, there is no fundamental reason why PyPy should be slower than cython, so I'll investigate a bit to see what is the problem. But still, I think that you should mention this "bettermath" implementation which is much faster on PyPy. ciao, Anto From ian at ianozsvald.com Sun Jul 3 12:03:14 2011 From: ian at ianozsvald.com (Ian Ozsvald) Date: Sun, 3 Jul 2011 11:03:14 +0100 Subject: [pypy-dev] bounties for pypy In-Reply-To: <4E0EEC0C.5010809@gmail.com> References: <201106281751.p5SHpixC014361@theraft.openend.se> <201106290753.p5T7rMKh002901@theraft.openend.se> <4E0EEC0C.5010809@gmail.com> Message-ID: Cool :-) Thanks Antonio. I'd left a note to myself in the report because I figured expanding the math to the primitive operations might help PyPy (seeing as it helped Cython and ShedSkin too). I'll redo the timings and add them to the v0.2 report (probably in two weeks time - I'm waiting for feedback from several people). I've already tried a trunk version of PyPy and it was faster than PyPy, I'll take whichever build is the most recent and replace PyPy1.5 for the next updates to the report. If there is a trick to e.g. re-ordering the operations so it runs faster in PyPy, I'd be happy to accept a modification. Hopefully there will also be a pyOpenCL version to add to the pyCUDA examples by then. Much obliged, Ian. On 2 July 2011 10:59, Antonio Cuni wrote: > Hello Ian, > > On 29/06/11 15:16, Ian Ozsvald wrote: >> >> Ian. >> ps. I posted the v0.1 PDF of my High Performance Python tutorial this >> morning (it is based on my EuroPython training session). It has a >> section on PyPy and I'd happily accept input if that section should be >> >> expanded:http://ianozsvald.com/2011/06/29/high-performance-python-tutorial-v0-1-from-my-4-hour-tutorial-at-europython-2011/ > > I read your tutorial and tried some of the code, nice work. > > I tried to use the "bettermath" approach also on the pure python code, and > run it on PyPy: on my machine, it takes 2.4 seconds instead of 6.2. > For comparison, cython with bettermath takes 0.57 seconds, i.e. it's about 4 > times faster than PyPy (with a trunk version of PyPy, did not try with pypy > 1.5). > > For this kind of code, there is no fundamental reason why PyPy should be > slower than cython, so I'll investigate a bit to see what is the problem. > ?But still, I think that you should mention this "bettermath" implementation > which is much faster on PyPy. > > ciao, > Anto > -- Ian Ozsvald (A.I. researcher, screencaster) ian at IanOzsvald.com http://IanOzsvald.com http://SocialTiesApp.com/ http://MorConsulting.com/ http://blog.AICookbook.com/ http://TheScreencastingHandbook.com http://FivePoundApp.com/ http://twitter.com/IanOzsvald From fijall at gmail.com Sun Jul 3 19:34:53 2011 From: fijall at gmail.com (Maciej Fijalkowski) Date: Sun, 3 Jul 2011 19:34:53 +0200 Subject: [pypy-dev] bounties for pypy In-Reply-To: References: <201106281751.p5SHpixC014361@theraft.openend.se> <201106290753.p5T7rMKh002901@theraft.openend.se> <4E0EEC0C.5010809@gmail.com> Message-ID: On Sun, Jul 3, 2011 at 12:03 PM, Ian Ozsvald wrote: > Cool :-) Thanks Antonio. I'd left a note to myself in the report > because I figured expanding the math to the primitive operations might > help PyPy (seeing as it helped Cython and ShedSkin too). I'll redo the > timings and add them to the v0.2 report (probably in two weeks time - > I'm waiting for feedback from several people). > > I've already tried a trunk version of PyPy and it was faster than > PyPy, I'll take whichever build is the most recent and replace PyPy1.5 > for the next updates to the report. If there is a trick to e.g. > re-ordering the operations so it runs faster in PyPy, I'd be happy to > accept a modification. > You can also use numpy arrays (or array from array module) on PyPy to get even more speedups. It's annoying because as of now pypy doesn't support much of numpy (1D float arrays only with ops please :) but it's a good start From fijall at gmail.com Sun Jul 3 19:38:32 2011 From: fijall at gmail.com (Maciej Fijalkowski) Date: Sun, 3 Jul 2011 19:38:32 +0200 Subject: [pypy-dev] bounties for pypy In-Reply-To: References: <201106281751.p5SHpixC014361@theraft.openend.se> <201106290753.p5T7rMKh002901@theraft.openend.se> <4E0EEC0C.5010809@gmail.com> Message-ID: On Sun, Jul 3, 2011 at 7:34 PM, Maciej Fijalkowski wrote: > On Sun, Jul 3, 2011 at 12:03 PM, Ian Ozsvald wrote: >> Cool :-) Thanks Antonio. I'd left a note to myself in the report >> because I figured expanding the math to the primitive operations might >> help PyPy (seeing as it helped Cython and ShedSkin too). I'll redo the >> timings and add them to the v0.2 report (probably in two weeks time - >> I'm waiting for feedback from several people). >> >> I've already tried a trunk version of PyPy and it was faster than >> PyPy, I'll take whichever build is the most recent and replace PyPy1.5 >> for the next updates to the report. If there is a trick to e.g. >> re-ordering the operations so it runs faster in PyPy, I'd be happy to >> accept a modification. >> > > You can also use numpy arrays (or array from array module) on PyPy to > get even more speedups. It's annoying because as of now pypy doesn't > support much of numpy (1D float arrays only with ops please :) but > it's a good start > "During 2011 at least it looks as though numpy integration will not happen." I personally challenge this statement :) From ian at ianozsvald.com Sun Jul 3 20:35:10 2011 From: ian at ianozsvald.com (Ian Ozsvald) Date: Sun, 3 Jul 2011 19:35:10 +0100 Subject: [pypy-dev] bounties for pypy In-Reply-To: References: <201106281751.p5SHpixC014361@theraft.openend.se> <201106290753.p5T7rMKh002901@theraft.openend.se> <4E0EEC0C.5010809@gmail.com> Message-ID: Hi Maciej. I said that based on what was discussed in Armin+Antonio's talk at EuroPython. If it takes 6+months for a numpy re-implementation then I'm suggesting that we're into 2012. I'd be *very* happy to be proved wrong! Do Python 'array' objects run faster than lists in PyPy? I believe that in CPython they run at the same speed (i.e. they're just a convenient storage system, they don't offer any of numpy's efficient math benefits). Does the micronumpy library support doubles yet? If so I'd be happy to give it a go. If 'micronumpy' is the wrong name then tell me where I should look, I'm just going on what I've remembered from the PyPy-blog discussions about numpy support. Cheers, Ian. On 3 July 2011 18:38, Maciej Fijalkowski wrote: > On Sun, Jul 3, 2011 at 7:34 PM, Maciej Fijalkowski wrote: >> On Sun, Jul 3, 2011 at 12:03 PM, Ian Ozsvald wrote: >>> Cool :-) Thanks Antonio. I'd left a note to myself in the report >>> because I figured expanding the math to the primitive operations might >>> help PyPy (seeing as it helped Cython and ShedSkin too). I'll redo the >>> timings and add them to the v0.2 report (probably in two weeks time - >>> I'm waiting for feedback from several people). >>> >>> I've already tried a trunk version of PyPy and it was faster than >>> PyPy, I'll take whichever build is the most recent and replace PyPy1.5 >>> for the next updates to the report. If there is a trick to e.g. >>> re-ordering the operations so it runs faster in PyPy, I'd be happy to >>> accept a modification. >>> >> >> You can also use numpy arrays (or array from array module) on PyPy to >> get even more speedups. It's annoying because as of now pypy doesn't >> support much of numpy (1D float arrays only with ops please :) but >> it's a good start >> > > "During 2011 at least it looks as > though numpy integration will not happen." I personally challenge this > statement :) > -- Ian Ozsvald (A.I. researcher, screencaster) ian at IanOzsvald.com http://IanOzsvald.com http://SocialTiesApp.com/ http://MorConsulting.com/ http://blog.AICookbook.com/ http://TheScreencastingHandbook.com http://FivePoundApp.com/ http://twitter.com/IanOzsvald From fijall at gmail.com Sun Jul 3 20:54:59 2011 From: fijall at gmail.com (Maciej Fijalkowski) Date: Sun, 3 Jul 2011 20:54:59 +0200 Subject: [pypy-dev] bounties for pypy In-Reply-To: References: <201106281751.p5SHpixC014361@theraft.openend.se> <201106290753.p5T7rMKh002901@theraft.openend.se> <4E0EEC0C.5010809@gmail.com> Message-ID: On Sun, Jul 3, 2011 at 8:35 PM, Ian Ozsvald wrote: > Hi Maciej. I said that based on what was discussed in Armin+Antonio's > talk at EuroPython. If it takes 6+months for a numpy re-implementation > then I'm suggesting that we're into 2012. I'd be *very* happy to be > proved wrong! I would estimate numpy for less than 6 months (numpy, not scipy and not matplotlib), but well that's me. There is also a lot of outside contribution, so it might take less wall time than manmonth time. From my perspective, if we can secure *some* funding for numpy, it should be ready sooner than 6 months from now. > > Do Python 'array' objects run faster than lists in PyPy? I believe > that in CPython they run at the same speed (i.e. they're just a > convenient storage system, they don't offer any of numpy's efficient > math benefits). The situation between PyPy and CPython is quite different here. As of now, both PyPy and CPython store wrapped objects in lists (let's call them PyIntObject) and unwrapped (C level int) in numpy arrays. In case you have only interpreter, when reading a list you do: py_x = list[index] do_something_with_py_x when reading from numpy array (or array.array), you do: py_x = new PyIntObject(array[index]) do_something_with_py_x so you use less memory (cache is better), but you allocate a new object each iteration (bad) But in case of the JIT, what happens is (list case) py_x = list[index] x = py_x.value do_something_with_x # x as in integer value not py_x array: x = list[index] do_something_with_x Hence, no allocation and memory saving - huge win! > > Does the micronumpy library support doubles yet? If so I'd be happy to > give it a go. If 'micronumpy' is the wrong name then tell me where I > should look, I'm just going on what I've remembered from the PyPy-blog > discussions about numpy support. yes, doubles is pretty much all it supports as of now :) micronumpy is a good name, but it comes under a name "numpy", so you can do "import numpy". Not much will work though :-) Note that we're also in the process of implementing faster vectorized operations, like numpyexpr, except better. The list of operations is very limited as of now though. Cheers, fijal PS. If it's unclear, I can try to explain more or pop up on IRC. From ian at ianozsvald.com Mon Jul 4 11:40:40 2011 From: ian at ianozsvald.com (Ian Ozsvald) Date: Mon, 4 Jul 2011 10:40:40 +0100 Subject: [pypy-dev] bounties for pypy In-Reply-To: References: <201106281751.p5SHpixC014361@theraft.openend.se> <201106290753.p5T7rMKh002901@theraft.openend.se> <4E0EEC0C.5010809@gmail.com> Message-ID: Re. micronumpy - all I'm doing in the Mandelbrot demo is multiplying and addition on doubles - that'll work now, right? If so I'll make the modification in a week or so when I next have some time. I'm guessing that there is no 'complex' support yet, even for basic operations? Adding micronumpy support would make for a nice addition to the v0.2 doc :-) Ian. re. the 6 month 'proper numpy' estimate - I'm just quoting some of the figures that were thrown around (with wide error margins depending on money raised, availability etc). I would *love* to see support come through sooner! On 3 July 2011 19:54, Maciej Fijalkowski wrote: > On Sun, Jul 3, 2011 at 8:35 PM, Ian Ozsvald wrote: >> Hi Maciej. I said that based on what was discussed in Armin+Antonio's >> talk at EuroPython. If it takes 6+months for a numpy re-implementation >> then I'm suggesting that we're into 2012. I'd be *very* happy to be >> proved wrong! > > I would estimate numpy for less than 6 months (numpy, not scipy and > not matplotlib), but well that's me. There is also a lot of outside > contribution, so it might take less wall time than manmonth time. From > my perspective, if we can secure *some* funding for numpy, it should > be ready sooner than 6 months from now. > >> >> Do Python 'array' objects run faster than lists in PyPy? I believe >> that in CPython they run at the same speed (i.e. they're just a >> convenient storage system, they don't offer any of numpy's efficient >> math benefits). > > The situation between PyPy and CPython is quite different here. As of > now, both PyPy and CPython store wrapped objects in lists (let's call > them PyIntObject) and unwrapped (C level int) in numpy arrays. In case > you have only interpreter, when reading a list you do: > > py_x = list[index] > do_something_with_py_x > > when reading from numpy array (or array.array), you do: > > py_x = new PyIntObject(array[index]) > do_something_with_py_x > > so you use less memory (cache is better), but you allocate a new > object each iteration (bad) > > But in case of the JIT, what happens is (list case) > > py_x = list[index] > x = py_x.value > do_something_with_x # x as in integer value not py_x > > array: > > x = list[index] > do_something_with_x > > Hence, no allocation and memory saving - huge win! > >> >> Does the micronumpy library support doubles yet? If so I'd be happy to >> give it a go. If 'micronumpy' is the wrong name then tell me where I >> should look, I'm just going on what I've remembered from the PyPy-blog >> discussions about numpy support. > > yes, doubles is pretty much all it supports as of now :) micronumpy is > a good name, but it comes under a name "numpy", so you can do "import > numpy". Not much will work though :-) > > Note that we're also in the process of implementing faster vectorized > operations, like numpyexpr, except better. The list of operations is > very limited as of now though. > > Cheers, > fijal > > PS. If it's unclear, I can try to explain more or pop up on IRC. > -- Ian Ozsvald (A.I. researcher, screencaster) ian at IanOzsvald.com http://IanOzsvald.com http://SocialTiesApp.com/ http://MorConsulting.com/ http://blog.AICookbook.com/ http://TheScreencastingHandbook.com http://FivePoundApp.com/ http://twitter.com/IanOzsvald From alex.gaynor at gmail.com Mon Jul 4 18:11:53 2011 From: alex.gaynor at gmail.com (Alex Gaynor) Date: Mon, 4 Jul 2011 09:11:53 -0700 Subject: [pypy-dev] Fwd: Speed.Python.org In-Reply-To: References: Message-ID: The following message is forwarded on behalf of Jesse Noller: Now that we have the machine, we need to start working on collecting/organizing the resources needed to get a shared codespeed system in place. After speaking with various people, we felt that overloading codespeed-dev, pypy-dev or python-dev with the discussions around this would be sub optimal. I've spun up a new mailing list here: http://mail.python.org/mailman/listinfo/speed Those who are interested in working on or contributing to the speed.python.org project can subscribe there. I personally can not lead the project, and so I will be looking to the current speed.pypy.org team, and python-dev contributors for leadership in this. I got you the hardware and hosting! :) jesse -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero -------------- next part -------------- An HTML attachment was scrubbed... URL: From romain.py at gmail.com Wed Jul 6 05:10:35 2011 From: romain.py at gmail.com (Romain Guillebert) Date: Wed, 6 Jul 2011 05:10:35 +0200 Subject: [pypy-dev] Cython backend aiming PyPy Status Message-ID: <20110706031034.GA31841@ubuntu> Hi I created a blog post summarizing what I've done the last few weeks on the Cython backend aiming PyPy. It's located at this URL : http://rguillebert.blogspot.com/2011/07/cython-backend-aiming-pypy-status.html Cheers Romain From giuott at gmail.com Wed Jul 6 10:55:04 2011 From: giuott at gmail.com (Giuseppe Ottaviano) Date: Wed, 6 Jul 2011 09:55:04 +0100 Subject: [pypy-dev] algorithm used for float -> str conversion In-Reply-To: References: <201106300926.50520.alexandre.fayolle@logilab.fr> Message-ID: > Note that cpython2.7 (and pypy 1.5) already uses a specific algorithm > to convert float to strings: a slightly customized version of David Gay's > dtoa.c: http://www.netlib.org/fp/dtoa.c > it is already faster and more accurate than many libc implementations. The article says "the Grisu family acts as the default rendering algorithms in both the V8 and Mozilla Javascript engines (replacing David Gay's 17-year-old dtoa code)" I wonder if they had specific reasons to switch that could be relevant to pypy and cpython as well. For example, besides speed, Grisu implementations look way simpler than Gay's code because the latter uses arbitrary precision integers (and it has to implement them from scratch). For this reason dtoa also needs dynamic allocations, while Grisu only does simple arithmetic on machine-word integers (if my understanding is correct, from a quick look at the paper and the source of both dtoa and grisu). From wlavrijsen at lbl.gov Thu Jul 7 00:27:07 2011 From: wlavrijsen at lbl.gov (wlavrijsen at lbl.gov) Date: Wed, 6 Jul 2011 15:27:07 -0700 (PDT) Subject: [pypy-dev] stuck with translation error In-Reply-To: References: Message-ID: Hi, >> [translation:ERROR] File "<487-codegen /home/wlav/pypydev/pypy/pypy/rpython/rtyper.py:610>", line 4, in translate_op_simple_call >> [translation:ERROR] return r_arg1.rtype_simple_call(hop) >> [translation:ERROR] File "/home/wlav/pypydev/pypy/pypy/rpython/rpbc.py", line 723, in rtype_simple_call >> [translation:ERROR] return self.redispatch_call(hop, call_args=False) >> [translation:ERROR] File "/home/wlav/pypydev/pypy/pypy/rpython/rpbc.py", line 772, in redispatch_call >> [translation:ERROR] assert hop.nb_args == 1, ("arguments passed to __init__, " >> [translation:ERROR] AssertionError: arguments passed to __init__, but no __init__! got it, and funnily enough it even had something to do with an __init__ (but that wasn't the actual cause). I had some code to raise an exception in init if an attempt was made to instantiate an abstract C++ class. In that code, I did: except Exception, e: if e.match(self.space, self.space.w_AttributeError): whereas it should have been: except OperationError, e: if e.match(self.space, self.space.w_AttributeError): the point being that the code changes an AttributeError (failure to find a suitable method, i.e. a usable C++ constructor) into a TypeError (stating that the class is abstract). I'm actually surprised that the bad code made it past the rtyper, given that class Exception does not actually have a match() method. Either way, I don't think I deserved such a horrible error message for such a little coding mistake. :} But there's a good deal of irony here in that the code that caused the horrible error message on me is there to make an error in user code more clear. Best regards, Wim -- WLavrijsen at lbl.gov -- +1 (510) 486 6411 -- www.lavrijsen.net From wlavrijsen at lbl.gov Thu Jul 7 03:24:37 2011 From: wlavrijsen at lbl.gov (wlavrijsen at lbl.gov) Date: Wed, 6 Jul 2011 18:24:37 -0700 (PDT) Subject: [pypy-dev] unable to translate with current head of default In-Reply-To: References:

<4DDCAEF2.5040100@gmail.com>

Message-ID: Hi Armin, to come back to the problem that I've had with the jit ... it's gone now with current default tip. No idea what's up, but I'm plenty happy with things now working again for me. The "heap" optimization does have a significant effect on my code, though, so it's good to have it. I broke the fast path during one of the many merges with default when I had to resolve differences (to be fixed next week?), but the current numbers are still excellent: $ ../../../translator/goal/pypy-cppyy-c-070511 bench1.py rebuilding bench1.exe ... warming up ... C++ reference uses 0.032s :::: cppyy interp cost: 0.523s ( 16x) :::: cppyy python cost: 0.997s ( 31x) :::: pycintex cost: 50.554s (1579x) So a factor of 50 improvement over the current CPython bindings, something that used to be (back in December) a factor of 10 "only." Of course, for a real comparison, I should backport cppyy to pypy as of Dec '10 to know which changes are to be attributed to what, but none of this is the final product, so for now it's simply good enough. :) Best regards, Wim -- WLavrijsen at lbl.gov -- +1 (510) 486 6411 -- www.lavrijsen.net From christian at jensenbox.com Thu Jul 7 04:14:57 2011 From: christian at jensenbox.com (Christian Jensen) Date: Wed, 6 Jul 2011 19:14:57 -0700 Subject: [pypy-dev] Postgres Message-ID: <-8985362543625951028@unknownmsgid> What is the best way to work with Postgres via Django? Is it still slow or have there been recent developments? Sent from my iPhone From benjamin at python.org Thu Jul 7 04:20:56 2011 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 6 Jul 2011 21:20:56 -0500 Subject: [pypy-dev] Postgres In-Reply-To: <-8985362543625951028@unknownmsgid> References: <-8985362543625951028@unknownmsgid> Message-ID: https://bitbucket.org/alex_gaynor/pypy-postgresql/overview 2011/7/6 Christian Jensen : > What is the best way to work with Postgres via Django? Is it still > slow or have there been recent developments? > > Sent from my iPhone > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > http://mail.python.org/mailman/listinfo/pypy-dev > -- Regards, Benjamin From christian at jensenbox.com Thu Jul 7 06:44:32 2011 From: christian at jensenbox.com (Christian Jensen) Date: Wed, 6 Jul 2011 21:44:32 -0700 Subject: [pypy-dev] Postgres In-Reply-To: References: <-8985362543625951028@unknownmsgid> Message-ID: <-7620406074375308895@unknownmsgid> Thanks. Are there any benchmarks or anything reasonable to indicate the speed in relation to cpython? Sent from my iPhone On Jul 6, 2011, at 7:20 PM, Benjamin Peterson wrote: > https://bitbucket.org/alex_gaynor/pypy-postgresql/overview > > 2011/7/6 Christian Jensen : >> What is the best way to work with Postgres via Django? Is it still >> slow or have there been recent developments? >> >> Sent from my iPhone >> _______________________________________________ >> pypy-dev mailing list >> pypy-dev at python.org >> http://mail.python.org/mailman/listinfo/pypy-dev >> > > > > -- > Regards, > Benjamin From vsapre80 at gmail.com Thu Jul 7 10:47:51 2011 From: vsapre80 at gmail.com (Vishal) Date: Thu, 7 Jul 2011 14:17:51 +0530 Subject: [pypy-dev] Floating point computation In-Reply-To: References:

Message-ID: Hello, I would like to bring to your notice this article about exposing SIMD instructions to high level languages. http://drdobbs.com/blogs/tools/230600043 The Translation and JIT backend for PyPy may be able to allow Python programmers to use SIMD instructions directly from Python. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fijall at gmail.com Thu Jul 7 10:59:26 2011 From: fijall at gmail.com (Maciej Fijalkowski) Date: Thu, 7 Jul 2011 10:59:26 +0200 Subject: [pypy-dev] Floating point computation In-Reply-To: References:

Message-ID: On Thu, Jul 7, 2011 at 10:47 AM, Vishal wrote: > Hello, > I would like to bring to your notice this article about exposing SIMD > instructions to high level languages. > http://drdobbs.com/blogs/tools/230600043 > The Translation and JIT backend for PyPy may be able to allow Python > programmers to use SIMD instructions directly from Python. We kind of want to do that automatically for numpy operations. > > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > http://mail.python.org/mailman/listinfo/pypy-dev > > From neppord at gmail.com Thu Jul 7 19:53:36 2011 From: neppord at gmail.com (Samuel Ytterbrink) Date: Thu, 7 Jul 2011 19:53:36 +0200 Subject: [pypy-dev] Argparser in RPython Message-ID: Hi very one! I'm doing some planing on writing a option/arg parser in RPython to add to the rlib. How ever I'm verry new to RPython. I would appreciate all input on how i should approach the task a head. Does people think that there is a BIG benefit to implement a dropin replacement for either the argparser from stdlib or the optparser from stdlib. Or should i look at the pypy app-level implementation and try to write something that is( simpler) and follow its criteria. At the moment i don't have nice work enviroment( waiting for a new computer) so i cant realy try out the nice translatorshell and all it's test code. But when i get it I'll start hacking of. In the meantime ill do research and some planing. I'm hoping to use the rlib parsing tool just for the benefit of one more project using it and by that have more code for people to look at for exampples to the parsing lib. Is this overkill? -- //Samuel (Neppord) Ytterbrink -------------- next part -------------- An HTML attachment was scrubbed... URL: From neppord at gmail.com Thu Jul 7 20:19:19 2011 From: neppord at gmail.com (Samuel Ytterbrink) Date: Thu, 7 Jul 2011 20:19:19 +0200 Subject: [pypy-dev] Argparser in RPython In-Reply-To: References: Message-ID: Please dissregard the rlib parsing idea, I was not thinking straight. Den 7 jul 2011 19.53 skrev "Samuel Ytterbrink" : > Hi very one! > > I'm doing some planing on writing a option/arg parser in RPython to add to > the rlib. How ever I'm verry new to RPython. I would appreciate all input on > how i should approach the task a head. > > Does people think that there is a BIG benefit to implement a dropin > replacement for either the argparser from stdlib or the optparser from > stdlib. > > Or should i look at the pypy app-level implementation and try to write > something that is( simpler) and follow its criteria. > > At the moment i don't have nice work enviroment( waiting for a new computer) > so i cant realy try out the nice translatorshell and all it's test code. But > when i get it I'll start hacking of. In the meantime ill do research and > some planing. > > I'm hoping to use the rlib parsing tool just for the benefit of one more > project using it and by that have more code for people to look at for > exampples to the parsing lib. Is this overkill? > > -- > //Samuel (Neppord) Ytterbrink -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexandervpetrov at gmail.com Fri Jul 8 01:03:20 2011 From: alexandervpetrov at gmail.com (Alexander Petrov) Date: Fri, 8 Jul 2011 02:03:20 +0300 Subject: [pypy-dev] PyPy is much slower than CPython example / question Message-ID: Hi. I'm new to PyPy and was trying to run some tests to see orders of speed improvement. Short script generating list of prime numbers using rather straightforward implementation of Eratosthene's sieve. Script: http://paste.pocoo.org/show/432727/ Typical results: http://paste.pocoo.org/show/432733/ (I thought that is due to absense of SSE2 on first computer, but I've rechecked on Intel(R) Xeon(R) CPU L5520 @ 2.27GHz with similar results). I'm getting that CPython is nearly 4-8 times faster than PyPy. Is it a bug in PyPy or what is wrong (may be "specific" to PyPy) in my script? Alex. From alex.gaynor at gmail.com Fri Jul 8 01:49:33 2011 From: alex.gaynor at gmail.com (Alex Gaynor) Date: Thu, 7 Jul 2011 16:49:33 -0700 Subject: [pypy-dev] Object identity and dict strategies Message-ID: Hi all, I've now spoken with several developers about the object identity that arises out of hte new dict strategies, and all seem to think that the current implementation breaks Python's semantics, that is: x = 3 z = {x: None} assert next(iter(z)) is x will fail. The only solution I see to this that maintains the correct semantics in all cases is to specialize is/id() for primitive types. That is to say, all integers of any given value have the same id() and x is y is true. Doing x is y is true is easy, you simply have a multimethod which dispatches on the types and compares intval for W_IntObjects (objects of differing types can never have the same identity) however the question is what to use for id() that is unique, never changes for an int, and doesn't intersect with any other live object. In particular the last constraint is the most difficult I think. Alex -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.gaynor at gmail.com Fri Jul 8 02:12:59 2011 From: alex.gaynor at gmail.com (Alex Gaynor) Date: Thu, 7 Jul 2011 17:12:59 -0700 Subject: [pypy-dev] PyPy is much slower than CPython example / question In-Reply-To: References: Message-ID: On Thu, Jul 7, 2011 at 4:03 PM, Alexander Petrov wrote: > Hi. > > I'm new to PyPy and was trying to run some tests to see orders of > speed improvement. > > Short script generating list of prime numbers using rather > straightforward implementation of Eratosthene's sieve. > Script: http://paste.pocoo.org/show/432727/ > > Typical results: http://paste.pocoo.org/show/432733/ > (I thought that is due to absense of SSE2 on first computer, but I've > rechecked on Intel(R) Xeon(R) CPU L5520 @ 2.27GHz with similar > results). > > I'm getting that CPython is nearly 4-8 times faster than PyPy. > Is it a bug in PyPy or what is wrong (may be "specific" to PyPy) in my > script? > > Alex. > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > http://mail.python.org/mailman/listinfo/pypy-dev > I haven't dug into this too deeply to see where the slowness is, however if you replace the setslice + repeat thingy with `for i in range(--that expression--): primes[i] = False` it becomes significantly faster, PyPy is 10x faster, and ~2x faster the original CPython version (CPython is ~2x slower when it's written this way). Alex -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero -------------- next part -------------- An HTML attachment was scrubbed... URL: From romain.py at gmail.com Fri Jul 8 02:30:48 2011 From: romain.py at gmail.com (Romain Guillebert) Date: Fri, 8 Jul 2011 02:30:48 +0200 Subject: [pypy-dev] PyPy is much slower than CPython example / question In-Reply-To: References: Message-ID: <20110708003048.GA8477@ubuntu> Hi When I change this line: primes[i*i:N+1:i] = repeat(False, len(primes[i*i:N+1:i])) into this : primes[i*i:N+1:i] = [False] * len(primes[i*i:N+1:i]) PyPy is much faster (but is still slower than CPython), so I would guess that the repeat function is the one to blame. Cheers Romain On Fri, Jul 08, 2011 at 02:03:20AM +0300, Alexander Petrov wrote: > Hi. > > I'm new to PyPy and was trying to run some tests to see orders of > speed improvement. > > Short script generating list of prime numbers using rather > straightforward implementation of Eratosthene's sieve. > Script: http://paste.pocoo.org/show/432727/ > > Typical results: http://paste.pocoo.org/show/432733/ > (I thought that is due to absense of SSE2 on first computer, but I've > rechecked on Intel(R) Xeon(R) CPU L5520 @ 2.27GHz with similar > results). > > I'm getting that CPython is nearly 4-8 times faster than PyPy. > Is it a bug in PyPy or what is wrong (may be "specific" to PyPy) in my script? > > Alex. > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > http://mail.python.org/mailman/listinfo/pypy-dev From alex.gaynor at gmail.com Fri Jul 8 02:38:44 2011 From: alex.gaynor at gmail.com (Alex Gaynor) Date: Thu, 7 Jul 2011 17:38:44 -0700 Subject: [pypy-dev] PyPy is much slower than CPython example / question In-Reply-To: <20110708003048.GA8477@ubuntu> References: <20110708003048.GA8477@ubuntu> Message-ID: repeat itself is not slow, it's just that when it's used it iterates over it, in RPython (meaning it's not jit'd) which results in a dictionary lookup for the next() method at every iteration, which is slowish, list hits a special case so it doesn' thave that overhead. Alex On Thu, Jul 7, 2011 at 5:30 PM, Romain Guillebert wrote: > Hi > > When I change this line: > > primes[i*i:N+1:i] = repeat(False, len(primes[i*i:N+1:i])) > > into this : > > primes[i*i:N+1:i] = [False] * len(primes[i*i:N+1:i]) > > PyPy is much faster (but is still slower than CPython), so I would guess > that the repeat function is the one to blame. > > Cheers > Romain > > On Fri, Jul 08, 2011 at 02:03:20AM +0300, Alexander Petrov wrote: > > Hi. > > > > I'm new to PyPy and was trying to run some tests to see orders of > > speed improvement. > > > > Short script generating list of prime numbers using rather > > straightforward implementation of Eratosthene's sieve. > > Script: http://paste.pocoo.org/show/432727/ > > > > Typical results: http://paste.pocoo.org/show/432733/ > > (I thought that is due to absense of SSE2 on first computer, but I've > > rechecked on Intel(R) Xeon(R) CPU L5520 @ 2.27GHz with similar > > results). > > > > I'm getting that CPython is nearly 4-8 times faster than PyPy. > > Is it a bug in PyPy or what is wrong (may be "specific" to PyPy) in my > script? > > > > Alex. > > _______________________________________________ > > pypy-dev mailing list > > pypy-dev at python.org > > http://mail.python.org/mailman/listinfo/pypy-dev > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > http://mail.python.org/mailman/listinfo/pypy-dev > -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero -------------- next part -------------- An HTML attachment was scrubbed... URL: From exarkun at twistedmatrix.com Fri Jul 8 03:05:18 2011 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Fri, 08 Jul 2011 01:05:18 -0000 Subject: [pypy-dev] PyPy is much slower than CPython example / question In-Reply-To: References: <20110708003048.GA8477@ubuntu> Message-ID: <20110708010518.3761.1343748147.divmod.xquotient.52@localhost.localdomain> On 12:38 am, alex.gaynor at gmail.com wrote: >repeat itself is not slow, it's just that when it's used it iterates >over >it, in RPython (meaning it's not jit'd) which results in a dictionary >lookup >for the next() method at every iteration, which is slowish, list hits a >special case so it doesn' thave that overhead. Is it time to reimplement repeat in Python then? Jean-Paul From alex.gaynor at gmail.com Fri Jul 8 03:15:08 2011 From: alex.gaynor at gmail.com (Alex Gaynor) Date: Thu, 7 Jul 2011 18:15:08 -0700 Subject: [pypy-dev] PyPy is much slower than CPython example / question In-Reply-To: <20110708010518.3761.1343748147.divmod.xquotient.52@localhost.localdomain> References: <20110708003048.GA8477@ubuntu> <20110708010518.3761.1343748147.divmod.xquotient.52@localhost.localdomain> Message-ID: No, you would need to implement list.__setitem__ in Python, which we could do, does the JIT see such code? Alex On Thu, Jul 7, 2011 at 6:05 PM, wrote: > On 12:38 am, alex.gaynor at gmail.com wrote: > >> repeat itself is not slow, it's just that when it's used it iterates over >> it, in RPython (meaning it's not jit'd) which results in a dictionary >> lookup >> for the next() method at every iteration, which is slowish, list hits a >> special case so it doesn' thave that overhead. >> > > Is it time to reimplement repeat in Python then? > > Jean-Paul > -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero -------------- next part -------------- An HTML attachment was scrubbed... URL: From fijall at gmail.com Fri Jul 8 09:47:04 2011 From: fijall at gmail.com (Maciej Fijalkowski) Date: Fri, 8 Jul 2011 09:47:04 +0200 Subject: [pypy-dev] Object identity and dict strategies In-Reply-To: References: Message-ID: On Fri, Jul 8, 2011 at 1:49 AM, Alex Gaynor wrote: > Hi all, > I've now spoken with several developers about the object identity that > arises out of hte new dict strategies, and all seem to think that the > current implementation breaks Python's semantics, that is: > x = 3 > z = {x: None} > assert next(iter(z)) is x > will fail. ?The only solution I see to this that maintains the correct > semantics in all cases is to specialize is/id() for primitive types. ?That > is to say, all integers of any given value have the same id() and x is y is > true. ?Doing x is y is true is easy, you simply have a multimethod which > dispatches on the types and compares intval for W_IntObjects (objects of > differing types can never have the same identity) however the question is > what to use for id() that is unique, never changes for an int, and doesn't > intersect with any other live object. ?In particular the last constraint is > the most difficult I think. > Alex >>> x = 3 >>> x is 3 True >>> x = 1003 >>> x is 1003 False >>> Stop relying on obscure details I think From holger at merlinux.eu Fri Jul 8 09:58:57 2011 From: holger at merlinux.eu (holger krekel) Date: Fri, 8 Jul 2011 07:58:57 +0000 Subject: [pypy-dev] Object identity and dict strategies In-Reply-To: References: Message-ID: <20110708075857.GF20287@merlinux.eu> On Fri, Jul 08, 2011 at 09:47 +0200, Maciej Fijalkowski wrote: > On Fri, Jul 8, 2011 at 1:49 AM, Alex Gaynor wrote: > > Hi all, > > I've now spoken with several developers about the object identity that > > arises out of hte new dict strategies, and all seem to think that the > > current implementation breaks Python's semantics, that is: > > x = 3 > > z = {x: None} > > assert next(iter(z)) is x > > will fail. ?The only solution I see to this that maintains the correct > > semantics in all cases is to specialize is/id() for primitive types. ?That > > is to say, all integers of any given value have the same id() and x is y is > > true. ?Doing x is y is true is easy, you simply have a multimethod which > > dispatches on the types and compares intval for W_IntObjects (objects of > > differing types can never have the same identity) however the question is > > what to use for id() that is unique, never changes for an int, and doesn't > > intersect with any other live object. ?In particular the last constraint is > > the most difficult I think. > > Alex > > >>> x = 3 > >>> x is 3 > True > >>> x = 1003 > >>> x is 1003 > False > >>> I guess this is not what Alex means here. In CPython, PyPy (both 1.5 and trunk) you can do: >>> class A: pass >>> a=A() >>> next(iter({a: None})) is a True but not with ints on pypy-trunk >>> a=3 >>> next(iter({a: None})) is a False (it's true on pypy-1.5). IOW, i think the issue here is that iterating over keys of a dict usually gives the exact same ("is") objects in CPython whereas pypy trunk does not provide that at least for ints. best, holger > Stop relying on obscure details I think > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > http://mail.python.org/mailman/listinfo/pypy-dev From william.leslie.ttg at gmail.com Fri Jul 8 10:31:37 2011 From: william.leslie.ttg at gmail.com (William ML Leslie) Date: Fri, 8 Jul 2011 18:31:37 +1000 Subject: [pypy-dev] Object identity and dict strategies In-Reply-To: <20110708075857.GF20287@merlinux.eu> References: <20110708075857.GF20287@merlinux.eu> Message-ID: On 8 July 2011 17:58, holger krekel wrote: > IOW, i think the issue here is that iterating over keys of a dict usually > gives the exact same ("is") objects in CPython whereas pypy trunk does not > provide that at least for ints. I couldn't find anything precise in the official documentation on the meaning of 'is'. I think that the general understanding is that it makes no sense whatsoever on immutable objects (as in, it isn't guaranteed to do so). Consequently, a python implementation could also cache tuples. Re-using tuples might sound unusual, but there are special cases that start to sound reasonable, such as caching the empty tuple, or copy-propogating a tuple unpack & repack. The language spec is very light on what is allowed to be a 'different object', and what is *suggested* by cpython's int caching behaviour is that the behaviour of 'is' for language-provided immutable objects can't be relied upon in any way, shape or form. Pypy hasn't matched cpython's behaviour with ints here in a long time, so it obviously doesn't matter. On another note: what Alex talks about as being two different cases are just one with the small int optimisation - all references can be compared by value in the C backend with small ints enabled, if the object space doesn't provide alternative behaviour. -- William Leslie From fijall at gmail.com Fri Jul 8 10:31:28 2011 From: fijall at gmail.com (Maciej Fijalkowski) Date: Fri, 8 Jul 2011 10:31:28 +0200 Subject: [pypy-dev] pypy 1.5 on Windows: sqlite connection args In-Reply-To: References: Message-ID: On Sun, Jun 5, 2011 at 4:18 PM, Caleb Hattingh wrote: > In lib-pypy/_sqlite3.py, line 239 (in Win 1.5 stable release) in class > Connection(object): > > def __init__(self, database, isolation_level="", detect_types=0, > timeout=None, cached_statements=None, factory=None): > ...might need to become: > def __init__(self, database, isolation_level="", detect_types=0, > timeout=None, cached_statements=None, factory=None, > check_same_thread=False): Fixed on trunk From arigo at tunes.org Fri Jul 8 11:18:45 2011 From: arigo at tunes.org (Armin Rigo) Date: Fri, 8 Jul 2011 11:18:45 +0200 Subject: [pypy-dev] PyPy is much slower than CPython example / question In-Reply-To: References: <20110708003048.GA8477@ubuntu> <20110708010518.3761.1343748147.divmod.xquotient.52@localhost.localdomain> Message-ID: Hi Alex, Before attacking the problem with the JIT, we should understand better why PyPy is 4-8 times slower than CPython. Normally you'd expect the factor to be at most 2. I suppose the answer is that our itertools.repeat() is bad for some reason. A bient?t, Armin. From arigo at tunes.org Fri Jul 8 11:52:10 2011 From: arigo at tunes.org (Armin Rigo) Date: Fri, 8 Jul 2011 11:52:10 +0200 Subject: [pypy-dev] Object identity and dict strategies In-Reply-To: References: <20110708075857.GF20287@merlinux.eu> Message-ID: Hi William, On Fri, Jul 8, 2011 at 10:31 AM, William ML Leslie wrote: > On another note: what Alex talks about as being two different cases > are just one with the small int optimisation - all references can be > compared by value in the C backend with small ints enabled, if the > object space doesn't provide alternative behaviour. No, because e.g. of longs. You don't have the same issue if you use longs instead of ints in this particular example, but more generally the issue exists too. A first note is that it's impossible to satisfy all three of Alex's criteria in general: we would like id(x) to be a unique word-sized number determined only by the value of 'x', and different for every value of 'x' and for every object of a different type too; but if the possible 'x'es are all possible word-sized integers, then it's impossible to satisfy this, just because there are too many of them. The problem only gets "more impossible" if we include all long objects in 'x'. The problem is not new, it is just a bit more apparent. For example, already in pypy 1.5 we have: >>>> a = A() >>>> d = a.__dict__ >>>> s = 'foobar' >>>> d[s] = 5 >>>> id(s) 163588812 >>>> id(d.keys()[0]) 163609508 >>>> id(d.keys()[0]) 163609520 I thought that there are also issues that would only show up with the JIT, because of the _immutable_ flag on W_IntObject, but it seems that I'm wrong. I can only say that Psyco has such issues, but nobody complained about them: lst = [] def f(x): for _ in range(3): pass # prevents inlining lst.append(x) def g(n): for i in range(n): f(i); f(i) With Psyco a call to g(5000) puts in 'lst' 10000 integer objects that are mostly all distinct objects, although there should in theory be at most 5000 distinct objects in there. (PyPy is safe so far because the call_assembler from g() to f() passes fully built W_IntObjects, instead of just cpu-level words; but that may change if in the future we add an optimization that knows how to generate a more efficient call_assembler.) I suppose that it's again a mixture of rules that are too vague and complains "but it works on CPython!" that are now being voiced just because the already-existing difference just became a bit more apparent. Sorry for the rant, I don't have an obvious solution :-) A bient?t, Armin. From cesare.di.mauro at gmail.com Fri Jul 8 12:04:18 2011 From: cesare.di.mauro at gmail.com (Cesare Di Mauro) Date: Fri, 8 Jul 2011 12:04:18 +0200 Subject: [pypy-dev] Object identity and dict strategies In-Reply-To: References: <20110708075857.GF20287@merlinux.eu>

Message-ID: 2011/7/8 Armin Rigo > Hi William, > > On Fri, Jul 8, 2011 at 10:31 AM, William ML Leslie > wrote: > > On another note: what Alex talks about as being two different cases > > are just one with the small int optimisation - all references can be > > compared by value in the C backend with small ints enabled, if the > > object space doesn't provide alternative behaviour. > > No, because e.g. of longs. You don't have the same issue if you use > longs instead of ints in this particular example, but more generally > the issue exists too. > > A first note is that it's impossible to satisfy all three of Alex's > criteria in general: we would like id(x) to be a unique word-sized > number determined only by the value of 'x', and different for every > value of 'x' and for every object of a different type too; but if the > possible 'x'es are all possible word-sized integers, then it's > impossible to satisfy this, just because there are too many of them. > The problem only gets "more impossible" if we include all long objects > in 'x'. > > The problem is not new, it is just a bit more apparent. For example, > already in pypy 1.5 we have: > > >>>> a = A() > >>>> d = a.__dict__ > >>>> s = 'foobar' > >>>> d[s] = 5 > >>>> id(s) > 163588812 > >>>> id(d.keys()[0]) > 163609508 > >>>> id(d.keys()[0]) > 163609520 > > I thought that there are also issues that would only show up with the > JIT, because of the _immutable_ flag on W_IntObject, but it seems that > I'm wrong. I can only say that Psyco has such issues, but nobody > complained about them: > > lst = [] > def f(x): > for _ in range(3): pass # prevents inlining > lst.append(x) > def g(n): > for i in range(n): > f(i); f(i) > > With Psyco a call to g(5000) puts in 'lst' 10000 integer objects that > are mostly all distinct objects, although there should in theory be at > most 5000 distinct objects in there. (PyPy is safe so far because the > call_assembler from g() to f() passes fully built W_IntObjects, > instead of just cpu-level words; but that may change if in the future > we add an optimization that knows how to generate a more efficient > call_assembler.) > > I suppose that it's again a mixture of rules that are too vague and > complains "but it works on CPython!" that are now being voiced just > because the already-existing difference just became a bit more > apparent. Sorry for the rant, I don't have an obvious solution :-) > > > A bient?t, > > Armin. > Hi Armin I fully agree. It's not an issue, but an implementation-specific detail which programmers don't have to assume always true. CPython can be compiled without "smallints" (-5..256, if I remember correctly) caching. There's a #DEFINE that can be disabled, so EVERY int (or long) will be allocated, so using the is operator will return False most of the time (unless you are just copied exactly the same object). The same applies for 1 character strings, which are USUALLY cached by CPython. So, there must be care about using is. It's safe for some trivial objects (None, False, True, Ellipsis) and, I think, with user-defined classes' instances, but not for everything. Regards, Cesare -------------- next part -------------- An HTML attachment was scrubbed... URL: From bokr at oz.net Fri Jul 8 16:07:53 2011 From: bokr at oz.net (Bengt Richter) Date: Fri, 08 Jul 2011 16:07:53 +0200 Subject: [pypy-dev] Object identity and dict strategies In-Reply-To: References: <20110708075857.GF20287@merlinux.eu> Message-ID: On 07/08/2011 10:31 AM William ML Leslie wrote: > On 8 July 2011 17:58, holger krekel wrote: >> IOW, i think the issue here is that iterating over keys of a dict usually >> gives the exact same ("is") objects in CPython whereas pypy trunk does not >> provide that at least for ints. > > I couldn't find anything precise in the official documentation on the > meaning of 'is'. I think that the general understanding is that it > makes no sense whatsoever on immutable objects (as in, it isn't > guaranteed to do so). > > Consequently, a python implementation could also cache tuples. > Re-using tuples might sound unusual, but there are special cases that > start to sound reasonable, such as caching the empty tuple, or > copy-propogating a tuple unpack& repack. The language spec is very > light on what is allowed to be a 'different object', and what is > *suggested* by cpython's int caching behaviour is that the behaviour > of 'is' for language-provided immutable objects can't be relied upon > in any way, shape or form. > > Pypy hasn't matched cpython's behaviour with ints here in a long time, > so it obviously doesn't matter. > > On another note: what Alex talks about as being two different cases > are just one with the small int optimisation - all references can be > compared by value in the C backend with small ints enabled, if the > object space doesn't provide alternative behaviour. > python -c 'import this'|egrep 'rules|purity|hard' #;-) From arigo at tunes.org Fri Jul 8 16:07:07 2011 From: arigo at tunes.org (Armin Rigo) Date: Fri, 8 Jul 2011 16:07:07 +0200 Subject: [pypy-dev] Object identity and dict strategies In-Reply-To: References: <20110708075857.GF20287@merlinux.eu>

Message-ID: Hi all, On Fri, Jul 8, 2011 at 12:04 PM, Cesare Di Mauro wrote: > So, there must be care about using is. It's safe for some trivial objects > (None, False, True, Ellipsis) and, I think, with user-defined classes' > instances, but not for everything. The problem is more acute with id(). Some standard library modules like copy.py *need* a working id() for any object, including immutable ones, because CPython has no identity dict. After some discussion with Carl Friedrich, here is the best we could think of. Say that "a is b" is redefined as being True for two equal immutable objects of the same type. Then we want the following property to always hold: "a is b" if and only if "id(a) == id(b)". We can do that by having id(x) return either a regular integer or a long. Let's call for the rest of this discussion an "immutable object" an object of type int, long or float. If 'x' is not an immutable object, then id(x) can return the same value as it does now. If 'x' is an immutable object, then we compute from it a long value that does *not* fit in 32- or 64-bit, and that includes some tagging to make sure that immutable objects of different types have different id's. For example, id(7) would be (2**32 + 7<<3 + 1). Such a solution should make two common use cases of id() work: 1. as keys in a dictionary, to implement an identity dict, like in copy.py: in this case we take the id() of random objects including immutable ones, but only expect the result to work as keys in the dictionary. Getting arbitrarily-sized longs is not a problem here. 2. more contrived examples involve taking the id() of some instance, sending it around as a (word-sized) integer, and when getting it back retrieving the original instance from some dict. I don't expect people to do that with immutable objects, only their own custom instances. That's why it should be enough if id(x) returns a regular, 32- or 64-bit integer for *non-immutable* objects. A bient?t, Armin. From amauryfa at gmail.com Fri Jul 8 16:14:37 2011 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Fri, 8 Jul 2011 16:14:37 +0200 Subject: [pypy-dev] Object identity and dict strategies In-Reply-To: References: <20110708075857.GF20287@merlinux.eu>

Message-ID: 2011/7/8 Cesare Di Mauro : > I fully agree. It's not an issue, but an implementation-specific detail > which programmers don't have to assume always true. > > CPython can be compiled without "smallints" (-5..256, if I remember > correctly) caching. There's a #DEFINE that can be disabled, so EVERY int (or > long) will be allocated, so using the is operator will return False most of > the time (unless you are just copied exactly the same object). > > The same applies for 1 character strings, which are USUALLY cached by > CPython. But the problem here is not object cache, but preservation of object identity, which is quite different. Python containers are supposed to keep the objects you put inside: myList.append(x) assert myList(-1) is x myDict[x] = 1 for key in myDict: if key is x: ... -- Amaury Forgeot d'Arc From fijall at gmail.com Fri Jul 8 16:17:08 2011 From: fijall at gmail.com (Maciej Fijalkowski) Date: Fri, 8 Jul 2011 16:17:08 +0200 Subject: [pypy-dev] Object identity and dict strategies In-Reply-To: References: <20110708075857.GF20287@merlinux.eu>

Message-ID: On Fri, Jul 8, 2011 at 4:14 PM, Amaury Forgeot d'Arc wrote: > 2011/7/8 Cesare Di Mauro : >> I fully agree. It's not an issue, but an implementation-specific detail >> which programmers don't have to assume always true. >> >> CPython can be compiled without "smallints" (-5..256, if I remember >> correctly) caching. There's a #DEFINE that can be disabled, so EVERY int (or >> long) will be allocated, so using the is operator will return False most of >> the time (unless you are just copied exactly the same object). >> >> The same applies for 1 character strings, which are USUALLY cached by >> CPython. > > But the problem here is not object cache, but preservation of object identity, > which is quite different. > Python containers are supposed to keep the objects you put inside: [citation needed] array.array does not for one > > myList.append(x) > assert myList(-1) is x > > myDict[x] = 1 > for key in myDict: > ? ?if key is x: > ? ? ? ?... > also dict doesn't work if you overwrite the key: d = {1003: None} x = 1003 d[x] = None d.keys()[0] is x From cfbolz at gmx.de Fri Jul 8 16:50:05 2011 From: cfbolz at gmx.de (Carl Friedrich Bolz) Date: Fri, 08 Jul 2011 16:50:05 +0200 Subject: [pypy-dev] Object identity and dict strategies In-Reply-To: References: <20110708075857.GF20287@merlinux.eu>

Message-ID: <4E17191D.6010700@gmx.de> On 07/08/2011 04:14 PM, Amaury Forgeot d'Arc wrote: > 2011/7/8 Cesare Di Mauro: >> I fully agree. It's not an issue, but an implementation-specific detail >> which programmers don't have to assume always true. >> >> CPython can be compiled without "smallints" (-5..256, if I remember >> correctly) caching. There's a #DEFINE that can be disabled, so EVERY int (or >> long) will be allocated, so using the is operator will return False most of >> the time (unless you are just copied exactly the same object). >> >> The same applies for 1 character strings, which are USUALLY cached by >> CPython. > > But the problem here is not object cache, but preservation of object identity, > which is quite different. I think in the end id is the hard problem. Object identity can be fixed. We could say that "a is b" uses equality for primitive objects. However, then id needs to be fixed too, so that the following property holds: "a is b" if and only if "id(a) == id(b)" This is the harder part. Carl Friedrich From anto.cuni at gmail.com Fri Jul 8 17:10:58 2011 From: anto.cuni at gmail.com (Antonio Cuni) Date: Fri, 08 Jul 2011 17:10:58 +0200 Subject: [pypy-dev] Object identity and dict strategies In-Reply-To: <4E17191D.6010700@gmx.de> References: <20110708075857.GF20287@merlinux.eu>

<4E17191D.6010700@gmx.de> Message-ID: <4E171E02.40202@gmail.com> On 08/07/11 16:50, Carl Friedrich Bolz wrote: > > I think in the end id is the hard problem. Object identity can be fixed. We > could say that "a is b" uses equality for primitive objects. what about starting to think about solutions only when we are sure that it's actually a problem? I don't expect any real world code to rely on this behavior (or, if it does, I'm up to call it broken :-)) From arigo at tunes.org Fri Jul 8 17:19:17 2011 From: arigo at tunes.org (Armin Rigo) Date: Fri, 8 Jul 2011 17:19:17 +0200 Subject: [pypy-dev] Object identity and dict strategies In-Reply-To: <4E171E02.40202@gmail.com> References: <20110708075857.GF20287@merlinux.eu>

<4E17191D.6010700@gmx.de> <4E171E02.40202@gmail.com> Message-ID: Hi Anto, On Fri, Jul 8, 2011 at 5:10 PM, Antonio Cuni wrote: > what about starting to think about solutions only when we are sure that it's > actually a problem? > > I don't expect any real world code to rely on this behavior (or, if it does, > I'm up to call it broken :-)) As I said in my previous e-mail, I think that e.g. copy.py relies on such behavior, and more generally any Python code that has to use id() to emulate an identity-dict --- as broken as that approach is, just because CPython thought that identity-dicts were unnecessary. It may not actually be a problem right now in copy.py, but still, I fear that it's a problem waiting to hurt us. A bient?t, Armin. From anto.cuni at gmail.com Fri Jul 8 17:40:21 2011 From: anto.cuni at gmail.com (Antonio Cuni) Date: Fri, 08 Jul 2011 17:40:21 +0200 Subject: [pypy-dev] Object identity and dict strategies In-Reply-To: References: <20110708075857.GF20287@merlinux.eu>

<4E17191D.6010700@gmx.de> <4E171E02.40202@gmail.com> Message-ID: <4E1724E5.2030902@gmail.com> On 08/07/11 17:19, Armin Rigo wrote: > As I said in my previous e-mail, I think that e.g. copy.py relies on > such behavior, and more generally any Python code that has to use id() > to emulate an identity-dict --- as broken as that approach is, just > because CPython thought that identity-dicts were unnecessary. > > It may not actually be a problem right now in copy.py, but still, I > fear that it's a problem waiting to hurt us. I don't think that copy.py relies on this behavior (even after looking at the code, but I might wrong). My point is that we are not breaking id(); identity dicts emulated using id() continue to work as normal. The only assumption that we are breaking is this one, which has nothing to do with identity dicts or id(). d = {} d[x] = None assert d.keys()[0] is x As fijal points out, the semantics of this particular behavior is already half-broken, even in CPython. E.g., in the following example we never remove anything from the dictionary, nevertheless a key "disappears": x = 1003 d = {x: None} assert d.keys()[0] is x d[1000+3] = None assert d.keys()[0] is x # BOOM ciao, Anto From exarkun at twistedmatrix.com Fri Jul 8 17:04:50 2011 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Fri, 08 Jul 2011 15:04:50 -0000 Subject: [pypy-dev] Object identity and dict strategies In-Reply-To: References: <20110708075857.GF20287@merlinux.eu>

Message-ID: <20110708150450.3761.405889280.divmod.xquotient.121@localhost.localdomain> On 02:17 pm, fijall at gmail.com wrote: >On Fri, Jul 8, 2011 at 4:14 PM, Amaury Forgeot d'Arc > wrote: >>2011/7/8 Cesare Di Mauro : >>>I fully agree. It's not an issue, but an implementation-specific >>>detail >>>which programmers don't have to assume always true. >>> >>>CPython can be compiled without "smallints" (-5..256, if I remember >>>correctly) caching. There's a #DEFINE that can be disabled, so EVERY >>>int (or >>>long) will be allocated, so using the is operator will return False >>>most of >>>the time (unless you are just copied exactly the same object). >>> >>>The same applies for 1 character strings, which are USUALLY cached by >>>CPython. >> >>But the problem here is not object cache, but preservation of object >>identity, >>which is quite different. >>Python containers are supposed to keep the objects you put inside: > >[citation needed] array.array does not for one Yes, and array.array is weird. :) It either exists as a memory optimization (ie, I don't want objects) or a way to directly lay out memory (to pass to a C API). Either way, you can't put arbitrary objects into it either - so it's already a little special, even if you disregard the fact that it doesn't preserve the identify the objects you can put into it. However, you're right. It exists, and it has this non-identity- preserving behavior. Is it a good thing, though? Or just an accident of how someone tried to let CPython be faster for some types of problems? >> >>myList.append(x) >>assert myList(-1) is x >> >>myDict[x] = 1 >>for key in myDict: >>? ?if key is x: >>? ? ? ?... > >also dict doesn't work if you overwrite the key: > >d = {1003: None} >x = 1003 >d[x] = None >d.keys()[0] is x This doesn't invalidate the original point, as far as I can tell. It just demonstrates again that you can have two instances of 1003. Whether dict guarantees to always use the new key or the old key when an update is made is a separate question. I think it would be better if object identity didn't depend on this mysterious quality of "immutability". The language is easier to understand (particularly for new programmers) if one can talk about objects and references without having to also explain that _some_ data types are represented using things that are sort of like objects but not quite (and worse if it depends on what types the JIT feels like playing with in any particular version of the interpreter). Jean-Paul From arigo at tunes.org Fri Jul 8 17:59:03 2011 From: arigo at tunes.org (Armin Rigo) Date: Fri, 8 Jul 2011 17:59:03 +0200 Subject: [pypy-dev] Object identity and dict strategies In-Reply-To: <4E1724E5.2030902@gmail.com> References: <20110708075857.GF20287@merlinux.eu>

<4E17191D.6010700@gmx.de> <4E171E02.40202@gmail.com> <4E1724E5.2030902@gmail.com> Message-ID: Hi Anto, On Fri, Jul 8, 2011 at 5:40 PM, Antonio Cuni wrote: > My point is that we are not breaking id(); identity dicts emulated using id() > continue to work as normal. Well, the argument goes like this: *if* we think that the problems like the one you describe are to be fixed, then we really need to hack at "is", and *then* we have a problem with id() as well, and we have to fix it as I described. You can also try to argue that nothing is broken so far, neither in "is" nor in id() nor your examples. I fear that we are going to end up seeing more and more cases where users rely on the current CPython behavior, particularly because we're going to expose such issues more and more over time as we add new optimizations. But I may be wrong and it may be enough to document it in cpython-differences.rst. > x = 1003 > d = {x: None} > assert d.keys()[0] is x > d[1000+3] = None > assert d.keys()[0] is x # BOOM This is the wrong example: in this case, CPython guarantees that it will not crash. The semantics are not really half-broken, just not written out clearly: when you add an object to a dict, if there is another key already in there that compares equal, then the existing key is kept; it is not replaced by the new key. (Both CPython and PyPy agree to this rule.) A bient?t, Armin. From anto.cuni at gmail.com Fri Jul 8 18:37:55 2011 From: anto.cuni at gmail.com (Antonio Cuni) Date: Fri, 08 Jul 2011 18:37:55 +0200 Subject: [pypy-dev] Object identity and dict strategies In-Reply-To: References: <20110708075857.GF20287@merlinux.eu>

<4E17191D.6010700@gmx.de> <4E171E02.40202@gmail.com> <4E1724E5.2030902@gmail.com> Message-ID: <4E173263.2060603@gmail.com> On 08/07/11 17:59, Armin Rigo wrote: > I fear that we are going to end up seeing more and more cases where > users rely on the current CPython behavior, particularly because we're > going to expose such issues more and more over time as we add new > optimizations. But I may be wrong and it may be enough to document it > in cpython-differences.rst. yes, the whole point of my position is that I don't think there is many code around relying on this behavior. Thus, I propose to wait a bit and see how many people complain, before fixing what it might be a non-issue >> x = 1003 >> d = {x: None} >> assert d.keys()[0] is x >> d[1000+3] = None >> assert d.keys()[0] is x # BOOM > > This is the wrong example: in this case, CPython guarantees that it > will not crash. The semantics are not really half-broken, just not > written out clearly: when you add an object to a dict, if there is > another key already in there that compares equal, then the existing > key is kept; it is not replaced by the new key. (Both CPython and > PyPy agree to this rule.) ouch, I should write test for my emails :-) From arigo at tunes.org Fri Jul 8 23:40:04 2011 From: arigo at tunes.org (Armin Rigo) Date: Fri, 8 Jul 2011 23:40:04 +0200 Subject: [pypy-dev] Floating point computation In-Reply-To: References:

Message-ID: Hi, On Thu, Jul 7, 2011 at 10:59 AM, Maciej Fijalkowski wrote: >> The Translation and JIT backend for PyPy may be able to allow Python >> programmers to use SIMD instructions directly from Python. > > We kind of want to do that automatically for numpy operations. Can all basic SIMD instructions be mapped to numpy array operations? If so, why doesn't numpy already use them? (Or does it?) And if not, then is there some existing C extension module for CPython that exposes them somehow? Of course CPython cannot hope for impressive speedups with such an extension module, but such a module in PyPy could be JITted and give massive speed benefits. A bient?t, Armin. From william.leslie.ttg at gmail.com Sat Jul 9 05:20:53 2011 From: william.leslie.ttg at gmail.com (William ML Leslie) Date: Sat, 9 Jul 2011 13:20:53 +1000 Subject: [pypy-dev] Object identity and dict strategies In-Reply-To: References: <20110708075857.GF20287@merlinux.eu>

Message-ID: On 8 July 2011 19:52, Armin Rigo wrote: > Hi William, Hi Armin, everybody, > On Fri, Jul 8, 2011 at 10:31 AM, William ML Leslie > wrote: >> On another note: what Alex talks about as being two different cases >> are just one with the small int optimisation - all references can be >> compared by value in the C backend with small ints enabled, if the >> object space doesn't provide alternative behaviour. > > No, because e.g. of longs. You don't have the same issue if you use > longs instead of ints in this particular example, but more generally > the issue exists too. > > A first note is that it's impossible to satisfy all three of Alex's > criteria in general: we would like id(x) to be a unique word-sized > number determined only by the value of 'x', and different for every > value of 'x' and for every object of a different type too; but if the > possible 'x'es are all possible word-sized integers, then it's > impossible to satisfy this, just because there are too many of them. > The problem only gets "more impossible" if we include all long objects > in 'x'. That id(x) be a word-sized value uniquely determined by the value of x is impossible, yes, as the following program should demonstrate: while True: id([]) We create an infinite number of objects here, and if each one had to have a unique word-sized id, we'd exhaust that space pretty quickly. What id() does provide is that as long as there is a *reference* to the object, it returns a constant and unique integer (which we are assuming should be a word-sized integer). As later emails on this list suggested, it would be better if the semantics were relaxed to "the id should be preserved", meaning that placement of an integer or string into a container should allow it to have the same id on retrieval. New objects resulting from operations such as multiplication or string concatenation need not have the same id as other objects with the same value - if we did that for strings, it would have serious consequences for parallel code. What I was suggesting is that, since every live object must be encoded in an object reference somewhere, that object references should be at least good enough to suggest an id. My point about small integers wasn't really about word-sized ints, I was talking about the smallint optimisation, which as I understood it, boxes small app-level integers in the C backend. It does this by shifting integers right and bitwise-oring with 1; encoding the integer into the reference. "By definition", as Carl put it, there are never more objects represented in this way than we can fit in a reasonable, bounded id size. The suggestion then is to use the value of the object reference cast as a word-sized integer as the id of the object for integers so encoded, and calculate the id of other objects in the usual fashion. This happens to work for larger integers and floats, because the id will be preserved as long as a reference to them exists by their boxedness. Floats could use a similar mechanism to integers, eg, their bit representation shifted left two and bitwise-ored with 2. That does mean that id() is no longer word-sized, but it does not make it unbounded. > The problem is not new, it is just a bit more apparent. For example, > already in pypy 1.5 we have: > >>>>> a = A() >>>>> d = a.__dict__ >>>>> s = 'foobar' >>>>> d[s] = 5 >>>>> id(s) > 163588812 >>>>> id(d.keys()[0]) > 163609508 >>>>> id(d.keys()[0]) > 163609520 What is keeping us from using the underlying rpython string as the basis for id? I guess there is nowhere obvious to store the id or the need to store the id in the gc copy phase. In that way, it'd make for an ugly special case. On 9 July 2011 00:07, Armin Rigo wrote: > After some discussion with Carl Friedrich, here is the best we could > think of. Say that "a is b" is redefined as being True for two equal > immutable objects of the same type. Then we want the following > property to always hold: "a is b" if and only if "id(a) == id(b)". I would prefer to weaken this just a little: x is y iff id(x) == id(y) WHEN x and y are live for the duration of the equality. This counters cases such as: id([]) == id([]) which can certainly happen under any reasonable definition of id. > We > can do that by having id(x) return either a regular integer or a long. > Let's call for the rest of this discussion an "immutable object" an > object of type int, long or float. If 'x' is not an immutable object, > then id(x) can return the same value as it does now. If 'x' is an > immutable object, then we compute from it a long value that does *not* > fit in 32- or 64-bit, and that includes some tagging to make sure that > immutable objects of different types have different id's. For > example, id(7) would be (2**32 + 7<<3 + 1). This makes the range of id() unbounded, where its domain is bounded. I don't feel comfortable with it, although I know that isn't much of an objection. > Such a solution should make two common use cases of id() work: > > 1. as keys in a dictionary, to implement an identity dict, like in > copy.py: in this case we take the id() of random objects including > immutable ones, but only expect the result to work as keys in the > dictionary. Getting arbitrarily-sized longs is not a problem here. Speaking of, maybe it'd be easier to attempt to get the identity dict into the language proper. William Leslie From stefan_ml at behnel.de Sat Jul 9 06:26:46 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 09 Jul 2011 06:26:46 +0200 Subject: [pypy-dev] PyPy is much slower than CPython example / question In-Reply-To: References: <20110708003048.GA8477@ubuntu> <20110708010518.3761.1343748147.divmod.xquotient.52@localhost.localdomain> Message-ID: Armin Rigo, 08.07.2011 11:18: > Before attacking the problem with the JIT, we should understand better > why PyPy is 4-8 times slower than CPython. Normally you'd expect the > factor to be at most 2. I suppose the answer is that our > itertools.repeat() is bad for some reason. You shouldn't forget that itertools contains extremely well optimised C code. I recently tried to (algorithmically) optimise the pure Python implementations from the CPython docs for Cython, and ended up with timings between 10% faster and 50% slower. It's actually hard to reach up to the raw power of hand tuned C code here. http://blog.behnel.de/index.php?p=163 http://blog.behnel.de/index.php?p=185 Given that some of the functions don't execute anything more than a couple of assembly instructions per cycle, you'll have to send a really well tuned high level implementation into the race to even come close. I would expect that PyPy's implementation simply wasn't written for that. That being said, it would be nice to have an itertools micro benchmark for the common Python test suite. Stefan From alex.gaynor at gmail.com Sat Jul 9 06:40:18 2011 From: alex.gaynor at gmail.com (Alex Gaynor) Date: Fri, 8 Jul 2011 21:40:18 -0700 Subject: [pypy-dev] PyPy is much slower than CPython example / question In-Reply-To: References: <20110708003048.GA8477@ubuntu> <20110708010518.3761.1343748147.divmod.xquotient.52@localhost.localdomain> Message-ID: I'm not too sure what could be better wrong with it, it's rather short: https://bitbucket.org/pypy/pypy/src/default/pypy/module/itertools/interp_itertools.py#cl-85 Alex On Fri, Jul 8, 2011 at 2:18 AM, Armin Rigo wrote: > Hi Alex, > > Before attacking the problem with the JIT, we should understand better > why PyPy is 4-8 times slower than CPython. Normally you'd expect the > factor to be at most 2. I suppose the answer is that our > itertools.repeat() is bad for some reason. > > > A bient?t, > > Armin. > -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero -------------- next part -------------- An HTML attachment was scrubbed... URL: From arigo at tunes.org Sat Jul 9 10:17:33 2011 From: arigo at tunes.org (Armin Rigo) Date: Sat, 9 Jul 2011 10:17:33 +0200 Subject: [pypy-dev] Object identity and dict strategies In-Reply-To: References: <20110708075857.GF20287@merlinux.eu>

Message-ID: Hi, On Sat, Jul 9, 2011 at 5:20 AM, William ML Leslie wrote: > My point about small integers (...) I think that your point about small integers is broken (even assuming that smallints are enabled by default, which is not the case). It means that we'd get an id() function which "behaves" as long as taking the id() of a 31-bit signed integer, and then doesn't behave as expected neither for full 32-bit integers, nor for longs, nor for floats. > This happens to work for larger integers and floats, because the id > will be preserved as long as a reference to them exists by their > boxedness. > Floats could use a similar mechanism to integers, eg, their bit > representation shifted left two and bitwise-ored with 2. I don't understand these two sentences, because they seem to say the exact opposite of each other... >?That does mean that id() is no longer word-sized, but it does not make it > unbounded. The "unbounded" part in my e-mail was about longs. Obviously if you are computing id(x) where x is in some finite set (like ints or floats), then the results are in some finite set too. > What is keeping us from using the underlying rpython string as the > basis for id? This is probably a good enough workaround for strings and unicodes. > Speaking of, maybe it'd be easier to attempt to get the identity dict > into the language proper. We tried at some point, but python-dev refused the idea. Maybe the idea has more chances for approval now that we can really show with performance numbers that it's a deep issue, as opposed to just wave our hands. Feel free to try again. In the meantime I've decided, at least myself, to stick with the approach that Python is whatever is in 2.7, and that we have to work around such issues instead of fixing them properly in the language. A bient?t, Armin. From osadchiy.ilya at gmail.com Sat Jul 9 11:19:26 2011 From: osadchiy.ilya at gmail.com (Ilya Osadchiy) Date: Sat, 9 Jul 2011 12:19:26 +0300 Subject: [pypy-dev] Floating point computation Message-ID: >>> The Translation and JIT backend for PyPy may be able to allow Python >>> programmers to use SIMD instructions directly from Python. >> >> We kind of want to do that automatically for numpy operations. > > Can all basic SIMD instructions be mapped to numpy array operations? > If so, why doesn't numpy already use them? ?(Or does it?) > I think CPython's numpy may be compiled with auto-vectorization and therefore use SIMD. It also has optional dependency for lapack, and AFAIK most lapacks use SIMD. > > And if not, then is there some existing C extension module for CPython > that exposes them somehow? ?Of course CPython cannot hope for > impressive speedups with such an extension module, but such a module > in PyPy could be JITted and give massive speed benefits. > I think that JITting element-wise operations to SIMD assembly is feasible. Logically you can see it like: normal JIT => loop unroll => instruction reordering => substitute N instructions of same type with 1 simd instruction Of course there are problems (loads and stores need special treatment, not every instruction has simd equivalent, register allocation differs, etc) But I think the bigger problem is JITting SIMD for things like matrix multiply and fft, which are more than a single loop and the order of calculations may seriously differ between scalar and simd versions. From neppord at gmail.com Sat Jul 9 19:24:08 2011 From: neppord at gmail.com (Samuel Ytterbrink) Date: Sat, 9 Jul 2011 19:24:08 +0200 Subject: [pypy-dev] integrity of code Message-ID: Hi! I was just thinking and then ask my self... Dose writing things in RPython and then compile the code with translate and a c compiler make it as hard to read as a c program... thinking that python some times get a hard time being a commercial tool for app making, companies don't want end users to be able to read there code. Am i correct and if so are there any improvements you could do to make the interpreter read a more 'seacret' version of python code, some kinde of binary code( or dose the byetcode work for this). Hope this is not a stupid or foolish topic. -- //Samuel Ytterbrink -------------- next part -------------- An HTML attachment was scrubbed... URL: From ademan555 at gmail.com Sun Jul 10 00:14:08 2011 From: ademan555 at gmail.com (Dan Roberts) Date: Sat, 9 Jul 2011 15:14:08 -0700 Subject: [pypy-dev] integrity of code In-Reply-To: References: Message-ID: Hi Samuel, If I understand your question correctly, you are a bit confused about RPython. Only the interpreter is translated to c and then compiled. The source code to the interpreter is obviously not secret at all, we all work on it and share it on bitbucket. User code might be secret, but it is not passed through the translation toolchain. In the case of Python on PyPy the user code is compiled to a bytecode very similar to CPython's bytecode. I would say this bytecode is quite readable, so it's significantly more readable than optimized x86 output. I suppose it would be possible to add some obfuscation to the bytecode but I don't know how one would do that effectively without modifying the entire bytecode interpreter and compiler, which would be an undertaking... I may have misunderstood you though, feel free to correct me. Cheers, Dan On Jul 9, 2011 10:24 AM, "Samuel Ytterbrink" wrote: > Hi! > > I was just thinking and then ask my self... Dose writing things in RPython > and then compile the code with translate and a c compiler make it as hard to > read as a c program... thinking that python some times get a hard time being > a commercial tool for app making, companies don't want end users to be able > to read there code. > > Am i correct and if so are there any improvements you could do to make the > interpreter read a more 'seacret' version of python code, some kinde of > binary code( or dose the byetcode work for this). > > Hope this is not a stupid or foolish topic. > > -- > //Samuel Ytterbrink -------------- next part -------------- An HTML attachment was scrubbed... URL: From bokr at oz.net Sun Jul 10 15:48:08 2011 From: bokr at oz.net (Bengt Richter) Date: Sun, 10 Jul 2011 15:48:08 +0200 Subject: [pypy-dev] Object identity and dict strategies In-Reply-To: References: <20110708075857.GF20287@merlinux.eu>

Message-ID: On 07/09/2011 10:17 AM Armin Rigo wrote: > Hi, > > On Sat, Jul 9, 2011 at 5:20 AM, William ML Leslie > wrote: >> My point about small integers (...) > > I think that your point about small integers is broken (even assuming > that smallints are enabled by default, which is not the case). It > means that we'd get an id() function which "behaves" as long as taking > the id() of a 31-bit signed integer, and then doesn't behave as > expected neither for full 32-bit integers, nor for longs, nor for > floats. > >> This happens to work for larger integers and floats, because the id >> will be preserved as long as a reference to them exists by their >> boxedness. >> Floats could use a similar mechanism to integers, eg, their bit >> representation shifted left two and bitwise-ored with 2. > > I don't understand these two sentences, because they seem to say the > exact opposite of each other... > >> That does mean that id() is no longer word-sized, but it does not make it >> unbounded. > > The "unbounded" part in my e-mail was about longs. Obviously if you > are computing id(x) where x is in some finite set (like ints or > floats), then the results are in some finite set too. > >> What is keeping us from using the underlying rpython string as the >> basis for id? > > This is probably a good enough workaround for strings and unicodes. > >> Speaking of, maybe it'd be easier to attempt to get the identity dict >> into the language proper. > > We tried at some point, but python-dev refused the idea. Maybe the > idea has more chances for approval now that we can really show with > performance numbers that it's a deep issue, as opposed to just wave > our hands. Feel free to try again. In the meantime I've decided, at > least myself, to stick with the approach that Python is whatever is in > 2.7, and that we have to work around such issues instead of fixing > them properly in the language. > Does that mean you have to follow the vagaries of the 2.7 compiler optimizations (or lack of them, as the case may be) that make for results like these?: (note fresh 2.7.2 build below ;-) [& BTW kudos to those who made it easy with the tarball and config, make] [10:30 ~]$ python Python 2.7.2 (default, Jul 8 2011, 23:38:53) [GCC 4.1.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from ut.miscutil import disev,disex >>> id(2000) == id(2000) True >>> id(2000) == id(20*100) False >>> disev('id(2000) == id(2000)') 1 0 LOAD_NAME 0 (id) 3 LOAD_CONST 0 (2000) 6 CALL_FUNCTION 1 9 LOAD_NAME 0 (id) 12 LOAD_CONST 0 (2000) 15 CALL_FUNCTION 1 18 COMPARE_OP 2 (==) 21 RETURN_VALUE >>> disev('id(2000) == id(20*100)') 1 0 LOAD_NAME 0 (id) 3 LOAD_CONST 0 (2000) 6 CALL_FUNCTION 1 9 LOAD_NAME 0 (id) 12 LOAD_CONST 3 (2000) 15 CALL_FUNCTION 1 18 COMPARE_OP 2 (==) 21 RETURN_VALUE >>> Notice that 20*100 was folded to 2000, but a different constant was generated, hence, presumably, the different id. Will the next update of 2.7 do the same? BTW, disev above is just from a grabbag module of mine, amounting to def disev(s): import dis return dis.dis(compile(s,'disev','eval')) It doesn't have to be about integers: >>> id([]) == id([]) True >>> id(list()) == id(list()) False >>> disev('id([]) == id([])') 1 0 LOAD_NAME 0 (id) 3 BUILD_LIST 0 6 CALL_FUNCTION 1 9 LOAD_NAME 0 (id) 12 BUILD_LIST 0 15 CALL_FUNCTION 1 18 COMPARE_OP 2 (==) 21 RETURN_VALUE >>> disev('id(list()) == id(list())') 1 0 LOAD_NAME 0 (id) 3 LOAD_NAME 1 (list) 6 CALL_FUNCTION 0 9 CALL_FUNCTION 1 12 LOAD_NAME 0 (id) 15 LOAD_NAME 1 (list) 18 CALL_FUNCTION 0 21 CALL_FUNCTION 1 24 COMPARE_OP 2 (==) 27 RETURN_VALUE >>> Of course, the is operator call will keep both references alive on the stack, so whereas you get >>> id([]) == id([]) True >>> id([]), id([]) (3082823052L, 3082823052L) Keeping the arguments alive simultaneously gives >>> (lambda x,y:(id(x),id(y)))([],[]) (3082823052L, 3082881516L) >>> (lambda x,y:(id(x)==id(y)))([],[]) False >>> ... assuming the lambda above approximates what `is' does. Using id comparisons instead of `is' can look superficially a bit weird: >>> id([1]) == id([2]) True >>> Bottom line: id does not seem to have a complete abstract definition[1][2], so how may one define a 'correct' implementation? ISTM id now (CPython 2.7.2) is more like a version-dependent debugger function that can be useful in app hacks, but with caveats ;-) Regards, Bengt Richter [1] http://docs.python.org/reference/datamodel.html#index-824 [2] http://docs.python.org/library/functions.html#id PS. How does the following cachingid.py compare to your idea of what id should ideally do? (not tested except example) (Obviously it is just a concept implementation, not resource efficient) ________________________________________ refval = id # for using old id def id(x, objcache = {}): if type(x) in (int,float,bool,tuple,str,unicode): # etc t = (type(x), x) return refval(objcache.setdefault(t, t)[1]) else: return refval(x) # as now? XXX what is abstract meaning of id(some_mutable)? ________________________________________ It tries to return the ordinary id of the first instance of equivalent immutable objects passed to it, and otherwise uses the old id, which I'm not finished thinking about ;-) At least it fixes the difference between id(2000) and id(20*100): >>> oldid=id >>> from ut.cachingid import id # ut is just my utility collection >>> oldid(2000)==oldid(2000) True >>> oldid(2000)==oldid(20*100) False >>> id(2000)==id(2000) True >>> id(2000)==id(20*100) True >>> id.func_defaults ({(, 2000): (, 2000)},) >>> > > A bient?t, > > Armin. From arigo at tunes.org Sun Jul 10 16:09:11 2011 From: arigo at tunes.org (Armin Rigo) Date: Sun, 10 Jul 2011 16:09:11 +0200 Subject: [pypy-dev] Object identity and dict strategies In-Reply-To: References: <20110708075857.GF20287@merlinux.eu>

Message-ID: Hi Bengt, On Sun, Jul 10, 2011 at 3:48 PM, Bengt Richter wrote: > ?>>> id([1]) == id([2]) > True As pointed out by Carl Friedrich, the real definition of "id" is: * if x and y are two variables, then "x is y" <=> "id(x) == id(y)". That's why in any Python implementation, >>> x=[1]; y=[2]; id(x) == id(y) must return False, but not necessarily id([1]) == id([2]). A bient?t, Armin. From bokr at oz.net Sun Jul 10 18:08:41 2011 From: bokr at oz.net (Bengt Richter) Date: Sun, 10 Jul 2011 18:08:41 +0200 Subject: [pypy-dev] Object identity and dict strategies In-Reply-To: References: <20110708075857.GF20287@merlinux.eu>

Message-ID: <4E19CE89.6070905@oz.net> On 07/10/2011 04:09 PM Armin Rigo wrote: > Hi Bengt, > > On Sun, Jul 10, 2011 at 3:48 PM, Bengt Richter wrote: >> >>> id([1]) == id([2]) >> True > True, I did write that. The key word in my line before that was `superficially' ;-) > As pointed out by Carl Friedrich, the real definition of "id" is: > > * if x and y are two variables, then "x is y"<=> "id(x) == id(y)". > So conceptually, id is about "variables" -- not objects? Isn't a "variable" just a special case of an expression (yielding a reference to a result object, like other expressions)? ISTM that they key thing above is that "if x and y are two variables" the two (or one if same) referenced objects are guaranteed to be live at the same time. So they can't be in the same "location" unless they are the same object. Someone else mentioned this liveness issue too. > That's why in any Python implementation, > >>>> x=[1]; y=[2]; id(x) == id(y) > > must return False, but not necessarily id([1]) == id([2]). > The definition with "two variables" seems unnecessarily restrictive, unless I am wrong that x=[1]; id(x) == id([2]) must also return false, since the x reference guarantees that the [2] cannot be created in the same "location" as the [1] held by the x. Something about returning location from id reminds me of the need for automatic closure creation in another context. Letting the expression result die and returning a kind of pointer to where the result object *was* seems like a dangling pointer problem, except I guess you can't dereference an id value (without hackery). Maybe id should raise an exception if the argument referenced only has a ref count of 1 (i.e., just the reference from the argument list)? Or else let id be a class and return a minimal instance only binding the passed object, and customize the compare ops to take into account type diffs etc.? Then there would be no id values without corresponding objects, and id values used in expressions would die a natural death, along with their references to their objects -- whether "variables" or expressions. Sorry to belabor the obvious ;-) Regards, Bengt > > A bient?t, > > Armin. From lac at openend.se Sun Jul 10 21:13:26 2011 From: lac at openend.se (Laura Creighton) Date: Sun, 10 Jul 2011 21:13:26 +0200 Subject: [pypy-dev] Object identity and dict strategies In-Reply-To: Message from Bengt Richter of "Sun, 10 Jul 2011 18:08:41 +0200." <4E19CE89.6070905@oz.net> References: <20110708075857.GF20287@merlinux.eu>

<4E19CE89.6070905@oz.net> Message-ID: <201107101913.p6AJDQm6027255@theraft.openend.se> What do we want to happen when somebody -- say in a C extension -- takes the id of an object that is scheduled to be removed when the gc next runs? Laura From bokr at oz.net Mon Jul 11 12:29:34 2011 From: bokr at oz.net (Bengt Richter) Date: Mon, 11 Jul 2011 12:29:34 +0200 Subject: [pypy-dev] Object identity and dict strategies In-Reply-To: <201107101913.p6AJDQm6027255@theraft.openend.se> References: <20110708075857.GF20287@merlinux.eu>

<4E19CE89.6070905@oz.net> <201107101913.p6AJDQm6027255@theraft.openend.se> Message-ID: On 07/10/2011 09:13 PM Laura Creighton wrote: > What do we want to happen when somebody -- say in a C extension -- takes the id of an object > that is scheduled to be removed when the gc next runs? IMO taking the id should increment the object ref counter and prevent the garbage collection, until the id value itself is garbage collected. The obvious way would be to make id an object that keeps a reference to the object whose id it represents. See below[1] for an example (just for discussion illustration). Of course in low level access, conventions of ownership can sometimes safely optimize away actual ref incr/decr, but it sounds like your example proposes "taking an id" after ref count has gone to zero. That's like doing a reinterpret_cast to integer of a malloc pointer after the area's been freed, and expecting it to mean something. It should be an enforced nono, if you ask me. The example id attempts to make all equivalent immutables of the same type have the same id, by taking advantage of dict's key comparison properties and the .setdefault method. to get something like the current, with just the object reference to make an id hold its object, use IdHolder in place of Id. But then you can get all kinds of ids for the same immutable value, as with the old id. _______________________________________________________________________________ # idstuff.py -- a concept-exploring toy re "id" # 2011-07-10 22:58:46 bokr # class Id(object): refval = id # old id function objcache = {} # to store typed immutables for ref to first of equal-valued encountered # and using old id of the first as id for all of same type and equal value. def __init__(self, obj): self.obj = obj def id(self): so=self.obj if type(so) in (int,float,bool,tuple,str,unicode): # etc? t = (type(so), so) return self.refval(self.objcache.setdefault(t, t)[1]) else: return self.refval(so) # as now? XXX what is abstract meaning of id(some_mutable)? def __eq__(self, other): if type(self) != type(other): raise TypeError('Id instances can only be compared with each other,' 'not to "%s" instances.'% type(other).__name__) tobj=type(self.obj) tother=type(other.obj) if tobj != tother: return False return self.id() == other.id() def __repr__(self): return ''%(self.obj, self.id(), self.refval(self.obj)) def __str__(self): return ''%(self.obj,) class IdHolder(object): refval = id # old id function def __init__(self, obj): self.obj = obj self.id = self.refval(obj) def __eq__(self, other): if type(self) != type(other): raise TypeError('IdHolder instances can only be compared with each other,' ' not to "%s" instances.'% type(other).__name__) return self.id == other.id def __repr__(self): return ''%(self.obj, self.id) def __str__(self): return ''%(self.obj,) _______________________________________________________________________________ Python 2.7.2 (default, Jul 8 2011, 23:38:53) [GCC 4.1.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> oldid=id >>> from ut.idstuff import IdHolder as id >>> from ut.idstuff import Id as idk # with caching for equal (type,value) immutables >>> oldid(2000),oldid(2000),oldid(20*100) (136189164, 136189164, 136189176) >>> id(2000),id(2000),id(20*100) # no k cacheing (, ,