From santagada at gmail.com Mon Oct 3 04:46:51 2016 From: santagada at gmail.com (Leonardo Santagada) Date: Mon, 3 Oct 2016 10:46:51 +0200 Subject: [pypy-dev] Sorting structs In-Reply-To: References: Message-ID: Can you start by having a stable benchmark? Instead of calling random have a reverse counter to create your list in reverse order. That will at least get somewhat similar work on both languages (javascript random is notoriously bad at being random). Maybe the difference in performance is all from the difference in the distribution of the numbers. Also try bigger/smaller lists to see how they behave. (also warming up both jit engines might be necessary) On Mon, Sep 26, 2016 at 10:57 PM, Tuom Larsen wrote: > Dear list! > > I stumbled upon a problem and I was wondering if someone would by so > kind and explain to me what is going on. > > The problem is sorting 2 000 000 points by `x` coordinate: > > from random import random > from time import time > > class point(object): > def __init__(self, x, y): > self.x, self.y = x, y > > data = [point(random(), random()) for i in range(2000000)] > > t = time() > data.sort(key=lambda p:p.x) > print time() - t > > on my machine in runs 8.74s under PyPy 5.4.1 on MacOS. I then try to > sort the points in JavaScript: > > var data = []; > for (var i=0; i<2000000; i++) { > data.push({x:Math.random(), y:Math.random()}); > } > > console.time('sorting'); > data.sort(function(a, b) { return a.x - b.x; }); > console.timeEnd('sorting'); > > and it runs in 3.09s under V8 5.3. > > I was just wondering, why is it nearly 3x slower under PyPy than it is > under V8? Is there any way I could make the code run faster? > > Thank you in advance! > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > https://mail.python.org/mailman/listinfo/pypy-dev > -- Leonardo Santagada -------------- next part -------------- An HTML attachment was scrubbed... URL: From santagada at gmail.com Mon Oct 3 04:53:41 2016 From: santagada at gmail.com (Leonardo Santagada) Date: Mon, 3 Oct 2016 10:53:41 +0200 Subject: [pypy-dev] improve error message when missing 'self' in method definition In-Reply-To: References: <8453e56c-5971-9b73-dea0-e8a75f7d9d85@gmx.de> Message-ID: I've already proposed something like this a looong time ago and guido even said it is a good idea. I never got around to implementing it. How difficult would be to port this to CPython? The patch seems very concise On Wed, Sep 28, 2016 at 8:17 AM, Maciej Fijalkowski wrote: > On Tue, Sep 27, 2016 at 8:33 PM, Ryan Gonzalez wrote: > > Have you considered bringing this up on python-ideas, too? > > python-idea is generally quite a hostile place. That said, if you > think it's worth your effort to submit it there, feel free to do so, > just the core pypy devs feel their time is better spent elsewhere than > arguing on python-ideas > > > > > On Tue, Sep 27, 2016 at 12:19 PM, Carl Friedrich Bolz > wrote: > >> > >> Hi all, > >> > >> I read this paper today about common mistakes that Python beginners > >> make: > >> > >> > >> https://www.researchgate.net/publication/307088989_Some_ > Trouble_with_Transparency_An_Analysis_of_Student_Errors_ > with_Object-oriented_Python > >> > >> The most common one by far is forgetting the "self" parameter in the > >> method definition (which also still happens to me regularly). The error > >> message is not particularly enlightening, if you don't quite understand > >> the explicit self in Python. > >> > >> > >> So I wonder whether we should print a better error message, something > >> like this: > >> > >> $ cat m.py > >> class A(object): > >> def f(x): > >> return self.x > >> A().f(1) > >> > >> $ pypy m.py > >> Traceback (application-level): > >> File "m.py", line 4 in > >> A().f(1) > >> TypeError: f() takes exactly 1 argument (2 given). Did you forget 'self' > >> in the function definition? > >> > >> > >> It's a bit the question how clever we would like this to be to reduce > >> false positives, see the attached patch for a very simple approach. > >> > >> Anyone have opinions? > >> > >> Cheers, > >> > >> Carl Friedrich > >> > >> _______________________________________________ > >> pypy-dev mailing list > >> pypy-dev at python.org > >> https://mail.python.org/mailman/listinfo/pypy-dev > >> > > > > > > > > -- > > Ryan > > [ERROR]: Your autotools build scripts are 200 lines longer than your > > program. Something?s wrong. > > http://kirbyfan64.github.io/ > > > > > > _______________________________________________ > > pypy-dev mailing list > > pypy-dev at python.org > > https://mail.python.org/mailman/listinfo/pypy-dev > > > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > https://mail.python.org/mailman/listinfo/pypy-dev > -- Leonardo Santagada -------------- next part -------------- An HTML attachment was scrubbed... URL: From tuom.larsen at gmail.com Mon Oct 3 09:53:32 2016 From: tuom.larsen at gmail.com (Tuom Larsen) Date: Mon, 3 Oct 2016 15:53:32 +0200 Subject: [pypy-dev] Sorting structs In-Reply-To: References: Message-ID: Thanks for the reply! I don't think a reverse counter is better because TimSort (used in PyPy?) is able to recognise the pattern [1] and do better than other kinds of sorts. And almost for a year [2] now, V8 implements random as XorShift128+ which very good. I also tried different sizes and the timings still show similar difference. [1] http://c2.com/cgi/wiki?TimSort [2] https://bugs.chromium.org/p/chromium/issues/detail?id=559024 On Mon, Oct 3, 2016 at 10:46 AM, Leonardo Santagada wrote: > Can you start by having a stable benchmark? Instead of calling random have a > reverse counter to create your list in reverse order. That will at least get > somewhat similar work on both languages (javascript random is notoriously > bad at being random). > > Maybe the difference in performance is all from the difference in the > distribution of the numbers. Also try bigger/smaller lists to see how they > behave. (also warming up both jit engines might be necessary) > > > On Mon, Sep 26, 2016 at 10:57 PM, Tuom Larsen wrote: >> >> Dear list! >> >> I stumbled upon a problem and I was wondering if someone would by so >> kind and explain to me what is going on. >> >> The problem is sorting 2 000 000 points by `x` coordinate: >> >> from random import random >> from time import time >> >> class point(object): >> def __init__(self, x, y): >> self.x, self.y = x, y >> >> data = [point(random(), random()) for i in range(2000000)] >> >> t = time() >> data.sort(key=lambda p:p.x) >> print time() - t >> >> on my machine in runs 8.74s under PyPy 5.4.1 on MacOS. I then try to >> sort the points in JavaScript: >> >> var data = []; >> for (var i=0; i<2000000; i++) { >> data.push({x:Math.random(), y:Math.random()}); >> } >> >> console.time('sorting'); >> data.sort(function(a, b) { return a.x - b.x; }); >> console.timeEnd('sorting'); >> >> and it runs in 3.09s under V8 5.3. >> >> I was just wondering, why is it nearly 3x slower under PyPy than it is >> under V8? Is there any way I could make the code run faster? >> >> Thank you in advance! >> _______________________________________________ >> pypy-dev mailing list >> pypy-dev at python.org >> https://mail.python.org/mailman/listinfo/pypy-dev > > > > > -- > > Leonardo Santagada From cfbolz at gmx.de Mon Oct 3 13:01:05 2016 From: cfbolz at gmx.de (Carl Friedrich Bolz) Date: Mon, 3 Oct 2016 19:01:05 +0200 Subject: [pypy-dev] improve error message when missing 'self' in method definition In-Reply-To: References: <8453e56c-5971-9b73-dea0-e8a75f7d9d85@gmx.de> Message-ID: <59a42401-bb93-8986-08ee-daf645b478c6@gmx.de> On 03/10/16 10:53, Leonardo Santagada wrote: > I've already proposed something like this a looong time ago and guido > even said it is a good idea. I never got around to implementing it. How > difficult would be to port this to CPython? The patch seems very concise Ah, cool. Do you have a link to the mailing list discussion? Anyway, the implementation is going to be slightly more complex, to reduce the false positive rate. So I have no idea how easy it would be to port to CPython. Cheers, Carl Friedrich From cfbolz at gmx.de Mon Oct 3 13:28:09 2016 From: cfbolz at gmx.de (Carl Friedrich Bolz) Date: Mon, 3 Oct 2016 19:28:09 +0200 Subject: [pypy-dev] Sorting structs In-Reply-To: References: Message-ID: Hi Tuom, The problem is the key=... argument to sort, which isn't optimized well. This kind of code is much faster: wrapped_data = [(p.x, p) for p in data] wrapped_data.sort() data = [it[1] for it in wrapped_data] We should fix this problem. I've created an issue so it doesn't get lost: https://bitbucket.org/pypy/pypy/issues/2410/listsort-key-is-slow Thanks for the report! Cheers, Carl Friedrich On 26/09/16 22:57, Tuom Larsen wrote: > Dear list! > > I stumbled upon a problem and I was wondering if someone would by so > kind and explain to me what is going on. > > The problem is sorting 2 000 000 points by `x` coordinate: > > from random import random > from time import time > > class point(object): > def __init__(self, x, y): > self.x, self.y = x, y > > data = [point(random(), random()) for i in range(2000000)] > > t = time() > data.sort(key=lambda p:p.x) > print time() - t > > on my machine in runs 8.74s under PyPy 5.4.1 on MacOS. I then try to > sort the points in JavaScript: > > var data = []; > for (var i=0; i<2000000; i++) { > data.push({x:Math.random(), y:Math.random()}); > } > > console.time('sorting'); > data.sort(function(a, b) { return a.x - b.x; }); > console.timeEnd('sorting'); > > and it runs in 3.09s under V8 5.3. > > I was just wondering, why is it nearly 3x slower under PyPy than it is > under V8? Is there any way I could make the code run faster? > > Thank you in advance! > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > https://mail.python.org/mailman/listinfo/pypy-dev > From tuom.larsen at gmail.com Mon Oct 3 17:49:17 2016 From: tuom.larsen at gmail.com (Tuom Larsen) Date: Mon, 3 Oct 2016 23:49:17 +0200 Subject: [pypy-dev] Sorting structs In-Reply-To: References: Message-ID: Hello Carl, thanks a lot for the clarification and for creating the ticket! On Mon, Oct 3, 2016 at 7:28 PM, Carl Friedrich Bolz wrote: > Hi Tuom, > > The problem is the key=... argument to sort, which isn't optimized > well. This kind of code is much faster: > > wrapped_data = [(p.x, p) for p in data] > wrapped_data.sort() > data = [it[1] for it in wrapped_data] > > We should fix this problem. I've created an issue so it doesn't get > lost: > > https://bitbucket.org/pypy/pypy/issues/2410/listsort-key-is-slow > > Thanks for the report! > > Cheers, > > Carl Friedrich > > > On 26/09/16 22:57, Tuom Larsen wrote: >> Dear list! >> >> I stumbled upon a problem and I was wondering if someone would by so >> kind and explain to me what is going on. >> >> The problem is sorting 2 000 000 points by `x` coordinate: >> >> from random import random >> from time import time >> >> class point(object): >> def __init__(self, x, y): >> self.x, self.y = x, y >> >> data = [point(random(), random()) for i in range(2000000)] >> >> t = time() >> data.sort(key=lambda p:p.x) >> print time() - t >> >> on my machine in runs 8.74s under PyPy 5.4.1 on MacOS. I then try to >> sort the points in JavaScript: >> >> var data = []; >> for (var i=0; i<2000000; i++) { >> data.push({x:Math.random(), y:Math.random()}); >> } >> >> console.time('sorting'); >> data.sort(function(a, b) { return a.x - b.x; }); >> console.timeEnd('sorting'); >> >> and it runs in 3.09s under V8 5.3. >> >> I was just wondering, why is it nearly 3x slower under PyPy than it is >> under V8? Is there any way I could make the code run faster? >> >> Thank you in advance! >> _______________________________________________ >> pypy-dev mailing list >> pypy-dev at python.org >> https://mail.python.org/mailman/listinfo/pypy-dev >> > > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > https://mail.python.org/mailman/listinfo/pypy-dev From santagada at gmail.com Tue Oct 4 07:52:46 2016 From: santagada at gmail.com (Leonardo Santagada) Date: Tue, 4 Oct 2016 13:52:46 +0200 Subject: [pypy-dev] improve error message when missing 'self' in method definition In-Reply-To: <59a42401-bb93-8986-08ee-daf645b478c6@gmx.de> References: <8453e56c-5971-9b73-dea0-e8a75f7d9d85@gmx.de> <59a42401-bb93-8986-08ee-daf645b478c6@gmx.de> Message-ID: On Mon, Oct 3, 2016 at 7:01 PM, Carl Friedrich Bolz wrote: > Ah, cool. Do you have a link to the mailing list discussion? Oh I really couldn't find it, maybe my memory is failing me (i did found a discussion from 2007 about something related to that (internationalization of messages, but there is no input from Guido and the discussion didn't go anywhere) I can take it to python-ideas and see it through. -- Leonardo Santagada -------------- next part -------------- An HTML attachment was scrubbed... URL: From firxen at gmail.com Sat Oct 8 06:59:09 2016 From: firxen at gmail.com (Jeremy Thurgood) Date: Sat, 8 Oct 2016 12:59:09 +0200 Subject: [pypy-dev] Bump DARWIN_VERSION_MIN to 10.7? Message-ID: Hi, I recently tried to build RevDB on OS X and the build failed because __thread isn't supported on OS X 10.6 which RPython specifies as the minimum supported version. I've updated DARWIN_VERSION_MIN to 10.7 in the reverse-debugger branch[1], because otherwise it doesn't build at all on OS X. Armin suggested that there would be a performance benefit to increasing DARWIN_VERSION_MIN for pypy as well (along with adding darwin to the SUPPORT__THREAD list). I would like to make this change, but I don't want to break pypy for anyone who may still need OS X 10.6 support. As far as I can tell, OS X 10.6 is unsupported (last release in 2011) but still available as a necessary upgrade step from older versions that don't support the app store to newer versions that are only available through the app store. I also have an old laptop running OS X 10.8, and several applications (including Chrome) have dropped support for that recently. [1] https://bitbucket.org/pypy/pypy/commits/a740348ea339c358601d9029bc6811f56c4d71ff Thanks, --jerith From dannym at scratchpost.org Sat Oct 8 08:39:11 2016 From: dannym at scratchpost.org (Danny Milosavljevic) Date: Sat, 8 Oct 2016 14:39:11 +0200 Subject: [pypy-dev] pypy3.3: building extensions: environment variables CC etc - not honored Message-ID: <20161008143911.100bd295@scratchpost.org> Hi, I'm trying to package pypy3.3 for the Guix distribution. In the course of that I found that pypy3.3-v5.2.0-alpha1-src/lib-python/3/distutils/sysconfig_pypy.py doesn't honor the compiler environment variables ("CC" etc) like sysconfig_cpython.py does. Is that on purpose? If not, find attached a patch which makes it honor the variables (I have tested it and it works). Cheers, Danny -------------- next part -------------- A non-text attachment was scrubbed... Name: pypy3.3-fix-compiler.patch Type: text/x-patch Size: 1385 bytes Desc: not available URL: From planrichi at gmail.com Sun Oct 9 10:51:23 2016 From: planrichi at gmail.com (Richard Plangger) Date: Sun, 9 Oct 2016 16:51:23 +0200 Subject: [pypy-dev] pypy3.3 release Message-ID: <5e701517-e31e-45a4-e8f7-0be7d5b0e23d@gmail.com> Hello, I have prepared an alpha release for the python 3.3 branching from the py3k branch. As far as I understood this means that the development will now continue towards python 3.5 in the py3.5 branch. Should we rename the py3k branch to py3.3 (branch py3.3 from py3k and close py3k)? I have noticed that the arm buildbots that build the raring pypy executables are offline since ~August. Can someone restart them? I will reupload the builds as soon as one of them come up again. Cheers, Richard -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From cfbolz at gmx.de Sun Oct 9 13:21:03 2016 From: cfbolz at gmx.de (Carl Friedrich Bolz) Date: Sun, 9 Oct 2016 19:21:03 +0200 Subject: [pypy-dev] pypy3.3: building extensions: environment variables CC etc - not honored In-Reply-To: <20161008143911.100bd295@scratchpost.org> References: <20161008143911.100bd295@scratchpost.org> Message-ID: <2d72285c-6456-1a9b-cf6d-f9858f43bb4f@gmx.de> On 08/10/16 14:39, Danny Milosavljevic wrote: > Hi, > > I'm trying to package pypy3.3 for the Guix distribution. > > In the course of that I found that > pypy3.3-v5.2.0-alpha1-src/lib-python/3/distutils/sysconfig_pypy.py > doesn't honor the compiler environment variables ("CC" etc) like > sysconfig_cpython.py does. Is that on purpose? If not, find attached > a patch which makes it honor the variables (I have tested it and it > works). Hi Danny, thanks for the patch. Would you please file an issue so it doesn't get lost? Cheers, Carl Friedrich From phyo.arkarlwin at gmail.com Mon Oct 10 05:33:21 2016 From: phyo.arkarlwin at gmail.com (Phyo Arkar) Date: Mon, 10 Oct 2016 09:33:21 +0000 Subject: [pypy-dev] pypy3.3 release In-Reply-To: <5e701517-e31e-45a4-e8f7-0be7d5b0e23d@gmail.com> References: <5e701517-e31e-45a4-e8f7-0be7d5b0e23d@gmail.com> Message-ID: Richard , thank you very much for moving directly to 3.5! On Sun, Oct 9, 2016 at 9:22 PM Richard Plangger wrote: > Hello, > > I have prepared an alpha release for the python 3.3 branching from the > py3k branch. As far as I understood this means that the development will > now continue towards python 3.5 in the py3.5 branch. Should we rename > the py3k branch to py3.3 (branch py3.3 from py3k and close py3k)? > > I have noticed that the arm buildbots that build the raring pypy > executables are offline since ~August. Can someone restart them? I will > reupload the builds as soon as one of them come up again. > > Cheers, > Richard > > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > https://mail.python.org/mailman/listinfo/pypy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Oct 17 04:08:37 2016 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 17 Oct 2016 01:08:37 -0700 Subject: [pypy-dev] RFC: draft idea for making for loops automatically close iterators Message-ID: Hi all, I've been poking at an idea for changing how 'for' loops work to hopefully make them work better for pypy and async/await code. I haven't taken it to python-ideas yet -- this is its first public outing, actually -- but since it directly addresses pypy GC issues I thought I'd send around a draft to see what you think. (E.g., would this be something that makes your life easier?) Thanks, -n ---- Abstract ======== We propose to extend the iterator protocol with a new ``__(a)iterclose__`` slot, which is called automatically on exit from ``(async) for`` loops, regardless of how they exit. This allows for convenient, deterministic cleanup of resources held by iterators without reliance on the garbage collector. This is especially important (and urgent) for asynchronous generators. Note ==== In practical terms, the proposal here is divided into two separate parts: the handling of async iterators, which should ideally be implemented ASAP, and the handling of regular iterators, which is a larger but more relaxed project that won't even start until 3.7. But since the changes are closely related, and we probably don't want to end up with async iterators and regular iterators diverging in the long run, it seems useful to look at them together. Background and motivation ========================= Python iterables often hold resources which require cleanup. For example: ``file`` objects need to be closed; the `WSGI spec `_ adds a ``close`` method on top of the regular iterator protocol and demands that consumers call it at the appropriate time (though forgetting to do so is a `frequent source of bugs `_); and PEP 342 (based on PEP 325) extended generator objects to add a ``close`` method to allow generators to clean up after themselves. Generally, objects that need to clean up after themselves define a ``__del__`` method to ensure that this cleanup will happen eventually, when the object is garbage collected. However, relying on the garbage collector for cleanup like this causes serious problems in at least two cases: - In Python implementations that do not use reference counting (e.g. PyPy, Jython), calls to ``__del__`` may be arbitrarily delayed, yet many situations depend on *prompt* cleanup of resources. Delayed cleanup produces problems like crashes due to file descriptor exhaustion, or WSGI timing middleware that collects bogus times. - Async generators (PEP 525) can only perform cleanup under the supervision of the appropriate coroutine runner. ``__del__`` doesn't have access to the coroutine runner; indeed, the coroutine runner might be garbage collected before the generator object. So relying on the garbage collector is effectively impossible without some kind of language extension. (PEP 525 does provide such an extension, but it has serious flaws; see the "rejected alternatives" section below.) The usual recommendation, therefore, is to avoid the garbage collector by using ``with`` blocks. For example, this code opens a file but relies on the garbage collector to close it:: def read_newline_separated_json(path): for line in open(path): yield json.loads(line) for document in read_newline_separated_json(path): ... and recent versions of CPython will point this out by issuing a ``ResourceWarning``, nudging us to fix it by adding a ``with`` block:: def read_newline_separated_json(path): with open(path) as file_handle: # <-- new for line in file_handle: yield json.loads(line) for document in read_newline_separated_json(path): # <-- outer for loop ... But there's a subtlety here, caused by the interaction of ``with`` blocks and generators. ``with`` blocks are Python's main tool for ensuring cleanup, and they're a powerful one, because they pin the lifetime of a resource to the lifetime of a stack frame. But in the case of generators, we need a ``with`` block to ensure that the stack frame is cleaned up! In this case, adding the ``with`` block *is* enough to shut up the ``ResourceWarning``, but this is misleading -- the file object cleanup here is still dependent on the garbage collector. The ``with`` block will only be unwound when the ``read_newline_separated_json`` generator is closed. If the outer ``for`` loop runs to completion then the cleanup will happen immediately; but if this loop is terminated early by a ``break`` or an exception, then the ``with`` block won't fire until the generator object is garbage collected. The correct solution requires that all *users* of this API wrap every ``for`` loop in its own ``with`` block:: with closing(read_newline_separated_json(path)) as genobj: for document in genobj: ... This gets even worse if we consider the idiom of decomposing a complex pipeline into multiple nested generators:: def read_users(path): with closing(read_newline_separated_json(path)) as gen: for document in gen: yield User.from_json(document) def users_in_group(path, group): with closing(read_users(path)) as gen: for user in gen: if user.group == group: yield user In general if you have N nested generators then you need N+1 ``with`` blocks to clean up 1 file. And good defensive programming would suggest that any time we use a generator, we should assume the possibility that there could be at least one ``with`` block somewhere in its (potentially transitive) call stack, either now or in the future, and thus always wrap it in a ``with``. But in practice, basically nobody does this, because it's awful and programmers would rather write buggy code than tiresome repetitive code. Is this worth fixing? Previously I would have argued yes, but that it was a low priority -- until the advent of async generators, which makes this problem much more urgent. Async generators cannot do cleanup *at all* without some mechanism for deterministic cleanup that people will actually use, and async generators are particularly likely to hold resources like file descriptors. (After all, if they weren't doing I/O, they'd be generators, not async generators.) And if we don't get it right now when async generators are first rolling out, then it'll be much harder to fix later. The proposal itself is simple in concept: add a ``__(a)iterclose__`` method to the iterator protocol, and have (async) ``for`` loops call it when the loop is exited. Effectively, we're taking the current cumbersome idiom (``with`` block + ``for`` loop) and merging them together into a fancier ``for``. Rejected alternatives ===================== PEP 525 asyncgen hooks ---------------------- PEP 525 proposes a `set of global hooks managed by new ``sys.{get/set}_asyncgen_hooks()`` functions `_, which event loops are intended to register to take control of async generator finalization. The stated goal is that "the end user does not need to care about the finalization problem, and everything just works". Unfortunately, though, the approach has a number of downsides: - It adds substantial complexity: we have new global interpreter state, and new public API in asyncio (``loop.shutdown_asyncgens()``) that users have to remember to call at the appropriate time. - The ``firstiter`` hook has to be able to uniquely identify what coroutine runner is being used at any given moment, and the ``finalizer`` hook has to be able to take a generator object and figure out which coroutine runner was supervising it initially. These requirements introduce surprisingly complicated couplings and potential constraints on future designs. For example, one might plausibly want to start several OS threads, and run a separate asyncio event loop in each -- ``asyncio.BaseEventLoopPolicy`` takes some trouble to support exactly this use case. But once there are multiple event loops running simultaneously, the hooks have the problem of somehow matching up each generator to its corresponding event loop. For ``firstiter`` this isn't so bad -- we can assume that the thread where ``firstiter`` is called is matches the thread whose event loop we want -- but ``finalizer`` is trickier, since the generator might be collected in a different thread than it started in. The code currently in the asyncio master branch doesn't consider this situation at all. If you try, what will happen is that whichever event loop starts up last will run the finalizers for all threads, which will probably blow up spectacularly. The current implementation is also broken if the following sequence of events occurs: 1. start a loop 2. firstiter(agen) invoked 3. stop the loop, but forget to call ``loop.shutdown_asyncgens()``. (NB: No existing asyncio programs call ``loop.shutdown_asyncgens()``, and it's never called automatically.) 4. create a new loop 5. finalizer(agen) invoked -- now the new loop will happily attempt to execute agen.aclose() These issues with the current implementation are fixable (XX FIXME file bugs), but they give a sense of how tricky this API is. It gets worse: suppose I want to run an asyncio event loop in one thread and a twisted reactor loop in another (e.g., to take advantage of twisted functionality that hasn't yet been ported to run on top of asyncio, with communication between the threads using ``call_soon_threadsafe`` / ``callFromThread``). Now the two event loops have to fight over the hooks. Curio currently doesn't even have the concept of a global event loop. A more obscure case arises with libraries like `async_generator `_, which runs code under a "proxy" coroutine runner that handles some yields itself while forwarding others on to the real event loop. Here it is the *inner* coroutine runner that should be used for calling ``aclose``, not the outer one, but there is no way for the hooks to know this. Though obviously this isn't a problem for async_generator itself since it's obsoleted by PEP 525, and it's not clear whether this technique has other use cases. But on the other hand, maybe we should try to keep our options open; we have so little experience with async/await that it's hard to say what clever tricks will turn out to be important. Basically the point is, these hooks have extremely delicate semantics and it's not at all clear that we know how to deal with all the situations they cause. - The new semantics aren't part of the abstract async iterator protocol, but are instead tied `specifically to the async generator concrete type `_. If you have an async iterator implemented using a class, like:: class MyAIterator: def __anext__(): ... then you can't refactor this into an async generator without changing the semantics, and vice-versa. This seems very unpythonic. (It also leaves open the question of what exactly class-based async iterators are supposed to do, given that since they face exactly the same cleanup problems as async generators.) And then assuming we manage to avoid the problems above, the best-case payoff is that we get GC semantics for async generators. So after all that it's still effectively a CPython-only feature (!!), and even there it has poor ergonomics, e.g., if ``aclose`` raises an error then it will get lost. In practice, code that wants to be portable across Python implementations or handle exceptions reliably will still have to write things like:: with aclosing(get_newline_separated_json(url)) as agen: async for document in agen: ... just like it would if the asyncgen hooks didn't exist. By comparison, the present proposal is straightforward and understandable, requires no global state or global coordination between coroutine runners, works equally well for generators and other iterators, works on PyPy, gives properly propagating exceptions by default, etc. Always inject resources, and do all cleanup at the top level ------------------------------------------------------------ It was suggested on python-dev (XX find link) that a pattern to avoid these problems is to always pass resources in from above, e.g. ``read_newline_separated_json`` should take a file object rather than a path, with cleanup handled at the top level:: def read_newline_separated_json(file_handle): for line in file_handle: yield json.loads(line) def read_users(file_handle): for document in read_newline_separated_json(file_handle): yield User.from_json(document) with open(path) as file_handle: for user in read_users(file_handle): ... This works well in simple cases; here it lets us avoid the "N+1 problem". But unfortunately, it breaks down quickly when things get more complex. Consider if instead of reading from a file, our generator was processing the body returned by an HTTP GET request -- while handling redirects and authentication via OAUTH. Then we'd really want the sockets to be managed down inside our HTTP client library, not at the top level. Plus there are other cases where ``finally`` blocks embedded inside generators are important in their own right: db transaction management, emitting logging information during cleanup (one of the major motivating use cases for WSGI ``close``), and so forth. Specification: final state ========================== This section describes where we want to eventually end up, though there are some backwards compatibility issues that mean we can't jump directly here. A later section describes the transition plan. Guiding principles ------------------ Generally, ``__(a)iterclose__`` implementations should: - be idempotent, - perform any cleanup that is appropriate on the assumption that the iterator will not be used again after ``__(a)iterclose__`` is called. In particular, once ``__(a)iterclose__`` has been called then calling ``__(a)next__`` produces undefined behavior. And generally, any code which starts iterating through an iterable with the intention of exhausting it, should arrange to make sure that ``__(a)iterclose__`` is eventually called, whether or not the iterator is actually exhausted. Changes to iteration -------------------- The core proposal is the change in behavior of ``for`` loops. Given this Python code:: for VAR in ITERABLE: LOOP-BODY else: ELSE-BODY we desugar to the equivalent of:: _iter = iter(ITERABLE) _iterclose = getattr(type(_iter), "__iterclose__", lambda: None) try: traditional-for VAR in _iter: LOOP-BODY else: ELSE-BODY finally: _iterclose(_iter) where the "traditional-for statement" here is meant as a shorthand for the classic 3.5-and-earlier ``for`` loop semantics. Besides the top-level ``for`` statement, Python also contains several other places where iterators are consumed. For consistency, these should call ``__iterclose__`` as well using semantics equivalent to the above. This includes: - ``for`` loops inside comprehensions - ``*`` unpacking - functions which accept and fully consume iterables, like ``list(it)``, ``tuple(it)``, ``itertools.product(it1, it2, ...)``, and others. Changes to async iteration -------------------------- We also make the analogous changes to async iteration constructs, except that the new slot is called ``__aiterclose__``, and it's an async method that gets ``await``\ed. Modifications to basic iterator types ------------------------------------- Generator objects (including those created by generator comprehensions): - ``__iterclose__`` calls ``self.close()`` - ``__del__`` calls ``self.close()`` (same as now), and additionally issues a ``ResourceWarning`` if ``aclose`` has not been called. This warning is hidden by default, but can be enabled for those who want to make sure they aren't inadverdantly relying on CPython-specific GC semantics. Async generator objects (including those created by async generator comprehensions): - ``__aiterclose__`` calls ``self.aclose()`` - ``__del__`` issues a ``RuntimeWarning`` if ``aclose`` has not been called, since this probably indicates a latent bug, similar to the "coroutine never awaited" warning. OPEN QUESTION: should file objects implement ``__iterclose__`` to close the file? On the one hand this would make this change more disruptive; on the other hand people really like writing ``for line in open(...): ...``, and if we get used to iterators taking care of their own cleanup then it might become very weird if files don't. New convenience functions ------------------------- The ``itertools`` module gains a new iterator wrapper that can be used to selectively disable the new ``__iterclose__`` behavior:: # XX FIXME: I feel like there might be a better name for this one? class protect(iterable): def __init__(self, iterable): self._it = iter(iterable) def __iter__(self): return self def __next__(self): return next(self._it) def __iterclose__(self): # Swallow __iterclose__ without passing it on pass Example usage (assuming that file objects implements ``__iterclose__``):: with open(...) as handle: # Iterate through the same file twice: for line in itertools.protect(handle): ... handle.seek(0) for line in itertools.protect(handle): ... The ``operator`` module gains two new functions, with semantics equivalent to the following:: def iterclose(it): if hasattr(type(it), "__iterclose__"): type(it).__iterclose__(it) async def aiterclose(ait): if hasattr(type(ait), "__aiterclose__"): await type(ait).__aiterclose__(ait) These are particularly useful when implementing the changes in the next section: __iterclose__ implementations for iterator wrappers --------------------------------------------------- Python ships a number of iterator types that act as wrappers around other iterators: ``map``, ``zip``, ``itertools.accumulate``, ``csv.reader``, and others. These iterators should define a ``__iterclose__`` method which calls ``__iterclose__`` in turn on their underlying iterators. For example, ``map`` could be implemented as:: class map: def __init__(self, fn, *iterables): self._fn = fn self._iters = [iter(iterable) for iterable in iterables] def __iter__(self): return self def __next__(self): return self._fn(*[next(it) for it in self._iters]) def __iterclose__(self): for it in self._iters: operator.iterclose(it) In some cases this requires some subtlety; for example, ```itertools.tee`` `_ should not call ``__iterclose__`` on the underlying iterator until it has been called on *all* of the clone iterators. Example / Rationale ------------------- The payoff for all this is that we can now write straightforward code like:: def read_newline_separated_json(path): for line in open(path): yield json.loads(line) and be confident that the file will receive deterministic cleanup *without the end-user having to take any special effort*, even in complex cases. For example, consider this silly pipeline:: list(map(lambda key: key.upper(), doc["key"] for doc in read_newline_separated_json(path))) If our file contains a document where ``doc["key"]`` turns out to be an integer, then the following sequence of events will happen: 1. ``key.upper()`` raises an ``AttributeError``, which propagates out of the ``map`` and triggers the implicit ``finally`` block inside ``list``. 2. The ``finally`` block in ``list`` calls ``__iterclose__()`` on the map object. 3. ``map.__iterclose__()`` calls ``__iterclose__()`` on the generator comprehension object. 4. This injects a ``GeneratorExit`` exception into the generator comprehension body, which is currently suspended inside the comprehension's ``for`` loop body. 5. The exception propagates out of the ``for`` loop, triggering the ``for`` loop's implicit ``finally`` block, which calls ``__iterclose__`` on the generator object representing the call to ``read_newline_separated_json``. 6. This injects an inner ``GeneratorExit`` exception into the body of ``read_newline_separated_json``, currently suspended at the ``yield``. 7. The inner ``GeneratorExit`` propagates out of the ``for`` loop, triggering the ``for`` loop's implicit ``finally`` block, which calls ``__iterclose__()`` on the file object. 8. The file object is closed. 9. The inner ``GeneratorExit`` resumes propagating, hits the boundary of the generator function, and causes ``read_newline_separated_json``'s ``__iterclose__()`` method to return successfully. 10. Control returns to the generator comprehension body, and the outer ``GeneratorExit`` continues propagating, allowing the comprehension's ``__iterclose__()`` to return successfully. 11. The rest of the ``__iterclose__()`` calls unwind without incident, back into the body of ``list``. 12. The original ``AttributeError`` resumes propagating. (The details above assume that we implement ``file.__iterclose__``; if not then add a ``with`` block to ``read_newline_separated_json`` and essentially the same logic goes through.) Of course, from the user's point of view, this can be simplified down to just: 1. ``int.upper()`` raises an ``AttributeError`` 1. The file object is closed. 2. The ``AttributeError`` propagates out of ``list`` So we've accomplished our goal of making this "just work" without the user having to think about it. Specification: how to get there from here ========================================= While the majority of existing ``for`` loops will continue to produce identical results, the proposed changes will produce backwards-incompatible behavior in some cases. Example:: def read_csv_with_header(lines_iterable): lines_iterator = iter(lines_iterable) # Used to be correct; now needs an itertools.protect() here: for line in lines_iterator: column_names = line.strip().split("\t") break for line in lines_iterator: values = line.strip().split("\t") record = dict(zip(column_names, values)) yield record Specifically, the incompatibility happens when all of these factors come together: - The automatic calling of ``__(a)iterclose__`` is enabled - The iterable did not previously define ``__(a)iterclose__`` - The iterable does now define ``__(a)iterclose__`` - The iterable is re-used after the ``for`` loop exits So the problem is how to manage this transition, and those are the levers we have to work with. First, observe that the only async iterables where we propose to add ``__aiterclose__`` are async generators, and there is currently no existing code using async generators (though this will start changing very soon), so the async changes do not produce any backwards incompatibilities. (There is existing code using async iterators, but using the new async for loop on an old async iterator is harmless, because old async iterators don't have ``__aiterclose__``.) In addition, PEP 525 was accepted on a provisional basis, and async generators are by far the biggest beneficiary of this PEP's proposed changes. Therefore, I think we should strongly consider enabling ``__aiterclose__`` for ``async for`` loops and async generators ASAP, ideally for 3.6.0 or 3.6.1. For the non-async world, things are harder, but here's a potential transition path: In 3.7: Our goal is that existing unsafe code will start emitting warnings, while those who want to opt-in to the future can do that immediately: - We immediately add all the ``__iterclose__`` methods described above. - If ``from __future__ import iterclose`` is in effect, then ``for`` loops and ``*`` unpacking call ``__iterclose__`` as specified above. - If the future is *not* enabled, then ``for`` loops and ``*`` unpacking do *not* call ``__iterclose__``. But they do call some other method instead, e.g. ``__iterclose_warning__``. - Similarly, functions like ``list`` use stack introspection (!!) to check whether their direct caller has ``__future__.iterclose`` enabled, and use this to decide whether to call ``__iterclose__`` or ``__iterclose_warning__``. - For all the wrapper iterators, we also add ``__iterclose_warning__`` methods that forward to the ``__iterclose_warning__`` method of the underlying iterator or iterators. - For generators (and files, if we decide to do that), ``__iterclose_warning__`` is defined to set an internal flag, and other methods on the object are modified to check for this flag. If they find the flag set, they issue a ``PendingDeprecationWarning`` to inform the user that in the future this sequence would have led to a use-after-close situation and the user should use ``protect()``. In 3.8: - Switch from ``PendingDeprecationWarning`` to ``DeprecationWarning`` In 3.9: - Enable the future unconditionally and remove all the ``__iterclose_warning__`` stuff. I believe that this satisfies the normal requirements for this kind of transition -- opt-in initially, with warnings targeted precisely to the cases that will be effected, and a long deprecation cycle. Probably the most controversial / risky part of this is the use of stack introspection to make the iterable-consuming functions sensitive to a ``__future__`` setting, though I haven't thought of any situation where it would actually go wrong yet... -- Nathaniel J. Smith -- https://vorpus.org From oscar.j.benjamin at gmail.com Mon Oct 17 08:04:46 2016 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 17 Oct 2016 13:04:46 +0100 Subject: [pypy-dev] RFC: draft idea for making for loops automatically close iterators In-Reply-To: References: Message-ID: On 17 October 2016 at 09:08, Nathaniel Smith wrote: > Hi all, > > I've been poking at an idea for changing how 'for' loops work to > hopefully make them work better for pypy and async/await code. I > haven't taken it to python-ideas yet -- this is its first public > outing, actually -- but since it directly addresses pypy GC issues I > thought I'd send around a draft to see what you think. (E.g., would > this be something that makes your life easier?) To be clear, I'm not a PyPy dev so I'm just answering from a general Python perspective here. > Always inject resources, and do all cleanup at the top level > ------------------------------------------------------------ > > It was suggested on python-dev (XX find link) that a pattern to avoid > these problems is to always pass resources in from above, e.g. > ``read_newline_separated_json`` should take a file object rather than > a path, with cleanup handled at the top level:: I suggested this and I still think that it is the best idea. > def read_newline_separated_json(file_handle): > for line in file_handle: > yield json.loads(line) > > def read_users(file_handle): > for document in read_newline_separated_json(file_handle): > yield User.from_json(document) > > with open(path) as file_handle: > for user in read_users(file_handle): > ... > > This works well in simple cases; here it lets us avoid the "N+1 > problem". But unfortunately, it breaks down quickly when things get > more complex. Consider if instead of reading from a file, our > generator was processing the body returned by an HTTP GET request -- > while handling redirects and authentication via OAUTH. Then we'd > really want the sockets to be managed down inside our HTTP client > library, not at the top level. Plus there are other cases where > ``finally`` blocks embedded inside generators are important in their > own right: db transaction management, emitting logging information > during cleanup (one of the major motivating use cases for WSGI > ``close``), and so forth. I haven't written the kind of code that you're describing so I can't say exactly how I would do it. I imagine though that helpers could be used to solve some of the problems that you're referring to though. Here's a case I do know where the above suggestion is awkward: def concat(filenames): for filename in filenames: with open(filename) as inputfile: yield from inputfile for line in concat(filenames): ... It's still possible to safely handle this use case by creating a helper though. fileinput.input almost does what you want: with fileinput.input(filenames) as lines: for line in lines: ... Unfortunately if filenames is empty this will default to sys.stdin so it's not perfect but really I think introducing useful helpers for common cases (rather than core language changes) should be considered as the obvious solution here. Generally it would have been better if the discussion for PEP 525 has focussed more on helping people to debug/fix dependence on __del__ rather than trying to magically fix broken code. > New convenience functions > ------------------------- > > The ``itertools`` module gains a new iterator wrapper that can be used > to selectively disable the new ``__iterclose__`` behavior:: > > # XX FIXME: I feel like there might be a better name for this one? > class protect(iterable): > def __init__(self, iterable): > self._it = iter(iterable) > > def __iter__(self): > return self > > def __next__(self): > return next(self._it) > > def __iterclose__(self): > # Swallow __iterclose__ without passing it on > pass > > Example usage (assuming that file objects implements ``__iterclose__``):: > > with open(...) as handle: > # Iterate through the same file twice: > for line in itertools.protect(handle): > ... > handle.seek(0) > for line in itertools.protect(handle): > ... It would be much simpler to reverse this suggestion and say let's introduce a helper that selectively *enables* the new behaviour you're proposing i.e.: for line in itertools.closeafter(open(...)): ... if not line.startswith('#'): break # <--------------- file gets closed here Then we can leave (async) for loops as they are and there are no backward compatbility problems etc. -- Oscar From william.leslie.ttg at gmail.com Mon Oct 17 10:04:01 2016 From: william.leslie.ttg at gmail.com (William ML Leslie) Date: Tue, 18 Oct 2016 01:04:01 +1100 Subject: [pypy-dev] RFC: draft idea for making for loops automatically close iterators In-Reply-To: References: Message-ID: Have you considered Custodians, a-la racket? I suspect that adding resources to and finalising custodians requires less defensiveness than marking all iterables as resources, but I've yet to see someone implement them in python. https://docs.racket-lang.org/reference/custodians.html -- William Leslie Notice: Likely much of this email is, by the nature of copyright, covered under copyright law. You absolutely MAY reproduce any part of it in accordance with the copyright law of the nation you are reading this in. Any attempt to DENY YOU THOSE RIGHTS would be illegal without prior contractual agreement. From mmskeen at gmail.com Mon Oct 17 18:27:55 2016 From: mmskeen at gmail.com (Michael Skeen) Date: Mon, 17 Oct 2016 22:27:55 +0000 Subject: [pypy-dev] PyPy Architectural Patterns & Quality Attributes for Research Message-ID: <001a1144a2ce597369053f171695@google.com> Hello PyPy Community, I am part of an undergraduate research group focusing on software architecture patterns and quality attributes at Utah Valley University. We recently analyzed the work published on PyPy in the Architecture of Open Source Applications (AOSA) and referenced it in a paper we presented at the 13th Working IEEE/IFIP Conference on Software Architecture (WICSA), as attached.? As a part of our continuing research we wish to validate our architectural analysis for PyPy with the current developers. We would like to know if we are missing any patterns or quality attributes that may have been included in PyPy , or if there are any we listed that aren?t used. Any additional comment on these topics you might have would also, of course, be welcome. We believe we found the following software architectural patterns in this application: Pattern Name | Is This Found in the Architecture? (yes / no / don't know) | Comments (optional) Interpreter Layers Pipes & Filters Virtual Machine Other? We also identified the following quality attributes: Attribute Name | Is This Found in the Architecture? | Comments (optional) Extensibility Performance Flexibility Other? For your convenience, we have a complete list below of the patterns and quality attributes we referred to when conducting our research. To clarify, we are specifically studying architectural patterns, rather than design patterns such as the GoF patterns. Architectural Patterns Considered Quality Attributes Considered Active Repository Scalability Batch Usability Blackboard Extensibility Broker Performance Client Server Portability Event System Flexibility Explicit Invocation Reliability Implicit Invocation Maintainability Indirection Layer Security Interceptor Testability Interpreter Capacity Layers Cost Master and Commander Legality Microkernel Modularity Model View Controller Robustness Peer to Peer Pipes and Filters Plugin Presentation Abstraction Control Publish Subscribe Reflection Rule-Based System Shared Repository Simple Repository State Based Virtual Machine Please respond by October 25th, if possible. Thank you for considering our request, and for your continued work on PyPy . Sincerely, Michael Skeen, with Erich Gubler, Danielle Skinner, Brandon Leishman, Neil Harrison, Ph.D. (advisor) Reference: Neil B. Harrison, Erich Gubler, Danielle Skinner, "Software Architecture Pattern Morphology in Open-Source Systems", WICSA , 2016, 2016 13th Working IEEE/IFIP Conference on Software Architecture (WICSA), 2016 13th Working IEEE/IFIP Conference on Software Architecture (WICSA) 2016, pp. 91-98, doi:10.1109/WICSA.2016.8 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PID4110571 (Morphology).pdf Type: application/pdf Size: 231943 bytes Desc: not available URL: From armin.rigo at gmail.com Tue Oct 18 04:01:51 2016 From: armin.rigo at gmail.com (Armin Rigo) Date: Tue, 18 Oct 2016 10:01:51 +0200 Subject: [pypy-dev] RFC: draft idea for making for loops automatically close iterators In-Reply-To: References: Message-ID: Hi, On 17 October 2016 at 10:08, Nathaniel Smith wrote: > thought I'd send around a draft to see what you think. (E.g., would > this be something that makes your life easier?) As a general rule, PyPy's GC behavior is similar to CPython's if we tweak the program to start a chain of references at a self-referential object. So for example, consider that the outermost loop of a program takes the objects like the async generators, and stores them inside such an object: class A: def __init__(self, ref): self.ref = ref self.myself = self and then immediately forget that A instance. Then both this A instance and everything it refers to is kept alive until the next cyclic GC occurs. PyPy just always exhibits that behavior instead of only when you start with reference cycles. So the real issue should not be "how to so something that will make PyPy happy", or not only---it should be "how to do something that will make CPython happy even in case of reference cycles". If you don't, then arguably CPython is slightly broken. Yes, anything that can reduce file descriptor leaks in Python sounds good to me. A bient?t, Armin. From njs at pobox.com Tue Oct 18 19:24:36 2016 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 18 Oct 2016 16:24:36 -0700 Subject: [pypy-dev] RFC: draft idea for making for loops automatically close iterators In-Reply-To: References: Message-ID: Hi Oscar, Thanks for the comments! Can I ask that you hold onto them until I post to python-ideas, though? (Should be later today.) It's a discussion worth having, but if we have it here then we'll just end up having to repeat it there anyway :-). -n On Mon, Oct 17, 2016 at 5:04 AM, Oscar Benjamin wrote: > On 17 October 2016 at 09:08, Nathaniel Smith wrote: >> Hi all, >> >> I've been poking at an idea for changing how 'for' loops work to >> hopefully make them work better for pypy and async/await code. I >> haven't taken it to python-ideas yet -- this is its first public >> outing, actually -- but since it directly addresses pypy GC issues I >> thought I'd send around a draft to see what you think. (E.g., would >> this be something that makes your life easier?) > > To be clear, I'm not a PyPy dev so I'm just answering from a general > Python perspective here. > >> Always inject resources, and do all cleanup at the top level >> ------------------------------------------------------------ >> >> It was suggested on python-dev (XX find link) that a pattern to avoid >> these problems is to always pass resources in from above, e.g. >> ``read_newline_separated_json`` should take a file object rather than >> a path, with cleanup handled at the top level:: > > I suggested this and I still think that it is the best idea. > >> def read_newline_separated_json(file_handle): >> for line in file_handle: >> yield json.loads(line) >> >> def read_users(file_handle): >> for document in read_newline_separated_json(file_handle): >> yield User.from_json(document) >> >> with open(path) as file_handle: >> for user in read_users(file_handle): >> ... >> >> This works well in simple cases; here it lets us avoid the "N+1 >> problem". But unfortunately, it breaks down quickly when things get >> more complex. Consider if instead of reading from a file, our >> generator was processing the body returned by an HTTP GET request -- >> while handling redirects and authentication via OAUTH. Then we'd >> really want the sockets to be managed down inside our HTTP client >> library, not at the top level. Plus there are other cases where >> ``finally`` blocks embedded inside generators are important in their >> own right: db transaction management, emitting logging information >> during cleanup (one of the major motivating use cases for WSGI >> ``close``), and so forth. > > I haven't written the kind of code that you're describing so I can't > say exactly how I would do it. I imagine though that helpers could be > used to solve some of the problems that you're referring to though. > Here's a case I do know where the above suggestion is awkward: > > def concat(filenames): > for filename in filenames: > with open(filename) as inputfile: > yield from inputfile > > for line in concat(filenames): > ... > > It's still possible to safely handle this use case by creating a > helper though. fileinput.input almost does what you want: > > with fileinput.input(filenames) as lines: > for line in lines: > ... > > Unfortunately if filenames is empty this will default to sys.stdin so > it's not perfect but really I think introducing useful helpers for > common cases (rather than core language changes) should be considered > as the obvious solution here. Generally it would have been better if > the discussion for PEP 525 has focussed more on helping people to > debug/fix dependence on __del__ rather than trying to magically fix > broken code. > >> New convenience functions >> ------------------------- >> >> The ``itertools`` module gains a new iterator wrapper that can be used >> to selectively disable the new ``__iterclose__`` behavior:: >> >> # XX FIXME: I feel like there might be a better name for this one? >> class protect(iterable): >> def __init__(self, iterable): >> self._it = iter(iterable) >> >> def __iter__(self): >> return self >> >> def __next__(self): >> return next(self._it) >> >> def __iterclose__(self): >> # Swallow __iterclose__ without passing it on >> pass >> >> Example usage (assuming that file objects implements ``__iterclose__``):: >> >> with open(...) as handle: >> # Iterate through the same file twice: >> for line in itertools.protect(handle): >> ... >> handle.seek(0) >> for line in itertools.protect(handle): >> ... > > It would be much simpler to reverse this suggestion and say let's > introduce a helper that selectively *enables* the new behaviour you're > proposing i.e.: > > for line in itertools.closeafter(open(...)): > ... > if not line.startswith('#'): > break # <--------------- file gets closed here > > Then we can leave (async) for loops as they are and there are no > backward compatbility problems etc. > > -- > Oscar > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > https://mail.python.org/mailman/listinfo/pypy-dev -- Nathaniel J. Smith -- https://vorpus.org From naylor.b.david at gmail.com Thu Oct 20 13:19:42 2016 From: naylor.b.david at gmail.com (David Naylor) Date: Thu, 20 Oct 2016 19:19:42 +0200 Subject: [pypy-dev] pypy-config Message-ID: <2192620.DiRuYLhWm2@dragon.local> Hi Some software depends on python-config however PyPy does not provide an equivalent pypy-config. Are there any plans/workarounds to provide such a bin? Regards D -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 949 bytes Desc: This is a digitally signed message part. URL: From matti.picus at gmail.com Thu Oct 20 14:59:03 2016 From: matti.picus at gmail.com (Matti Picus) Date: Thu, 20 Oct 2016 21:59:03 +0300 Subject: [pypy-dev] pypy-config In-Reply-To: <2192620.DiRuYLhWm2@dragon.local> References: <2192620.DiRuYLhWm2@dragon.local> Message-ID: <4d5a8a3b-2a67-3b17-58fa-00695a111071@gmail.com> On 20/10/16 20:19, David Naylor wrote: > Hi > > Some software depends on python-config however PyPy does not provide an > equivalent pypy-config. Are there any plans/workarounds to provide such a > bin? > > Regards > > D I actually ran into this yesterday, as I tried to build wxPython which uses SIP whose build system uses python-config. AFAICT, python-config is provided by the downstream package maintainer. For instance, in debian it is provided by their python-dev package. Since it is not an integral part of python, I'm not sure it should be an integral part of pypy, but it is trivial to copy-and-modify. FWIW, I am attatching the one I used, YMMV. Also, note that in building wxPython I discovered that we do not yet support enough of the C-API to use SIP, so I assume PyQT/PySide will fail as well. The issue is in siplib.c's sipWrapperType_alloc, since we do not yet (may not ever?) support overriding tp_alloc. Matti -------------- next part -------------- #!/bin/sh exit_with_usage () { echo "Usage: $0 --prefix|--exec-prefix|--includes|--libs|--cflags|--ldflags|--extension-suffix|--help|--configdir" exit $1 } if [ "$1" = "" ] ; then exit_with_usage 1 fi # Returns the actual prefix where this script was installed to. installed_prefix () { local RESULT=$(dirname $(cd $(dirname "$1") && pwd -P)) if [ $(which readlink) ] ; then RESULT=$(readlink -f "$RESULT") fi echo $RESULT } prefix_build="/usr" prefix_real=$(installed_prefix "$0") # Use sed to fix paths from their built to locations to their installed to locations. prefix=$(echo "$prefix_build" | sed "s#$prefix_build#$prefix_real#") exec_prefix_build="${prefix}" exec_prefix=$(echo "$exec_prefix_build" | sed "s#$exec_prefix_build#$prefix_real#") includedir=$(echo "${prefix}/include" | sed "s#$prefix_build#$prefix_real#") libdir=$(echo "${exec_prefix}/lib" | sed "s#$prefix_build#$prefix_real#") CFLAGS=$(echo "-Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security " | sed "s#$prefix_build#$prefix_real#") VERSION="2.7" LIBM="-lm" LIBC="" SYSLIBS="$LIBM $LIBC" ABIFLAGS="" MULTIARCH="x86_64-linux-gnu" LIBS="-lpypy-c -lpthread -ldl -lutil $SYSLIBS" BASECFLAGS=" -fno-strict-aliasing" LDLIBRARY="libpython${VERSION}${DEBUG_EXT}.a" LINKFORSHARED="-Xlinker -export-dynamic -Wl,-O1 -Wl,-Bsymbolic-functions" OPT="-DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes" PY_ENABLE_SHARED="0" LIBDEST=${prefix}/lib/python${VERSION} LIBPL=${LIBDEST}/config-${MULTIARCH}${ABIFLAGS} SO="${ABIFLAGS}.so" PYTHONFRAMEWORK="" INCDIR="-I$includedir" PLATINCDIR="" # Scan for --help or unknown argument. for ARG in $* do case $ARG in --help) exit_with_usage 0 ;; --prefix|--exec-prefix|--includes|--libs|--cflags|--ldflags|--extension-suffix|--configdir) ;; *) exit_with_usage 1 ;; esac done for ARG in $* do case $ARG in --prefix) echo "$prefix" ;; --exec-prefix) echo "$exec_prefix" ;; --includes) echo "$INCDIR" "$PLATINCDIR" ;; --cflags) echo "$INCDIR $PLATINCDIR $BASECFLAGS $CFLAGS $OPT" ;; --libs) echo "$LIBS" ;; --ldflags) LINKFORSHAREDUSED= if [ -z "$PYTHONFRAMEWORK" ] ; then LINKFORSHAREDUSED=$LINKFORSHARED fi LIBPLUSED= if [ "$PY_ENABLE_SHARED" = "0" ] ; then LIBPLUSED="-L$LIBPL" fi echo "$LIBPLUSED -L$libdir $LIBS $LINKFORSHAREDUSED" ;; --extension-suffix) echo "$SO" ;; --configdir) echo "$LIBPL" ;; esac done From yury at shurup.com Thu Oct 20 15:18:14 2016 From: yury at shurup.com (Yury V. Zaytsev) Date: Thu, 20 Oct 2016 21:18:14 +0200 (CEST) Subject: [pypy-dev] pypy-config In-Reply-To: <4d5a8a3b-2a67-3b17-58fa-00695a111071@gmail.com> References: <2192620.DiRuYLhWm2@dragon.local> <4d5a8a3b-2a67-3b17-58fa-00695a111071@gmail.com> Message-ID: On Thu, 20 Oct 2016, Matti Picus wrote: > AFAICT, python-config is provided by the downstream package maintainer. > For instance, in debian it is provided by their python-dev package. > Since it is not an integral part of python, I'm not sure it should be an > integral part of pypy, but it is trivial to copy-and-modify. Hi Matti, FYI, this used to be the case a long while ago, but python-config has been integrated into Python 2.5 since ~2006: https://mail.python.org/pipermail/patches/2006-April/019478.html -- Sincerely yours, Yury V. Zaytsev From sid.kshatriya at gmail.com Thu Oct 20 15:50:07 2016 From: sid.kshatriya at gmail.com (Sidharth Kshatriya) Date: Fri, 21 Oct 2016 01:20:07 +0530 Subject: [pypy-dev] Dontbug: A reversible debugger for PHP (similar in concept to RevDB for Python/PyPy) Message-ID: Dear All, There have been some interesting blogs about RevDB a reversible debugger for Python on the PyPy blog. I'd like to tell you about Dontbug, a reversible debugger for PHP that I recently released. Like RevDB, it allows you to debug forwards and backwards -- but in PHP. See: https://github.com/sidkshatriya/dontbug For a short (1m35s) demo video: https://www.youtube.com/watch?v=DA76z77KtY0 Why am I talking about this in a PyPy mailing list :-) ? Firstly, because I think reverse debuggers for dynamic languages are relatively rare -- so its a good idea that we know about each other! Secondly, the fact that there are more and more reversible debuggers for various languages every year means that reverse debugging is definitely entering the mainstream. We could be at an inflexion point here! Hope you guys find Dontbug interesting! Thanks, Sidharth -------------- next part -------------- An HTML attachment was scrubbed... URL: From fijall at gmail.com Fri Oct 21 01:28:42 2016 From: fijall at gmail.com (Maciej Fijalkowski) Date: Fri, 21 Oct 2016 07:28:42 +0200 Subject: [pypy-dev] Dontbug: A reversible debugger for PHP (similar in concept to RevDB for Python/PyPy) In-Reply-To: References: Message-ID: Hi Sidharth I see dontbug is based on rr - I would like to know how well rr works for you. We've tried using rr for pypy and it didn't work as advertised. On the other hand it seems the project is moving fast, so maybe it works these days On Thu, Oct 20, 2016 at 9:50 PM, Sidharth Kshatriya wrote: > Dear All, > > There have been some interesting blogs about RevDB a reversible debugger for > Python on the PyPy blog. > > I'd like to tell you about Dontbug, a reversible debugger for PHP that I > recently released. Like RevDB, it allows you to debug forwards and backwards > -- but in PHP. > > See: > https://github.com/sidkshatriya/dontbug > > For a short (1m35s) demo video: > https://www.youtube.com/watch?v=DA76z77KtY0 > > Why am I talking about this in a PyPy mailing list :-) ? Firstly, because I > think reverse debuggers for dynamic languages are relatively rare -- so its > a good idea that we know about each other! Secondly, the fact that there are > more and more reversible debuggers for various languages every year means > that reverse debugging is definitely entering the mainstream. We could be at > an inflexion point here! > > Hope you guys find Dontbug interesting! > > Thanks, > > Sidharth > > > > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > https://mail.python.org/mailman/listinfo/pypy-dev > From sid.kshatriya at gmail.com Fri Oct 21 02:36:09 2016 From: sid.kshatriya at gmail.com (Sidharth Kshatriya) Date: Fri, 21 Oct 2016 12:06:09 +0530 Subject: [pypy-dev] Dontbug: A reversible debugger for PHP (similar in concept to RevDB for Python/PyPy) In-Reply-To: References: Message-ID: Hi Maciej, Yes, Dontbug is built on top of RR. Mozilla/RR can be finicky at times but overall I had a very good experience with it. I developed Dontbug on Ubuntu 16.04 and I found RR to be pretty robust on that distro at least. I did encounter a serious regression once (I was on a bleeding edge commit) but it was addressed quickly after I filed a ticket. RR runs Travis tests on Ubuntu 14.04 so we can be sure about that distro also. I also see references to Fedora in RR documentation so I'm guessing RR should be good on that distribution also. The reason I mention all these distros is that your experience with RR will often depend on the specific Linux kernel version and the gdb debugger version you're using (and to a slightly lesser extend the specific distro). RR tends to use a lot of hairy/advanced features like ptrace, seccomp-bpf, CPU performance counters etc. and the internal implementation of these in the kernel tends to subtly change over time (or suffer bugs). So you can often run into problems on very recent distros. For instance there is currently an outstanding ticket for test failures on Ubuntu 16.10. But as mentioned, the developers tend to address these quickly. As long as you're on a mainstream non-bleeding edge distro, I would think that RR should work fine for you. The Mozilla folks use RR to debug Firefox which is a hugely complex application (as you can imagine) and this gives me confidence about the overall correctness of RR. Coming to PyPy I would suggest you try RR out again. As a start, try it out on Ubuntu 16.04. If you run into problems do post a ticket on the RR project. I also noticed some references to UndoDB usage on the PyPy project. How has your team's experience been with UndoDB+PyPy in general? It would interesting to learn about your experiences there... Thanks, Sidharth On Fri, Oct 21, 2016 at 10:58 AM, Maciej Fijalkowski wrote: > Hi Sidharth > > I see dontbug is based on rr - I would like to know how well rr works > for you. We've tried using rr for pypy and it didn't work as > advertised. On the other hand it seems the project is moving fast, so > maybe it works these days > > On Thu, Oct 20, 2016 at 9:50 PM, Sidharth Kshatriya > wrote: > > Dear All, > > > > There have been some interesting blogs about RevDB a reversible debugger > for > > Python on the PyPy blog. > > > > I'd like to tell you about Dontbug, a reversible debugger for PHP that I > > recently released. Like RevDB, it allows you to debug forwards and > backwards > > -- but in PHP. > > > > See: > > https://github.com/sidkshatriya/dontbug > > > > For a short (1m35s) demo video: > > https://www.youtube.com/watch?v=DA76z77KtY0 > > > > Why am I talking about this in a PyPy mailing list :-) ? Firstly, > because I > > think reverse debuggers for dynamic languages are relatively rare -- so > its > > a good idea that we know about each other! Secondly, the fact that there > are > > more and more reversible debuggers for various languages every year means > > that reverse debugging is definitely entering the mainstream. We could > be at > > an inflexion point here! > > > > Hope you guys find Dontbug interesting! > > > > Thanks, > > > > Sidharth > > > > > > > > _______________________________________________ > > pypy-dev mailing list > > pypy-dev at python.org > > https://mail.python.org/mailman/listinfo/pypy-dev > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hubo at jiedaibao.com Fri Oct 21 10:13:45 2016 From: hubo at jiedaibao.com (hubo) Date: Fri, 21 Oct 2016 22:13:45 +0800 Subject: [pypy-dev] RFC: draft idea for making for loops automatically close iterators In-Reply-To: References: Message-ID: <580A2295.7010004@jiedaibao.com> Well I'm really shocked to find out what I thought was a "automatic close" is really the ref-couting GC of CPython, means that a lot of my code breaks in PyPy... It really becomes a big problem after iterators heavily used in Python nowadays. Some builtin functions like zip, map, filter return iterators in Python 3 instead of lists in Python 2, means invisible bugs for code ported from Python 2, like zip(my_generator(), my_other_generator()) may leave the iterators open if exited from a for loop. Even in Python 2, functions in itertools may create these bugs. In CPython, this kind of code will work because of the ref-counting GC, so it is not obvious in CPython, but they break in PyPy. I'm wondering since a ref-counting GC implemention is not possible for PyPy, is it possible to hack on the for loop to make it "try to" collect the generator? That may really save a lot of lives. If the generator is still referenced after the for loop, it may be the programmer's fault for not calling close(), but loop through a returned value is something different - sometimes you even do not know if it is a generator. 2016-10-21 hubo ????Armin Rigo ?????2016-10-18 16:01 ???Re: [pypy-dev] RFC: draft idea for making for loops automatically close iterators ????"Nathaniel Smith" ???"PyPy Developer Mailing List" Hi, On 17 October 2016 at 10:08, Nathaniel Smith wrote: > thought I'd send around a draft to see what you think. (E.g., would > this be something that makes your life easier?) As a general rule, PyPy's GC behavior is similar to CPython's if we tweak the program to start a chain of references at a self-referential object. So for example, consider that the outermost loop of a program takes the objects like the async generators, and stores them inside such an object: class A: def __init__(self, ref): self.ref = ref self.myself = self and then immediately forget that A instance. Then both this A instance and everything it refers to is kept alive until the next cyclic GC occurs. PyPy just always exhibits that behavior instead of only when you start with reference cycles. So the real issue should not be "how to so something that will make PyPy happy", or not only---it should be "how to do something that will make CPython happy even in case of reference cycles". If you don't, then arguably CPython is slightly broken. Yes, anything that can reduce file descriptor leaks in Python sounds good to me. A bient?t, Armin. _______________________________________________ pypy-dev mailing list pypy-dev at python.org https://mail.python.org/mailman/listinfo/pypy-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex0player at gmail.com Fri Oct 21 10:20:08 2016 From: alex0player at gmail.com (Alex S.) Date: Fri, 21 Oct 2016 17:20:08 +0300 Subject: [pypy-dev] RFC: draft idea for making for loops automatically close iterators In-Reply-To: <580A2295.7010004@jiedaibao.com> References: <580A2295.7010004@jiedaibao.com> Message-ID: That?s a good point, as it means there?s probably no safe & portable way to ensure that kind of stuff. ?Trying to collect? something doesn?t really fall short of an actual collection, I believe (finding referers is hard). But I believe iterclose() defined appropriately on derived iterators would solve that?.. > 21 ???. 2016 ?., ? 17:13, hubo ???????(?): > > Well I'm really shocked to find out what I thought was a "automatic close" is really the ref-couting GC of CPython, means that a lot of my code breaks in PyPy... > It really becomes a big problem after iterators heavily used in Python nowadays. Some builtin functions like zip, map, filter return iterators in Python 3 instead of lists in Python 2, means invisible bugs for code ported from Python 2, like zip(my_generator(), my_other_generator()) may leave the iterators open if exited from a for loop. Even in Python 2, functions in itertools may create these bugs. > In CPython, this kind of code will work because of the ref-counting GC, so it is not obvious in CPython, but they break in PyPy. > > I'm wondering since a ref-counting GC implemention is not possible for PyPy, is it possible to hack on the for loop to make it "try to" collect the generator? That may really save a lot of lives. If the generator is still referenced after the for loop, it may be the programmer's fault for not calling close(), but loop through a returned value is something different - sometimes you even do not know if it is a generator. > > 2016-10-21 > hubo -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Oct 21 19:13:52 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 22 Oct 2016 10:13:52 +1100 Subject: [pypy-dev] RFC: draft idea for making for loops automatically close iterators In-Reply-To: <580A2295.7010004@jiedaibao.com> References: <580A2295.7010004@jiedaibao.com> Message-ID: <20161021231351.GM22471@ando.pearwood.info> On Fri, Oct 21, 2016 at 10:13:45PM +0800, hubo wrote: > Well I'm really shocked to find out what I thought was a "automatic > close" is really the ref-couting GC of CPython, means that a lot of my > code breaks in PyPy... But does it really? If you've run your code in PyPy, and it obviously, clearly breaks, then why are you so shocked? You should have already known this. (Unless this is your first time running your code under PyPy.) But if your code runs under PyPy, with no crashes, no exceptions, no failures caused by running out of file descriptors... then you can't really say your code is broken. What does it matter if your application doesn't close the files until exit, if you only open three files and the application never runs for more than two seconds? I'd like to get a good idea of how often this is an actual problem, causing scripts and applications to fail when run in PyPy. Actual failures, not just wasting a file descriptor or three. > I'm wondering since a ref-counting GC implemention is not possible for > PyPy, is it possible to hack on the for loop to make it "try to" > collect the generator? That may really save a lot of lives. Saving lives? That's a bit of an exaggeration, isn't it? There is a big discussion going on over on the Python-Ideas mailing list, and exaggerated, over-the-top responses aren't going to help this proposal's case. Already people have said this issue is only a problem for PyPy, so it's PyPy's problem to fix. -- Steve From sid.kshatriya at gmail.com Sat Oct 22 03:10:39 2016 From: sid.kshatriya at gmail.com (Sidharth Kshatriya) Date: Sat, 22 Oct 2016 12:40:39 +0530 Subject: [pypy-dev] Dontbug: A reversible debugger for PHP (similar in concept to RevDB for Python/PyPy) In-Reply-To: References: Message-ID: A small note related to Travis mentioned in my last email: It turns out, Travis was actually *not* running any rr tests. It seems Travis was only there to make sure rr built correctly. Tests could not be run because the VMs on which Travis was running don't support PMU virtualization. The situation has now changed. There is a buildbot on Digital ocean that is now running tests. Digital Ocean VMs support PMU virtualization. See: https://mail.mozilla.org/pipermail/rr-dev/2016-October/000424.html https://mail.mozilla.org/pipermail/rr-dev/2016-October/000420.html TL;DR: rr has proper CI now (but its not via Travis) Thanks, Sidharth On Fri, Oct 21, 2016 at 12:06 PM, Sidharth Kshatriya < sid.kshatriya at gmail.com> wrote: > Hi Maciej, > > Yes, Dontbug is built on top of RR. Mozilla/RR can be finicky at times but > overall I had a very good experience with it. > > I developed Dontbug on Ubuntu 16.04 and I found RR to be pretty robust on > that distro at least. I did encounter a serious regression once (I was on a > bleeding edge commit) but it was addressed quickly after I filed a ticket. > RR runs Travis tests on Ubuntu 14.04 so we can be sure about that distro > also. I also see references to Fedora in RR documentation so I'm guessing > RR should be good on that distribution also. > > The reason I mention all these distros is that your experience with RR > will often depend on the specific Linux kernel version and the gdb debugger > version you're using (and to a slightly lesser extend the specific distro). > RR tends to use a lot of hairy/advanced features like ptrace, seccomp-bpf, > CPU performance counters etc. and the internal implementation of these in > the kernel tends to subtly change over time (or suffer bugs). So you can > often run into problems on very recent distros. For instance there is > currently an outstanding ticket for test failures on Ubuntu 16.10. > > But as mentioned, the developers tend to address these quickly. As long as > you're on a mainstream non-bleeding edge distro, I would think that RR > should work fine for you. The Mozilla folks use RR to debug Firefox which > is a hugely complex application (as you can imagine) and this gives me > confidence about the overall correctness of RR. > > Coming to PyPy I would suggest you try RR out again. As a start, try it > out on Ubuntu 16.04. If you run into problems do post a ticket on the RR > project. > > I also noticed some references to UndoDB usage on the PyPy project. How > has your team's experience been with UndoDB+PyPy in general? It would > interesting to learn about your experiences there... > > Thanks, > > Sidharth > > > On Fri, Oct 21, 2016 at 10:58 AM, Maciej Fijalkowski > wrote: > >> Hi Sidharth >> >> I see dontbug is based on rr - I would like to know how well rr works >> for you. We've tried using rr for pypy and it didn't work as >> advertised. On the other hand it seems the project is moving fast, so >> maybe it works these days >> >> On Thu, Oct 20, 2016 at 9:50 PM, Sidharth Kshatriya >> wrote: >> > Dear All, >> > >> > There have been some interesting blogs about RevDB a reversible >> debugger for >> > Python on the PyPy blog. >> > >> > I'd like to tell you about Dontbug, a reversible debugger for PHP that I >> > recently released. Like RevDB, it allows you to debug forwards and >> backwards >> > -- but in PHP. >> > >> > See: >> > https://github.com/sidkshatriya/dontbug >> > >> > For a short (1m35s) demo video: >> > https://www.youtube.com/watch?v=DA76z77KtY0 >> > >> > Why am I talking about this in a PyPy mailing list :-) ? Firstly, >> because I >> > think reverse debuggers for dynamic languages are relatively rare -- so >> its >> > a good idea that we know about each other! Secondly, the fact that >> there are >> > more and more reversible debuggers for various languages every year >> means >> > that reverse debugging is definitely entering the mainstream. We could >> be at >> > an inflexion point here! >> > >> > Hope you guys find Dontbug interesting! >> > >> > Thanks, >> > >> > Sidharth >> > >> > >> > >> > _______________________________________________ >> > pypy-dev mailing list >> > pypy-dev at python.org >> > https://mail.python.org/mailman/listinfo/pypy-dev >> > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From william.leslie.ttg at gmail.com Sat Oct 22 04:21:42 2016 From: william.leslie.ttg at gmail.com (William ML Leslie) Date: Sat, 22 Oct 2016 19:21:42 +1100 Subject: [pypy-dev] RFC: draft idea for making for loops automatically close iterators In-Reply-To: <580A2295.7010004@jiedaibao.com> References: <580A2295.7010004@jiedaibao.com> Message-ID: *shrug* The real solution to relying on implementation-defined behaviour isn't to emulate that behavour through even more pervasive user-visible (and user-breakable) generator madness, but to use static analysis tools as part of the development process that can detect and report common violations of linearity (especially *local* violations, which seem to be very common and easy to detect). Do any of the existing tools do such checks? On 22 October 2016 at 01:13, hubo wrote: > Well I'm really shocked to find out what I thought was a "automatic close" > is really the ref-couting GC of CPython, means that a lot of my code breaks > in PyPy... > It really becomes a big problem after iterators heavily used in Python > nowadays. Some builtin functions like zip, map, filter return iterators in > Python 3 instead of lists in Python 2, means invisible bugs for code ported > from Python 2, like zip(my_generator(), my_other_generator()) may leave the > iterators open if exited from a for loop. Even in Python 2, functions in > itertools may create these bugs. > In CPython, this kind of code will work because of the ref-counting GC, so > it is not obvious in CPython, but they break in PyPy. > > I'm wondering since a ref-counting GC implemention is not possible for PyPy, > is it possible to hack on the for loop to make it "try to" collect the > generator? That may really save a lot of lives. If the generator is still > referenced after the for loop, it may be the programmer's fault for not > calling close(), but loop through a returned value is something different - > sometimes you even do not know if it is a generator. > > 2016-10-21 > ________________________________ > hubo > ________________________________ > > ????Armin Rigo > ?????2016-10-18 16:01 > ???Re: [pypy-dev] RFC: draft idea for making for loops automatically close > iterators > ????"Nathaniel Smith" > ???"PyPy Developer Mailing List" > > Hi, > > On 17 October 2016 at 10:08, Nathaniel Smith wrote: >> thought I'd send around a draft to see what you think. (E.g., would >> this be something that makes your life easier?) > > As a general rule, PyPy's GC behavior is similar to CPython's if we > tweak the program to start a chain of references at a self-referential > object. So for example, consider that the outermost loop of a program > takes the objects like the async generators, and stores them inside > such an object: > > class A: > def __init__(self, ref): > self.ref = ref > self.myself = self > > and then immediately forget that A instance. Then both this A > instance and everything it refers to is kept alive until the next > cyclic GC occurs. PyPy just always exhibits that behavior instead of > only when you start with reference cycles. > > So the real issue should not be "how to so something that will make > PyPy happy", or not only---it should be "how to do something that will > make CPython happy even in case of reference cycles". If you don't, > then arguably CPython is slightly broken. > > Yes, anything that can reduce file descriptor leaks in Python sounds good to > me. > > > A bient?t, > > Armin. > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > https://mail.python.org/mailman/listinfo/pypy-dev > > > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > https://mail.python.org/mailman/listinfo/pypy-dev > -- William Leslie Notice: Likely much of this email is, by the nature of copyright, covered under copyright law. You absolutely MAY reproduce any part of it in accordance with the copyright law of the nation you are reading this in. Any attempt to DENY YOU THOSE RIGHTS would be illegal without prior contractual agreement. From william.leslie.ttg at gmail.com Sat Oct 22 05:14:09 2016 From: william.leslie.ttg at gmail.com (William ML Leslie) Date: Sat, 22 Oct 2016 20:14:09 +1100 Subject: [pypy-dev] RFC: draft idea for making for loops automatically close iterators In-Reply-To: References: <580A2295.7010004@jiedaibao.com> Message-ID: A quick definition: A resource is something, typically represented as a program object, whose timely disposal is relevant for the correct operation of the program. Typically, it can't be closed too late (in the case of files, the process may end before all writes are flushed), but it often can't be closed too soon. This has a few obvious corallaries: * If the correct functioning of the program relies on prompt disposal, then something must be responsible for that disposal. If you place it in a container and hand that container off to a library without linearity or borrow checking, you're saying you have no interest in its disposal. * The property of "being a resource" is transitive over ownership (Hopwood's Corallary?): if X requires explicit disposal, and that disposal is managed by Y, then Y is also a resource (proof of this is easy via contradiction). Typically, there is no special magic for Y; the documentation for Y must either describe how to dispose of it, or whatever code constructed Y must also maintain control over the lifetime of X. * Are files/sockets open for reading a resource? They typically aren't, especially on GNU and BSD derived systems. Some programs will never run out of them because they will never open enough. At the same time, given that "the correct operation of the program" is specific to the actual program and its usage, the answer isn't typically clear. The first point we should notice is that explicit lifetime management is /required/ - it's a direct consequence of the definition of resource! On 22 October 2016 at 01:20, Alex S. wrote: > That?s a good point, as it means there?s probably no safe & portable way to > ensure that kind of stuff. ?Trying to collect? something doesn?t really fall > short of an actual collection, I believe (finding referers is hard). > But I believe iterclose() defined appropriately on derived iterators would > solve that?.. That now places the onus on every user of iterators to ensure that their iterators are either consumed or closed; but many of the iterators we create that are resources are then handed to library functions which just can't be relied upon to maintain resource transitivity. The second point, then, is that requiring everyone to rewrite any code they have that makes or consumes generators or iterables is probably not tractible. Even if everyone decided that would be better than fixing their use of files, you'd still stumble across library code that didn't make the effort. We might have been able to do something like this if we'd had something like (dynamic-wind) when generators first arrived in python, but it's probably beyond us now without a good helping of SA and/or ugly magic. So it really is easier to rewrite file usage than it is to rewrite iterator usage, especially if we can only detect and fix a handful of obvious cases in the runtime. -- William Leslie Notice: Likely much of this email is, by the nature of copyright, covered under copyright law. You absolutely MAY reproduce any part of it in accordance with the copyright law of the nation you are reading this in. Any attempt to DENY YOU THOSE RIGHTS would be illegal without prior contractual agreement. From armin.rigo at gmail.com Sat Oct 22 19:23:25 2016 From: armin.rigo at gmail.com (Armin Rigo) Date: Sun, 23 Oct 2016 01:23:25 +0200 Subject: [pypy-dev] RFC: draft idea for making for loops automatically close iterators In-Reply-To: <20161021231351.GM22471@ando.pearwood.info> References: <580A2295.7010004@jiedaibao.com> <20161021231351.GM22471@ando.pearwood.info> Message-ID: Hi Steven, On 22 October 2016 at 01:13, Steven D'Aprano wrote: > Saving lives? That's a bit of an exaggeration, isn't it? > > There is a big discussion going on over on the Python-Ideas mailing > list, and exaggerated, over-the-top responses aren't going to help this > proposal's case. Already people have said this issue is only a problem > for PyPy, so it's PyPy's problem to fix. Why did CPython add ResourceWarning when a file is not explicitly closed in the first place? That's because relying on reference counting to close files has been judged a bad programming practice by python-dev. The reasons for this judgement are along the lines of "because it breaks on every non-refcounted implementation". It's a good move from PyPy's point of view, and it is something that we implemented on PyPy2 too. Now the present discussion is about a similar case. Based on past experience, people are going to say first "it's not really important", then "it works fine in CPython", and finally in a few years python-dev is going to realize that it's maybe more important than what they originally thought. Nathaniel is trying to do the right thing from the start instead, so deserves a +1. Now *how* to do it exactly is a bit unclear, and largely involves language design questions (which pypy-dev is not the best place to address). A bient?t, Armin. From armin.rigo at gmail.com Sat Oct 22 19:28:14 2016 From: armin.rigo at gmail.com (Armin Rigo) Date: Sun, 23 Oct 2016 01:28:14 +0200 Subject: [pypy-dev] RFC: draft idea for making for loops automatically close iterators In-Reply-To: References: <580A2295.7010004@jiedaibao.com> <20161021231351.GM22471@ando.pearwood.info> Message-ID: Hi again, On 23 October 2016 at 01:23, Armin Rigo wrote: > then "it works fine in CPython" I forgot the usual "if you don't have references from a cycle" here. That is, it works fine in CPython unless your object graph is in some shape that is hard to predict, relatively rare, and can be created internally by the implementation (or its next version). Not very sane to *rely* on that for programs to work, imho. A bient?t, Armin. From tkn0313 at corp.netease.com Wed Oct 19 23:58:48 2016 From: tkn0313 at corp.netease.com (=?UTF-8?B?5omL5ri45byV5pOO57uEfOeUsOWHrw==?=) Date: Thu, 20 Oct 2016 11:58:48 +0800 Subject: [pypy-dev] pypy c++ embedding multithread GIL Message-ID: <981478195.1141378.1476935929420.JavaMail.tkn0313@corp.netease.com> My program is programed mixing with python and c++. Program runs in pypy, using c++ extending( mylib.so). mylib.so is multi-threaded. How could I get pypy GIL to ensure sync when c++ calls python function ? My program runs flow as example: Is there any other api or my use is wrong ? python code: def work_callback(): pass def work(): lib.cpp_work() c++ code: (thread func, multi thread will call cpp_work) void cpp_work() { //how to get pypy GIL ( I ues PyGILState_Ensure but blocked here for ever) work_callback() //call python func //release GIL (PyGILState_Release(state) } 2016-10-20 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tiankai2416 at 126.com Fri Oct 21 03:03:43 2016 From: tiankai2416 at 126.com (=?GBK?B?t+fUxg==?=) Date: Fri, 21 Oct 2016 15:03:43 +0800 (CST) Subject: [pypy-dev] PyGILState_Ensure will deadlock Message-ID: <7af0679.6631.157e60d71d9.Coremail.tiankai2416@126.com> We have an extension module which works fine with CPython but deadlocks every time in PyPy , same as the links below, pypy version is the latest (v5.4.1) Is this issue been fixed or other reasons lead to this ? https://bitbucket.org/pypy/pypy/issues/1778 -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Oct 23 12:04:24 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 24 Oct 2016 03:04:24 +1100 Subject: [pypy-dev] RFC: draft idea for making for loops automatically close iterators In-Reply-To: References: <580A2295.7010004@jiedaibao.com> <20161021231351.GM22471@ando.pearwood.info> Message-ID: <20161023160424.GS22471@ando.pearwood.info> On Sun, Oct 23, 2016 at 01:23:25AM +0200, Armin Rigo wrote: > Hi Steven, > > On 22 October 2016 at 01:13, Steven D'Aprano wrote: > > Saving lives? That's a bit of an exaggeration, isn't it? > > > > There is a big discussion going on over on the Python-Ideas mailing > > list, and exaggerated, over-the-top responses aren't going to help this > > proposal's case. Already people have said this issue is only a problem > > for PyPy, so it's PyPy's problem to fix. > > Why did CPython add ResourceWarning when a file is not explicitly > closed in the first place? That's because relying on reference > counting to close files has been judged a bad programming practice by > python-dev. The reasons for this judgement are along the lines of > "because it breaks on every non-refcounted implementation". It's a > good move from PyPy's point of view, and it is something that we > implemented on PyPy2 too. > > Now the present discussion is about a similar case. Based on past > experience, people are going to say first "it's not really important", > then "it works fine in CPython", and finally in a few years python-dev > is going to realize that it's maybe more important than what they > originally thought. You are probably correct. On the Python-Ideas list, I've changed from mild opposition to mild support. I don't think it will be as disruptive as some people fear, and I think it probably will of benefit. But it would be nice to have some good, concrete examples of how it will help, rather than exaggerated claims of everyone's code being broken and saving lives. (I'm sure Hubo didn't *literally* mean people die from this, but still, it is hard to take people seriously when they exaggerate in this way unless they are clearly doing it in fun.) > Nathaniel is trying to do the right thing from the start instead, so > deserves a +1. I agree. -- Steve From hubo at jiedaibao.com Mon Oct 24 00:03:19 2016 From: hubo at jiedaibao.com (hubo) Date: Mon, 24 Oct 2016 12:03:19 +0800 Subject: [pypy-dev] RFC: draft idea for making for loops automatically close iterators In-Reply-To: <20161021231351.GM22471@ando.pearwood.info> References: <580A2295.7010004@jiedaibao.com> <20161021231351.GM22471@ando.pearwood.info> Message-ID: <580D8805.2060904@jiedaibao.com> Well the code don't appear to be broken, in fact they have been working well for serveral months with PyPy, it only means the design is broken - there are chances that iterators are not closed correctly in certain circumstances which leads to unpredictable behaviors. It might not be critical for small scripts but may be quite critical for services that must keep running for a long time, which is what PyPy is for. Files may not be the most critical problem, the real problem is LOCK - when you use with on a lock, there are chances that it never unlocks. As far as I know quite a lot of softwares use generators on network programmings because it is convenient to process the callbacks with generators, and it is not so unusual to "call" another generator method or recurse on itself. When the connection is suddenly shutdown, the connection manager closes the generator - but not the generators called inside. Python 3 is using this as the standard programming model of asyncio, it may also suffer but not that much, because yield from seems to close the iterator automatically because it reraises the exception inside. 2016-10-24 hubo ????Steven D'Aprano ?????2016-10-22 07:13 ???Re: [pypy-dev] RFC: draft idea for making for loops automatically close iterators ????"pypy-dev" ??? On Fri, Oct 21, 2016 at 10:13:45PM +0800, hubo wrote: > Well I'm really shocked to find out what I thought was a "automatic > close" is really the ref-couting GC of CPython, means that a lot of my > code breaks in PyPy... But does it really? If you've run your code in PyPy, and it obviously, clearly breaks, then why are you so shocked? You should have already known this. (Unless this is your first time running your code under PyPy.) But if your code runs under PyPy, with no crashes, no exceptions, no failures caused by running out of file descriptors... then you can't really say your code is broken. What does it matter if your application doesn't close the files until exit, if you only open three files and the application never runs for more than two seconds? I'd like to get a good idea of how often this is an actual problem, causing scripts and applications to fail when run in PyPy. Actual failures, not just wasting a file descriptor or three. > I'm wondering since a ref-counting GC implemention is not possible for > PyPy, is it possible to hack on the for loop to make it "try to" > collect the generator? That may really save a lot of lives. Saving lives? That's a bit of an exaggeration, isn't it? There is a big discussion going on over on the Python-Ideas mailing list, and exaggerated, over-the-top responses aren't going to help this proposal's case. Already people have said this issue is only a problem for PyPy, so it's PyPy's problem to fix. -- Steve _______________________________________________ pypy-dev mailing list pypy-dev at python.org https://mail.python.org/mailman/listinfo/pypy-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From armin.rigo at gmail.com Mon Oct 24 02:42:02 2016 From: armin.rigo at gmail.com (Armin Rigo) Date: Mon, 24 Oct 2016 08:42:02 +0200 Subject: [pypy-dev] RFC: draft idea for making for loops automatically close iterators In-Reply-To: <580D8805.2060904@jiedaibao.com> References: <580A2295.7010004@jiedaibao.com> <20161021231351.GM22471@ando.pearwood.info> <580D8805.2060904@jiedaibao.com> Message-ID: Hi, On 24 October 2016 at 06:03, hubo wrote: > long time, which is what PyPy is for. Files may not be the most critical > problem, the real problem is LOCK - when you use with on a lock, there are > chances that it never unlocks. Bah. I would say that it makes the whole "with" statement pointless in case you're using "await" somewhere inside it. In my own honest opinion it should not be permitted to do that at all---like, it should be a SyntaxError to use "await" inside a "with". (This opinion is based on a very vague understanding of async/await in the first place, so please take it with a grain of salt.) Just my 2 cents. Armin From armin.rigo at gmail.com Mon Oct 24 02:51:55 2016 From: armin.rigo at gmail.com (Armin Rigo) Date: Mon, 24 Oct 2016 08:51:55 +0200 Subject: [pypy-dev] PyGILState_Ensure will deadlock In-Reply-To: <7af0679.6631.157e60d71d9.Coremail.tiankai2416@126.com> References: <7af0679.6631.157e60d71d9.Coremail.tiankai2416@126.com> Message-ID: Hi, On 21 October 2016 at 09:03, ?? wrote: > We have an extension module which works fine with CPython but deadlocks > every time in PyPy , same as the links below, pypy version is the latest > (v5.4.1) > Is this issue been fixed or other reasons lead to this ? > > https://bitbucket.org/pypy/pypy/issues/1778 As explained in the issue, the original bug has been resolved. Your use case is probably subtly different. We need to see an example that fails, or the original code of the module you're talking about. A bient?t, Armin. From njs at pobox.com Mon Oct 24 03:24:42 2016 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 24 Oct 2016 00:24:42 -0700 Subject: [pypy-dev] RFC: draft idea for making for loops automatically close iterators In-Reply-To: References: <580A2295.7010004@jiedaibao.com> <20161021231351.GM22471@ando.pearwood.info> <580D8805.2060904@jiedaibao.com> Message-ID: On Sun, Oct 23, 2016 at 11:42 PM, Armin Rigo wrote: > Hi, > > On 24 October 2016 at 06:03, hubo wrote: >> long time, which is what PyPy is for. Files may not be the most critical >> problem, the real problem is LOCK - when you use with on a lock, there are >> chances that it never unlocks. > > Bah. I would say that it makes the whole "with" statement pointless > in case you're using "await" somewhere inside it. In my own honest > opinion it should not be permitted to do that at all---like, it should > be a SyntaxError to use "await" inside a "with". (This opinion is > based on a very vague understanding of async/await in the first place, > so please take it with a grain of salt.) Interesting historical tidbit: until PEP 342, it actually was a syntax error to yield inside try, or at least inside try/finally. Actually relevant modern answer: 'await' inside 'with' turns out to be OK, because async functions are always run by some supervisor (like an event loop), and so the solution is that the supervisor guarantees as part of its semantics that it will always run async functions to completion. (You're right that this isn't obvious though -- it took very confused thread on the async-sig mailing list to recognize the problem and solution.) Since event loops are already rocket science and you only need one of them, requiring them to take some care here isn't a big deal. The problem with (async) generators is that everyone who uses them also has to make some similar guarantee (i.e., they must be run to completion, either by exhausting them or calling .close()), but they get used all over the place in random user code, and it's not reasonable to ask users to figure this out every time they write a 'for' loop. -n -- Nathaniel J. Smith -- https://vorpus.org From armin.rigo at gmail.com Mon Oct 24 03:37:56 2016 From: armin.rigo at gmail.com (Armin Rigo) Date: Mon, 24 Oct 2016 09:37:56 +0200 Subject: [pypy-dev] pypy c++ embedding multithread GIL In-Reply-To: <981478195.1141378.1476935929420.JavaMail.tkn0313@corp.netease.com> References: <981478195.1141378.1476935929420.JavaMail.tkn0313@corp.netease.com> Message-ID: Hi, On 20 October 2016 at 05:58, ?????|?? wrote: > c++ code: (thread func, multi thread will call cpp_work) > void cpp_work() > { > //how to get pypy GIL ( I ues PyGILState_Ensure but blocked here for ever) > work_callback() //call python func > //release GIL (PyGILState_Release(state) > } You don't explain how ``work_callback()`` calls the Python function. Maybe it is a CFFI callback? In that case, the GIL is acquired automatically and you should not call any of the PyXxx functions from the CPython C API. A bient?t, Armin. From hubo at jiedaibao.com Mon Oct 24 08:17:30 2016 From: hubo at jiedaibao.com (hubo) Date: Mon, 24 Oct 2016 20:17:30 +0800 Subject: [pypy-dev] RFC: draft idea for making for loops automatically close iterators In-Reply-To: References: <580A2295.7010004@jiedaibao.com> <20161021231351.GM22471@ando.pearwood.info> <580D8805.2060904@jiedaibao.com> Message-ID: <580DFBD8.1080602@jiedaibao.com> Well, with scope and try...finally... are really important in asynchronized programming, so I'm afraid we cannot just drop it. There isn't a replacement for an iterator to safely dispose some resources; and now we know that EVEN with and finally are not really safe... It turns out that what we love most and trust most harm us most :( 2016-10-24 hubo ????Nathaniel Smith ?????2016-10-24 15:24 ???Re: [pypy-dev] RFC: draft idea for making for loops automatically close iterators ????"Armin Rigo" ???"hubo","PyPy Developer Mailing List" On Sun, Oct 23, 2016 at 11:42 PM, Armin Rigo wrote: > Hi, > > On 24 October 2016 at 06:03, hubo wrote: >> long time, which is what PyPy is for. Files may not be the most critical >> problem, the real problem is LOCK - when you use with on a lock, there are >> chances that it never unlocks. > > Bah. I would say that it makes the whole "with" statement pointless > in case you're using "await" somewhere inside it. In my own honest > opinion it should not be permitted to do that at all---like, it should > be a SyntaxError to use "await" inside a "with". (This opinion is > based on a very vague understanding of async/await in the first place, > so please take it with a grain of salt.) Interesting historical tidbit: until PEP 342, it actually was a syntax error to yield inside try, or at least inside try/finally. Actually relevant modern answer: 'await' inside 'with' turns out to be OK, because async functions are always run by some supervisor (like an event loop), and so the solution is that the supervisor guarantees as part of its semantics that it will always run async functions to completion. (You're right that this isn't obvious though -- it took very confused thread on the async-sig mailing list to recognize the problem and solution.) Since event loops are already rocket science and you only need one of them, requiring them to take some care here isn't a big deal. The problem with (async) generators is that everyone who uses them also has to make some similar guarantee (i.e., they must be run to completion, either by exhausting them or calling .close()), but they get used all over the place in random user code, and it's not reasonable to ask users to figure this out every time they write a 'for' loop. -n -- Nathaniel J. Smith -- https://vorpus.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From yury at shurup.com Tue Oct 25 15:49:39 2016 From: yury at shurup.com (Yury V. Zaytsev) Date: Tue, 25 Oct 2016 21:49:39 +0200 (CEST) Subject: [pypy-dev] Colo in Germany for the Windows build slave? Message-ID: Hi there, I've been running the Windows build slave for the last ~1.5 years, but all good things come to an end, and I've been informed that I must shutdown the machine by the end of the year (that was the bad news). The "good" news is that I've been promised that I can have the hardware, so, in theory, if I can find colo to host it (Germany, preferably Cologne area), I can replace the hard drives, reshuffle the RAM, reinstall the system and plug it back in. If memory serves me well, it's an old Dell PowerEdge R710 rack mount server (2U). It used to run other VMs unrelated to the PyPy project, but I only plan to use it to host build VMs for OSS projects that I'm involved in after "decommissioning". Any takers? I think it would be best to talk via private email to avoid annoying list subscribers. -- Sincerely yours, Yury V. Zaytsev From saudalwasli at gmail.com Wed Oct 26 05:19:20 2016 From: saudalwasli at gmail.com (Saud Alwasly) Date: Wed, 26 Oct 2016 09:19:20 +0000 Subject: [pypy-dev] Performance degradation in the latest pypy Message-ID: Dear pypy-dev representative I have noticed severe performance degradation after upgrading pypy from pypy4.0.1 to pypy5.4.1. I attached the call graph for the same script running on both: it seems that the copy module is an issue in the new version. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: diagram_pypy5.png Type: image/png Size: 253657 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: diagram_pypy4.png Type: image/png Size: 361635 bytes Desc: not available URL: From armin.rigo at gmail.com Wed Oct 26 09:07:37 2016 From: armin.rigo at gmail.com (Armin Rigo) Date: Wed, 26 Oct 2016 15:07:37 +0200 Subject: [pypy-dev] Performance degradation in the latest pypy In-Reply-To: References: Message-ID: Hi Saud, On 26 October 2016 at 11:19, Saud Alwasly wrote: > I have noticed severe performance degradation after upgrading pypy from > pypy4.0.1 to pypy5.4.1. > > I attached the call graph for the same script running on both: > it seems that the copy module is an issue in the new version. Please give us an example of runnable code that shows the problem. We (or at least I) can't do anything with a call graph. A bient?t, Armin. From saudalwasli at gmail.com Thu Oct 27 18:42:27 2016 From: saudalwasli at gmail.com (Saud Alwasly) Date: Thu, 27 Oct 2016 22:42:27 +0000 Subject: [pypy-dev] Performance degradation in the latest pypy In-Reply-To: References: Message-ID: Sure, here it is. you can run *Simulation.py* On Wed, Oct 26, 2016 at 9:08 AM Armin Rigo wrote: > Hi Saud, > > On 26 October 2016 at 11:19, Saud Alwasly wrote: > > I have noticed severe performance degradation after upgrading pypy from > > pypy4.0.1 to pypy5.4.1. > > > > I attached the call graph for the same script running on both: > > it seems that the copy module is an issue in the new version. > > Please give us an example of runnable code that shows the problem. We > (or at least I) can't do anything with a call graph. > > > A bient?t, > > Armin. > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: BuggySctipt.zip Type: application/zip Size: 112064 bytes Desc: not available URL: From fijall at gmail.com Fri Oct 28 10:57:55 2016 From: fijall at gmail.com (Maciej Fijalkowski) Date: Fri, 28 Oct 2016 16:57:55 +0200 Subject: [pypy-dev] Performance degradation in the latest pypy In-Reply-To: References: Message-ID: (vmprof)[brick:~/Downloads/dir] $ python Simulate.py Traceback (most recent call last): File "Simulate.py", line 9, in from pypyplotPKG.Ploter import Graph, plot2, hist, bar, bar2 File "/Users/dev/Downloads/dir/pypyplotPKG/Ploter.py", line 1, in import pypyplotPKG.pypyplot as plt File "/Users/dev/Downloads/dir/pypyplotPKG/pypyplot.py", line 3, in from shareddataPKG import SharedData as SharedData ImportError: No module named shareddataPKG On Fri, Oct 28, 2016 at 12:42 AM, Saud Alwasly wrote: > Sure, here it is. > you can run Simulation.py > > On Wed, Oct 26, 2016 at 9:08 AM Armin Rigo wrote: >> >> Hi Saud, >> >> On 26 October 2016 at 11:19, Saud Alwasly wrote: >> > I have noticed severe performance degradation after upgrading pypy from >> > pypy4.0.1 to pypy5.4.1. >> > >> > I attached the call graph for the same script running on both: >> > it seems that the copy module is an issue in the new version. >> >> Please give us an example of runnable code that shows the problem. We >> (or at least I) can't do anything with a call graph. >> >> >> A bient?t, >> >> Armin. > > > _______________________________________________ > pypy-dev mailing list > pypy-dev at python.org > https://mail.python.org/mailman/listinfo/pypy-dev > From fijall at gmail.com Sat Oct 29 09:22:31 2016 From: fijall at gmail.com (Maciej Fijalkowski) Date: Sat, 29 Oct 2016 15:22:31 +0200 Subject: [pypy-dev] Performance degradation in the latest pypy In-Reply-To: References: Message-ID: Hi Saud copy.copy is a lot slower in new pypy. Minimal example to reproduce it: import copy, sys class A(object): def __init__(self, a, b, c): self.a = a self.b = b self.c = c self.d = a self.e = b a = [A(1, 2, 3), A(4, 5, 6), A(6, 7, 8)] def f(): for i in range(int(sys.argv[1])): copy.copy(a[i % 3]) f() (run it with 1 000 000) The quick workaround for you is to write a copy function manually, since you use it in one specific place, that should make things a lot faster. In the meantime we'll look into the fix On Sat, Oct 29, 2016 at 8:02 AM, Saud Alwasly wrote: > here is the new zip file including the missing package > > > On Fri, Oct 28, 2016 at 10:58 AM Maciej Fijalkowski > wrote: >> >> (vmprof)[brick:~/Downloads/dir] $ python Simulate.py >> >> Traceback (most recent call last): >> >> File "Simulate.py", line 9, in >> >> from pypyplotPKG.Ploter import Graph, plot2, hist, bar, bar2 >> >> File "/Users/dev/Downloads/dir/pypyplotPKG/Ploter.py", line 1, in >> >> >> import pypyplotPKG.pypyplot as plt >> >> File "/Users/dev/Downloads/dir/pypyplotPKG/pypyplot.py", line 3, in >> >> >> from shareddataPKG import SharedData as SharedData >> >> ImportError: No module named shareddataPKG >> >> On Fri, Oct 28, 2016 at 12:42 AM, Saud Alwasly >> wrote: >> > Sure, here it is. >> > you can run Simulation.py >> > >> > On Wed, Oct 26, 2016 at 9:08 AM Armin Rigo wrote: >> >> >> >> Hi Saud, >> >> >> >> On 26 October 2016 at 11:19, Saud Alwasly >> >> wrote: >> >> > I have noticed severe performance degradation after upgrading pypy >> >> > from >> >> > pypy4.0.1 to pypy5.4.1. >> >> > >> >> > I attached the call graph for the same script running on both: >> >> > it seems that the copy module is an issue in the new version. >> >> >> >> Please give us an example of runnable code that shows the problem. We >> >> (or at least I) can't do anything with a call graph. >> >> >> >> >> >> A bient?t, >> >> >> >> Armin. >> > >> > >> > _______________________________________________ >> > pypy-dev mailing list >> > pypy-dev at python.org >> > https://mail.python.org/mailman/listinfo/pypy-dev >> > From saudalwasli at gmail.com Sat Oct 29 02:02:31 2016 From: saudalwasli at gmail.com (Saud Alwasly) Date: Sat, 29 Oct 2016 06:02:31 +0000 Subject: [pypy-dev] Performance degradation in the latest pypy In-Reply-To: References: Message-ID: here is the new zip file including the missing package On Fri, Oct 28, 2016 at 10:58 AM Maciej Fijalkowski wrote: > (vmprof)[brick:~/Downloads/dir] $ python Simulate.py > > Traceback (most recent call last): > > File "Simulate.py", line 9, in > > from pypyplotPKG.Ploter import Graph, plot2, hist, bar, bar2 > > File "/Users/dev/Downloads/dir/pypyplotPKG/Ploter.py", line 1, in > > > import pypyplotPKG.pypyplot as plt > > File "/Users/dev/Downloads/dir/pypyplotPKG/pypyplot.py", line 3, in > > > from shareddataPKG import SharedData as SharedData > > ImportError: No module named shareddataPKG > > On Fri, Oct 28, 2016 at 12:42 AM, Saud Alwasly > wrote: > > Sure, here it is. > > you can run Simulation.py > > > > On Wed, Oct 26, 2016 at 9:08 AM Armin Rigo wrote: > >> > >> Hi Saud, > >> > >> On 26 October 2016 at 11:19, Saud Alwasly > wrote: > >> > I have noticed severe performance degradation after upgrading pypy > from > >> > pypy4.0.1 to pypy5.4.1. > >> > > >> > I attached the call graph for the same script running on both: > >> > it seems that the copy module is an issue in the new version. > >> > >> Please give us an example of runnable code that shows the problem. We > >> (or at least I) can't do anything with a call graph. > >> > >> > >> A bient?t, > >> > >> Armin. > > > > > > _______________________________________________ > > pypy-dev mailing list > > pypy-dev at python.org > > https://mail.python.org/mailman/listinfo/pypy-dev > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: BuggySctipt.zip Type: application/zip Size: 135828 bytes Desc: not available URL: From matti.picus at gmail.com Sun Oct 30 14:45:37 2016 From: matti.picus at gmail.com (Matti Picus) Date: Sun, 30 Oct 2016 20:45:37 +0200 Subject: [pypy-dev] Fwd: [SUPPORT] [Bitbucket Cloud Support] users are issuing spam pull requests [BBS-42969] In-Reply-To: References: Message-ID: I filed an issue in PyPy's name with Atlassian about spam pull requests on the PyPy hg repo, it is annoying to decline them. Here is the text of the issue (reference BBS-42969): "We have been getting pull requests like this one https://bitbucket.org/pypy/pypy/pull-requests/492/niconico-translator/diff Pure garbage. How can we prevent this?" Matti From planrichi at gmail.com Mon Oct 31 11:20:41 2016 From: planrichi at gmail.com (Richard Plangger) Date: Mon, 31 Oct 2016 16:20:41 +0100 Subject: [pypy-dev] SSL module Py3.5, Github? Bitbucket? Message-ID: Hi, I'm currenlty working on the ssl stdlib replacement we agreed to implement during for py3.5. I'm unsure what the current rule for new project is. Github? Bitbucket? VMProf was initiated on Github, what did we learn from that experiment? Is the bar lower for new contributions? Cheers, Richard -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From matti.picus at gmail.com Mon Oct 31 12:58:50 2016 From: matti.picus at gmail.com (Matti Picus) Date: Mon, 31 Oct 2016 18:58:50 +0200 Subject: [pypy-dev] Fwd: [SUPPORT] [Bitbucket Cloud Support] users are issuing spam pull requests [BBS-42969] In-Reply-To: References: Message-ID: On 30/10/16 20:45, Matti Picus wrote: > I filed an issue in PyPy's name with Atlassian about spam pull > requests on the PyPy hg repo, it is annoying to decline them. > > Here is the text of the issue (reference BBS-42969): > > "We have been getting pull requests like this one > > https://bitbucket.org/pypy/pypy/pull-requests/492/niconico-translator/diff > > > Pure garbage. How can we prevent this?" > > Matti > > I recieved this response from Atlassian, so I guess it pays to try to report abusers "Hi Mattip, Thank you for the report. These PRs were indeed created by spam users and I went ahead and removed the user accounts completely. Please, feel free to let us know if you need anything further. Cheers," From yashwardhan.singh at intel.com Mon Oct 31 17:28:49 2016 From: yashwardhan.singh at intel.com (Singh, Yashwardhan) Date: Mon, 31 Oct 2016 21:28:49 +0000 Subject: [pypy-dev] PGO Optimized Binary Message-ID: <0151F66FF725AC42A760DA612754C5F8198E093B@ORSMSX104.amr.corp.intel.com> Hi All, We applied compiler assisted optimization technique called PGO or Profile Guided Optimization while building PyPy, and found performance got improved by up to 22.4% on the Grand Unified Python Benchmark (GUPB) from ?hg clone https://hg.python.org/benchmarks?. The below result table shows majority of 51 micros got performance boost with 8 got performance regression. Benchmark Baseline PGO Perf Delta % hg_startup 0.0160 0.0124 22.4 2to3 6.1157 5.1978 15.0 html5lib 4.9263 4.1961 14.8 formatted_logging 0.0463 0.0399 13.9 regex_v8 0.1394 0.1206 13.5 simple_logging 0.0328 0.0289 11.9 html5lib_warmup 2.5411 2.2939 9.7 bzr_startup 0.0686 0.0621 9.6 unpack_sequence 0.0001 0.0001 8.6 normal_startup 0.8694 0.7983 8.2 regex_compile 0.0707 0.0657 7.0 json_load 0.2924 0.2734 6.5 fastpickle 1.7315 1.6290 5.9 tornado_http 0.0707 0.0665 5.8 pickle_list 1.8614 1.7897 3.9 slowunpickle 0.0260 0.0250 3.8 slowpickle 0.0336 0.0323 3.7 telco 0.0194 0.0187 3.7 pathlib 0.0171 0.0165 3.2 go 0.1069 0.1036 3.1 slowspitfire 0.2624 0.2547 2.9 etree_generate 0.1037 0.1008 2.8 silent_logging 0.0000 0.0000 2.8 pickle_dict 3.2698 3.1796 2.8 spambayes 0.0581 0.0566 2.6 startup_nosite 0.5691 0.5549 2.5 chameleon_v2 2.7629 2.7009 2.2 etree_parse 0.5610 0.5505 1.9 etree_process 0.0725 0.0712 1.9 regex_effbot 0.0377 0.0371 1.7 fastunpickle 0.8521 0.8382 1.6 float 0.0171 0.0169 0.9 pidigits 0.3833 0.3801 0.8 call_method_unknown 0.0123 0.0122 0.6 hexiom2 15.8354 15.7533 0.5 etree_iterparse 0.2102 0.2094 0.4 chaos 0.0089 0.0088 0.2 spectral_norm 0.0099 0.0099 0.2 call_simple 0.0102 0.0102 0.1 mako_v2 0.0204 0.0204 0.1 fannkuch 0.2262 0.2260 0.1 unpickle_list 0.6448 0.6449 0.0 call_method_slots 0.0106 0.0106 0.0 call_method 0.0106 0.0106 -0.1 raytrace 0.0210 0.0210 -0.2 richards 0.0042 0.0043 -1.6 json_dump_v2 0.9288 0.9501 -2.3 django_v3 0.0551 0.0570 -3.4 meteor_contest 0.0984 0.1021 -3.8 nbody 0.0446 0.0463 -3.8 nqueens 0.0498 0.0525 -5.4 Average 3.6 We?d like to get some input on how to contribute our optimization recipe to the PyPy dev tree, perhaps by creating an item to the PyPy issue tracker? In addition, we would also appreciate any other benchmark or real world use based workload as alternatives to evaluate this. Thanks, Yash -------------- next part -------------- An HTML attachment was scrubbed... URL: