From rdias at suse.com Mon Dec 10 08:42:17 2018 From: rdias at suse.com (Ricardo Dias) Date: Mon, 10 Dec 2018 13:42:17 +0000 Subject: [Cython] Python subinterpreters support problem in v0.29 Message-ID: <58473d38-7608-7020-934d-16ca54c1f808@suse.com> Hi Cython developers, In the recent Cython 0.29 version was introduced a commit [1] that hinders the usage of python subinterpreters. I discovered this the hard way when suddenly a component I was working on started to crash. The component in question is the ceph-mgr daemon from the Ceph project [2]. Python subinterpreters are the basic building block for the plugin/module architecture of ceph-mgr. Each "manager module" runs in its own python subinterpreter. Furthermore, all python bindings for the client libraries of Ceph, such as librados, librbd, libcephfs, and librgw, are implemented as Cython modules, and in the particular case of librados, all ceph-mgr plugin modules import the rados Cython module upon initialization. In practice, with Cython 0.29 we can only load one module, because the following modules will refuse to load. After discovering this issue, we "temporarily" prevent the issue by restricting the version of Cython as a dependency [3]. But we don't want to keep this restriction indefinitely and would prefer a fix from the Cython side. Do you think it's feasible to implement a flag to disable the safe guard introduced in [1]? That way we could re-enable subinterpreters at our own risk. [1] https://github.com/cython/cython/commit/7e27c7cd51a2f048cd6d3c246740cd977f8d2e50 [2] https://github.com/ceph/ceph [3] https://github.com/ceph/ceph/pull/25328 -- Ricardo Dias Senior Software Engineer - Storage Team SUSE Linux GmbH, GF: Felix Imend?rffer, Jane Smithard, Graham Norton, HRB 21284 (AG N?rnberg) -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: From stefan_ml at behnel.de Tue Dec 11 14:39:32 2018 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 11 Dec 2018 20:39:32 +0100 Subject: [Cython] Python subinterpreters support problem in v0.29 In-Reply-To: <58473d38-7608-7020-934d-16ca54c1f808@suse.com> References: <58473d38-7608-7020-934d-16ca54c1f808@suse.com> Message-ID: <2b2c91fb-7421-5e4f-93e0-21b5c9f2eb70@behnel.de> Ricardo Dias schrieb am 10.12.18 um 14:42: > In the recent Cython 0.29 version was introduced a commit [1] that > hinders the usage of python subinterpreters. > > I discovered this the hard way when suddenly a component I was working > on started to crash. The component in question is the ceph-mgr daemon > from the Ceph project [2]. > > Python subinterpreters are the basic building block for the > plugin/module architecture of ceph-mgr. Each "manager module" runs in > its own python subinterpreter. Furthermore, all python bindings for the > client libraries of Ceph, such as librados, librbd, libcephfs, and > librgw, are implemented as Cython modules, and in the particular case of > librados, all ceph-mgr plugin modules import the rados Cython module > upon initialization. > > In practice, with Cython 0.29 we can only load one module, because the > following modules will refuse to load. > > After discovering this issue, we "temporarily" prevent the issue by > restricting the version of Cython as a dependency [3]. But we don't want > to keep this restriction indefinitely and would prefer a fix from the > Cython side. > > Do you think it's feasible to implement a flag to disable the safe guard > introduced in [1]? That way we could re-enable subinterpreters at our > own risk. > > [1] > https://github.com/cython/cython/commit/7e27c7cd51a2f048cd6d3c246740cd977f8d2e50 > [2] https://github.com/ceph/ceph > [3] https://github.com/ceph/ceph/pull/25328 My guess is that your modules just silently leaked object references and memory with the previous Cython versions. That is why we now inserted a guard that detects cases where the module init function is executed multiple times, which would overwrite the state of the previous run. The shared library of an extension module is only loaded once, so any global C state is shared for the entire process, regardless of how often CPython calls the module init function. I am surprised that your setup didn't crash in any way. Could you explain a bit more how you are using this feature? Are the different subinterpreters running in parallel or sequentially? The ceph repo looks huge. Any pointers where I should start looking? I actually wonder if we could at least support sequential usages through the module cleanup mechanism. Once a module is cleaned up and all global objects freed, calling the module init function again should be ok. Apart from that, here is the feature ticket for module specific global state: https://github.com/cython/cython/issues/2343 Stefan From njs at vorpus.org Tue Dec 11 15:07:27 2018 From: njs at vorpus.org (Nathaniel Smith) Date: Tue, 11 Dec 2018 12:07:27 -0800 Subject: [Cython] Python subinterpreters support problem in v0.29 In-Reply-To: References: <58473d38-7608-7020-934d-16ca54c1f808@suse.com> Message-ID: (resending to cython-devel list since my first attempt bounced) On Tue, Dec 11, 2018, 11:56 Nathaniel Smith FYI ? you should be aware that subinterpreters are poorly tested (AFAIK > ceph is the third project to try using them, ever), not well supported in > general, and there has even been some discussion of removing them from > CPython. For example, numpy has never supported being used with > subinterpreters, and currently has no plans to fix this. I suspect other > extension modules are in similar positions, but since the bugs that > subinterpreters trigger are often really hard to detect or debug, no one > really knows. > > On Tue, Dec 11, 2018, 11:11 Ricardo Dias >> Hi Cython developers, >> >> In the recent Cython 0.29 version was introduced a commit [1] that >> hinders the usage of python subinterpreters. >> >> I discovered this the hard way when suddenly a component I was working >> on started to crash. The component in question is the ceph-mgr daemon >> from the Ceph project [2]. >> >> Python subinterpreters are the basic building block for the >> plugin/module architecture of ceph-mgr. Each "manager module" runs in >> its own python subinterpreter. Furthermore, all python bindings for the >> client libraries of Ceph, such as librados, librbd, libcephfs, and >> librgw, are implemented as Cython modules, and in the particular case of >> librados, all ceph-mgr plugin modules import the rados Cython module >> upon initialization. >> >> In practice, with Cython 0.29 we can only load one module, because the >> following modules will refuse to load. >> >> After discovering this issue, we "temporarily" prevent the issue by >> restricting the version of Cython as a dependency [3]. But we don't want >> to keep this restriction indefinitely and would prefer a fix from the >> Cython side. >> >> Do you think it's feasible to implement a flag to disable the safe guard >> introduced in [1]? That way we could re-enable subinterpreters at our >> own risk. >> >> >> [1] >> >> https://github.com/cython/cython/commit/7e27c7cd51a2f048cd6d3c246740cd977f8d2e50 >> [2] https://github.com/ceph/ceph >> [3] https://github.com/ceph/ceph/pull/25328 >> >> -- >> Ricardo Dias >> Senior Software Engineer - Storage Team >> SUSE Linux GmbH, GF: Felix Imend?rffer, Jane Smithard, Graham Norton, >> HRB 21284 >> (AG N?rnberg) >> >> >> >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> https://mail.python.org/mailman/listinfo/cython-devel >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdias at suse.com Tue Dec 11 17:16:55 2018 From: rdias at suse.com (Ricardo Dias) Date: Tue, 11 Dec 2018 22:16:55 +0000 Subject: [Cython] Python subinterpreters support problem in v0.29 In-Reply-To: <2b2c91fb-7421-5e4f-93e0-21b5c9f2eb70@behnel.de> References: <58473d38-7608-7020-934d-16ca54c1f808@suse.com> <2b2c91fb-7421-5e4f-93e0-21b5c9f2eb70@behnel.de> Message-ID: On 11/12/18 19:39, Stefan Behnel wrote: > Ricardo Dias schrieb am 10.12.18 um 14:42: >> In the recent Cython 0.29 version was introduced a commit [1] that >> hinders the usage of python subinterpreters. >> >> I discovered this the hard way when suddenly a component I was working >> on started to crash. The component in question is the ceph-mgr daemon >> from the Ceph project [2]. >> >> Python subinterpreters are the basic building block for the >> plugin/module architecture of ceph-mgr. Each "manager module" runs in >> its own python subinterpreter. Furthermore, all python bindings for the >> client libraries of Ceph, such as librados, librbd, libcephfs, and >> librgw, are implemented as Cython modules, and in the particular case of >> librados, all ceph-mgr plugin modules import the rados Cython module >> upon initialization. >> >> In practice, with Cython 0.29 we can only load one module, because the >> following modules will refuse to load. >> >> After discovering this issue, we "temporarily" prevent the issue by >> restricting the version of Cython as a dependency [3]. But we don't want >> to keep this restriction indefinitely and would prefer a fix from the >> Cython side. >> >> Do you think it's feasible to implement a flag to disable the safe guard >> introduced in [1]? That way we could re-enable subinterpreters at our >> own risk. >> >> [1] >> https://github.com/cython/cython/commit/7e27c7cd51a2f048cd6d3c246740cd977f8d2e50 >> [2] https://github.com/ceph/ceph >> [3] https://github.com/ceph/ceph/pull/25328 > > My guess is that your modules just silently leaked object references and > memory with the previous Cython versions. That is why we now inserted a > guard that detects cases where the module init function is executed > multiple times, which would overwrite the state of the previous run. The > shared library of an extension module is only loaded once, so any global C > state is shared for the entire process, regardless of how often CPython > calls the module init function. I assume that the problem with subinterpreters occurs when a cython module declares some static/global variables, which might cause undesirable side-effects upon module loading in several subinterpreters. I believe the cython modules that we develop in Ceph do not declared any global state, and therefore the modules have been working good when loaded by several subinterpreters. > > I am surprised that your setup didn't crash in any way. Could you explain a > bit more how you are using this feature? Are the different subinterpreters > running in parallel or sequentially? The ceph repo looks huge. Any pointers > where I should start looking? The subinterpreters are run in parallel. Basically we have a single process, the ceph-mgr daemon that creates a subinterpreter per each mgr plugin (a plugin is basically a pure python module) that it finds in a specific location. All these plugins import the "rados" cython module to be able to talk with the Ceph cluster. The C++ code that manages the subinterpreters can be found at: https://github.com/ceph/ceph/tree/master/src/mgr More specifically in the files PyModule.* PyModuleRegistry.*: https://github.com/ceph/ceph/blob/master/src/mgr/PyModule.cc#L324 > > I actually wonder if we could at least support sequential usages through > the module cleanup mechanism. Once a module is cleaned up and all global > objects freed, calling the module init function again should be ok.> > Apart from that, here is the feature ticket for module specific global state: > > https://github.com/cython/cython/issues/2343 > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > https://mail.python.org/mailman/listinfo/cython-devel > -- Ricardo Dias Senior Software Engineer - Storage Team SUSE Linux GmbH, GF: Felix Imend?rffer, Jane Smithard, Graham Norton, HRB 21284 (AG N?rnberg) -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: From stefan_ml at behnel.de Wed Dec 12 13:37:49 2018 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 12 Dec 2018 19:37:49 +0100 Subject: [Cython] Python subinterpreters support problem in v0.29 In-Reply-To: References: <58473d38-7608-7020-934d-16ca54c1f808@suse.com> <2b2c91fb-7421-5e4f-93e0-21b5c9f2eb70@behnel.de> Message-ID: <48d25581-e079-742e-02d8-e09ff257f2d6@behnel.de> Ricardo Dias schrieb am 11.12.18 um 23:16: > On 11/12/18 19:39, Stefan Behnel wrote: >> Ricardo Dias schrieb am 10.12.18 um 14:42: >>> In the recent Cython 0.29 version was introduced a commit [1] that >>> hinders the usage of python subinterpreters. >>> >>> I discovered this the hard way when suddenly a component I was working >>> on started to crash. The component in question is the ceph-mgr daemon >>> from the Ceph project [2]. >>> >>> Python subinterpreters are the basic building block for the >>> plugin/module architecture of ceph-mgr. Each "manager module" runs in >>> its own python subinterpreter. Furthermore, all python bindings for the >>> client libraries of Ceph, such as librados, librbd, libcephfs, and >>> librgw, are implemented as Cython modules, and in the particular case of >>> librados, all ceph-mgr plugin modules import the rados Cython module >>> upon initialization. >>> >>> In practice, with Cython 0.29 we can only load one module, because the >>> following modules will refuse to load. >>> >>> After discovering this issue, we "temporarily" prevent the issue by >>> restricting the version of Cython as a dependency [3]. But we don't want >>> to keep this restriction indefinitely and would prefer a fix from the >>> Cython side. >>> >>> Do you think it's feasible to implement a flag to disable the safe guard >>> introduced in [1]? That way we could re-enable subinterpreters at our >>> own risk. >>> >>> [1] >>> https://github.com/cython/cython/commit/7e27c7cd51a2f048cd6d3c246740cd977f8d2e50 >>> [2] https://github.com/ceph/ceph >>> [3] https://github.com/ceph/ceph/pull/25328 >> >> My guess is that your modules just silently leaked object references and >> memory with the previous Cython versions. That is why we now inserted a >> guard that detects cases where the module init function is executed >> multiple times, which would overwrite the state of the previous run. The >> shared library of an extension module is only loaded once, so any global C >> state is shared for the entire process, regardless of how often CPython >> calls the module init function. > > I assume that the problem with subinterpreters occurs when a cython > module declares some static/global variables, which might cause > undesirable side-effects upon module loading in several subinterpreters. > > I believe the cython modules that we develop in Ceph do not declared any > global state, and therefore the modules have been working good when > loaded by several subinterpreters. This question already came up recently in the cython-users mailing list (where I'd say it belongs). Since I couldn't find a web version of my reply anywhere outside of google-groups, I'll copy my reply below: """ > We use a lot of scratch > interpreters to keep independent tasks isolated from one another which is > when I ran into the error. In theory, PEP-489 would allow this. https://www.python.org/dev/peps/pep-0489/ In practice, it's not that easy, because avoiding global state requires a lot of work and makes some things slower, especially access to module globals. It also cannot be done in normal C in all cases, because global cdef functions simply do not have access to non-static module globals, since you cannot pass an additional context into them without changing their signature. Imagine a (statically defined global) C callback function that tries to do a type check against a (module/runtime instance specific) extension type. Which is the right type to check against in that case? Depending on how such a function gets called, there might not even be a thread-local to recover its global module context from. These things could still be done by generating module specific C functions at runtime, but then you're really leaving the platform independent sector of C. These problems are not specific to Cython. Most CPython extension modules are not prepared to work with multiple interpreters, mostly for the same reasons. These issues can be worked around in some cases by carefully crafting the global state in a way that allows it to be shared across interpreters, but this is such a special case that Cython rather assumes that the module init code is not safe to be re-executed. And the code that Cython generates internally is also far from PEP-489 clean. I started doing some work towards getting rid of globals here: https://github.com/cython/cython/pull/1919 And, what a surprise, it's not easy. It turned out that PEP-489 isn't really enough here, so PEP-573 was written to investigate the details and resolve the remaining issues. https://www.python.org/dev/peps/pep-0573/ Very long story short: Cython detects reloading now and prevents it, rather than crashing in arbitrary places or leaking resources. The situation will probably improve over time (and help is always appreciated), but that's how things are now. """ Stefan From newt0311 at gmail.com Sat Dec 29 09:02:57 2018 From: newt0311 at gmail.com (Prakhar Goel) Date: Sat, 29 Dec 2018 09:02:57 -0500 Subject: [Cython] Debugging Cython programs with PDB Message-ID: I wanted to test the waters on this idea. The idea is to allow debugging Cython programs with PDB. This relies on making call-backs to trigger the sys.trace functionality every now and then. It is very similar to how profiling is handled so I figured that a bunch of the infra for this is already present. It involves a lot of overhead so it would only be enabled for a special debug mode but could be very useful in that capacity since it would allow using identical tools for both the Python and Cython modules. Additionally other people have built debugging tools on top of sys.settrace so some of the more advanced debugging facilities would also become at least partially available for Cython modules. It would be a fair bit of work. I'm not asking you to do this work of course! Just looking for some feedback here. The rough pieces as far as I can tell: > Adding in a flag for a special debug mode that calls Python's trace functions appropriately. > code-gen for all the trace calls. > Ideally we want some kind of wrapped call-frame that exports the current set of variables. Cython knows quite a bit about the variables currently in scope thanks to the cdef declarations. Ideally we should export these over to the Python side (with appropriate wrappers perhaps for the raw-c structs?). This is probably the most work but I'm hoping that getting a MVP here wouldn't be too hard... Thoughts? Comments? Suggestions? Thanks. -- ________________________ Warm Regards Prakhar Goel From stefan_ml at behnel.de Mon Dec 31 05:28:25 2018 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 31 Dec 2018 11:28:25 +0100 Subject: [Cython] Debugging Cython programs with PDB In-Reply-To: References: Message-ID: <5909a102-5cd0-4e04-1ca9-cb0abe371d0e@behnel.de> Hi! Nice idea. Prakhar Goel schrieb am 29.12.18 um 15:02: > I wanted to test the waters on this idea. > > The idea is to allow debugging Cython programs with PDB. This relies > on making call-backs to trigger the sys.trace functionality every now > and then. It is very similar to how profiling is handled so I figured > that a bunch of the infra for this is already present. It involves a > lot of overhead so it would only be enabled for a special debug mode > but could be very useful in that capacity since it would allow using > identical tools for both the Python and Cython modules. Additionally > other people have built debugging tools on top of sys.settrace so some > of the more advanced debugging facilities would also become at least > partially available for Cython modules. > > It would be a fair bit of work. I'm not asking you to do this work of > course! Just looking for some feedback here. The rough pieces as far > as I can tell: > >> Adding in a flag for a special debug mode that calls Python's trace functions appropriately. > >> code-gen for all the trace calls. Most of this should already be in place. I wouldn't want a special flag for it, just extend the existing tracing support (which works for coverage analysis, mostly). >> Ideally we want some kind of wrapped call-frame that exports the current set of variables. Cython knows quite a bit about the variables currently in scope thanks to the cdef declarations. Not only those, it actually knows all defined names in the current scope (except for the global scope, which is extensible at runtime). >> Ideally we should export these over to the Python side (with appropriate wrappers perhaps for the raw-c structs?). This is probably the most work but I'm hoping that getting a MVP here wouldn't be too hard... There is support for locals(), which does most of what you need here. Regarding frames, Cython's tracing code uses frames already. I think the important steps would be: 1) Make sure there is only one frame per function execution. Currently, there are cases where we use one per source code line, which is an ugly hack in lack of proper Cython line number reporting support. This can be achieved by finalising the lnotab support in PR-93, and then cleaning up the way we create code objects and frames for tracing. https://github.com/cython/cython/pull/93 2) Extend the meta-data that the code objects (which we already create, just mostly empty) provide for each of the functions, since that is where PDB gets its introspection details from. 3) Make the frame refer to the current "locals", Preferably in a way that only generates the dict on request, not for all frames. In the worst case, that could be something to enable with a C compile time define, as we do for tracing in general. 4) Trial and error fixing. :) So, yeah, there is a bit of work involved, but it seems doable and worth doing. Are you interested in giving this a try? Stefan From erik.m.bray at gmail.com Mon Dec 31 05:58:13 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Mon, 31 Dec 2018 11:58:13 +0100 Subject: [Cython] Debugging Cython programs with PDB In-Reply-To: References: Message-ID: On Mon, Dec 31, 2018 at 10:55 AM Prakhar Goel wrote: > > I wanted to test the waters on this idea. > > The idea is to allow debugging Cython programs with PDB. This relies > on making call-backs to trigger the sys.trace functionality every now > and then. It is very similar to how profiling is handled so I figured > that a bunch of the infra for this is already present. It involves a > lot of overhead so it would only be enabled for a special debug mode > but could be very useful in that capacity since it would allow using > identical tools for both the Python and Cython modules. Additionally > other people have built debugging tools on top of sys.settrace so some > of the more advanced debugging facilities would also become at least > partially available for Cython modules. > > It would be a fair bit of work. I'm not asking you to do this work of > course! Just looking for some feedback here. The rough pieces as far > as I can tell: > > > Adding in a flag for a special debug mode that calls Python's trace functions appropriately. > > > code-gen for all the trace calls. > > > Ideally we want some kind of wrapped call-frame that exports the current set of variables. Cython knows quite a bit about the variables currently in scope thanks to the cdef declarations. Ideally we should export these over to the Python side (with appropriate wrappers perhaps for the raw-c structs?). This is probably the most work but I'm hoping that getting a MVP here wouldn't be too hard... > > Thoughts? Comments? Suggestions? I have had the exact same idea before, and I believe it to be possible, but as you say non-trivial. But at least you're not the only one to have the idea so it can't be completely crazy :) If you can get the basics working, you might also want to look into supporting it via Python 3.7's sys.breakpointhook: https://www.python.org/dev/peps/pep-0553/