From ncoghlan at gmail.com Mon Jul 6 04:11:57 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 6 Jul 2015 12:11:57 +1000 Subject: [Import-SIG] Bundling importlib2 with Python 2.7.x? Message-ID: What do folks think of the idea of proposing bundling importlib2 with Python 2.7.x (via pip), such that issues like https://bitbucket.org/pypa/setuptools/issue/250/ can be addressed by telling people to enable the Python 3 style import system? That is, Python 2 would still use the legacy import system by default, but Python 3 style imports would just be an "import importlib2; importlib2.install_import_hooks()" away? The main risk I see with the idea is projects deciding to install those hooks as a side effect of their own import. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From donald at stufft.io Mon Jul 6 04:23:09 2015 From: donald at stufft.io (Donald Stufft) Date: Sun, 5 Jul 2015 22:23:09 -0400 Subject: [Import-SIG] Bundling importlib2 with Python 2.7.x? In-Reply-To: References: Message-ID: On July 5, 2015 at 10:18:07 PM, Nick Coghlan (ncoghlan at gmail.com) wrote: > What do folks think of the idea of proposing bundling importlib2 with > Python 2.7.x (via pip), such that issues like > https://bitbucket.org/pypa/setuptools/issue/250/ can be addressed by > telling people to enable the Python 3 style import system? > > That is, Python 2 would still use the legacy import system by default, > but Python 3 style imports would just be an "import importlib2; > importlib2.install_import_hooks()" away? > > The main risk I see with the idea is projects deciding to install > those hooks as a side effect of their own import. > If you?re already installing a third party module, isn?t installing a second third party module a pretty small amount of additional work? IOW, bundling pip made sense because of the bootstrapping problems, but once you have pip, directing people to depend on importlib2 isn?t very hard if they?re already installing hypotheticalthingthatwoulduseit? --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From ncoghlan at gmail.com Mon Jul 6 04:31:38 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 6 Jul 2015 12:31:38 +1000 Subject: [Import-SIG] Bundling importlib2 with Python 2.7.x? In-Reply-To: References: Message-ID: On 6 July 2015 at 12:23, Donald Stufft wrote: > > On July 5, 2015 at 10:18:07 PM, Nick Coghlan (ncoghlan at gmail.com) wrote: >> What do folks think of the idea of proposing bundling importlib2 with >> Python 2.7.x (via pip), such that issues like >> https://bitbucket.org/pypa/setuptools/issue/250/ can be addressed by >> telling people to enable the Python 3 style import system? >> >> That is, Python 2 would still use the legacy import system by default, >> but Python 3 style imports would just be an "import importlib2; >> importlib2.install_import_hooks()" away? >> >> The main risk I see with the idea is projects deciding to install >> those hooks as a side effect of their own import. >> > > If you?re already installing a third party module, isn?t installing > a second third party module a pretty small amount of additional work? > > IOW, bundling pip made sense because of the bootstrapping problems, > but once you have pip, directing people to depend on importlib2 isn?t > very hard if they?re already installing hypotheticalthingthatwoulduseit? Aye, a docs-only approach could work, and would definitely be easier to maintain. It may just be a matter of pushing in that direction on the PyPA side of things, by considering "requires importlib2 on Python 2.7" to be a reasonable requirement for getting some kinds of operations to work smoothly. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From erik.m.bray at gmail.com Mon Jul 6 17:57:57 2015 From: erik.m.bray at gmail.com (Erik Bray) Date: Mon, 6 Jul 2015 11:57:57 -0400 Subject: [Import-SIG] Bundling importlib2 with Python 2.7.x? In-Reply-To: References: Message-ID: On Sun, Jul 5, 2015 at 10:31 PM, Nick Coghlan wrote: > On 6 July 2015 at 12:23, Donald Stufft wrote: >> >> On July 5, 2015 at 10:18:07 PM, Nick Coghlan (ncoghlan at gmail.com) wrote: >>> What do folks think of the idea of proposing bundling importlib2 with >>> Python 2.7.x (via pip), such that issues like >>> https://bitbucket.org/pypa/setuptools/issue/250/ can be addressed by >>> telling people to enable the Python 3 style import system? >>> >>> That is, Python 2 would still use the legacy import system by default, >>> but Python 3 style imports would just be an "import importlib2; >>> importlib2.install_import_hooks()" away? >>> >>> The main risk I see with the idea is projects deciding to install >>> those hooks as a side effect of their own import. >>> >> >> If you?re already installing a third party module, isn?t installing >> a second third party module a pretty small amount of additional work? >> >> IOW, bundling pip made sense because of the bootstrapping problems, >> but once you have pip, directing people to depend on importlib2 isn?t >> very hard if they?re already installing hypotheticalthingthatwoulduseit? > > Aye, a docs-only approach could work, and would definitely be easier > to maintain. It may just be a matter of pushing in that direction on > the PyPA side of things, by considering "requires importlib2 on Python > 2.7" to be a reasonable requirement for getting some kinds of > operations to work smoothly. I'm not exactly sure I follow--the types of operations we're talking about involve installation of packages, and I can't go telling users "you have to install importlib2 in order for installation of these packages to not hose up your system" when they're installing distributions that have namespace packages. I think that this issue affects more than just `./setup.py develop` / `pip install --editable`. It also affects installation of namespace packages with or without --single-version-externally-managed (that is as eggs or not). Having importlib2 to natively support new-style namespace packages would be a boon, and pulling it in along with pip seems like it would be straightforward to support. Alternatively I would argue to just include it with setuptools. Erik From donald at stufft.io Mon Jul 6 18:10:27 2015 From: donald at stufft.io (Donald Stufft) Date: Mon, 6 Jul 2015 12:10:27 -0400 Subject: [Import-SIG] Bundling importlib2 with Python 2.7.x? In-Reply-To: References: Message-ID: On July 6, 2015 at 11:57:59 AM, Erik Bray (erik.m.bray at gmail.com) wrote: > On Sun, Jul 5, 2015 at 10:31 PM, Nick Coghlan wrote: > > On 6 July 2015 at 12:23, Donald Stufft wrote: > >> > >> On July 5, 2015 at 10:18:07 PM, Nick Coghlan (ncoghlan at gmail.com) wrote: > >>> What do folks think of the idea of proposing bundling importlib2 with > >>> Python 2.7.x (via pip), such that issues like > >>> https://bitbucket.org/pypa/setuptools/issue/250/ can be addressed by > >>> telling people to enable the Python 3 style import system? > >>> > >>> That is, Python 2 would still use the legacy import system by default, > >>> but Python 3 style imports would just be an "import importlib2; > >>> importlib2.install_import_hooks()" away? > >>> > >>> The main risk I see with the idea is projects deciding to install > >>> those hooks as a side effect of their own import. > >>> > >> > >> If you?re already installing a third party module, isn?t installing > >> a second third party module a pretty small amount of additional work? > >> > >> IOW, bundling pip made sense because of the bootstrapping problems, > >> but once you have pip, directing people to depend on importlib2 isn?t > >> very hard if they?re already installing hypotheticalthingthatwoulduseit? > > > > Aye, a docs-only approach could work, and would definitely be easier > > to maintain. It may just be a matter of pushing in that direction on > > the PyPA side of things, by considering "requires importlib2 on Python > > 2.7" to be a reasonable requirement for getting some kinds of > > operations to work smoothly. > > I'm not exactly sure I follow--the types of operations we're talking > about involve installation of packages, and I can't go telling users > "you have to install importlib2 in order for installation of these > packages to not hose up your system" when they're installing > distributions that have namespace packages. > > I think that this issue affects more than just `./setup.py develop` / > `pip install --editable`. It also affects installation of namespace > packages with or without --single-version-externally-managed (that is > as eggs or not). Having importlib2 to natively support new-style > namespace packages would be a boon, and pulling it in along with pip > seems like it would be straightforward to support. > > Alternatively I would argue to just include it with setuptools. > > Erik > If you?re installing using an actual package manager like pip, easy_install, apt-get, yum, whatever the instructions won?t change. You?ll need to ``pip install foo.bar`` and the fact it depends on importlib2 is something the installer takes care of. You?d need to tell anyone who is using implicit namespace packages that they need to depend on importlib2 on Python 2, and you?d need to tell anyone doing a ?manual? installation (e.g. they download a tarball from PyPI and install it manually) to get importlib2. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From erik.m.bray at gmail.com Mon Jul 6 19:08:16 2015 From: erik.m.bray at gmail.com (Erik Bray) Date: Mon, 6 Jul 2015 13:08:16 -0400 Subject: [Import-SIG] Bundling importlib2 with Python 2.7.x? In-Reply-To: References: Message-ID: On Mon, Jul 6, 2015 at 12:10 PM, Donald Stufft wrote: > > > On July 6, 2015 at 11:57:59 AM, Erik Bray (erik.m.bray at gmail.com) wrote: >> On Sun, Jul 5, 2015 at 10:31 PM, Nick Coghlan wrote: >> > On 6 July 2015 at 12:23, Donald Stufft wrote: >> >> >> >> On July 5, 2015 at 10:18:07 PM, Nick Coghlan (ncoghlan at gmail.com) wrote: >> >>> What do folks think of the idea of proposing bundling importlib2 with >> >>> Python 2.7.x (via pip), such that issues like >> >>> https://bitbucket.org/pypa/setuptools/issue/250/ can be addressed by >> >>> telling people to enable the Python 3 style import system? >> >>> >> >>> That is, Python 2 would still use the legacy import system by default, >> >>> but Python 3 style imports would just be an "import importlib2; >> >>> importlib2.install_import_hooks()" away? >> >>> >> >>> The main risk I see with the idea is projects deciding to install >> >>> those hooks as a side effect of their own import. >> >>> >> >> >> >> If you?re already installing a third party module, isn?t installing >> >> a second third party module a pretty small amount of additional work? >> >> >> >> IOW, bundling pip made sense because of the bootstrapping problems, >> >> but once you have pip, directing people to depend on importlib2 isn?t >> >> very hard if they?re already installing hypotheticalthingthatwoulduseit? >> > >> > Aye, a docs-only approach could work, and would definitely be easier >> > to maintain. It may just be a matter of pushing in that direction on >> > the PyPA side of things, by considering "requires importlib2 on Python >> > 2.7" to be a reasonable requirement for getting some kinds of >> > operations to work smoothly. >> >> I'm not exactly sure I follow--the types of operations we're talking >> about involve installation of packages, and I can't go telling users >> "you have to install importlib2 in order for installation of these >> packages to not hose up your system" when they're installing >> distributions that have namespace packages. >> >> I think that this issue affects more than just `./setup.py develop` / >> `pip install --editable`. It also affects installation of namespace >> packages with or without --single-version-externally-managed (that is >> as eggs or not). Having importlib2 to natively support new-style >> namespace packages would be a boon, and pulling it in along with pip >> seems like it would be straightforward to support. >> >> Alternatively I would argue to just include it with setuptools. >> >> Erik >> > > If you?re installing using an actual package manager like pip, > easy_install, apt-get, yum, whatever the instructions won?t change. > You?ll need to ``pip install foo.bar`` and the fact it depends on > importlib2 is something the installer takes care of. Okay--it wasn't clear to me that you were suggesting adding importlib2 as an install_requires, but that would be fine. > You?d need to tell anyone who is using implicit namespace packages > that they need to depend on importlib2 on Python 2, and you?d need > to tell anyone doing a ?manual? installation (e.g. they download a > tarball from PyPI and install it manually) to get importlib2. Got it--that's pretty reasonable I think. Erik From encukou at gmail.com Sun Jul 26 12:39:21 2015 From: encukou at gmail.com (Petr Viktorin) Date: Sun, 26 Jul 2015 12:39:21 +0200 Subject: [Import-SIG] On singleton modules, heap types, and subinterpreters Message-ID: Hello, This is a follow-up to PEP 489 and discussions regarding per-module data PyState_FindModule. It turned out to be quite the rabbit hole. Apologies for the long mail, I hope it ends up sufficiently clear. Using single-phase initialization (the pre-PEP 489 solution), extension modules are effectively singletons ? there's up to one instance of a particular module in any given subinterpreter. Cython modules only allow one instance *per process*. Using the new multiple-phase init, one can create several modules from one PyModuleDef ? either (again) one per subinterpreter, or for testing purposes. As per the goal of PEP 489, this brings extension modules closer to how Python modules behave. The problem is that classes defined in a module don't have a reference to the module object. For example, the _csv module defines the classes "reader" and "Error". The code in "reader" needs to have access to "Error" in order to raise exceptions. (The Error class here is just an example of module state; things like _csv's global field size limit or Cython globals also need to be accessed from classes.) With the traditional single-phase init, to access module state, one can use PyState_FindModule, which queries a per-subinterpreter mapping of PyModuleDef to module object. This obviously assumes one module per subinterpreter, which is a limitation that PEP 489 currently avoids. Bringing this limitation back would probably be the easiest solution to the problem I'm describing here; this has been discussed in the form of "singleton modules" [0], and postponed in hopes of a better solution. So, what options are there for methods of extension classes to get a hold of the module object (or module state)? For static classes, it's not possible to store a reference module, because multiple modules can use a single static class. Also, static classes won't behave well with multiple subinterpreters: if an object of such a class is passed into a submodule that doesn't have the corresponding module object loaded, PyState_FindModule will fail. And PyState_FindModule failure tends to have very nasty consequences ? there's really not much you can do when you get a NULL, and most modules don't even check. And even if PyState_FindModule succeeds, in a "foreign" subinterpreter it arguably won't find the "correct" module instance. So, singleton modules (or the pre-PEP489 status quo) aren't a good answer. So it seems that extension modules that need per-module state need to use heap types. And the heap types need a reference to "their" module. And methods of those types need to be called with the class that defined them. This would be possible with regular methods. But, consider for example the tp_iternext signature: PyObject* myobj_iternext(PyObject *self) There's no good way for this function to get a reference to the class it belongs to. `Py_TYPE(self)` might be a subclass. The best way I can think of is walking the MRO until I get to a class with tp_iter (or a class created from "my" known PyType_Spec), but one of the requirements on module state is that it needs to be efficient, so I'd rather avoid walking a list. That's where I'm currently stuck. Does anyone have any ideas/comments on this problem? [0] https://mail.python.org/pipermail/import-sig/2015-April/000946.html From ncoghlan at gmail.com Sun Jul 26 14:50:37 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 26 Jul 2015 22:50:37 +1000 Subject: [Import-SIG] On singleton modules, heap types, and subinterpreters In-Reply-To: References: Message-ID: On 26 July 2015 at 20:39, Petr Viktorin wrote: > So it seems that extension modules that need per-module state need to > use heap types. And the heap types need a reference to "their" module. > And methods of those types need to be called with the class that > defined them. > This would be possible with regular methods. But, consider for example > the tp_iternext signature: > > PyObject* myobj_iternext(PyObject *self) > > There's no good way for this function to get a reference to the class > it belongs to. > `Py_TYPE(self)` might be a subclass. The best way I can think of is > walking the MRO until I get to a class with tp_iter (or a class > created from "my" known PyType_Spec), but one of the requirements on > module state is that it needs to be efficient, so I'd rather avoid > walking a list. > > That's where I'm currently stuck. Does anyone have any ideas/comments > on this problem? (I'm assuming I'm going to be retreading ground you've already covered in your own investigations here, but I want to make sure we're at least thinking along the same lines) Let's start by assuming the following constraints: * we can add new standard function signatures * we can add new calling convention flags * we *can't* change slot signatures Tackling the easy problem first, the new standard function signatures could be: PyObject* (*PyCMethod)(PyObject *module, PyObject *self, PyObject *args) PyObject* (*PyCMethodWithKeywords)(PyObject *module, PyObject *self, PyObject *args, PyObject *kwds) The new calling conventions would be METH_VARAGS_METHOD, METH_KEYWORDS_METHOD and METH_NOARGS_METHOD (probably implemented as a single new flag like METH_MODULE that these set). The key difference between the *_METHOD conventions and their existing PyCFunction counterparts is that when you use the latter for methods on a class, the class instance is passed in *instead of* the module reference, while with this change, methods on a class would receive the instance *in addition to* the module reference. To facilitate this, type objects would also need to gain a new __module__ attribute. Ignoring slots, extension modules written for Python 3.6+ could then just use the PyCFunction calling conventions for module level functions, and the new PyCMethod ones for actual methods on extension classes, and things should "just work". Extension modules (including Cython) that needed to maintain compatibility with older versions could implement wrappers that used PyState_FindModule to pass in the appropriate module name and use those in combination with single-phase initialisation on older versions that didn't support the new call signatures. For the slot case, where we can't change the function signature to accept the module object directly, I'm wondering if we could take a leaf out of the decimal module's book and define the notion of a thread local "active module", together with a way to automatically define slot wrappers that manage the active module. The latter might look something like: PyObject* PyType_FromSpecInModule(PyType_Spec* spec, PyModule* module, int* wrapped_slot_ids) With the following consequences: * the newly defined type would have its __module__ attribute set appropriately * the slots named in the NULL terminated "wrapped_slot_ids" array would be replaced with wrappers that pushed the given module onto the active module stack, called the function supplied in the type spec, and popped the active module off again (as a possible optimisation, there could potentially be a counter for how many times the currently active module had been pushed, rather than actually pushing the same pointer multiple times) That then gets us to your original hard question, which is "How would the slot wrappers look up the correct module?". There, I think the definition time "fixup_slots" operations in the type machinery may help: this is the code where the function pointers are copied from the base classes to the slots in the class currently being defined. If there was a way of flagging "module aware" slots at type definition time, then that same code (or an equivalent loop run later on) could be used to populate a mapping from slot IDs to the appropriate module object. The fastest and simplest way I can think of to do the module object lookup would be to have a C level PyObject* array keyed by the PyType_Slot slot IDs - finding the right module would then be a matter of having predefined wrappers for each slot that looked up the appropriate slot ID to get both the module to activate and the function pointer for the actual slot implementation. Any type defined using PyType_FromSpecInModule with a non-NULL "wrapped_slot_ids" would incur the same memory cost in terms of the size of the type object itself. Even though the memory hit for making an extension type module aware would be constant using that approach, the runtime speed hit would still only affect the specifically wrapped slots that were flagged as needing the active module state to be updated around the call. There'd be a lot of devils in the details of making such a scheme work, and we'd want to quantify the impact of converting a slot definition from a singleton implementation to a subinterpreter friendly implementation, but I'm not seeing anything fundamentally unworkable about the above approach. It makes me nervous from a maintainability perspective (typeobject.c and function calls are already hairy, and this would make both of them worse), but if the pay-off is substantially improved subinterpreter support, I think it will be worth it (especially if Eric is able to manage the trick of allowing subinterpreters to run concurrently on different cores) Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From encukou at gmail.com Sun Jul 26 15:49:10 2015 From: encukou at gmail.com (Petr Viktorin) Date: Sun, 26 Jul 2015 15:49:10 +0200 Subject: [Import-SIG] On singleton modules, heap types, and subinterpreters In-Reply-To: References: Message-ID: On Sun, Jul 26, 2015 at 2:50 PM, Nick Coghlan wrote: > On 26 July 2015 at 20:39, Petr Viktorin wrote: >> So it seems that extension modules that need per-module state need to >> use heap types. And the heap types need a reference to "their" module. >> And methods of those types need to be called with the class that >> defined them. >> This would be possible with regular methods. But, consider for example >> the tp_iternext signature: >> >> PyObject* myobj_iternext(PyObject *self) >> >> There's no good way for this function to get a reference to the class >> it belongs to. >> `Py_TYPE(self)` might be a subclass. The best way I can think of is >> walking the MRO until I get to a class with tp_iter (or a class >> created from "my" known PyType_Spec), but one of the requirements on >> module state is that it needs to be efficient, so I'd rather avoid >> walking a list. >> >> That's where I'm currently stuck. Does anyone have any ideas/comments >> on this problem? > > (I'm assuming I'm going to be retreading ground you've already covered > in your own investigations here, but I want to make sure we're at > least thinking along the same lines) > > Let's start by assuming the following constraints: > > * we can add new standard function signatures > * we can add new calling convention flags > * we *can't* change slot signatures > > Tackling the easy problem first, the new standard function signatures could be: > > PyObject* (*PyCMethod)(PyObject *module, PyObject *self, PyObject *args) > > PyObject* (*PyCMethodWithKeywords)(PyObject *module, PyObject > *self, PyObject *args, PyObject *kwds) > > The new calling conventions would be METH_VARAGS_METHOD, > METH_KEYWORDS_METHOD and METH_NOARGS_METHOD (probably implemented as a > single new flag like METH_MODULE that these set). > > The key difference between the *_METHOD conventions and their existing > PyCFunction counterparts is that when you use the latter for methods > on a class, the class instance is passed in *instead of* the module > reference, while with this change, methods on a class would receive > the instance *in addition to* the module reference. > > To facilitate this, type objects would also need to gain a new > __module__ attribute. > > Ignoring slots, extension modules written for Python 3.6+ could then > just use the PyCFunction calling conventions for module level > functions, and the new PyCMethod ones for actual methods on extension > classes, and things should "just work". Extension modules (including > Cython) that needed to maintain compatibility with older versions > could implement wrappers that used PyState_FindModule to pass in the > appropriate module name and use those in combination with single-phase > initialisation on older versions that didn't support the new call > signatures. Yes, that's pretty much what I had in mind when I said "This would be possible with regular methods" :) Rather than the module, I'd pass in the defining class, and letting the method look up __module__ itself. But that's pretty minor. > For the slot case, where we can't change the function signature to > accept the module object directly, I'm wondering if we could take a > leaf out of the decimal module's book and define the notion of a > thread local "active module", together with a way to automatically > define slot wrappers that manage the active module. The latter might > look something like: > > PyObject* PyType_FromSpecInModule(PyType_Spec* spec, PyModule* > module, int* wrapped_slot_ids) > > With the following consequences: > > * the newly defined type would have its __module__ attribute set appropriately > * the slots named in the NULL terminated "wrapped_slot_ids" array > would be replaced with wrappers that pushed the given module onto the > active module stack, called the function supplied in the type spec, > and popped the active module off again (as a possible optimisation, > there could potentially be a counter for how many times the currently > active module had been pushed, rather than actually pushing the same > pointer multiple times) > > That then gets us to your original hard question, which is "How would > the slot wrappers look up the correct module?". There, I think the > definition time "fixup_slots" operations in the type machinery may > help: this is the code where the function pointers are copied from the > base classes to the slots in the class currently being defined. If > there was a way of flagging "module aware" slots at type definition > time, then that same code (or an equivalent loop run later on) could > be used to populate a mapping from slot IDs to the appropriate module > object. > > The fastest and simplest way I can think of to do the module object > lookup would be to have a C level PyObject* array keyed by the > PyType_Slot slot IDs - finding the right module would then be a matter > of having predefined wrappers for each slot that looked up the > appropriate slot ID to get both the module to activate and the > function pointer for the actual slot implementation. Any type defined > using PyType_FromSpecInModule with a non-NULL "wrapped_slot_ids" would > incur the same memory cost in terms of the size of the type object > itself. > > Even though the memory hit for making an extension type module aware > would be constant using that approach, the runtime speed hit would > still only affect the specifically wrapped slots that were flagged as > needing the active module state to be updated around the call. > > There'd be a lot of devils in the details of making such a scheme > work, and we'd want to quantify the impact of converting a slot > definition from a singleton implementation to a subinterpreter > friendly implementation, but I'm not seeing anything fundamentally > unworkable about the above approach. It makes me nervous from a > maintainability perspective (typeobject.c and function calls are > already hairy, and this would make both of them worse), but if the > pay-off is substantially improved subinterpreter support, I think it > will be worth it (especially if Eric is able to manage the trick of > allowing subinterpreters to run concurrently on different cores) That does sound doable, even if it is a pretty arcane workaround. It should at least do as a proof of concept, to allow exploring this space further. Thank you! From stefan_ml at behnel.de Mon Jul 27 08:44:08 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 27 Jul 2015 08:44:08 +0200 Subject: [Import-SIG] On singleton modules, heap types, and subinterpreters In-Reply-To: References: Message-ID: Petr Viktorin schrieb am 26.07.2015 um 12:39: > This is a follow-up to PEP 489 and discussions regarding per-module > data PyState_FindModule. > It turned out to be quite the rabbit hole. Apologies for the long > mail, I hope it ends up sufficiently clear. > > Using single-phase initialization (the pre-PEP 489 solution), > extension modules are effectively singletons ? there's up to one > instance of a particular module in any given subinterpreter. Cython > modules only allow one instance *per process*. > > Using the new multiple-phase init, one can create several modules from > one PyModuleDef ? either (again) one per subinterpreter, or for > testing purposes. As per the goal of PEP 489, this brings extension > modules closer to how Python modules behave. > > The problem is that classes defined in a module don't have a reference > to the module object. > [lots of of other tricky details stripped] Sorry for cutting it short here, but isn't this a hint that linking the new initialisation to subinterpreter support might not be a good idea? I mean, there are a couple of advantages of the new initialisation scheme, e.g. for relative imports etc., which are completely unrelated to subinterpreters. And yet, the PEP suggests that supporting the new module setup scheme should indicate that subinterpreters are supported as well. Given how complex the support seems to be for any non-trivial module, linking the two use cases might end up preventing us from getting any benefit at all out of this for quite some time. Stefan From ncoghlan at gmail.com Mon Jul 27 16:07:30 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 28 Jul 2015 00:07:30 +1000 Subject: [Import-SIG] On singleton modules, heap types, and subinterpreters In-Reply-To: References: Message-ID: On 27 July 2015 at 16:44, Stefan Behnel wrote: > Petr Viktorin schrieb am 26.07.2015 um 12:39: >> This is a follow-up to PEP 489 and discussions regarding per-module >> data PyState_FindModule. >> It turned out to be quite the rabbit hole. Apologies for the long >> mail, I hope it ends up sufficiently clear. >> >> Using single-phase initialization (the pre-PEP 489 solution), >> extension modules are effectively singletons ? there's up to one >> instance of a particular module in any given subinterpreter. Cython >> modules only allow one instance *per process*. >> >> Using the new multiple-phase init, one can create several modules from >> one PyModuleDef ? either (again) one per subinterpreter, or for >> testing purposes. As per the goal of PEP 489, this brings extension >> modules closer to how Python modules behave. >> >> The problem is that classes defined in a module don't have a reference >> to the module object. >> [lots of of other tricky details stripped] > > Sorry for cutting it short here, but isn't this a hint that linking the new > initialisation to subinterpreter support might not be a good idea? I mean, > there are a couple of advantages of the new initialisation scheme, e.g. for > relative imports etc., which are completely unrelated to subinterpreters. > And yet, the PEP suggests that supporting the new module setup scheme > should indicate that subinterpreters are supported as well. Given how > complex the support seems to be for any non-trivial module, linking the two > use cases might end up preventing us from getting any benefit at all out of > this for quite some time. It shouldn't be too complicated when there aren't any custom slot implementations involved - regular methods would just get a slightly different signature where the defining class is also passed in. However, I still suspect you're right that we'll end up wanting to offer "singleton mode" to allow folks to do a two step upgrade of extension modules, first to multi-phase initialisation, and then to supporting subinterpreters and reloading. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Tue Jul 28 15:52:46 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 28 Jul 2015 15:52:46 +0200 Subject: [Import-SIG] On singleton modules, heap types, and subinterpreters References: Message-ID: <20150728155246.77fdcfe9@fsol> On Sun, 26 Jul 2015 12:39:21 +0200 Petr Viktorin wrote: > > So, what options are there for methods of extension classes to get a > hold of the module object (or module state)? The "obvious" answer is to use the C equivalent of `sys.modules[__name__]`. It should always do the right thing. However, it's also quite inefficient and cumbersome to write. Regards Antoine. From encukou at gmail.com Tue Jul 28 19:54:29 2015 From: encukou at gmail.com (Petr Viktorin) Date: Tue, 28 Jul 2015 19:54:29 +0200 Subject: [Import-SIG] On singleton modules, heap types, and subinterpreters In-Reply-To: References: Message-ID: On Mon, Jul 27, 2015 at 4:07 PM, Nick Coghlan wrote: > On 27 July 2015 at 16:44, Stefan Behnel wrote: >> Petr Viktorin schrieb am 26.07.2015 um 12:39: >>> This is a follow-up to PEP 489 and discussions regarding per-module >>> data PyState_FindModule. >>> It turned out to be quite the rabbit hole. Apologies for the long >>> mail, I hope it ends up sufficiently clear. >>> >>> Using single-phase initialization (the pre-PEP 489 solution), >>> extension modules are effectively singletons ? there's up to one >>> instance of a particular module in any given subinterpreter. Cython >>> modules only allow one instance *per process*. >>> >>> Using the new multiple-phase init, one can create several modules from >>> one PyModuleDef ? either (again) one per subinterpreter, or for >>> testing purposes. As per the goal of PEP 489, this brings extension >>> modules closer to how Python modules behave. >>> >>> The problem is that classes defined in a module don't have a reference >>> to the module object. >>> [lots of of other tricky details stripped] >> >> Sorry for cutting it short here, but isn't this a hint that linking the new >> initialisation to subinterpreter support might not be a good idea? I mean, >> there are a couple of advantages of the new initialisation scheme, e.g. for >> relative imports etc., which are completely unrelated to subinterpreters. >> And yet, the PEP suggests that supporting the new module setup scheme >> should indicate that subinterpreters are supported as well. Given how >> complex the support seems to be for any non-trivial module, linking the two >> use cases might end up preventing us from getting any benefit at all out of >> this for quite some time. > > It shouldn't be too complicated when there aren't any custom slot > implementations involved - regular methods would just get a slightly > different signature where the defining class is also passed in. > > However, I still suspect you're right that we'll end up wanting to > offer "singleton mode" to allow folks to do a two step upgrade of > extension modules, first to multi-phase initialisation, and then to > supporting subinterpreters and reloading. Right, I'm starting to regret dropping the singleton flag so easily. But, for better or for worse, both singleton modules and a PyState_FindModule replacement is now material for 3.6. From encukou at gmail.com Tue Jul 28 19:48:23 2015 From: encukou at gmail.com (Petr Viktorin) Date: Tue, 28 Jul 2015 19:48:23 +0200 Subject: [Import-SIG] On singleton modules, heap types, and subinterpreters In-Reply-To: <20150728155246.77fdcfe9@fsol> References: <20150728155246.77fdcfe9@fsol> Message-ID: On Tue, Jul 28, 2015 at 3:52 PM, Antoine Pitrou wrote: > On Sun, 26 Jul 2015 12:39:21 +0200 > Petr Viktorin wrote: >> >> So, what options are there for methods of extension classes to get a >> hold of the module object (or module state)? > > The "obvious" answer is to use the C equivalent of > `sys.modules[__name__]`. It should always do the right thing. However, > it's also quite inefficient and cumbersome to write. We don't have access to globals, so we can't use __name__. We can't use a hardcoded module name either: the fully-qualified name depends on the loading mechanism and/or location of the shared library in the filesystem. (Extension modules usually only hard-code the last part of the module name, without the package prefix.) Another problem is that "sys.modules" can be modified by the user. For Python programs, where you just get a LookupError when something isn't found, looking things up in sys.modules is OK. But here we're retrieving C-level state: when the wrong module isn't found (or we find nothing at all), a segfault is one of the better things that can happen. From ericsnowcurrently at gmail.com Wed Jul 29 19:53:49 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 29 Jul 2015 11:53:49 -0600 Subject: [Import-SIG] On singleton modules, heap types, and subinterpreters In-Reply-To: References: Message-ID: On Jul 26, 2015 4:39 AM, "Petr Viktorin" wrote: > So it seems that extension modules that need per-module state need to > use heap types. And the heap types need a reference to "their" module. > And methods of those types need to be called with the class that > defined them. > This would be possible with regular methods. But, consider for example > the tp_iternext signature: > > PyObject* myobj_iternext(PyObject *self) > > There's no good way for this function to get a reference to the class > it belongs to. > `Py_TYPE(self)` might be a subclass. The best way I can think of is > walking the MRO until I get to a class with tp_iter (or a class > created from "my" known PyType_Spec), but one of the requirements on > module state is that it needs to be efficient, so I'd rather avoid > walking a list. One thing I've considered for several years now, and perhaps even proposed at some point (around PEP 451?), is adding "__origin__" to objects, indicating where the object came from. "Where" would be the object (or its qualname?) associated with the scope in which the first object was created. For example, for classes this would be the module (or class/func for nested ones). Likewise, the class for methods. Something like __origin__ would help make the actual class/module explicit. I expect it would be sufficiently efficient. __qualname__ gets you something similar but less efficiently. Is __qualname__ set for extension types/functions? Note that __origin__ provides other non-import benefits as well. An alternative, could the module intra-dependencies be bound where needed? For example, with _csv could Error be added to reader.__dict__ (i.e. bound to reader)? -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From encukou at gmail.com Wed Jul 29 20:05:38 2015 From: encukou at gmail.com (Petr Viktorin) Date: Wed, 29 Jul 2015 20:05:38 +0200 Subject: [Import-SIG] On singleton modules, heap types, and subinterpreters In-Reply-To: References: Message-ID: On Wed, Jul 29, 2015 at 7:53 PM, Eric Snow wrote: > > On Jul 26, 2015 4:39 AM, "Petr Viktorin" wrote: >> So it seems that extension modules that need per-module state need to >> use heap types. And the heap types need a reference to "their" module. >> And methods of those types need to be called with the class that >> defined them. >> This would be possible with regular methods. But, consider for example >> the tp_iternext signature: >> >> PyObject* myobj_iternext(PyObject *self) >> >> There's no good way for this function to get a reference to the class >> it belongs to. >> `Py_TYPE(self)` might be a subclass. The best way I can think of is >> walking the MRO until I get to a class with tp_iter (or a class >> created from "my" known PyType_Spec), but one of the requirements on >> module state is that it needs to be efficient, so I'd rather avoid >> walking a list. > > One thing I've considered for several years now, and perhaps even proposed > at some point (around PEP 451?), is adding "__origin__" to objects, > indicating where the object came from. "Where" would be the object (or its > qualname?) associated with the scope in which the first object was created. > For example, for classes this would be the module (or class/func for nested > ones). Likewise, the class for methods. > > Something like __origin__ would help make the actual class/module explicit. > I expect it would be sufficiently efficient. __qualname__ gets you > something similar but less efficiently. Is __qualname__ set for extension > types/functions? Note that __origin__ provides other non-import benefits as > well. One thing to watch out is reference cycles. Having a hard ref to a module is fine; modules aren't designed to be unloaded frequently so having them only collected by a full GC run is OK. But I think having each nested function link to the outside function would create too many reference cycles. And it would keep the outer function alive for the lifetime of the inner one. > An alternative, could the module intra-dependencies be bound where needed? > For example, with _csv could Error be added to reader.__dict__ (i.e. bound > to reader)? Putting a reference to the module on *classes* is not a problem. Getting a reference to the module from normal methods should also not be that hard. The hard part is special methods, like tp_iter, which only get "self" as an argument, at the C level. There's no information about which (super)class the method is defined in. From ericsnowcurrently at gmail.com Wed Jul 29 20:57:21 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 29 Jul 2015 12:57:21 -0600 Subject: [Import-SIG] On singleton modules, heap types, and subinterpreters In-Reply-To: References: Message-ID: On Jul 29, 2015 12:05 PM, "Petr Viktorin" wrote: > > On Wed, Jul 29, 2015 at 7:53 PM, Eric Snow wrote: > > An alternative, could the module intra-dependencies be bound where needed? > > For example, with _csv could Error be added to reader.__dict__ (i.e. bound > > to reader)? > > Putting a reference to the module on *classes* is not a problem. > Getting a reference to the module from normal methods should also not > be that hard. > > The hard part is special methods, like tp_iter, which only get "self" > as an argument, at the C level. There's no information about which > (super)class the method is defined in. The slot methods would have to do `type(self).<"global">`, which is what any other method would do. So in the relevant _csv.reader methods we would use the equivalent "type(self).Error" where needed. Let me make sure I understand the problem before I say anything more. :) In Python each function has __globals__ and __closure__ that provide externally scoped objects for use within the function. In C extension functions neither exists. Is that right? (Or is that sort of what PyState_FindModule is supposed to facilitate?) If so then that's similar to how, in Python, objects in class definitions are not exposed to other objects in the same definition. The following result in NameErrors: class Spam: class Ham: OKAY = True class Eggs: OKAY = Ham.OKAY class Counter: DEFAULT = 0 def next(self): try: return self.count except AttributeError: DEFAULT Counter().next() The current solution for both Python class definition scope and C extension functions is basically the same then, right? Since you can't rely on a lookup mechanism you must explicitly bind the objects to places where they can be accessed. In the case of methods (including type slots), the module-scoped ("global") objects would have to be bound to the class (where the methods will have access to them through self). Sure, subclasses could override the class attrs, but that shouldn't be a problem. However, it sounds like you're suggesting that PyState_FindModule should be fixed, replaced, or supplemented, assuming I've understood correctly that it's part of the problem here. I'd say it depends on the actual impact of the lack of implicit Python-level scoping/lookup in C extension functions/methods. -eric From encukou at gmail.com Wed Jul 29 22:01:42 2015 From: encukou at gmail.com (Petr Viktorin) Date: Wed, 29 Jul 2015 22:01:42 +0200 Subject: [Import-SIG] On singleton modules, heap types, and subinterpreters In-Reply-To: References: Message-ID: On Wed, Jul 29, 2015 at 8:57 PM, Eric Snow wrote: > On Jul 29, 2015 12:05 PM, "Petr Viktorin" wrote: >> >> On Wed, Jul 29, 2015 at 7:53 PM, Eric Snow wrote: >> > An alternative, could the module intra-dependencies be bound where needed? >> > For example, with _csv could Error be added to reader.__dict__ (i.e. bound >> > to reader)? >> >> Putting a reference to the module on *classes* is not a problem. >> Getting a reference to the module from normal methods should also not >> be that hard. >> >> The hard part is special methods, like tp_iter, which only get "self" >> as an argument, at the C level. There's no information about which >> (super)class the method is defined in. > > The slot methods would have to do `type(self).<"global">`, which is > what any other method would do. So in the relevant _csv.reader > methods we would use the equivalent "type(self).Error" where needed. We need the class that defines the method. type(self) might return a subclass of that. So we either need to walk the MRO until the defining class is found, or use Nick's mechanism and record the defining class for each relevant special method. > Let me make sure I understand the problem before I say anything more. > :) In Python each function has __globals__ and __closure__ that > provide externally scoped objects for use within the function. In C > extension functions neither exists. Is that right? (Or is that sort > of what PyState_FindModule is supposed to facilitate?) For extension functions, you can choose from a variety of calling mechanisms that evolved over the years: positional args only vs. keyword args; and functions/normal methods/class methods depending on the "special" argument (instance/class) they get. Adding an extra class argument wouldn't be too much of a problem, and neither would populating it. The problem is with special methods, like tp_iter whose signature is: PyObject *iter(PyObject *self) with no way to pass in the defining class or module. As I said above, type(self) will not always give the class where tp_iter was defined ? and we need the module of the defining class. That is the missing link. Getting a module reference to everything except the special methods is relatively straightforward. [...] > However, it sounds like you're suggesting that PyState_FindModule > should be fixed, replaced, or supplemented, assuming I've understood > correctly that it's part of the problem here. I'd say it depends on > the actual impact of the lack of implicit Python-level scoping/lookup > in C extension functions/methods. PyState_FindModule bypasses the need for passing a module reference around: you give it a PyModuleDef (static data from which a module is constructed), and from that it looks up the corresponding module in current subinterpreter state. This assumes the same set of modules is loaded in every subinterpreter. (Or at least that the required module is loaded in the current subinterpreter.) If we want to support subinterpreters safely, PyState_FindModule must go. From ericsnowcurrently at gmail.com Wed Jul 29 23:00:21 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 29 Jul 2015 15:00:21 -0600 Subject: [Import-SIG] On singleton modules, heap types, and subinterpreters In-Reply-To: References: Message-ID: On Wed, Jul 29, 2015 at 2:01 PM, Petr Viktorin wrote: > On Wed, Jul 29, 2015 at 8:57 PM, Eric Snow wrote: >> The slot methods would have to do `type(self).<"global">`, which is >> what any other method would do. So in the relevant _csv.reader >> methods we would use the equivalent "type(self).Error" where needed. > > We need the class that defines the method. type(self) might return a > subclass of that. So we either need to walk the MRO until the defining > class is found, or use Nick's mechanism and record the defining class > for each relevant special method. If you explicitly bind the module-scoped object to the class that needs it, then the methods of that class can access it. There's no need for anything more complicated (even for type slot methods). In Python it would look like this: class Error(Exception): ... class Spam: Error = Error def fail(self): # Look up Error on Spam instead of from globals(). raise type(self).Error() We could even get more generic about it: class Spam: __globals__ = globals() def fail(self): raise self.__globals__["Error"]() Obviously it's not that simple if we are trying to provide an implicit Python-style scoping lookup for C extension functions/methods... >> However, it sounds like you're suggesting that PyState_FindModule >> should be fixed, replaced, or supplemented, assuming I've understood >> correctly that it's part of the problem here. I'd say it depends on >> the actual impact of the lack of implicit Python-level scoping/lookup >> in C extension functions/methods. > > PyState_FindModule bypasses the need for passing a module reference > around: you give it a PyModuleDef (static data from which a module is > constructed), and from that it looks up the corresponding module in > current subinterpreter state. > This assumes the same set of modules is loaded in every > subinterpreter. (Or at least that the required module is loaded in the > current subinterpreter.) > If we want to support subinterpreters safely, PyState_FindModule must go. Hence we *are* looking for any alternative lookup mechanism (effectively an equivalent to Python's scoping lookup). So my question is, is it worth it relative to C extension functions/methods? -eric From ncoghlan at gmail.com Thu Jul 30 05:30:46 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 30 Jul 2015 13:30:46 +1000 Subject: [Import-SIG] On singleton modules, heap types, and subinterpreters In-Reply-To: References: Message-ID: On 30 July 2015 at 07:00, Eric Snow wrote: > On Wed, Jul 29, 2015 at 2:01 PM, Petr Viktorin wrote: >> On Wed, Jul 29, 2015 at 8:57 PM, Eric Snow wrote: >>> The slot methods would have to do `type(self).<"global">`, which is >>> what any other method would do. So in the relevant _csv.reader >>> methods we would use the equivalent "type(self).Error" where needed. >> >> We need the class that defines the method. type(self) might return a >> subclass of that. So we either need to walk the MRO until the defining >> class is found, or use Nick's mechanism and record the defining class >> for each relevant special method. > > If you explicitly bind the module-scoped object to the class that > needs it, then the methods of that class can access it. This subsequent discussion meant I realised that with the reverse lookup from slot method -> defining class stored on type objects, and a reference to the defining module similarly stored on type objects, then we can dispense entirely with my more complex "active module" idea - the slot implementation will always have access to the type, and could traverse from there to the defining module as needed. Now, the *reason* we want to enable access to the defining module is because we want to minimise the barrier to migrating from singleton extension modules to subinterpreter friendly modules, which means providing a way to do a fast lookup of the defining module given only a slot ID and a type instance. For full compatibility, we can't get away without offering *some* way of doing that - consider cases where folks want to allow rebinding (rather than mutation) of module level attributes and have that affect the behaviour of special methods. However, we also want to offer a compelling replacement for caching things in C level static variables. Eric's suggestion made me realise there may be a better way to go about addressing those *performance* related aspects of this problem: what if instead of focusing on providing fast access to the Python level module object, we instead focused on providing an indexed *PyObject pointer cache* on type instances for use by extension module method implementations? A "__typeslots__" as it were? If __typeslots__ was defined, it would be a tuple of slot field names. The named fields at the Python level would become descriptors accessing indexed slots in an underlying C level array of PyObject pointers (unlike instance slots, we could allocate this array separately from the type object, since type objects are already quite sprawling beasts, but don't generally exist in the kinds of numbers that instances do). Then if a particular extension module needs fast access to the module, or to any module attribute (and didn't need to worry about lazy lookup), then they could cache it in a typeslot on first use, and thereafter look it up by index. That way, we could walk the type hierarchy to find the defining class on first look up, and then cache it on the derived type when done. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From encukou at gmail.com Thu Jul 30 10:01:53 2015 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 30 Jul 2015 10:01:53 +0200 Subject: [Import-SIG] On singleton modules, heap types, and subinterpreters In-Reply-To: References: Message-ID: On Wed, Jul 29, 2015 at 11:00 PM, Eric Snow wrote: > On Wed, Jul 29, 2015 at 2:01 PM, Petr Viktorin wrote: >> On Wed, Jul 29, 2015 at 8:57 PM, Eric Snow wrote: >>> The slot methods would have to do `type(self).<"global">`, which is >>> what any other method would do. So in the relevant _csv.reader >>> methods we would use the equivalent "type(self).Error" where needed. >> >> We need the class that defines the method. type(self) might return a >> subclass of that. So we either need to walk the MRO until the defining >> class is found, or use Nick's mechanism and record the defining class >> for each relevant special method. > > If you explicitly bind the module-scoped object to the class that > needs it, then the methods of that class can access it. There's no > need for anything more complicated (even for type slot methods). In > Python it would look like this: > > class Error(Exception): > ... > > class Spam: > Error = Error > def fail(self): > # Look up Error on Spam instead of from globals(). > raise type(self).Error() > > We could even get more generic about it: > > class Spam: > __globals__ = globals() > def fail(self): > raise self.__globals__["Error"]() > > Obviously it's not that simple if we are trying to provide an implicit > Python-style scoping lookup for C extension functions/methods... The problem is here: base.py: class Error(Exception): "Base error" class Spam: __globals__ = globals() def fail(self): raise self.__globals__["Error"]() other.py: import .base Error = "different error" class Eggs(base.Spam): __globals__ = globals() Eggs().fail() Some kind of namespacing is needed here ? fail() needs to know that it was defined in Spam, not Eggs. This information is not passed to special methods when they're called, and their signatures can't be extended. Also, module state lookup needs to be fast ? MRO walking or dict lookup would probably be too slow. >>> However, it sounds like you're suggesting that PyState_FindModule >>> should be fixed, replaced, or supplemented, assuming I've understood >>> correctly that it's part of the problem here. I'd say it depends on >>> the actual impact of the lack of implicit Python-level scoping/lookup >>> in C extension functions/methods. >> >> PyState_FindModule bypasses the need for passing a module reference >> around: you give it a PyModuleDef (static data from which a module is >> constructed), and from that it looks up the corresponding module in >> current subinterpreter state. >> This assumes the same set of modules is loaded in every >> subinterpreter. (Or at least that the required module is loaded in the >> current subinterpreter.) >> If we want to support subinterpreters safely, PyState_FindModule must go. > > Hence we *are* looking for any alternative lookup mechanism > (effectively an equivalent to Python's scoping lookup). So my > question is, is it worth it relative to C extension functions/methods? > > -eric From encukou at gmail.com Thu Jul 30 10:13:49 2015 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 30 Jul 2015 10:13:49 +0200 Subject: [Import-SIG] On singleton modules, heap types, and subinterpreters In-Reply-To: References: Message-ID: On Thu, Jul 30, 2015 at 5:30 AM, Nick Coghlan wrote: > On 30 July 2015 at 07:00, Eric Snow wrote: >> On Wed, Jul 29, 2015 at 2:01 PM, Petr Viktorin wrote: >>> On Wed, Jul 29, 2015 at 8:57 PM, Eric Snow wrote: >>>> The slot methods would have to do `type(self).<"global">`, which is >>>> what any other method would do. So in the relevant _csv.reader >>>> methods we would use the equivalent "type(self).Error" where needed. >>> >>> We need the class that defines the method. type(self) might return a >>> subclass of that. So we either need to walk the MRO until the defining >>> class is found, or use Nick's mechanism and record the defining class >>> for each relevant special method. >> >> If you explicitly bind the module-scoped object to the class that >> needs it, then the methods of that class can access it. > > This subsequent discussion meant I realised that with the reverse > lookup from slot method -> defining class stored on type objects, and > a reference to the defining module similarly stored on type objects, > then we can dispense entirely with my more complex "active module" > idea - the slot implementation will always have access to the type, > and could traverse from there to the defining module as needed. > > Now, the *reason* we want to enable access to the defining module is > because we want to minimise the barrier to migrating from singleton > extension modules to subinterpreter friendly modules, which means > providing a way to do a fast lookup of the defining module given only > a slot ID and a type instance. > > For full compatibility, we can't get away without offering *some* way > of doing that - consider cases where folks want to allow rebinding > (rather than mutation) of module level attributes and have that affect > the behaviour of special methods. > > However, we also want to offer a compelling replacement for caching > things in C level static variables. > > Eric's suggestion made me realise there may be a better way to go > about addressing those *performance* related aspects of this problem: > what if instead of focusing on providing fast access to the Python > level module object, we instead focused on providing an indexed > *PyObject pointer cache* on type instances for use by extension module > method implementations? A "__typeslots__" as it were? > > If __typeslots__ was defined, it would be a tuple of slot field names. > The named fields at the Python level would become descriptors > accessing indexed slots in an underlying C level array of PyObject > pointers (unlike instance slots, we could allocate this array > separately from the type object, since type objects are already quite > sprawling beasts, but don't generally exist in the kinds of numbers > that instances do). > > Then if a particular extension module needs fast access to the module, > or to any module attribute (and didn't need to worry about lazy > lookup), then they could cache it in a typeslot on first use, and > thereafter look it up by index. > > That way, we could walk the type hierarchy to find the defining class > on first look up, and then cache it on the derived type when done. It sounds classes with index-based __typeslots__ would be incompatible for multiple inheritance. We're trying to help Cython classes be more like Python ones, so I think that's something to watch out for. Anyway, it's a good idea. But it's an addition; the original problem needs to be solved anyway :(