From markflorisson88 at gmail.com Sun Oct 2 12:38:23 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 2 Oct 2011 11:38:23 +0100 Subject: [Cython] buffer bug Message-ID: Hey, I'm unable to login in trac, but I found a bug in the buffer support: cimport cython cimport numpy as np @cython.boundscheck(False) @cython.wraparound(False) cdef void func(np.ndarray[np.float32_t, ndim=2] a) nogil: pass This calls __Pyx_GetBufferAndValidate, which needs the GIL. When I get the last failing tests fixed (introduced after rebasing on the latest master) for memoryviews, should be transform the current buffer support to memoryviews before doing a release? The only incompatibility I see is that readonly buffers are not supported. On the other hand it might be a good idea to wait with that, in case there are any bugs. We don't want to break everyone's existing code. Opinions? Mark From d.s.seljebotn at astro.uio.no Sun Oct 2 13:04:26 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sun, 02 Oct 2011 13:04:26 +0200 Subject: [Cython] buffer bug In-Reply-To: References: Message-ID: <4E88453A.6090702@astro.uio.no> On 10/02/2011 12:38 PM, mark florisson wrote: > Hey, > > I'm unable to login in trac, but I found a bug in the buffer support: > > cimport cython > cimport numpy as np > > @cython.boundscheck(False) > @cython.wraparound(False) > cdef void func(np.ndarray[np.float32_t, ndim=2] a) nogil: > pass > > This calls __Pyx_GetBufferAndValidate, which needs the GIL. Hmm. I thought buffers were disallowed as arguments to cdef functions? > When I get the last failing tests fixed (introduced after rebasing on > the latest master) for memoryviews, should be transform the current > buffer support to memoryviews before doing a release? The only > incompatibility I see is that readonly buffers are not supported. Do you mean readonly memoryviews? I'm not sure how much of an issue it is. NumPy arrays support being readonly, but it is not straightforward to make a NumPy array so. Eventually I guess "const int[:]" should be supported; one could do so even without allowing const anywhere else. > On the other hand it might be a good idea to wait with that, in case > there are any bugs. We don't want to break everyone's existing code. > Opinions? I think this is mostly a question of how much time you have to work on it. Transforming buffer support into memoryviews would be a new feature branch, and whether that branch is merged into next release depends on the timing for the next release I'd say. I don't think a new release has to happen in the meantime, if you want to make it before, all the better! Dag Sverre From markflorisson88 at gmail.com Sun Oct 2 13:13:05 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 2 Oct 2011 12:13:05 +0100 Subject: [Cython] buffer bug In-Reply-To: <4E88453A.6090702@astro.uio.no> References: <4E88453A.6090702@astro.uio.no> Message-ID: On 2 October 2011 12:04, Dag Sverre Seljebotn wrote: > On 10/02/2011 12:38 PM, mark florisson wrote: >> >> Hey, >> >> I'm unable to login in trac, but I found a bug in the buffer support: >> >> cimport cython >> cimport numpy as np >> >> @cython.boundscheck(False) >> @cython.wraparound(False) >> cdef void func(np.ndarray[np.float32_t, ndim=2] a) nogil: >> ? ? pass >> >> This calls __Pyx_GetBufferAndValidate, which needs the GIL. > > Hmm. I thought buffers were disallowed as arguments to cdef functions? Ah, perhaps they are, but I didn't get any error message. >> When I get the last failing tests fixed (introduced after rebasing on >> the latest master) for memoryviews, should be transform the current >> buffer support to memoryviews before doing a release? The only >> incompatibility I see is that readonly buffers are not supported. > > Do you mean readonly memoryviews? > > I'm not sure how much of an issue it is. NumPy arrays support being > readonly, but it is not straightforward to make a NumPy array so. > > Eventually I guess "const int[:]" should be supported; one could do so even > without allowing const anywhere else. Right, readonly memoryviews, I think the current buffer support supports not asking for a buffer with PyBUF_WRITABLE (when there is no item assignment in that function). Memoryviews cannot make the same assumption because they can be passed anywhere without requesting a new buffer, so they always insert PyBUF_WRITABLE in the flags when requesting the buffer. >> On the other hand it might be a good idea to wait with that, in case >> there are any bugs. We don't want to break everyone's existing code. >> Opinions? > > I think this is mostly a question of how much time you have to work on it. > Transforming buffer support into memoryviews would be a new feature branch, > and whether that branch is merged into next release depends on the timing > for the next release I'd say. I don't think a new release has to happen in > the meantime, if you want to make it before, all the better! > > Dag Sverre > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > Ok, sounds good. Let's see what happens, I'm probably going to be quite busy, but the weather forecast also mentioned rain... :) From vitja.makarov at gmail.com Sun Oct 2 19:52:23 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Sun, 2 Oct 2011 21:52:23 +0400 Subject: [Cython] CyFunction refactoring plan In-Reply-To: References: <4E8556E7.7050007@behnel.de> Message-ID: 2011/9/30 mark florisson : > On 30 September 2011 07:47, Vitja Makarov wrote: >> 2011/9/30 Vitja Makarov : >>> 2011/9/30 Robert Bradshaw : >>>> On Thu, Sep 29, 2011 at 10:43 PM, Stefan Behnel wrote: >>>>> Vitja Makarov, 30.09.2011 06:41: >>>>>> >>>>>> 2011/9/28 Vitja Makarov: >>>>>>> >>>>>>> I tried to build simple plan for ongoing cython function refactoring >>>>>>> >>>>>>> * Replace assignment synthesis with SingleAssignmentNode, where LHS is >>>>>>> NameNode and RHS is PyCFunctionNode >>>>>>> * Split function body into python wrapper and C function >>>>>>> http://wiki.cython.org/enhancements/generators#Pythonfunctionrefactoring >>>>>>> >>>>>>> Then we can implement some features and optimizations: >>>>>>> >>>>>>> * Reduce difference between cdef and def functions >>>>>>> * Store runtime evaluated default values inside CyFunction, ticket #674 >>>>>>> * Implement no-args super(), ticket #696 >>>>>>> * Function call inlining >>>>>> >>>>>> If nobody don't mind I would start with first one. >>>> >>>> I would love to see this happen. >>>> >>>>> Please go ahead. :) >>>>> >>>>> Note that you will encounter some problems when enabling name assignments >>>>> for all named functions. I tried that at least once and it "didn't work", >>>>> but I didn't take the time yet to investigate them further. >>>>> >>>>> I assume you are going to work on this in your own repo? >>>> >>>> Please also coordinate with Mark's work on function dispatching for >>>> fused types. >>>> >>> >>> I assume that that fused type functions are cdef ones so I think that >>> should be easy to merge. >>> On the other hand it's better to have Mark's branch merged into master. >>> >>> Mark, what is the state of your fused types branch? >>> Is it possible to break it into smaller parts to ease reviewing and merging? >>> >> >> It seems I meant memview branch not fusedtypes. > > There are 2 pending branches, _memview_rebase, which has support for > memoryviews, and fusedtypes. The former is ready for merge, it's > waiting to be reviewed. The fused types branch needs to subclass > CyFunction (it basically modified the old binding function). There was > also some duplicate functionality there, so I thought it'd be easier > and more convenient to use the utility code loading there. > > Since it's not a strict dependency and it will be blocking progress, I > will try to find some time to get it merge-ready for master. > > But no, it does cdef, cpdef and def methods, and it has some changes > to all function nodes (FuncdefNode, CFuncdefNode and DefNode). These > changes shouldn't be major though, but the logic in FusedFuncdefNode > does differentiate between all the different functions in order to > support them. Feel free to ask me about specifics any time. > I've moved def node assignment synthesis into DefNodeAssignmentSynthesis transformation. https://github.com/vitek/cython/commit/efacfed3c9cd8216b6c2100073a9df809b76675a Instead of moving defnode into PyCFunctionNode I've inserted assignment statement right after defnode. This is much more easy and seems ok to me. -- vitja. From markflorisson88 at gmail.com Sun Oct 2 20:21:40 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 2 Oct 2011 19:21:40 +0100 Subject: [Cython] CyFunction refactoring plan In-Reply-To: References: <4E8556E7.7050007@behnel.de> Message-ID: On 2 October 2011 18:52, Vitja Makarov wrote: > 2011/9/30 mark florisson : >> On 30 September 2011 07:47, Vitja Makarov wrote: >>> 2011/9/30 Vitja Makarov : >>>> 2011/9/30 Robert Bradshaw : >>>>> On Thu, Sep 29, 2011 at 10:43 PM, Stefan Behnel wrote: >>>>>> Vitja Makarov, 30.09.2011 06:41: >>>>>>> >>>>>>> 2011/9/28 Vitja Makarov: >>>>>>>> >>>>>>>> I tried to build simple plan for ongoing cython function refactoring >>>>>>>> >>>>>>>> * Replace assignment synthesis with SingleAssignmentNode, where LHS is >>>>>>>> NameNode and RHS is PyCFunctionNode >>>>>>>> * Split function body into python wrapper and C function >>>>>>>> http://wiki.cython.org/enhancements/generators#Pythonfunctionrefactoring >>>>>>>> >>>>>>>> Then we can implement some features and optimizations: >>>>>>>> >>>>>>>> * Reduce difference between cdef and def functions >>>>>>>> * Store runtime evaluated default values inside CyFunction, ticket #674 >>>>>>>> * Implement no-args super(), ticket #696 >>>>>>>> * Function call inlining >>>>>>> >>>>>>> If nobody don't mind I would start with first one. >>>>> >>>>> I would love to see this happen. >>>>> >>>>>> Please go ahead. :) >>>>>> >>>>>> Note that you will encounter some problems when enabling name assignments >>>>>> for all named functions. I tried that at least once and it "didn't work", >>>>>> but I didn't take the time yet to investigate them further. >>>>>> >>>>>> I assume you are going to work on this in your own repo? >>>>> >>>>> Please also coordinate with Mark's work on function dispatching for >>>>> fused types. >>>>> >>>> >>>> I assume that that fused type functions are cdef ones so I think that >>>> should be easy to merge. >>>> On the other hand it's better to have Mark's branch merged into master. >>>> >>>> Mark, what is the state of your fused types branch? >>>> Is it possible to break it into smaller parts to ease reviewing and merging? >>>> >>> >>> It seems I meant memview branch not fusedtypes. >> >> There are 2 pending branches, _memview_rebase, which has support for >> memoryviews, and fusedtypes. The former is ready for merge, it's >> waiting to be reviewed. The fused types branch needs to subclass >> CyFunction (it basically modified the old binding function). There was >> also some duplicate functionality there, so I thought it'd be easier >> and more convenient to use the utility code loading there. >> >> Since it's not a strict dependency and it will be blocking progress, I >> will try to find some time to get it merge-ready for master. >> >> But no, it does cdef, cpdef and def methods, and it has some changes >> to all function nodes (FuncdefNode, CFuncdefNode and DefNode). These >> changes shouldn't be major though, but the logic in FusedFuncdefNode >> does differentiate between all the different functions in order to >> support them. Feel free to ask me about specifics any time. >> > > I've moved def node assignment synthesis into > DefNodeAssignmentSynthesis transformation. > > https://github.com/vitek/cython/commit/efacfed3c9cd8216b6c2100073a9df809b76675a > > Instead of moving defnode into PyCFunctionNode I've inserted > assignment statement right after defnode. > This is much more easy and seems ok to me. > > -- > vitja. > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > Ah, I thought you were going to wait until fused types were merged. In any case, this doesn't look like it will give too many conflicts, but there will be a few. I'm currently moving CyFunction to a utility code file and making a FusedFunction subclass. From vitja.makarov at gmail.com Sun Oct 2 20:44:34 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Sun, 2 Oct 2011 22:44:34 +0400 Subject: [Cython] CyFunction refactoring plan In-Reply-To: References: <4E8556E7.7050007@behnel.de> Message-ID: 2011/10/2 mark florisson : > On 2 October 2011 18:52, Vitja Makarov wrote: >> 2011/9/30 mark florisson : >>> On 30 September 2011 07:47, Vitja Makarov wrote: >>>> 2011/9/30 Vitja Makarov : >>>>> 2011/9/30 Robert Bradshaw : >>>>>> On Thu, Sep 29, 2011 at 10:43 PM, Stefan Behnel wrote: >>>>>>> Vitja Makarov, 30.09.2011 06:41: >>>>>>>> >>>>>>>> 2011/9/28 Vitja Makarov: >>>>>>>>> >>>>>>>>> I tried to build simple plan for ongoing cython function refactoring >>>>>>>>> >>>>>>>>> * Replace assignment synthesis with SingleAssignmentNode, where LHS is >>>>>>>>> NameNode and RHS is PyCFunctionNode >>>>>>>>> * Split function body into python wrapper and C function >>>>>>>>> http://wiki.cython.org/enhancements/generators#Pythonfunctionrefactoring >>>>>>>>> >>>>>>>>> Then we can implement some features and optimizations: >>>>>>>>> >>>>>>>>> * Reduce difference between cdef and def functions >>>>>>>>> * Store runtime evaluated default values inside CyFunction, ticket #674 >>>>>>>>> * Implement no-args super(), ticket #696 >>>>>>>>> * Function call inlining >>>>>>>> >>>>>>>> If nobody don't mind I would start with first one. >>>>>> >>>>>> I would love to see this happen. >>>>>> >>>>>>> Please go ahead. :) >>>>>>> >>>>>>> Note that you will encounter some problems when enabling name assignments >>>>>>> for all named functions. I tried that at least once and it "didn't work", >>>>>>> but I didn't take the time yet to investigate them further. >>>>>>> >>>>>>> I assume you are going to work on this in your own repo? >>>>>> >>>>>> Please also coordinate with Mark's work on function dispatching for >>>>>> fused types. >>>>>> >>>>> >>>>> I assume that that fused type functions are cdef ones so I think that >>>>> should be easy to merge. >>>>> On the other hand it's better to have Mark's branch merged into master. >>>>> >>>>> Mark, what is the state of your fused types branch? >>>>> Is it possible to break it into smaller parts to ease reviewing and merging? >>>>> >>>> >>>> It seems I meant memview branch not fusedtypes. >>> >>> There are 2 pending branches, _memview_rebase, which has support for >>> memoryviews, and fusedtypes. The former is ready for merge, it's >>> waiting to be reviewed. The fused types branch needs to subclass >>> CyFunction (it basically modified the old binding function). There was >>> also some duplicate functionality there, so I thought it'd be easier >>> and more convenient to use the utility code loading there. >>> >>> Since it's not a strict dependency and it will be blocking progress, I >>> will try to find some time to get it merge-ready for master. >>> >>> But no, it does cdef, cpdef and def methods, and it has some changes >>> to all function nodes (FuncdefNode, CFuncdefNode and DefNode). These >>> changes shouldn't be major though, but the logic in FusedFuncdefNode >>> does differentiate between all the different functions in order to >>> support them. Feel free to ask me about specifics any time. >>> >> >> I've moved def node assignment synthesis into >> DefNodeAssignmentSynthesis transformation. >> >> https://github.com/vitek/cython/commit/efacfed3c9cd8216b6c2100073a9df809b76675a >> >> Instead of moving defnode into PyCFunctionNode I've inserted >> assignment statement right after defnode. >> This is much more easy and seems ok to me. >> >> -- >> vitja. >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> > > Ah, I thought you were going to wait until fused types were merged. In > any case, this doesn't look like it will give too many conflicts, but > there will be a few. Yeah, I just had a free time and decided to try. I think fused types should be merged first. > I'm currently moving CyFunction to a utility code > file and making a FusedFunction subclass. > That's cool! Btw, have you seen utility code related bug in hudson it happens only with py2.4? -- vitja. From markflorisson88 at gmail.com Sun Oct 2 20:57:08 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 2 Oct 2011 19:57:08 +0100 Subject: [Cython] CyFunction refactoring plan In-Reply-To: References: <4E8556E7.7050007@behnel.de> Message-ID: On 2 October 2011 19:44, Vitja Makarov wrote: > 2011/10/2 mark florisson : >> On 2 October 2011 18:52, Vitja Makarov wrote: >>> 2011/9/30 mark florisson : >>>> On 30 September 2011 07:47, Vitja Makarov wrote: >>>>> 2011/9/30 Vitja Makarov : >>>>>> 2011/9/30 Robert Bradshaw : >>>>>>> On Thu, Sep 29, 2011 at 10:43 PM, Stefan Behnel wrote: >>>>>>>> Vitja Makarov, 30.09.2011 06:41: >>>>>>>>> >>>>>>>>> 2011/9/28 Vitja Makarov: >>>>>>>>>> >>>>>>>>>> I tried to build simple plan for ongoing cython function refactoring >>>>>>>>>> >>>>>>>>>> * Replace assignment synthesis with SingleAssignmentNode, where LHS is >>>>>>>>>> NameNode and RHS is PyCFunctionNode >>>>>>>>>> * Split function body into python wrapper and C function >>>>>>>>>> http://wiki.cython.org/enhancements/generators#Pythonfunctionrefactoring >>>>>>>>>> >>>>>>>>>> Then we can implement some features and optimizations: >>>>>>>>>> >>>>>>>>>> * Reduce difference between cdef and def functions >>>>>>>>>> * Store runtime evaluated default values inside CyFunction, ticket #674 >>>>>>>>>> * Implement no-args super(), ticket #696 >>>>>>>>>> * Function call inlining >>>>>>>>> >>>>>>>>> If nobody don't mind I would start with first one. >>>>>>> >>>>>>> I would love to see this happen. >>>>>>> >>>>>>>> Please go ahead. :) >>>>>>>> >>>>>>>> Note that you will encounter some problems when enabling name assignments >>>>>>>> for all named functions. I tried that at least once and it "didn't work", >>>>>>>> but I didn't take the time yet to investigate them further. >>>>>>>> >>>>>>>> I assume you are going to work on this in your own repo? >>>>>>> >>>>>>> Please also coordinate with Mark's work on function dispatching for >>>>>>> fused types. >>>>>>> >>>>>> >>>>>> I assume that that fused type functions are cdef ones so I think that >>>>>> should be easy to merge. >>>>>> On the other hand it's better to have Mark's branch merged into master. >>>>>> >>>>>> Mark, what is the state of your fused types branch? >>>>>> Is it possible to break it into smaller parts to ease reviewing and merging? >>>>>> >>>>> >>>>> It seems I meant memview branch not fusedtypes. >>>> >>>> There are 2 pending branches, _memview_rebase, which has support for >>>> memoryviews, and fusedtypes. The former is ready for merge, it's >>>> waiting to be reviewed. The fused types branch needs to subclass >>>> CyFunction (it basically modified the old binding function). There was >>>> also some duplicate functionality there, so I thought it'd be easier >>>> and more convenient to use the utility code loading there. >>>> >>>> Since it's not a strict dependency and it will be blocking progress, I >>>> will try to find some time to get it merge-ready for master. >>>> >>>> But no, it does cdef, cpdef and def methods, and it has some changes >>>> to all function nodes (FuncdefNode, CFuncdefNode and DefNode). These >>>> changes shouldn't be major though, but the logic in FusedFuncdefNode >>>> does differentiate between all the different functions in order to >>>> support them. Feel free to ask me about specifics any time. >>>> >>> >>> I've moved def node assignment synthesis into >>> DefNodeAssignmentSynthesis transformation. >>> >>> https://github.com/vitek/cython/commit/efacfed3c9cd8216b6c2100073a9df809b76675a >>> >>> Instead of moving defnode into PyCFunctionNode I've inserted >>> assignment statement right after defnode. >>> This is much more easy and seems ok to me. >>> >>> -- >>> vitja. >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >>> >> >> Ah, I thought you were going to wait until fused types were merged. In >> any case, this doesn't look like it will give too many conflicts, but >> there will be a few. > > Yeah, I just had a free time and decided to try. I think fused types > should be merged first. > >> I'm currently moving CyFunction to a utility code >> file and making a FusedFunction subclass. >> > > That's cool! Btw, have you seen utility code related bug in hudson it > happens only with py2.4? Yeah I'll fix that, thanks for pointing it out, I don't have a 2.4 build myself. I think it's not eating unicode keys for keyword arguments. > -- > vitja. > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From markflorisson88 at gmail.com Sun Oct 2 23:39:24 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 2 Oct 2011 22:39:24 +0100 Subject: [Cython] CyFunction refactoring plan In-Reply-To: References: <4E8556E7.7050007@behnel.de> Message-ID: On 2 October 2011 19:44, Vitja Makarov wrote: > 2011/10/2 mark florisson : >> On 2 October 2011 18:52, Vitja Makarov wrote: >>> 2011/9/30 mark florisson : >>>> On 30 September 2011 07:47, Vitja Makarov wrote: >>>>> 2011/9/30 Vitja Makarov : >>>>>> 2011/9/30 Robert Bradshaw : >>>>>>> On Thu, Sep 29, 2011 at 10:43 PM, Stefan Behnel wrote: >>>>>>>> Vitja Makarov, 30.09.2011 06:41: >>>>>>>>> >>>>>>>>> 2011/9/28 Vitja Makarov: >>>>>>>>>> >>>>>>>>>> I tried to build simple plan for ongoing cython function refactoring >>>>>>>>>> >>>>>>>>>> * Replace assignment synthesis with SingleAssignmentNode, where LHS is >>>>>>>>>> NameNode and RHS is PyCFunctionNode >>>>>>>>>> * Split function body into python wrapper and C function >>>>>>>>>> http://wiki.cython.org/enhancements/generators#Pythonfunctionrefactoring >>>>>>>>>> >>>>>>>>>> Then we can implement some features and optimizations: >>>>>>>>>> >>>>>>>>>> * Reduce difference between cdef and def functions >>>>>>>>>> * Store runtime evaluated default values inside CyFunction, ticket #674 >>>>>>>>>> * Implement no-args super(), ticket #696 >>>>>>>>>> * Function call inlining >>>>>>>>> >>>>>>>>> If nobody don't mind I would start with first one. >>>>>>> >>>>>>> I would love to see this happen. >>>>>>> >>>>>>>> Please go ahead. :) >>>>>>>> >>>>>>>> Note that you will encounter some problems when enabling name assignments >>>>>>>> for all named functions. I tried that at least once and it "didn't work", >>>>>>>> but I didn't take the time yet to investigate them further. >>>>>>>> >>>>>>>> I assume you are going to work on this in your own repo? >>>>>>> >>>>>>> Please also coordinate with Mark's work on function dispatching for >>>>>>> fused types. >>>>>>> >>>>>> >>>>>> I assume that that fused type functions are cdef ones so I think that >>>>>> should be easy to merge. >>>>>> On the other hand it's better to have Mark's branch merged into master. >>>>>> >>>>>> Mark, what is the state of your fused types branch? >>>>>> Is it possible to break it into smaller parts to ease reviewing and merging? >>>>>> >>>>> >>>>> It seems I meant memview branch not fusedtypes. >>>> >>>> There are 2 pending branches, _memview_rebase, which has support for >>>> memoryviews, and fusedtypes. The former is ready for merge, it's >>>> waiting to be reviewed. The fused types branch needs to subclass >>>> CyFunction (it basically modified the old binding function). There was >>>> also some duplicate functionality there, so I thought it'd be easier >>>> and more convenient to use the utility code loading there. >>>> >>>> Since it's not a strict dependency and it will be blocking progress, I >>>> will try to find some time to get it merge-ready for master. >>>> >>>> But no, it does cdef, cpdef and def methods, and it has some changes >>>> to all function nodes (FuncdefNode, CFuncdefNode and DefNode). These >>>> changes shouldn't be major though, but the logic in FusedFuncdefNode >>>> does differentiate between all the different functions in order to >>>> support them. Feel free to ask me about specifics any time. >>>> >>> >>> I've moved def node assignment synthesis into >>> DefNodeAssignmentSynthesis transformation. >>> >>> https://github.com/vitek/cython/commit/efacfed3c9cd8216b6c2100073a9df809b76675a >>> >>> Instead of moving defnode into PyCFunctionNode I've inserted >>> assignment statement right after defnode. >>> This is much more easy and seems ok to me. >>> >>> -- >>> vitja. >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >>> >> >> Ah, I thought you were going to wait until fused types were merged. In >> any case, this doesn't look like it will give too many conflicts, but >> there will be a few. > > Yeah, I just had a free time and decided to try. I think fused types > should be merged first. If you want you can rebase your branch on https://github.com/markflorisson88/cython/tree/fusedmerge, I'm not going to rebase that branch. It needs a few more fixes though. >> I'm currently moving CyFunction to a utility code >> file and making a FusedFunction subclass. >> > > That's cool! Btw, have you seen utility code related bug in hudson it > happens only with py2.4? > > -- > vitja. > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From vitja.makarov at gmail.com Sun Oct 2 23:52:54 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Mon, 3 Oct 2011 01:52:54 +0400 Subject: [Cython] CyFunction refactoring plan In-Reply-To: References: <4E8556E7.7050007@behnel.de> Message-ID: 2011/10/3 mark florisson : > On 2 October 2011 19:44, Vitja Makarov wrote: >> 2011/10/2 mark florisson : >>> On 2 October 2011 18:52, Vitja Makarov wrote: >>>> 2011/9/30 mark florisson : >>>>> On 30 September 2011 07:47, Vitja Makarov wrote: >>>>>> 2011/9/30 Vitja Makarov : >>>>>>> 2011/9/30 Robert Bradshaw : >>>>>>>> On Thu, Sep 29, 2011 at 10:43 PM, Stefan Behnel wrote: >>>>>>>>> Vitja Makarov, 30.09.2011 06:41: >>>>>>>>>> >>>>>>>>>> 2011/9/28 Vitja Makarov: >>>>>>>>>>> >>>>>>>>>>> I tried to build simple plan for ongoing cython function refactoring >>>>>>>>>>> >>>>>>>>>>> * Replace assignment synthesis with SingleAssignmentNode, where LHS is >>>>>>>>>>> NameNode and RHS is PyCFunctionNode >>>>>>>>>>> * Split function body into python wrapper and C function >>>>>>>>>>> http://wiki.cython.org/enhancements/generators#Pythonfunctionrefactoring >>>>>>>>>>> >>>>>>>>>>> Then we can implement some features and optimizations: >>>>>>>>>>> >>>>>>>>>>> * Reduce difference between cdef and def functions >>>>>>>>>>> * Store runtime evaluated default values inside CyFunction, ticket #674 >>>>>>>>>>> * Implement no-args super(), ticket #696 >>>>>>>>>>> * Function call inlining >>>>>>>>>> >>>>>>>>>> If nobody don't mind I would start with first one. >>>>>>>> >>>>>>>> I would love to see this happen. >>>>>>>> >>>>>>>>> Please go ahead. :) >>>>>>>>> >>>>>>>>> Note that you will encounter some problems when enabling name assignments >>>>>>>>> for all named functions. I tried that at least once and it "didn't work", >>>>>>>>> but I didn't take the time yet to investigate them further. >>>>>>>>> >>>>>>>>> I assume you are going to work on this in your own repo? >>>>>>>> >>>>>>>> Please also coordinate with Mark's work on function dispatching for >>>>>>>> fused types. >>>>>>>> >>>>>>> >>>>>>> I assume that that fused type functions are cdef ones so I think that >>>>>>> should be easy to merge. >>>>>>> On the other hand it's better to have Mark's branch merged into master. >>>>>>> >>>>>>> Mark, what is the state of your fused types branch? >>>>>>> Is it possible to break it into smaller parts to ease reviewing and merging? >>>>>>> >>>>>> >>>>>> It seems I meant memview branch not fusedtypes. >>>>> >>>>> There are 2 pending branches, _memview_rebase, which has support for >>>>> memoryviews, and fusedtypes. The former is ready for merge, it's >>>>> waiting to be reviewed. The fused types branch needs to subclass >>>>> CyFunction (it basically modified the old binding function). There was >>>>> also some duplicate functionality there, so I thought it'd be easier >>>>> and more convenient to use the utility code loading there. >>>>> >>>>> Since it's not a strict dependency and it will be blocking progress, I >>>>> will try to find some time to get it merge-ready for master. >>>>> >>>>> But no, it does cdef, cpdef and def methods, and it has some changes >>>>> to all function nodes (FuncdefNode, CFuncdefNode and DefNode). These >>>>> changes shouldn't be major though, but the logic in FusedFuncdefNode >>>>> does differentiate between all the different functions in order to >>>>> support them. Feel free to ask me about specifics any time. >>>>> >>>> >>>> I've moved def node assignment synthesis into >>>> DefNodeAssignmentSynthesis transformation. >>>> >>>> https://github.com/vitek/cython/commit/efacfed3c9cd8216b6c2100073a9df809b76675a >>>> >>>> Instead of moving defnode into PyCFunctionNode I've inserted >>>> assignment statement right after defnode. >>>> This is much more easy and seems ok to me. >>>> >>>> -- >>>> vitja. >>>> _______________________________________________ >>>> cython-devel mailing list >>>> cython-devel at python.org >>>> http://mail.python.org/mailman/listinfo/cython-devel >>>> >>> >>> Ah, I thought you were going to wait until fused types were merged. In >>> any case, this doesn't look like it will give too many conflicts, but >>> there will be a few. >> >> Yeah, I just had a free time and decided to try. I think fused types >> should be merged first. > > If you want you can rebase your branch on > https://github.com/markflorisson88/cython/tree/fusedmerge, I'm not > going to rebase that branch. It needs a few more fixes though. > Ok. I'll try tomorrow. -- vitja. From markflorisson88 at gmail.com Tue Oct 4 23:19:08 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 4 Oct 2011 22:19:08 +0100 Subject: [Cython] Utilities, cython.h, libcython Message-ID: Hey, I briefly mentioned something about this in a pull request, but maybe it deserves some actual discussion on the ML. So I propose that after fused types gets merged we try to move as many utility codes as possible to their utility code files (unless they are used in pending pull requests or other branches). Preferably this will be done in one or a few commits. How should we split up the work, any volunteers? Perhaps people who wrote certain utilities also want to move them? In that case, we should start a new branch and then merge that into master when it's done. We could actually move things before fused types get merged, as long as we don't touch binding_cfunc_utility_code. Before we go there, Stefan, do we still want to implement the header .ini style which can list dependencies and such? I personally don't care very much about it, but memoryviews and the utility loaders are merged so if someone wants to take up that job, it'd be good to do before moving the utilities. Another issue is that Cython compile time is increasing with the addition of control flow and cython utilities. If you use fused types you're also going to combinatorially add more compile time. I'm sure this came up earlier, but I really think we should have a libcython and a cython.h. libcython (a shared library) should contain any common Cython-specific code not meant to be inlined, and cython.h any types, macros and inline functions etc. This will decrease Cython and C compile time, and will also make executables smaller. This could be enabled using a command line option to Cython, as well as with distutils, eventually we may decide to make it the default (lets figure that out later). Preferably libcython.so would be installed alongside libpython.so and cython.h inside the Python include directory. Assuming multiple versions of Cython and multiple Python installations, we'd need to come up with a versioning scheme for either. We could also provide a static library there, for users who want to link and ship a compiled and statically linked version of their code. For a local Cython that isn't built, we can ignore the header and shared library option and issue a warning or some such. Lastly, I think we also should figure out a way to serialize Entry objects from CythonUtilities, which could easily and swiftly be loaded when creating the cython scope. It's quite a pain to declare all entries for utilities you write manually, so what I mostly did was parse the utility up to and including AnalyseDeclarationsTransform, and then retrieve the entries from there. Thoughts? Mark From robertwb at math.washington.edu Wed Oct 5 02:46:18 2011 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 4 Oct 2011 17:46:18 -0700 Subject: [Cython] Utilities, cython.h, libcython In-Reply-To: References: Message-ID: On Tue, Oct 4, 2011 at 2:19 PM, mark florisson wrote: > Hey, > > I briefly mentioned something about this in a pull request, but maybe > it deserves some actual discussion on the ML. > > So I propose that after fused types gets merged we try to move as many > utility codes as possible to their utility code files (unless they are > used in pending pull requests or other branches). Preferably this will > be done in one or a few commits. How should we split up the work, any > volunteers? Perhaps people who wrote certain utilities also want to > move them? In that case, we should start a new branch and then merge > that into master when it's done. > We could actually move things before fused types get merged, as long > as we don't touch binding_cfunc_utility_code. +1 to moving towards this, but I don't see the urgency or need to do it all at once (though if there's going to be a big push, lets coordinate on a wiki or trac). > Before we go there, Stefan, do we still want to implement the header > .ini style which can list dependencies and such? I personally don't > care very much about it, but memoryviews and the utility loaders are > merged so if someone wants to take up that job, it'd be good to do > before moving the utilities. > > Another issue is that Cython compile time is increasing with the > addition of control flow and cython utilities. If you use fused types > you're also going to combinatorially add more compile time. Yeah, this was especially obvious with, e.g. cython.compile(...). (In particular, some utility code was being parsed before it could even figure out whether it needed to do a full re-compile...) > I'm sure > this came up earlier, but I really think we should have a libcython > and a cython.h. libcython (a shared library) should contain any common > Cython-specific code not meant to be inlined, and cython.h any types, > macros and inline functions etc. This will decrease Cython and C > compile time, and will also make executables smaller. +1. Yes, we talked about this earlier, but nothing concrete was planned. It's probably worth a CEP, if anything to have a concrete plan recorded somewhere other than a series of mailing list threads (though discussion tends to work best here). > This could be > enabled using a command line option to Cython, as well as with > distutils, eventually we may decide to make it the default (lets > figure that out later). Preferably libcython.so would be installed > alongside libpython.so and cython.h inside the Python include > directory. Assuming multiple versions of Cython and multiple Python > installations, we'd need to come up with a versioning scheme for > either. I would propose a cython.h file that sits in Cython/Compiler/Include (or similar), as a first step. The .pyx -> .c pass could be configured to copy this to a specific location (for shipping just the generated .c files). One option is to build the shared library as a companion _cython_x_y_z.so module which, while not as efficient as linking at the C level, would probably be much simpler to implement in a cross-platform way. (This perhaps merits some benchmarks, but the main contents is likely to be things like shared classes and objects.) Actually linking .so files from modules that cimport each other would be a nice feature down the road anyways. Again, the associated .c file could be (optionally) generated/copied during the .pyx -> .c step. Installation would determine if the required module exists, and if not build and install it. > We could also provide a static library there, for users who want to > link and ship a compiled and statically linked version of their code. > For a local Cython that isn't built, we can ignore the header and > shared library option and issue a warning or some such. > > Lastly, I think we also should figure out a way to serialize Entry > objects from CythonUtilities, which could easily and swiftly be loaded > when creating the cython scope. It's quite a pain to declare all > entries for utilities you write manually, so what I mostly did was > parse the utility up to and including AnalyseDeclarationsTransform, > and then retrieve the entries from there. This would be really nice too. Way back in the day I did some work with trying to pickle full module scopes, but that soon became too painful as there are so many far-reaching references. Pickling individual Entries and re-building modules will probably be a more tractable goal. Eventually, I'd like to see a way to cache the full pxd pipeline. - Robert From stefan_ml at behnel.de Wed Oct 5 09:16:24 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 05 Oct 2011 09:16:24 +0200 Subject: [Cython] Utilities, cython.h, libcython In-Reply-To: References: Message-ID: <4E8C0448.6010204@behnel.de> mark florisson, 04.10.2011 23:19: > So I propose that after fused types gets merged we try to move as many > utility codes as possible to their utility code files (unless they are > used in pending pull requests or other branches). Preferably this will > be done in one or a few commits. How should we split up the work I would propose that new utility code gets moved out into utility files right away (if doable, given the current state of the infrastructure), and that existing utility code gets moves when it gets modified or when someone feels like it. Until we really get to the point of wanting to create a separate shared library etc., there's no need to hurry with the move. > We could actually move things before fused types get merged, as long > as we don't touch binding_cfunc_utility_code. Another reason not to hurry, right? > Before we go there, Stefan, do we still want to implement the header > .ini style which can list dependencies and such? I think we'll eventually need that, but that also depends a bit on the question whether we want to (or can) build a shared library or not. See below. > Another issue is that Cython compile time is increasing with the > addition of control flow and cython utilities. If you use fused types > you're also going to combinatorially add more compile time. I don't see that locally - a compiled Cython is hugely fast for me. In comparison, the C compiler literally takes ages to compile the result. An external shared library may or may not help with both - in particular, it is not clear to me what makes the C compiler slow. If the compile time is dominated by the number of inlined functions (which is not unlikely), a shared library + header file will not make a difference. > I'm sure > this came up earlier, but I really think we should have a libcython > and a cython.h. libcython (a shared library) should contain any common > Cython-specific code not meant to be inlined, and cython.h any types, > macros and inline functions etc. This has a couple of implications though. In order to support this on the user side, we have to build one shared library per installed package in order to avoid any Cython versioning issues. Just installing a versioned "libcython_x.y.z.so" globally isn't enough, especially during development, but also at deployment time. Different packages may use different CFLAGS or Cython options, which may have an impact on the result. Encoding all possible factors in the file name will be cumbersome and may mean that we still end up with a number of installed Cython libraries that correlates with the number of installed Cython based packages. Next, we may not know at build time which set of Cython modules is in the package. This may be less of an issue if we rely on "cythonize()" in setup.py to compile all modules before hand (assuming that the user doesn't call it twice, once for *.pyx, once for *.py, for example), but even if we know all modules, we'd still have to figure out the complete set of utility code used by all modules in order to build an adapted library with only the necessary code used in the package. So we'd always end up with a complete library with all utility code, which is only really interesting for larger packages with several Cython modules. I agree with Robert that a CEP would be needed for this, both for clearing up the implications and actual use cases (I know that Sage is a reasonable use case, but it's also a rather special case). > This will decrease Cython and C > compile time, and will also make executables smaller. I don't see how this actually impacts executables. However, a self-contained executable is a value in itself. > This could be > enabled using a command line option to Cython, as well as with > distutils, eventually we may decide to make it the default (lets > figure that out later). Preferably libcython.so would be installed > alongside libpython.so and cython.h inside the Python include > directory. I don't see this happening. It's easy for Python (there is only one Python running at a time, with one libpython loaded), but it's a lot less safe for different versions of a Cython library that are used by different modules inside of the running Python. For example, we'd have to version all visible symbols in operating systems with flat namespaces, in order to support loading multiple versions of the library. > Lastly, I think we also should figure out a way to serialize Entry > objects from CythonUtilities, which could easily and swiftly be loaded > when creating the cython scope. It's quite a pain to declare all > entries for utilities you write manually Why would you declare them manually? I thought everything would be moved out into the utility code files? > so what I mostly did was > parse the utility up to and including AnalyseDeclarationsTransform, > and then retrieve the entries from there. Sounds like a drawback regarding the processing time, but may still be a reasonable way to do it. I would expect that it won't be hard to pickle the resulting dict of entries into a cache file and rebuild it only when one of the utility files changes. Stefan From robertwb at math.washington.edu Wed Oct 5 09:38:26 2011 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 5 Oct 2011 00:38:26 -0700 Subject: [Cython] Utilities, cython.h, libcython In-Reply-To: <4E8C0448.6010204@behnel.de> References: <4E8C0448.6010204@behnel.de> Message-ID: On Wed, Oct 5, 2011 at 12:16 AM, Stefan Behnel wrote: > mark florisson, 04.10.2011 23:19: >> >> So I propose that after fused types gets merged we try to move as many >> utility codes as possible to their utility code files (unless they are >> used in pending pull requests or other branches). Preferably this will >> be done in one or a few commits. How should we split up the work > > I would propose that new utility code gets moved out into utility files > right away (if doable, given the current state of the infrastructure), and > that existing utility code gets moves when it gets modified or when someone > feels like it. Until we really get to the point of wanting to create a > separate shared library etc., there's no need to hurry with the move. > > >> We could actually move things before fused types get merged, as long >> as we don't touch binding_cfunc_utility_code. > > Another reason not to hurry, right? > > >> Before we go there, Stefan, do we still want to implement the header >> .ini style which can list dependencies and such? > > I think we'll eventually need that, but that also depends a bit on the > question whether we want to (or can) build a shared library or not. See > below. > > >> Another issue is that Cython compile time is increasing with the >> addition of control flow and cython utilities. If you use fused types >> you're also going to combinatorially add more compile time. > > I don't see that locally - a compiled Cython is hugely fast for me. In > comparison, the C compiler literally takes ages to compile the result. An > external shared library may or may not help with both - in particular, it is > not clear to me what makes the C compiler slow. If the compile time is > dominated by the number of inlined functions (which is not unlikely), a > shared library + header file will not make a difference. > > >> I'm sure >> this came up earlier, but I really think we should have a libcython >> and a cython.h. libcython (a shared library) should contain any common >> Cython-specific code not meant to be inlined, and cython.h any types, >> macros and inline functions etc. > > This has a couple of implications though. In order to support this on the > user side, we have to build one shared library per installed package in > order to avoid any Cython versioning issues. Just installing a versioned > "libcython_x.y.z.so" globally isn't enough, especially during development, > but also at deployment time. Different packages may use different CFLAGS or > Cython options, which may have an impact on the result. Encoding all > possible factors in the file name will be cumbersome and may mean that we > still end up with a number of installed Cython libraries that correlates > with the number of installed Cython based packages. That's a good point. Perhaps an easier first target is to have one "libcython" per package (with a randomized or project-specific name). Longer-term, I think the goal of one libcython per version is a reasonable one, for deployment at least. Exceptional packages (e.g. that require a special set of CFLAGS rather than the ones Python was built with) can either bundle their own or forgo any sharing of code as it is done now, and features that can't be easily normalized across (cython and c) compilation options would remain in project-specific generated .c files. > Next, we may not know at build time which set of Cython modules is in the > package. This may be less of an issue if we rely on "cythonize()" in > setup.py to compile all modules before hand (assuming that the user doesn't > call it twice, once for *.pyx, once for *.py, for example), but even if we > know all modules, we'd still have to figure out the complete set of utility > code used by all modules in order to build an adapted library with only the > necessary code used in the package. So we'd always end up with a complete > library with all utility code, which is only really interesting for larger > packages with several Cython modules. Yes, I'm thinking we would create relatively complete libraries, though if we did things on a per package level perhaps we could do some pruning. We could still conditionally put some of the utility code (especially the rarely used or shared stuff) into each module. > I agree with Robert that a CEP would be needed for this, both for clearing > up the implications and actual use cases (I know that Sage is a reasonable > use case, but it's also a rather special case). > > >> This will decrease Cython and C >> compile time, and will also make executables smaller. > > I don't see how this actually impacts executables. However, a self-contained > executable is a value in itself. As an example, we're starting to have full utility types, e.g. for generators and or CyFunction. Lots of the utility code (e.g. loading modules, raising exceptions, etc.) could be shared as well. For something like Sage that could be a significant savings, and it could be a big boon for cython.inline as well. >> This could be >> enabled using a command line option to Cython, as well as with >> distutils, eventually we may decide to make it the default (lets >> figure that out later). Preferably libcython.so would be installed >> alongside libpython.so and cython.h inside the Python include >> directory. > > I don't see this happening. It's easy for Python (there is only one Python > running at a time, with one libpython loaded), but it's a lot less safe for > different versions of a Cython library that are used by different modules > inside of the running Python. For example, we'd have to version all visible > symbols in operating systems with flat namespaces, in order to support > loading multiple versions of the library. Which is another advantage to "linking" via the cimport mechanisms. >> Lastly, I think we also should figure out a way to serialize Entry >> objects from CythonUtilities, which could easily and swiftly be loaded >> when creating the cython scope. It's quite a pain to declare all >> entries for utilities you write manually > > Why would you declare them manually? I thought everything would be moved out > into the utility code files? > > >> so what I mostly did was >> parse the utility up to and including AnalyseDeclarationsTransform, >> and then retrieve the entries from there. > > Sounds like a drawback regarding the processing time, but may still be a > reasonable way to do it. I would expect that it won't be hard to pickle the > resulting dict of entries into a cache file and rebuild it only when one of > the utility files changes. +1 It'd be great to be able to do this for the many .pxd files in Sage as well. Parsing .pxd files is a huge portion of the compilation of the Sage library. - Robert From robertwb at math.washington.edu Wed Oct 5 09:45:25 2011 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 5 Oct 2011 00:45:25 -0700 Subject: [Cython] [cython-users] Re: callback function pointer problem In-Reply-To: <588dc249-8f0b-49f2-bf42-23978ea95ddf@email.android.com> References: <4E835336.1060800@gmail.com> <4E8398B5.6050905@gmail.com> <4E8421D0.5010007@gmail.com> <4E844BD0.4040207@gmail.com> <4E845B24.6060102@astro.uio.no> <4E845BAE.307@astro.uio.no> <4E8460E8.5050701@gmail.com> <588dc249-8f0b-49f2-bf42-23978ea95ddf@email.android.com> Message-ID: On Fri, Sep 30, 2011 at 2:14 PM, Dag Sverre Seljebotn wrote: > Are you saying that when coercing a struct to an object, one would copy > scalar fields by value but reference array fields? -1, that would be > confusing. Either the whole struct through a view, or copy it all. +1 > It bothers me that structs are passed by value in Cython, but it seems > impossible to change that now. (i.e, once upon a time one could have > required the use of a copy method to do a struct assignment and give a > syntax error otherwise, which would have worked nicer with Python > semantics). Of course, to do otherwise would have resulted in "pure C" code behaving very differently from C and messy issues like "cdef int f(struct_type a)" either meaning different things in an extern block or not mapping to the "obvious" C signature. On this note, eventually I would like coerce structs (and unions, enums) to auto-generated wrapper classes, visible in the Python module namespace if one declares them as "cpdef struct ..." (even if they're extern). - Robert From markflorisson88 at gmail.com Wed Oct 5 15:52:19 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Wed, 5 Oct 2011 14:52:19 +0100 Subject: [Cython] Utilities, cython.h, libcython In-Reply-To: References: Message-ID: On 5 October 2011 01:46, Robert Bradshaw wrote: > On Tue, Oct 4, 2011 at 2:19 PM, mark florisson > wrote: >> Hey, >> >> I briefly mentioned something about this in a pull request, but maybe >> it deserves some actual discussion on the ML. >> >> So I propose that after fused types gets merged we try to move as many >> utility codes as possible to their utility code files (unless they are >> used in pending pull requests or other branches). Preferably this will >> be done in one or a few commits. How should we split up the work, any >> volunteers? Perhaps people who wrote certain utilities also want to >> move them? In that case, we should start a new branch and then merge >> that into master when it's done. >> We could actually move things before fused types get merged, as long >> as we don't touch binding_cfunc_utility_code. > > +1 to moving towards this, but I don't see the urgency or need to do > it all at once (though if there's going to be a big push, lets > coordinate on a wiki or trac). Hm, perhaps there is no strict need to hurry, as long as we take care not to modify utilities after they have been moved. The wiki could be great for that, but I personally don't keep track of everyone's branches, so I don't know which utility is modified by whom (if at all), so strictly speaking (to avoid painful merges) I'd have to ask everyone each time I wanted to move something, or dig through everyone's branches. >> Before we go there, Stefan, do we still want to implement the header >> .ini style which can list dependencies and such? I personally don't >> care very much about it, but memoryviews and the utility loaders are >> merged so if someone wants to take up that job, it'd be good to do >> before moving the utilities. >> >> Another issue is that Cython compile time is increasing with the >> addition of control flow and cython utilities. If you use fused types >> you're also going to combinatorially add more compile time. > > Yeah, this was especially obvious with, e.g. cython.compile(...). (In > particular, some utility code was being parsed before it could even > figure out whether it needed to do a full re-compile...) > >> I'm sure >> this came up earlier, but I really think we should have a libcython >> and a cython.h. libcython (a shared library) should contain any common >> Cython-specific code not meant to be inlined, and cython.h any types, >> macros and inline functions etc. This will decrease Cython and C >> compile time, and will also make executables smaller. > > +1. Yes, we talked about this earlier, but nothing concrete was > planned. It's probably worth a CEP, if anything to have a concrete > plan recorded somewhere other than a series of mailing list threads > (though discussion tends to work best here). > >> This could be >> enabled using a command line option to Cython, as well as with >> distutils, eventually we may decide to make it the default (lets >> figure that out later). Preferably libcython.so would be installed >> alongside libpython.so and cython.h inside the Python include >> directory. Assuming multiple versions of Cython and multiple Python >> installations, we'd need to come up with a versioning scheme for >> either. > > I would propose a cython.h file that sits in Cython/Compiler/Include > (or similar), as a first step. The .pyx -> .c pass could be configured > to copy this to a specific location (for shipping just the generated > .c files). That would be fine as well. It might be convenient for users in that case if we could provide a cython.get_include() in addition to the distutils hooks, and a cython-config script. > One option is to build the shared library as a companion > _cython_x_y_z.so module which, while not as efficient as linking at > the C level, would probably be much simpler to implement in a > cross-platform way. (This perhaps merits some benchmarks, but the main > contents is likely to be things like shared classes and objects.) > Actually linking .so files from modules that cimport each other would > be a nice feature down the road anyways. Again, the associated .c file > could be (optionally) generated/copied during the .pyx -> .c step. > Installation would determine if the required module exists, and if not > build and install it. Hm, that's a really good idea. I think the only overhead would be the capsule unpacking and pointer duplication, but that shouldn't suddenly be an issue. That means we don't have to do any versioning of the libraries and the symbols to avoid clashes in a flat namespaces as Stefan mentioned. >> We could also provide a static library there, for users who want to >> link and ship a compiled and statically linked version of their code. >> For a local Cython that isn't built, we can ignore the header and >> shared library option and issue a warning or some such. >> >> Lastly, I think we also should figure out a way to serialize Entry >> objects from CythonUtilities, which could easily and swiftly be loaded >> when creating the cython scope. It's quite a pain to declare all >> entries for utilities you write manually, so what I mostly did was >> parse the utility up to and including AnalyseDeclarationsTransform, >> and then retrieve the entries from there. > > This would be really nice too. Way back in the day I did some work > with trying to pickle full module scopes, but that soon became too > painful as there are so many far-reaching references. Pickling > individual Entries and re-building modules will probably be a more > tractable goal. Eventually, I'd like to see a way to cache the full > pxd pipeline. > > - Robert > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From markflorisson88 at gmail.com Wed Oct 5 15:53:24 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Wed, 5 Oct 2011 14:53:24 +0100 Subject: [Cython] Utilities, cython.h, libcython In-Reply-To: <4E8C0448.6010204@behnel.de> References: <4E8C0448.6010204@behnel.de> Message-ID: On 5 October 2011 08:16, Stefan Behnel wrote: > mark florisson, 04.10.2011 23:19: >> >> So I propose that after fused types gets merged we try to move as many >> utility codes as possible to their utility code files (unless they are >> used in pending pull requests or other branches). Preferably this will >> be done in one or a few commits. How should we split up the work > > I would propose that new utility code gets moved out into utility files > right away (if doable, given the current state of the infrastructure), and > that existing utility code gets moves when it gets modified or when someone > feels like it. Until we really get to the point of wanting to create a > separate shared library etc., there's no need to hurry with the move. > > >> We could actually move things before fused types get merged, as long >> as we don't touch binding_cfunc_utility_code. > > Another reason not to hurry, right? > > >> Before we go there, Stefan, do we still want to implement the header >> .ini style which can list dependencies and such? > > I think we'll eventually need that, but that also depends a bit on the > question whether we want to (or can) build a shared library or not. See > below. > > >> Another issue is that Cython compile time is increasing with the >> addition of control flow and cython utilities. If you use fused types >> you're also going to combinatorially add more compile time. > > I don't see that locally - a compiled Cython is hugely fast for me. In > comparison, the C compiler literally takes ages to compile the result. An > external shared library may or may not help with both - in particular, it is > not clear to me what makes the C compiler slow. If the compile time is > dominated by the number of inlined functions (which is not unlikely), a > shared library + header file will not make a difference. > Have you tried with the memoryviews merged? e.g. if I have this code: from libc.stdlib cimport malloc cdef int[:] slice = malloc(sizeof(int) * 10) [0] [14:45] ~ ? time cython test.pyx cython test.pyx 2.61s user 0.08s system 99% cpu 2.695 total [0] [14:45] ~ ? time zsh compile zsh compile 1.88s user 0.06s system 99% cpu 1.946 total where 'compile' is the script that invoked the same gcc command distutils uses. As you can see it took more than 2.5 seconds to compile this code (simply because the memoryview utilities get included). The C compiler does it quite a lot faster here. This obviously depends largely on your code, you get probably have it the other way around as well. >> I'm sure >> this came up earlier, but I really think we should have a libcython >> and a cython.h. libcython (a shared library) should contain any common >> Cython-specific code not meant to be inlined, and cython.h any types, >> macros and inline functions etc. > > This has a couple of implications though. In order to support this on the > user side, we have to build one shared library per installed package in > order to avoid any Cython versioning issues. Just installing a versioned > "libcython_x.y.z.so" globally isn't enough, especially during development, > but also at deployment time. Different packages may use different CFLAGS or > Cython options, which may have an impact on the result. Encoding all > possible factors in the file name will be cumbersome and may mean that we > still end up with a number of installed Cython libraries that correlates > with the number of installed Cython based packages. Hm, I think the CFLAGS are important so long as they are compatible with Python. When the user compiles a Cython extension module with extra CFLAGS, this doesn't affect libpython. Similarly, the Cython utilities are really not the user's responsibility, so libcython doesn't need to be compiled with the same flags as the extension module. If still wanted, the user could either recompile python with different CFLAGS (which means libcython will get those as well), or not use libcython at all. CFLAGS should really only pertain to user code, not to the Cython library, which the user shouldn't be concerned about. > Next, we may not know at build time which set of Cython modules is in the > package. This may be less of an issue if we rely on "cythonize()" in > setup.py to compile all modules before hand (assuming that the user doesn't > call it twice, once for *.pyx, once for *.py, for example), but even if we > know all modules, we'd still have to figure out the complete set of utility > code used by all modules in order to build an adapted library with only the > necessary code used in the package. So we'd always end up with a complete > library with all utility code, which is only really interesting for larger > packages with several Cython modules. > I agree with Robert that a CEP would be needed for this, both for clearing > up the implications and actual use cases (I know that Sage is a reasonable > use case, but it's also a rather special case). > > >> This will decrease Cython and C >> compile time, and will also make executables smaller. > > I don't see how this actually impacts executables. However, a self-contained > executable is a value in itself. > > >> This could be >> enabled using a command line option to Cython, as well as with >> distutils, eventually we may decide to make it the default (lets >> figure that out later). Preferably libcython.so would be installed >> alongside libpython.so and cython.h inside the Python include >> directory. > > I don't see this happening. It's easy for Python (there is only one Python > running at a time, with one libpython loaded), but it's a lot less safe for > different versions of a Cython library that are used by different modules > inside of the running Python. For example, we'd have to version all visible > symbols in operating systems with flat namespaces, in order to support > loading multiple versions of the library. > > >> Lastly, I think we also should figure out a way to serialize Entry >> objects from CythonUtilities, which could easily and swiftly be loaded >> when creating the cython scope. It's quite a pain to declare all >> entries for utilities you write manually > > Why would you declare them manually? I thought everything would be moved out > into the utility code files? > Right, the code is in the utility files. However, the cython scope needs to have the entries of the classes and functions of the utilities. e.g. the user may write cimport cython cdef cython.array myobject For this to work, we need an 'array' entry, which we don't have yet, as the utility code will be parsed at code generation time if an entry of that utility code (which doesn't exist yet!) is used. >> so what I mostly did was >> parse the utility up to and including AnalyseDeclarationsTransform, >> and then retrieve the entries from there. > > Sounds like a drawback regarding the processing time, but may still be a > reasonable way to do it. I would expect that it won't be hard to pickle the > resulting dict of entries into a cache file and rebuild it only when one of > the utility files changes. Exactly. I'm not sure about pickle though, but the details don't matter. Pickle is certainly easy as long as you don't change your interface (which we most certainly will, though). > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From markflorisson88 at gmail.com Wed Oct 5 15:54:02 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Wed, 5 Oct 2011 14:54:02 +0100 Subject: [Cython] Utilities, cython.h, libcython In-Reply-To: References: <4E8C0448.6010204@behnel.de> Message-ID: On 5 October 2011 08:38, Robert Bradshaw wrote: > On Wed, Oct 5, 2011 at 12:16 AM, Stefan Behnel wrote: >> mark florisson, 04.10.2011 23:19: >>> >>> So I propose that after fused types gets merged we try to move as many >>> utility codes as possible to their utility code files (unless they are >>> used in pending pull requests or other branches). Preferably this will >>> be done in one or a few commits. How should we split up the work >> >> I would propose that new utility code gets moved out into utility files >> right away (if doable, given the current state of the infrastructure), and >> that existing utility code gets moves when it gets modified or when someone >> feels like it. Until we really get to the point of wanting to create a >> separate shared library etc., there's no need to hurry with the move. >> >> >>> We could actually move things before fused types get merged, as long >>> as we don't touch binding_cfunc_utility_code. >> >> Another reason not to hurry, right? >> >> >>> Before we go there, Stefan, do we still want to implement the header >>> .ini style which can list dependencies and such? >> >> I think we'll eventually need that, but that also depends a bit on the >> question whether we want to (or can) build a shared library or not. See >> below. >> >> >>> Another issue is that Cython compile time is increasing with the >>> addition of control flow and cython utilities. If you use fused types >>> you're also going to combinatorially add more compile time. >> >> I don't see that locally - a compiled Cython is hugely fast for me. In >> comparison, the C compiler literally takes ages to compile the result. An >> external shared library may or may not help with both - in particular, it is >> not clear to me what makes the C compiler slow. If the compile time is >> dominated by the number of inlined functions (which is not unlikely), a >> shared library + header file will not make a difference. >> >> >>> I'm sure >>> this came up earlier, but I really think we should have a libcython >>> and a cython.h. libcython (a shared library) should contain any common >>> Cython-specific code not meant to be inlined, and cython.h any types, >>> macros and inline functions etc. >> >> This has a couple of implications though. In order to support this on the >> user side, we have to build one shared library per installed package in >> order to avoid any Cython versioning issues. Just installing a versioned >> "libcython_x.y.z.so" globally isn't enough, especially during development, >> but also at deployment time. Different packages may use different CFLAGS or >> Cython options, which may have an impact on the result. Encoding all >> possible factors in the file name will be cumbersome and may mean that we >> still end up with a number of installed Cython libraries that correlates >> with the number of installed Cython based packages. > > That's a good point. Perhaps an easier first target is to have one > "libcython" per package (with a randomized or project-specific name). > Longer-term, I think the goal of one libcython per version is a > reasonable one, for deployment at least. Exceptional packages (e.g. > that require a special set of CFLAGS rather than the ones Python was > built with) can either bundle their own or forgo any sharing of code > as it is done now, and features that can't be easily normalized across > (cython and c) compilation options would remain in project-specific > generated .c files. > >> Next, we may not know at build time which set of Cython modules is in the >> package. This may be less of an issue if we rely on "cythonize()" in >> setup.py to compile all modules before hand (assuming that the user doesn't >> call it twice, once for *.pyx, once for *.py, for example), but even if we >> know all modules, we'd still have to figure out the complete set of utility >> code used by all modules in order to build an adapted library with only the >> necessary code used in the package. So we'd always end up with a complete >> library with all utility code, which is only really interesting for larger >> packages with several Cython modules. > > Yes, I'm thinking we would create relatively complete libraries, > though if we did things on a per package level perhaps we could do > some pruning. We could still conditionally put some of the utility > code (especially the rarely used or shared stuff) into each module. Yeah that would be nice. I actually think we shouldn't do anything on a per-package level, only a bunch of modules with related stuff (conversion utilities/exception raising etc in one module, buffer/memoryview utilities in another etc). We've been living with huge files since now, I don't think we suddenly need to actively start pruning for a little bit of memory. I think the module approach would also be easy to implement, as the infrastructure for external cdef functions/classes importing/exporting is already there. >> I agree with Robert that a CEP would be needed for this, both for clearing >> up the implications and actual use cases (I know that Sage is a reasonable >> use case, but it's also a rather special case). >> >> >>> This will decrease Cython and C >>> compile time, and will also make executables smaller. >> >> I don't see how this actually impacts executables. However, a self-contained >> executable is a value in itself. > > As an example, we're starting to have full utility types, e.g. for > generators and or CyFunction. Lots of the utility code (e.g. loading > modules, raising exceptions, etc.) could be shared as well. For > something like Sage that could be a significant savings, and it could > be a big boon for cython.inline as well. > >>> This could be >>> enabled using a command line option to Cython, as well as with >>> distutils, eventually we may decide to make it the default (lets >>> figure that out later). Preferably libcython.so would be installed >>> alongside libpython.so and cython.h inside the Python include >>> directory. >> >> I don't see this happening. It's easy for Python (there is only one Python >> running at a time, with one libpython loaded), but it's a lot less safe for >> different versions of a Cython library that are used by different modules >> inside of the running Python. For example, we'd have to version all visible >> symbols in operating systems with flat namespaces, in order to support >> loading multiple versions of the library. > > Which is another advantage to "linking" via the cimport mechanisms. > >>> Lastly, I think we also should figure out a way to serialize Entry >>> objects from CythonUtilities, which could easily and swiftly be loaded >>> when creating the cython scope. It's quite a pain to declare all >>> entries for utilities you write manually >> >> Why would you declare them manually? I thought everything would be moved out >> into the utility code files? >> >> >>> so what I mostly did was >>> parse the utility up to and including AnalyseDeclarationsTransform, >>> and then retrieve the entries from there. >> >> Sounds like a drawback regarding the processing time, but may still be a >> reasonable way to do it. I would expect that it won't be hard to pickle the >> resulting dict of entries into a cache file and rebuild it only when one of >> the utility files changes. > > +1 > > It'd be great to be able to do this for the many .pxd files in Sage as > well. Parsing .pxd files is a huge portion of the compilation of the > Sage library. > > - Robert > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From markflorisson88 at gmail.com Wed Oct 5 16:18:11 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Wed, 5 Oct 2011 15:18:11 +0100 Subject: [Cython] Utilities, cython.h, libcython In-Reply-To: References: <4E8C0448.6010204@behnel.de> Message-ID: On 5 October 2011 14:54, mark florisson wrote: > On 5 October 2011 08:38, Robert Bradshaw wrote: >> On Wed, Oct 5, 2011 at 12:16 AM, Stefan Behnel wrote: >>> mark florisson, 04.10.2011 23:19: >>>> >>>> So I propose that after fused types gets merged we try to move as many >>>> utility codes as possible to their utility code files (unless they are >>>> used in pending pull requests or other branches). Preferably this will >>>> be done in one or a few commits. How should we split up the work >>> >>> I would propose that new utility code gets moved out into utility files >>> right away (if doable, given the current state of the infrastructure), and >>> that existing utility code gets moves when it gets modified or when someone >>> feels like it. Until we really get to the point of wanting to create a >>> separate shared library etc., there's no need to hurry with the move. >>> >>> >>>> We could actually move things before fused types get merged, as long >>>> as we don't touch binding_cfunc_utility_code. >>> >>> Another reason not to hurry, right? >>> >>> >>>> Before we go there, Stefan, do we still want to implement the header >>>> .ini style which can list dependencies and such? >>> >>> I think we'll eventually need that, but that also depends a bit on the >>> question whether we want to (or can) build a shared library or not. See >>> below. >>> >>> >>>> Another issue is that Cython compile time is increasing with the >>>> addition of control flow and cython utilities. If you use fused types >>>> you're also going to combinatorially add more compile time. >>> >>> I don't see that locally - a compiled Cython is hugely fast for me. In >>> comparison, the C compiler literally takes ages to compile the result. An >>> external shared library may or may not help with both - in particular, it is >>> not clear to me what makes the C compiler slow. If the compile time is >>> dominated by the number of inlined functions (which is not unlikely), a >>> shared library + header file will not make a difference. >>> >>> >>>> I'm sure >>>> this came up earlier, but I really think we should have a libcython >>>> and a cython.h. libcython (a shared library) should contain any common >>>> Cython-specific code not meant to be inlined, and cython.h any types, >>>> macros and inline functions etc. >>> >>> This has a couple of implications though. In order to support this on the >>> user side, we have to build one shared library per installed package in >>> order to avoid any Cython versioning issues. Just installing a versioned >>> "libcython_x.y.z.so" globally isn't enough, especially during development, >>> but also at deployment time. Different packages may use different CFLAGS or >>> Cython options, which may have an impact on the result. Encoding all >>> possible factors in the file name will be cumbersome and may mean that we >>> still end up with a number of installed Cython libraries that correlates >>> with the number of installed Cython based packages. >> >> That's a good point. Perhaps an easier first target is to have one >> "libcython" per package (with a randomized or project-specific name). >> Longer-term, I think the goal of one libcython per version is a >> reasonable one, for deployment at least. Exceptional packages (e.g. >> that require a special set of CFLAGS rather than the ones Python was >> built with) can either bundle their own or forgo any sharing of code >> as it is done now, and features that can't be easily normalized across >> (cython and c) compilation options would remain in project-specific >> generated .c files. >> >>> Next, we may not know at build time which set of Cython modules is in the >>> package. This may be less of an issue if we rely on "cythonize()" in >>> setup.py to compile all modules before hand (assuming that the user doesn't >>> call it twice, once for *.pyx, once for *.py, for example), but even if we >>> know all modules, we'd still have to figure out the complete set of utility >>> code used by all modules in order to build an adapted library with only the >>> necessary code used in the package. So we'd always end up with a complete >>> library with all utility code, which is only really interesting for larger >>> packages with several Cython modules. >> >> Yes, I'm thinking we would create relatively complete libraries, >> though if we did things on a per package level perhaps we could do >> some pruning. We could still conditionally put some of the utility >> code (especially the rarely used or shared stuff) into each module. > > Yeah that would be nice. I actually think we shouldn't do anything on > a per-package level, only a bunch of modules with related stuff > (conversion utilities/exception raising etc in one module, > buffer/memoryview utilities in another etc). We've been living with > huge files since now, I don't think we suddenly need to actively start > pruning for a little bit of memory. > > I think the module approach would also be easy to implement, as the > infrastructure for external cdef functions/classes importing/exporting > is already there. > >>> I agree with Robert that a CEP would be needed for this, both for clearing >>> up the implications and actual use cases (I know that Sage is a reasonable >>> use case, but it's also a rather special case). >>> >>> >>>> This will decrease Cython and C >>>> compile time, and will also make executables smaller. >>> >>> I don't see how this actually impacts executables. However, a self-contained >>> executable is a value in itself. >> >> As an example, we're starting to have full utility types, e.g. for >> generators and or CyFunction. Lots of the utility code (e.g. loading >> modules, raising exceptions, etc.) could be shared as well. For >> something like Sage that could be a significant savings, and it could >> be a big boon for cython.inline as well. >> >>>> This could be >>>> enabled using a command line option to Cython, as well as with >>>> distutils, eventually we may decide to make it the default (lets >>>> figure that out later). Preferably libcython.so would be installed >>>> alongside libpython.so and cython.h inside the Python include >>>> directory. >>> >>> I don't see this happening. It's easy for Python (there is only one Python >>> running at a time, with one libpython loaded), but it's a lot less safe for >>> different versions of a Cython library that are used by different modules >>> inside of the running Python. For example, we'd have to version all visible >>> symbols in operating systems with flat namespaces, in order to support >>> loading multiple versions of the library. >> >> Which is another advantage to "linking" via the cimport mechanisms. >> >>>> Lastly, I think we also should figure out a way to serialize Entry >>>> objects from CythonUtilities, which could easily and swiftly be loaded >>>> when creating the cython scope. It's quite a pain to declare all >>>> entries for utilities you write manually >>> >>> Why would you declare them manually? I thought everything would be moved out >>> into the utility code files? >>> >>> >>>> so what I mostly did was >>>> parse the utility up to and including AnalyseDeclarationsTransform, >>>> and then retrieve the entries from there. >>> >>> Sounds like a drawback regarding the processing time, but may still be a >>> reasonable way to do it. I would expect that it won't be hard to pickle the >>> resulting dict of entries into a cache file and rebuild it only when one of >>> the utility files changes. >> >> +1 >> >> It'd be great to be able to do this for the many .pxd files in Sage as >> well. Parsing .pxd files is a huge portion of the compilation of the >> Sage library. >> >> - Robert >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> > I expect it will also speed up the test runner quite a lot, which takes forever, as there are lots of small doctests. On an unrelated note, it'd be great if we could run individual doctests in parallel, I know py.test can do that, maybe nosetests as well. It'd be great if there was a plugin that supported cython (as well as C extension modules) that could run them, and an additional plugin that could make it work with our various test modes and directives. From ndbecker2 at gmail.com Wed Oct 5 17:09:55 2011 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 05 Oct 2011 11:09:55 -0400 Subject: [Cython] scons support Message-ID: I have no idea why this doesn't work for me. Looking at http://www.mail-archive.com/cython-dev at codespeak.net/msg09540.html scons --version SCons by Steven Knight et al.: script: v2.1.0.r5357[MODIFIED], 2011/09/09 21:31:03, by bdeegan on ubuntu engine: v2.1.0.r5357[MODIFIED], 2011/09/09 21:31:03, by bdeegan on ubuntu engine path: ['/usr/lib/scons/SCons'] ------------------------------------------------ cyenv = Environment(PYEXT_USE_DISTUTILS=True) cyenv.Tool("pyext") cyenv.Tool("cython") import numpy cyenv.Append(PYEXTINCPATH=[numpy.get_include()]) cyenv.Replace(CYTHONFLAGS=['--cplus']) #cyenv.Replace(CXXFILESUFFIX='.cpp') #cyenv.Replace(CYTHONCFILESUFFIX='.cpp') cyenv.PythonExtension ('trellis_enc', ['trellis_enc.py']) ----------------------------------------------------- gives: cython --cplus -o trellis_enc.c trellis_enc.pyx gcc -pthread -o trellis_enc.os -c -fPIC -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall - Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -I/usr/include/python2.7 - I/usr/lib64/python2.7/site-packages/numpy/core/include trellis_enc.c gcc -pthread -shared -o trellis_enc.so trellis_enc.os Which is OK, except it used '.c' instead of '.cpp' but if I try: ------------------------------------------------ cyenv = Environment(PYEXT_USE_DISTUTILS=True) cyenv.Tool("pyext") cyenv.Tool("cython") import numpy cyenv.Append(PYEXTINCPATH=[numpy.get_include()]) cyenv.Replace(CYTHONFLAGS=['--cplus']) cyenv.Replace(CXXFILESUFFIX='.cpp') cyenv.Replace(CYTHONCFILESUFFIX='.cpp') cyenv.PythonExtension ('trellis_enc', ['trellis_enc.py']) ----------------------------------------------------- cython --cplus -o trellis_enc.cpp trellis_enc.pyx o trellis_enc.os -c -I/usr/include/python2.7 -I/usr/lib64/python2.7/site- packages/numpy/core/include trellis_enc.cpp sh: o: command not found The 'gcc' command got completely mangled. ??? From greg.ewing at canterbury.ac.nz Thu Oct 6 00:41:09 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 06 Oct 2011 11:41:09 +1300 Subject: [Cython] [cython-users] Re: callback function pointer problem In-Reply-To: References: <4E835336.1060800@gmail.com> <4E8398B5.6050905@gmail.com> <4E8421D0.5010007@gmail.com> <4E844BD0.4040207@gmail.com> <4E845B24.6060102@astro.uio.no> <4E845BAE.307@astro.uio.no> <4E8460E8.5050701@gmail.com> <588dc249-8f0b-49f2-bf42-23978ea95ddf@email.android.com> Message-ID: <4E8CDD05.2080102@canterbury.ac.nz> Robert Bradshaw wrote: > On this note, eventually I would like coerce structs (and unions, > enums) to auto-generated wrapper classes, visible in the Python module > namespace if one declares them as "cpdef struct ..." Would these wrapper classes contain a copy of the struct, or would they reference the struct? If they reference it, there would be issues with the lifetime of the referenced data. -- Greg From robertwb at math.washington.edu Thu Oct 6 02:05:21 2011 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 5 Oct 2011 17:05:21 -0700 Subject: [Cython] Utilities, cython.h, libcython In-Reply-To: References: Message-ID: On Wednesday, October 5, 2011, mark florisson wrote: > On 5 October 2011 01:46, Robert Bradshaw > > wrote: > > On Tue, Oct 4, 2011 at 2:19 PM, mark florisson > > > wrote: > >> Hey, > >> > >> I briefly mentioned something about this in a pull request, but maybe > >> it deserves some actual discussion on the ML. > >> > >> So I propose that after fused types gets merged we try to move as many > >> utility codes as possible to their utility code files (unless they are > >> used in pending pull requests or other branches). Preferably this will > >> be done in one or a few commits. How should we split up the work, any > >> volunteers? Perhaps people who wrote certain utilities also want to > >> move them? In that case, we should start a new branch and then merge > >> that into master when it's done. > >> We could actually move things before fused types get merged, as long > >> as we don't touch binding_cfunc_utility_code. > > > > +1 to moving towards this, but I don't see the urgency or need to do > > it all at once (though if there's going to be a big push, lets > > coordinate on a wiki or trac). > > Hm, perhaps there is no strict need to hurry, as long as we take care > not to modify utilities after they have been moved. The wiki could be > great for that, but I personally don't keep track of everyone's > branches, so I don't know which utility is modified by whom (if at > all), so strictly speaking (to avoid painful merges) I'd have to ask > everyone each time I wanted to move something, or dig through > everyone's branches. > I was proposing that everyone lists the utility code sections that are likely to cause merge conflicts on a wiki page, and the rest are fair game. > >> Before we go there, Stefan, do we still want to implement the header > >> .ini style which can list dependencies and such? I personally don't > >> care very much about it, but memoryviews and the utility loaders are > >> merged so if someone wants to take up that job, it'd be good to do > >> before moving the utilities. > >> > >> Another issue is that Cython compile time is increasing with the > >> addition of control flow and cython utilities. If you use fused types > >> you're also going to combinatorially add more compile time. > > > > Yeah, this was especially obvious with, e.g. cython.compile(...). (In > > particular, some utility code was being parsed before it could even > > figure out whether it needed to do a full re-compile...) > > > >> I'm sure > >> this came up earlier, but I really think we should have a libcython > >> and a cython.h. libcython (a shared library) should contain any common > >> Cython-specific code not meant to be inlined, and cython.h any types, > >> macros and inline functions etc. This will decrease Cython and C > >> compile time, and will also make executables smaller. > > > > +1. Yes, we talked about this earlier, but nothing concrete was > > planned. It's probably worth a CEP, if anything to have a concrete > > plan recorded somewhere other than a series of mailing list threads > > (though discussion tends to work best here). > > > >> This could be > >> enabled using a command line option to Cython, as well as with > >> distutils, eventually we may decide to make it the default (lets > >> figure that out later). Preferably libcython.so would be installed > >> alongside libpython.so and cython.h inside the Python include > >> directory. Assuming multiple versions of Cython and multiple Python > >> installations, we'd need to come up with a versioning scheme for > >> either. > > > > I would propose a cython.h file that sits in Cython/Compiler/Include > > (or similar), as a first step. The .pyx -> .c pass could be configured > > to copy this to a specific location (for shipping just the generated > > .c files). > > That would be fine as well. It might be convenient for users in that > case if we could provide a cython.get_include() in addition to the > distutils hooks, and a cython-config script. > For sure. We could also have a cython.get_shared_library() (common_code? cython_module?) which would return an Extension object to build. > > One option is to build the shared library as a companion > > _cython_x_y_z.so module which, while not as efficient as linking at > > the C level, would probably be much simpler to implement in a > > cross-platform way. (This perhaps merits some benchmarks, but the main > > contents is likely to be things like shared classes and objects.) > > Actually linking .so files from modules that cimport each other would > > be a nice feature down the road anyways. Again, the associated .c file > > could be (optionally) generated/copied during the .pyx -> .c step. > > Installation would determine if the required module exists, and if not > > build and install it. > > Hm, that's a really good idea. I think the only overhead would be the > capsule unpacking and pointer duplication, but that shouldn't suddenly > be an issue. That means we don't have to do any versioning of the > libraries and the symbols to avoid clashes in a flat namespaces as > Stefan mentioned. > I'm not sure what the overhead is, if any, in calling function pointers vs. actually linking things together at the C level (which is essentially the same idea, but perhaps addresses are resolved at library load time rather than requiring a dereference on each call?) > >> We could also provide a static library there, for users who want to > >> link and ship a compiled and statically linked version of their code. > >> For a local Cython that isn't built, we can ignore the header and > >> shared library option and issue a warning or some such. > >> > >> Lastly, I think we also should figure out a way to serialize Entry > >> objects from CythonUtilities, which could easily and swiftly be loaded > >> when creating the cython scope. It's quite a pain to declare all > >> entries for utilities you write manually, so what I mostly did was > >> parse the utility up to and including AnalyseDeclarationsTransform, > >> and then retrieve the entries from there. > > > > This would be really nice too. Way back in the day I did some work > > with trying to pickle full module scopes, but that soon became too > > painful as there are so many far-reaching references. Pickling > > individual Entries and re-building modules will probably be a more > > tractable goal. Eventually, I'd like to see a way to cache the full > > pxd pipeline. > > > > - Robert > > _______________________________________________ > > cython-devel mailing list > > cython-devel at python.org > > http://mail.python.org/mailman/listinfo/cython-devel > > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertwb at math.washington.edu Thu Oct 6 02:05:21 2011 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 5 Oct 2011 17:05:21 -0700 Subject: [Cython] Utilities, cython.h, libcython In-Reply-To: References: <4E8C0448.6010204@behnel.de> Message-ID: On Wednesday, October 5, 2011, mark florisson wrote: > On 5 October 2011 08:16, Stefan Behnel > > wrote: > > mark florisson, 04.10.2011 23:19: > >> > >> So I propose that after fused types gets merged we try to move as many > >> utility codes as possible to their utility code files (unless they are > >> used in pending pull requests or other branches). Preferably this will > >> be done in one or a few commits. How should we split up the work > > > > I would propose that new utility code gets moved out into utility files > > right away (if doable, given the current state of the infrastructure), > and > > that existing utility code gets moves when it gets modified or when > someone > > feels like it. Until we really get to the point of wanting to create a > > separate shared library etc., there's no need to hurry with the move. > > > > > >> We could actually move things before fused types get merged, as long > >> as we don't touch binding_cfunc_utility_code. > > > > Another reason not to hurry, right? > > > > > >> Before we go there, Stefan, do we still want to implement the header > >> .ini style which can list dependencies and such? > > > > I think we'll eventually need that, but that also depends a bit on the > > question whether we want to (or can) build a shared library or not. See > > below. > > > > > >> Another issue is that Cython compile time is increasing with the > >> addition of control flow and cython utilities. If you use fused types > >> you're also going to combinatorially add more compile time. > > > > I don't see that locally - a compiled Cython is hugely fast for me. In > > comparison, the C compiler literally takes ages to compile the result. An > > external shared library may or may not help with both - in particular, it > is > > not clear to me what makes the C compiler slow. If the compile time is > > dominated by the number of inlined functions (which is not unlikely), a > > shared library + header file will not make a difference. > > > > Have you tried with the memoryviews merged? e.g. if I have this code: > > from libc.stdlib cimport malloc > cdef int[:] slice = malloc(sizeof(int) * 10) > > [0] [14:45] ~ ? time cython test.pyx > cython test.pyx 2.61s user 0.08s system 99% cpu 2.695 total > [0] [14:45] ~ ? time zsh compile > zsh compile 1.88s user 0.06s system 99% cpu 1.946 total > > where 'compile' is the script that invoked the same gcc command > distutils uses. As you can see it took more than 2.5 seconds to > compile this code (simply because the memoryview utilities get > included). The C compiler does it quite a lot faster here. This > obviously depends largely on your code, you get probably have it the > other way around as well. > Anything we can do to cache/dedupe things here would be great. > >> I'm sure > >> this came up earlier, but I really think we should have a libcython > >> and a cython.h. libcython (a shared library) should contain any common > >> Cython-specific code not meant to be inlined, and cython.h any types, > >> macros and inline functions etc. > > > > This has a couple of implications though. In order to support this on the > > user side, we have to build one shared library per installed package in > > order to avoid any Cython versioning issues. Just installing a versioned > > "libcython_x.y.z.so" globally isn't enough, especially during > development, > > but also at deployment time. Different packages may use different CFLAGS > or > > Cython options, which may have an impact on the result. Encoding all > > possible factors in the file name will be cumbersome and may mean that we > > still end up with a number of installed Cython libraries that correlates > > with the number of installed Cython based packages. > > Hm, I think the CFLAGS are important so long as they are compatible > with Python. When the user compiles a Cython extension module with > extra CFLAGS, this doesn't affect libpython. Similarly, the Cython > utilities are really not the user's responsibility, so libcython > doesn't need to be compiled with the same flags as the extension > module. If still wanted, the user could either recompile python with > different CFLAGS (which means libcython will get those as well), or > not use libcython at all. CFLAGS should really only pertain to user > code, not to the Cython library, which the user shouldn't be concerned > about. > > > Next, we may not know at build time which set of Cython modules is in the > > package. This may be less of an issue if we rely on "cythonize()" in > > setup.py to compile all modules before hand (assuming that the user > doesn't > > call it twice, once for *.pyx, once for *.py, for example), but even if > we > > know all modules, we'd still have to figure out the complete set of > utility > > code used by all modules in order to build an adapted library with only > the > > necessary code used in the package. So we'd always end up with a complete > > library with all utility code, which is only really interesting for > larger > > packages with several Cython modules. > > I agree with Robert that a CEP would be needed for this, both for > clearing > > up the implications and actual use cases (I know that Sage is a > reasonable > > use case, but it's also a rather special case). > > > > > >> This will decrease Cython and C > >> compile time, and will also make executables smaller. > > > > I don't see how this actually impacts executables. However, a > self-contained > > executable is a value in itself. > > > > > >> This could be > >> enabled using a command line option to Cython, as well as with > >> distutils, eventually we may decide to make it the default (lets > >> figure that out later). Preferably libcython.so would be installed > >> alongside libpython.so and cython.h inside the Python include > >> directory. > > > > I don't see this happening. It's easy for Python (there is only one > Python > > running at a time, with one libpython loaded), but it's a lot less safe > for > > different versions of a Cython library that are used by different modules > > inside of the running Python. For example, we'd have to version all > visible > > symbols in operating systems with flat namespaces, in order to support > > loading multiple versions of the library. > > > > > >> Lastly, I think we also should figure out a way to serialize Entry > >> objects from CythonUtilities, which could easily and swiftly be loaded > >> when creating the cython scope. It's quite a pain to declare all > >> entries for utilities you write manually > > > > Why would you declare them manually? I thought everything would be moved > out > > into the utility code files? > > > > Right, the code is in the utility files. However, the cython scope > needs to have the entries of the classes and functions of the > utilities. e.g. the user may write > > cimport cython > > cdef cython.array myobject > > For this to work, we need an 'array' entry, which we don't have yet, > as the utility code will be parsed at code generation time if an entry > of that utility code (which doesn't exist yet!) is used. > > >> so what I mostly did was > >> parse the utility up to and including AnalyseDeclarationsTransform, > >> and then retrieve the entries from there. > > > > Sounds like a drawback regarding the processing time, but may still be a > > reasonable way to do it. I would expect that it won't be hard to pickle > the > > resulting dict of entries into a cache file and rebuild it only when one > of > > the utility files changes. > > Exactly. I'm not sure about pickle though, but the details don't > matter. Pickle is certainly easy as long as you don't change your > interface (which we most certainly will, though). > > We can version the cache to handle this. - Robert -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Thu Oct 6 08:46:51 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 06 Oct 2011 08:46:51 +0200 Subject: [Cython] Utilities, cython.h, libcython In-Reply-To: References: <4E8C0448.6010204@behnel.de> Message-ID: <4E8D4EDB.2090009@behnel.de> mark florisson, 05.10.2011 15:53: > On 5 October 2011 08:16, Stefan Behnel wrote: >> mark florisson, 04.10.2011 23:19: >>> Another issue is that Cython compile time is increasing with the >>> addition of control flow and cython utilities. If you use fused types >>> you're also going to combinatorially add more compile time. >> >> I don't see that locally - a compiled Cython is hugely fast for me. In >> comparison, the C compiler literally takes ages to compile the result. An >> external shared library may or may not help with both - in particular, it is >> not clear to me what makes the C compiler slow. If the compile time is >> dominated by the number of inlined functions (which is not unlikely), a >> shared library + header file will not make a difference. > > Have you tried with the memoryviews merged? No. I didn't expect the difference to be quite that large. > e.g. if I have this code: > > from libc.stdlib cimport malloc > cdef int[:] slice = malloc(sizeof(int) * 10) > > [0] [14:45] ~ ? time cython test.pyx > cython test.pyx 2.61s user 0.08s system 99% cpu 2.695 total > [0] [14:45] ~ ? time zsh compile > zsh compile 1.88s user 0.06s system 99% cpu 1.946 total > > where 'compile' is the script that invoked the same gcc command > distutils uses. As you can see it took more than 2.5 seconds to > compile this code (simply because the memoryview utilities get > included). Ok, that hints at serious performance problems. Could you profile it to see where the issues are? Is it more that the code is loaded from an external file? Or the fact that more utility code is parsed than necessary? It's certainly not obvious why the inclusion of static code, even from an external file, should make any difference. That being said, it's not we were lacking the infrastructure for making Python code run faster ... >>> I'm sure >>> this came up earlier, but I really think we should have a libcython >>> and a cython.h. libcython (a shared library) should contain any common >>> Cython-specific code not meant to be inlined, and cython.h any types, >>> macros and inline functions etc. >> >> This has a couple of implications though. In order to support this on the >> user side, we have to build one shared library per installed package in >> order to avoid any Cython versioning issues. Just installing a versioned >> "libcython_x.y.z.so" globally isn't enough, especially during development, >> but also at deployment time. Different packages may use different CFLAGS or >> Cython options, which may have an impact on the result. Encoding all >> possible factors in the file name will be cumbersome and may mean that we >> still end up with a number of installed Cython libraries that correlates >> with the number of installed Cython based packages. > > Hm, I think the CFLAGS are important so long as they are compatible > with Python. When the user compiles a Cython extension module with > extra CFLAGS, this doesn't affect libpython. Similarly, the Cython > utilities are really not the user's responsibility, so libcython > doesn't need to be compiled with the same flags as the extension > module. If still wanted, the user could either recompile python with > different CFLAGS (which means libcython will get those as well), or > not use libcython at all. CFLAGS should really only pertain to user > code, not to the Cython library, which the user shouldn't be concerned > about. Well, it's either the user or the OS distribution that installs (and potentially builds) the libraries. That already makes it two responsible entities for many systems that have to agree on what gets installed in what way. I'm just saying, don't underestimate the details in world wide deployments. Stefan From robertwb at math.washington.edu Thu Oct 6 09:50:17 2011 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Thu, 6 Oct 2011 00:50:17 -0700 Subject: [Cython] [cython-users] Re: callback function pointer problem In-Reply-To: <4E8CDD05.2080102@canterbury.ac.nz> References: <4E835336.1060800@gmail.com> <4E8398B5.6050905@gmail.com> <4E8421D0.5010007@gmail.com> <4E844BD0.4040207@gmail.com> <4E845B24.6060102@astro.uio.no> <4E845BAE.307@astro.uio.no> <4E8460E8.5050701@gmail.com> <588dc249-8f0b-49f2-bf42-23978ea95ddf@email.android.com> <4E8CDD05.2080102@canterbury.ac.nz> Message-ID: On Wed, Oct 5, 2011 at 3:41 PM, Greg Ewing wrote: > Robert Bradshaw wrote: > >> On this note, eventually I would like coerce structs (and unions, >> enums) to auto-generated wrapper classes, visible in the Python module >> namespace if one declares them as "cpdef struct ..." > > Would these wrapper classes contain a copy of the struct, > or would they reference the struct? If they reference it, > there would be issues with the lifetime of the referenced > data. They'd contain a copy, which I also think would match expectations better as well. - Robert From markflorisson88 at gmail.com Thu Oct 6 11:45:55 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 6 Oct 2011 10:45:55 +0100 Subject: [Cython] Utilities, cython.h, libcython In-Reply-To: References: Message-ID: On 6 October 2011 01:05, Robert Bradshaw wrote: > On Wednesday, October 5, 2011, mark florisson wrote: >> >> On 5 October 2011 01:46, Robert Bradshaw >> wrote: >> > On Tue, Oct 4, 2011 at 2:19 PM, mark florisson >> > wrote: >> >> Hey, >> >> >> >> I briefly mentioned something about this in a pull request, but maybe >> >> it deserves some actual discussion on the ML. >> >> >> >> So I propose that after fused types gets merged we try to move as many >> >> utility codes as possible to their utility code files (unless they are >> >> used in pending pull requests or other branches). Preferably this will >> >> be done in one or a few commits. How should we split up the work, any >> >> volunteers? Perhaps people who wrote certain utilities also want to >> >> move them? In that case, we should start a new branch and then merge >> >> that into master when it's done. >> >> We could actually move things before fused types get merged, as long >> >> as we don't touch binding_cfunc_utility_code. >> > >> > +1 to moving towards this, but I don't see the urgency or need to do >> > it all at once (though if there's going to be a big push, lets >> > coordinate on a wiki or trac). >> >> Hm, perhaps there is no strict need to hurry, as long as we take care >> not to modify utilities after they have been moved. The wiki could be >> great for that, but I personally don't keep track of everyone's >> branches, so I don't know which utility is modified by whom (if at >> all), so strictly speaking (to avoid painful merges) I'd have to ask >> everyone each time I wanted to move something, or dig through >> everyone's branches. > > I was proposing that everyone lists the utility code sections that are > likely to cause merge conflicts on a wiki page, and the rest are fair game. Ah ok, that sounds good. >> >> >> Before we go there, Stefan, do we still want to implement the header >> >> .ini style which can list dependencies and such? I personally don't >> >> care very much about it, but memoryviews and the utility loaders are >> >> merged so if someone wants to take up that job, it'd be good to do >> >> before moving the utilities. >> >> >> >> Another issue is that Cython compile time is increasing with the >> >> addition of control flow and cython utilities. If you use fused types >> >> you're also going to combinatorially add more compile time. >> > >> > Yeah, this was especially obvious with, e.g. cython.compile(...). (In >> > particular, some utility code was being parsed before it could even >> > figure out whether it needed to do a full re-compile...) >> > >> >> I'm sure >> >> this came up earlier, but I really think we should have a libcython >> >> and a cython.h. libcython (a shared library) should contain any common >> >> Cython-specific code not meant to be inlined, and cython.h any types, >> >> macros and inline functions etc. This will decrease Cython and C >> >> compile time, and will also make executables smaller. >> > >> > +1. Yes, we talked about this earlier, but nothing concrete was >> > planned. It's probably worth a CEP, if anything to have a concrete >> > plan recorded somewhere other than a series of mailing list threads >> > (though discussion tends to work best here). >> > >> >> This could be >> >> enabled using a command line option to Cython, as well as with >> >> distutils, eventually we may decide to make it the default (lets >> >> figure that out later). Preferably libcython.so would be installed >> >> alongside libpython.so and cython.h inside the Python include >> >> directory. Assuming multiple versions of Cython and multiple Python >> >> installations, we'd need to come up with a versioning scheme for >> >> either. >> > >> > I would propose a cython.h file that sits in Cython/Compiler/Include >> > (or similar), as a first step. The .pyx -> .c pass could be configured >> > to copy this to a specific location (for shipping just the generated >> > .c files). >> >> That would be fine as well. It might be convenient for users in that >> case if we could provide a cython.get_include() in addition to the >> distutils hooks, and a cython-config script. > > For sure. We could also have a cython.get_shared_library() (common_code? > cython_module?) which would return an Extension object to build. > >> >> > One option is to build the shared library as a companion >> > _cython_x_y_z.so module which, while not as efficient as linking at >> > the C level, would probably be much simpler to implement in a >> > cross-platform way. (This perhaps merits some benchmarks, but the main >> > contents is likely to be things like shared classes and objects.) >> > Actually linking .so files from modules that cimport each other would >> > be a nice feature down the road anyways. Again, the associated .c file >> > could be (optionally) generated/copied during the .pyx -> .c step. >> > Installation would determine if the required module exists, and if not >> > build and install it. >> >> Hm, that's a really good idea. I think the only overhead would be the >> capsule unpacking and pointer duplication, but that shouldn't suddenly >> be an issue. That means we don't have to do any versioning of the >> libraries and the symbols to avoid clashes in a flat namespaces as >> Stefan mentioned. > > I'm not sure what the overhead is, if any, in calling function pointers vs. > actually linking things together at the C level (which is essentially the > same idea, but perhaps addresses are resolved at library load time rather > than requiring a dereference on each call?) I think there isn't any difference with dynamic linking and having a pointer. My understanding (of ELF shared libraries) is that the procedure lookup table will contain the actual address of the symbol (likely after the first reference to it has been made, it may have a stub that resolves the symbol and replaces it's own address with the actual address), which to me sounds like the same thing as a pointer. I think only static linking can prevent this, i.e. directly encode the static address into the call opcode, but I'm not an expert. >> >> >> We could also provide a static library there, for users who want to >> >> link and ship a compiled and statically linked version of their code. >> >> For a local Cython that isn't built, we can ignore the header and >> >> shared library option and issue a warning or some such. >> >> >> >> Lastly, I think we also should figure out a way to serialize Entry >> >> objects from CythonUtilities, which could easily and swiftly be loaded >> >> when creating the cython scope. It's quite a pain to declare all >> >> entries for utilities you write manually, so what I mostly did was >> >> parse the utility up to and including AnalyseDeclarationsTransform, >> >> and then retrieve the entries from there. >> > >> > This would be really nice too. Way back in the day I did some work >> > with trying to pickle full module scopes, but that soon became too >> > painful as there are so many far-reaching references. Pickling >> > individual Entries and re-building modules will probably be a more >> > tractable goal. Eventually, I'd like to see a way to cache the full >> > pxd pipeline. >> > >> > - Robert >> > _______________________________________________ >> > cython-devel mailing list >> > cython-devel at python.org >> > http://mail.python.org/mailman/listinfo/cython-devel >> > >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > > From markflorisson88 at gmail.com Thu Oct 6 11:46:20 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 6 Oct 2011 10:46:20 +0100 Subject: [Cython] Utilities, cython.h, libcython In-Reply-To: <4E8D4EDB.2090009@behnel.de> References: <4E8C0448.6010204@behnel.de> <4E8D4EDB.2090009@behnel.de> Message-ID: On 6 October 2011 07:46, Stefan Behnel wrote: > mark florisson, 05.10.2011 15:53: >> >> On 5 October 2011 08:16, Stefan Behnel wrote: >>> >>> mark florisson, 04.10.2011 23:19: >>>> >>>> Another issue is that Cython compile time is increasing with the >>>> addition of control flow and cython utilities. If you use fused types >>>> you're also going to combinatorially add more compile time. >>> >>> I don't see that locally - a compiled Cython is hugely fast for me. In >>> comparison, the C compiler literally takes ages to compile the result. An >>> external shared library may or may not help with both - in particular, it >>> is >>> not clear to me what makes the C compiler slow. If the compile time is >>> dominated by the number of inlined functions (which is not unlikely), a >>> shared library + header file will not make a difference. >> >> Have you tried with the memoryviews merged? > > No. I didn't expect the difference to be quite that large. > > >> e.g. if I have this code: >> >> from libc.stdlib cimport malloc >> cdef int[:] slice = ? ?malloc(sizeof(int) * 10) >> >> [0] [14:45] ~ ?? time cython test.pyx >> cython test.pyx ?2.61s user 0.08s system 99% cpu 2.695 total >> [0] [14:45] ~ ?? time zsh compile >> zsh compile ?1.88s user 0.06s system 99% cpu 1.946 total >> >> where 'compile' is the script that invoked the same gcc command >> distutils uses. ?As you can see it took more than 2.5 seconds to >> compile this code (simply because the memoryview utilities get >> included). > > Ok, that hints at serious performance problems. Could you profile it to see > where the issues are? Is it more that the code is loaded from an external > file? Or the fact that more utility code is parsed than necessary? I haven't profiled it yet (I'll do that), but I'm fairly sure it's the parsing of Cython utility files (not the loading). Maybe Tempita also adds to the overhead, I'll find out. > It's certainly not obvious why the inclusion of static code, even from an > external file, should make any difference. > > That being said, it's not we were lacking the infrastructure for making > Python code run faster ... > Heh, indeed. In this case I think caching will solve all our problems. >>>> I'm sure >>>> this came up earlier, but I really think we should have a libcython >>>> and a cython.h. libcython (a shared library) should contain any common >>>> Cython-specific code not meant to be inlined, and cython.h any types, >>>> macros and inline functions etc. >>> >>> This has a couple of implications though. In order to support this on the >>> user side, we have to build one shared library per installed package in >>> order to avoid any Cython versioning issues. Just installing a versioned >>> "libcython_x.y.z.so" globally isn't enough, especially during >>> development, >>> but also at deployment time. Different packages may use different CFLAGS >>> or >>> Cython options, which may have an impact on the result. Encoding all >>> possible factors in the file name will be cumbersome and may mean that we >>> still end up with a number of installed Cython libraries that correlates >>> with the number of installed Cython based packages. >> >> Hm, I think the CFLAGS are important so long as they are compatible >> with Python. When the user compiles a Cython extension module with >> extra CFLAGS, this doesn't affect libpython. Similarly, the Cython >> utilities are really not the user's responsibility, so libcython >> doesn't need to be compiled with the same flags as the extension >> module. If still wanted, the user could either recompile python with >> different CFLAGS (which means libcython will get those as well), or >> not use libcython at all. CFLAGS should really only pertain to user >> code, not to the Cython library, which the user shouldn't be concerned >> about. > > Well, it's either the user or the OS distribution that installs (and > potentially builds) the libraries. That already makes it two responsible > entities for many systems that have to agree on what gets installed in what > way. I'm just saying, don't underestimate the details in world wide > deployments. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From vitja.makarov at gmail.com Thu Oct 6 22:56:43 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Fri, 7 Oct 2011 00:56:43 +0400 Subject: [Cython] Utilities, cython.h, libcython In-Reply-To: References: <4E8C0448.6010204@behnel.de> <4E8D4EDB.2090009@behnel.de> Message-ID: 2011/10/6 mark florisson : > On 6 October 2011 07:46, Stefan Behnel wrote: >> mark florisson, 05.10.2011 15:53: >>> >>> On 5 October 2011 08:16, Stefan Behnel wrote: >>>> >>>> mark florisson, 04.10.2011 23:19: >>>>> >>>>> Another issue is that Cython compile time is increasing with the >>>>> addition of control flow and cython utilities. If you use fused types >>>>> you're also going to combinatorially add more compile time. >>>> >>>> I don't see that locally - a compiled Cython is hugely fast for me. In >>>> comparison, the C compiler literally takes ages to compile the result. An >>>> external shared library may or may not help with both - in particular, it >>>> is >>>> not clear to me what makes the C compiler slow. If the compile time is >>>> dominated by the number of inlined functions (which is not unlikely), a >>>> shared library + header file will not make a difference. >>> >>> Have you tried with the memoryviews merged? >> >> No. I didn't expect the difference to be quite that large. >> >> >>> e.g. if I have this code: >>> >>> from libc.stdlib cimport malloc >>> cdef int[:] slice = ? ?malloc(sizeof(int) * 10) >>> >>> [0] [14:45] ~ ?? time cython test.pyx >>> cython test.pyx ?2.61s user 0.08s system 99% cpu 2.695 total >>> [0] [14:45] ~ ?? time zsh compile >>> zsh compile ?1.88s user 0.06s system 99% cpu 1.946 total >>> >>> where 'compile' is the script that invoked the same gcc command >>> distutils uses. ?As you can see it took more than 2.5 seconds to >>> compile this code (simply because the memoryview utilities get >>> included). >> >> Ok, that hints at serious performance problems. Could you profile it to see >> where the issues are? Is it more that the code is loaded from an external >> file? Or the fact that more utility code is parsed than necessary? > > I haven't profiled it yet (I'll do that), but I'm fairly sure it's the > parsing of Cython utility files (not the loading). Maybe Tempita also > adds to the overhead, I'll find out. > Compiling this regex gives 5ms instead of 10ms on my machine https://github.com/cython/cython/blob/master/Cython/Compiler/Code.py#L85 And on your example gives 3% speedup >> It's certainly not obvious why the inclusion of static code, even from an >> external file, should make any difference. >> >> That being said, it's not we were lacking the infrastructure for making >> Python code run faster ... >> > > Heh, indeed. In this case I think caching will solve all our problems. > >>>>> I'm sure >>>>> this came up earlier, but I really think we should have a libcython >>>>> and a cython.h. libcython (a shared library) should contain any common >>>>> Cython-specific code not meant to be inlined, and cython.h any types, >>>>> macros and inline functions etc. >>>> >>>> This has a couple of implications though. In order to support this on the >>>> user side, we have to build one shared library per installed package in >>>> order to avoid any Cython versioning issues. Just installing a versioned >>>> "libcython_x.y.z.so" globally isn't enough, especially during >>>> development, >>>> but also at deployment time. Different packages may use different CFLAGS >>>> or >>>> Cython options, which may have an impact on the result. Encoding all >>>> possible factors in the file name will be cumbersome and may mean that we >>>> still end up with a number of installed Cython libraries that correlates >>>> with the number of installed Cython based packages. >>> >>> Hm, I think the CFLAGS are important so long as they are compatible >>> with Python. When the user compiles a Cython extension module with >>> extra CFLAGS, this doesn't affect libpython. Similarly, the Cython >>> utilities are really not the user's responsibility, so libcython >>> doesn't need to be compiled with the same flags as the extension >>> module. If still wanted, the user could either recompile python with >>> different CFLAGS (which means libcython will get those as well), or >>> not use libcython at all. CFLAGS should really only pertain to user >>> code, not to the Cython library, which the user shouldn't be concerned >>> about. >> >> Well, it's either the user or the OS distribution that installs (and >> potentially builds) the libraries. That already makes it two responsible >> entities for many systems that have to agree on what gets installed in what >> way. I'm just saying, don't underestimate the details in world wide >> deployments. >> -- vitja. From markflorisson88 at gmail.com Thu Oct 6 23:02:24 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 6 Oct 2011 22:02:24 +0100 Subject: [Cython] Utilities, cython.h, libcython In-Reply-To: References: <4E8C0448.6010204@behnel.de> <4E8D4EDB.2090009@behnel.de> Message-ID: On 6 October 2011 21:56, Vitja Makarov wrote: > 2011/10/6 mark florisson : >> On 6 October 2011 07:46, Stefan Behnel wrote: >>> mark florisson, 05.10.2011 15:53: >>>> >>>> On 5 October 2011 08:16, Stefan Behnel wrote: >>>>> >>>>> mark florisson, 04.10.2011 23:19: >>>>>> >>>>>> Another issue is that Cython compile time is increasing with the >>>>>> addition of control flow and cython utilities. If you use fused types >>>>>> you're also going to combinatorially add more compile time. >>>>> >>>>> I don't see that locally - a compiled Cython is hugely fast for me. In >>>>> comparison, the C compiler literally takes ages to compile the result. An >>>>> external shared library may or may not help with both - in particular, it >>>>> is >>>>> not clear to me what makes the C compiler slow. If the compile time is >>>>> dominated by the number of inlined functions (which is not unlikely), a >>>>> shared library + header file will not make a difference. >>>> >>>> Have you tried with the memoryviews merged? >>> >>> No. I didn't expect the difference to be quite that large. >>> >>> >>>> e.g. if I have this code: >>>> >>>> from libc.stdlib cimport malloc >>>> cdef int[:] slice = ? ?malloc(sizeof(int) * 10) >>>> >>>> [0] [14:45] ~ ?? time cython test.pyx >>>> cython test.pyx ?2.61s user 0.08s system 99% cpu 2.695 total >>>> [0] [14:45] ~ ?? time zsh compile >>>> zsh compile ?1.88s user 0.06s system 99% cpu 1.946 total >>>> >>>> where 'compile' is the script that invoked the same gcc command >>>> distutils uses. ?As you can see it took more than 2.5 seconds to >>>> compile this code (simply because the memoryview utilities get >>>> included). >>> >>> Ok, that hints at serious performance problems. Could you profile it to see >>> where the issues are? Is it more that the code is loaded from an external >>> file? Or the fact that more utility code is parsed than necessary? >> >> I haven't profiled it yet (I'll do that), but I'm fairly sure it's the >> parsing of Cython utility files (not the loading). Maybe Tempita also >> adds to the overhead, I'll find out. >> > > Compiling this regex gives 5ms instead of 10ms on my machine > > https://github.com/cython/cython/blob/master/Cython/Compiler/Code.py#L85 > > And on your example gives 3% speedup > Sorry, which code gets you 10ms? Also, is this about loading + regex matching, or just about compiling the pattern? In any case, libcython would solve these issues. Profiling will still be useful though. >>> It's certainly not obvious why the inclusion of static code, even from an >>> external file, should make any difference. >>> >>> That being said, it's not we were lacking the infrastructure for making >>> Python code run faster ... >>> >> >> Heh, indeed. In this case I think caching will solve all our problems. >> >>>>>> I'm sure >>>>>> this came up earlier, but I really think we should have a libcython >>>>>> and a cython.h. libcython (a shared library) should contain any common >>>>>> Cython-specific code not meant to be inlined, and cython.h any types, >>>>>> macros and inline functions etc. >>>>> >>>>> This has a couple of implications though. In order to support this on the >>>>> user side, we have to build one shared library per installed package in >>>>> order to avoid any Cython versioning issues. Just installing a versioned >>>>> "libcython_x.y.z.so" globally isn't enough, especially during >>>>> development, >>>>> but also at deployment time. Different packages may use different CFLAGS >>>>> or >>>>> Cython options, which may have an impact on the result. Encoding all >>>>> possible factors in the file name will be cumbersome and may mean that we >>>>> still end up with a number of installed Cython libraries that correlates >>>>> with the number of installed Cython based packages. >>>> >>>> Hm, I think the CFLAGS are important so long as they are compatible >>>> with Python. When the user compiles a Cython extension module with >>>> extra CFLAGS, this doesn't affect libpython. Similarly, the Cython >>>> utilities are really not the user's responsibility, so libcython >>>> doesn't need to be compiled with the same flags as the extension >>>> module. If still wanted, the user could either recompile python with >>>> different CFLAGS (which means libcython will get those as well), or >>>> not use libcython at all. CFLAGS should really only pertain to user >>>> code, not to the Cython library, which the user shouldn't be concerned >>>> about. >>> >>> Well, it's either the user or the OS distribution that installs (and >>> potentially builds) the libraries. That already makes it two responsible >>> entities for many systems that have to agree on what gets installed in what >>> way. I'm just saying, don't underestimate the details in world wide >>> deployments. >>> > > > > -- > vitja. > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From vitja.makarov at gmail.com Thu Oct 6 23:07:22 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Fri, 7 Oct 2011 01:07:22 +0400 Subject: [Cython] Utilities, cython.h, libcython In-Reply-To: References: <4E8C0448.6010204@behnel.de> <4E8D4EDB.2090009@behnel.de> Message-ID: 2011/10/7 mark florisson : > On 6 October 2011 21:56, Vitja Makarov wrote: >> 2011/10/6 mark florisson : >>> On 6 October 2011 07:46, Stefan Behnel wrote: >>>> mark florisson, 05.10.2011 15:53: >>>>> >>>>> On 5 October 2011 08:16, Stefan Behnel wrote: >>>>>> >>>>>> mark florisson, 04.10.2011 23:19: >>>>>>> >>>>>>> Another issue is that Cython compile time is increasing with the >>>>>>> addition of control flow and cython utilities. If you use fused types >>>>>>> you're also going to combinatorially add more compile time. >>>>>> >>>>>> I don't see that locally - a compiled Cython is hugely fast for me. In >>>>>> comparison, the C compiler literally takes ages to compile the result. An >>>>>> external shared library may or may not help with both - in particular, it >>>>>> is >>>>>> not clear to me what makes the C compiler slow. If the compile time is >>>>>> dominated by the number of inlined functions (which is not unlikely), a >>>>>> shared library + header file will not make a difference. >>>>> >>>>> Have you tried with the memoryviews merged? >>>> >>>> No. I didn't expect the difference to be quite that large. >>>> >>>> >>>>> e.g. if I have this code: >>>>> >>>>> from libc.stdlib cimport malloc >>>>> cdef int[:] slice = ? ?malloc(sizeof(int) * 10) >>>>> >>>>> [0] [14:45] ~ ?? time cython test.pyx >>>>> cython test.pyx ?2.61s user 0.08s system 99% cpu 2.695 total >>>>> [0] [14:45] ~ ?? time zsh compile >>>>> zsh compile ?1.88s user 0.06s system 99% cpu 1.946 total >>>>> >>>>> where 'compile' is the script that invoked the same gcc command >>>>> distutils uses. ?As you can see it took more than 2.5 seconds to >>>>> compile this code (simply because the memoryview utilities get >>>>> included). >>>> >>>> Ok, that hints at serious performance problems. Could you profile it to see >>>> where the issues are? Is it more that the code is loaded from an external >>>> file? Or the fact that more utility code is parsed than necessary? >>> >>> I haven't profiled it yet (I'll do that), but I'm fairly sure it's the >>> parsing of Cython utility files (not the loading). Maybe Tempita also >>> adds to the overhead, I'll find out. >>> >> >> Compiling this regex gives 5ms instead of 10ms on my machine >> >> https://github.com/cython/cython/blob/master/Cython/Compiler/Code.py#L85 >> >> And on your example gives 3% speedup >> > > Sorry, which code gets you 10ms? Also, is this about loading + regex > matching, or just about compiling the pattern? > I've added decorator for load_utilities_from_file that prints time for current call and total sum for this function and total gives 10ms. Btw that's not that much. > In any case, libcython would solve these issues. Profiling will still > be useful though. > >>>> It's certainly not obvious why the inclusion of static code, even from an >>>> external file, should make any difference. >>>> >>>> That being said, it's not we were lacking the infrastructure for making >>>> Python code run faster ... >>>> >>> >>> Heh, indeed. In this case I think caching will solve all our problems. >>> >>>>>>> I'm sure >>>>>>> this came up earlier, but I really think we should have a libcython >>>>>>> and a cython.h. libcython (a shared library) should contain any common >>>>>>> Cython-specific code not meant to be inlined, and cython.h any types, >>>>>>> macros and inline functions etc. >>>>>> >>>>>> This has a couple of implications though. In order to support this on the >>>>>> user side, we have to build one shared library per installed package in >>>>>> order to avoid any Cython versioning issues. Just installing a versioned >>>>>> "libcython_x.y.z.so" globally isn't enough, especially during >>>>>> development, >>>>>> but also at deployment time. Different packages may use different CFLAGS >>>>>> or >>>>>> Cython options, which may have an impact on the result. Encoding all >>>>>> possible factors in the file name will be cumbersome and may mean that we >>>>>> still end up with a number of installed Cython libraries that correlates >>>>>> with the number of installed Cython based packages. >>>>> >>>>> Hm, I think the CFLAGS are important so long as they are compatible >>>>> with Python. When the user compiles a Cython extension module with >>>>> extra CFLAGS, this doesn't affect libpython. Similarly, the Cython >>>>> utilities are really not the user's responsibility, so libcython >>>>> doesn't need to be compiled with the same flags as the extension >>>>> module. If still wanted, the user could either recompile python with >>>>> different CFLAGS (which means libcython will get those as well), or >>>>> not use libcython at all. CFLAGS should really only pertain to user >>>>> code, not to the Cython library, which the user shouldn't be concerned >>>>> about. >>>> >>>> Well, it's either the user or the OS distribution that installs (and >>>> potentially builds) the libraries. That already makes it two responsible >>>> entities for many systems that have to agree on what gets installed in what >>>> way. I'm just saying, don't underestimate the details in world wide >>>> deployments. >>>> >> >> >> >> -- >> vitja. >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > -- vitja. From vitja.makarov at gmail.com Thu Oct 6 23:12:10 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Fri, 7 Oct 2011 01:12:10 +0400 Subject: [Cython] Utilities, cython.h, libcython In-Reply-To: References: <4E8C0448.6010204@behnel.de> <4E8D4EDB.2090009@behnel.de> Message-ID: 2011/10/7 Vitja Makarov : > 2011/10/7 mark florisson : >> On 6 October 2011 21:56, Vitja Makarov wrote: >>> 2011/10/6 mark florisson : >>>> On 6 October 2011 07:46, Stefan Behnel wrote: >>>>> mark florisson, 05.10.2011 15:53: >>>>>> >>>>>> On 5 October 2011 08:16, Stefan Behnel wrote: >>>>>>> >>>>>>> mark florisson, 04.10.2011 23:19: >>>>>>>> >>>>>>>> Another issue is that Cython compile time is increasing with the >>>>>>>> addition of control flow and cython utilities. If you use fused types >>>>>>>> you're also going to combinatorially add more compile time. >>>>>>> >>>>>>> I don't see that locally - a compiled Cython is hugely fast for me. In >>>>>>> comparison, the C compiler literally takes ages to compile the result. An >>>>>>> external shared library may or may not help with both - in particular, it >>>>>>> is >>>>>>> not clear to me what makes the C compiler slow. If the compile time is >>>>>>> dominated by the number of inlined functions (which is not unlikely), a >>>>>>> shared library + header file will not make a difference. >>>>>> >>>>>> Have you tried with the memoryviews merged? >>>>> >>>>> No. I didn't expect the difference to be quite that large. >>>>> >>>>> >>>>>> e.g. if I have this code: >>>>>> >>>>>> from libc.stdlib cimport malloc >>>>>> cdef int[:] slice = ? ?malloc(sizeof(int) * 10) >>>>>> >>>>>> [0] [14:45] ~ ?? time cython test.pyx >>>>>> cython test.pyx ?2.61s user 0.08s system 99% cpu 2.695 total >>>>>> [0] [14:45] ~ ?? time zsh compile >>>>>> zsh compile ?1.88s user 0.06s system 99% cpu 1.946 total >>>>>> >>>>>> where 'compile' is the script that invoked the same gcc command >>>>>> distutils uses. ?As you can see it took more than 2.5 seconds to >>>>>> compile this code (simply because the memoryview utilities get >>>>>> included). >>>>> >>>>> Ok, that hints at serious performance problems. Could you profile it to see >>>>> where the issues are? Is it more that the code is loaded from an external >>>>> file? Or the fact that more utility code is parsed than necessary? >>>> >>>> I haven't profiled it yet (I'll do that), but I'm fairly sure it's the >>>> parsing of Cython utility files (not the loading). Maybe Tempita also >>>> adds to the overhead, I'll find out. >>>> >>> >>> Compiling this regex gives 5ms instead of 10ms on my machine >>> >>> https://github.com/cython/cython/blob/master/Cython/Compiler/Code.py#L85 >>> >>> And on your example gives 3% speedup >>> >> >> Sorry, which code gets you 10ms? Also, is this about loading + regex >> matching, or just about compiling the pattern? >> > > I've added decorator for load_utilities_from_file that prints time for > current call and total sum for this function and total gives 10ms. > > Btw that's not that much. > > Here is small comparison on compiling urllib.py with cython: ((e8527c5...)) vitja at mchome:~/work/cython-vitek-git/zzz$ time python ../cython.py urllib.py real 0m1.699s user 0m1.650s sys 0m0.040s (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python ../cython.py urllib.py real 0m2.830s user 0m2.790s sys 0m0.030s It's about 1.5 times slower. -- vitja. From stefan_ml at behnel.de Fri Oct 7 09:41:34 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 07 Oct 2011 09:41:34 +0200 Subject: [Cython] Utilities, cython.h, libcython In-Reply-To: References: <4E8C0448.6010204@behnel.de> <4E8D4EDB.2090009@behnel.de> Message-ID: <4E8EAD2E.8040701@behnel.de> Vitja Makarov, 06.10.2011 23:12: > Here is small comparison on compiling urllib.py with cython: > > ((e8527c5...)) vitja at mchome:~/work/cython-vitek-git/zzz$ time python > ../cython.py urllib.py > > real 0m1.699s > user 0m1.650s > sys 0m0.040s > (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python > ../cython.py urllib.py > > real 0m2.830s > user 0m2.790s > sys 0m0.030s > > > It's about 1.5 times slower. I assume this uses a compiled Cython? That's a pretty serious regression for plain Python code then. Again, this needs proper profiling. We may also want to disable certain steps in the pipeline based on the syntax features used. If a feature is not used that has its own (set of) visitors, we can disable them completely. Detection already happens based on the .pyx/.py distinction, but could additionally use a detector (e.g. in the post-parse phase) that sets up skip flags. One example is the closure building step, which could be skipped if there are no closures. Stefan From vitja.makarov at gmail.com Fri Oct 7 10:11:42 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Fri, 7 Oct 2011 12:11:42 +0400 Subject: [Cython] Utilities, cython.h, libcython In-Reply-To: <4E8EAD2E.8040701@behnel.de> References: <4E8C0448.6010204@behnel.de> <4E8D4EDB.2090009@behnel.de> <4E8EAD2E.8040701@behnel.de> Message-ID: 2011/10/7 Stefan Behnel : > Vitja Makarov, 06.10.2011 23:12: >> >> Here is small comparison on compiling urllib.py with cython: >> >> ((e8527c5...)) vitja at mchome:~/work/cython-vitek-git/zzz$ time python >> ../cython.py urllib.py >> >> real ? ?0m1.699s >> user ? ?0m1.650s >> sys ? ? 0m0.040s >> (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python >> ../cython.py urllib.py >> >> real ? ?0m2.830s >> user ? ?0m2.790s >> sys ? ? 0m0.030s >> >> >> It's about 1.5 times slower. > > I assume this uses a compiled Cython? That's a pretty serious regression for > plain Python code then. Again, this needs proper profiling. > No, that was pure python cython. > We may also want to disable certain steps in the pipeline based on the > syntax features used. If a feature is not used that has its own (set of) > visitors, we can disable them completely. Detection already happens based on > the .pyx/.py distinction, but could additionally use a detector (e.g. in the > post-parse phase) that sets up skip flags. One example is the closure > building step, which could be skipped if there are no closures. > One more think I've found is that many unused utilities are loaded. -- vitja. From vitja.makarov at gmail.com Fri Oct 7 18:01:02 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Fri, 7 Oct 2011 20:01:02 +0400 Subject: [Cython] Utilities, cython.h, libcython In-Reply-To: References: <4E8C0448.6010204@behnel.de> <4E8D4EDB.2090009@behnel.de> <4E8EAD2E.8040701@behnel.de> Message-ID: 2011/10/7 Vitja Makarov : > 2011/10/7 Stefan Behnel : >> Vitja Makarov, 06.10.2011 23:12: >>> >>> Here is small comparison on compiling urllib.py with cython: >>> >>> ((e8527c5...)) vitja at mchome:~/work/cython-vitek-git/zzz$ time python >>> ../cython.py urllib.py >>> >>> real ? ?0m1.699s >>> user ? ?0m1.650s >>> sys ? ? 0m0.040s >>> (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python >>> ../cython.py urllib.py >>> >>> real ? ?0m2.830s >>> user ? ?0m2.790s >>> sys ? ? 0m0.030s >>> >>> >>> It's about 1.5 times slower. >> >> I assume this uses a compiled Cython? That's a pretty serious regression for >> plain Python code then. Again, this needs proper profiling. >> > > No, that was pure python cython. > I've added return statement on top of CythonScope.test_cythonscope, now I have these timings: (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python ../cython.py urllib.py real 0m1.764s user 0m1.700s sys 0m0.060s >> We may also want to disable certain steps in the pipeline based on the >> syntax features used. If a feature is not used that has its own (set of) >> visitors, we can disable them completely. Detection already happens based on >> the .pyx/.py distinction, but could additionally use a detector (e.g. in the >> post-parse phase) that sets up skip flags. One example is the closure >> building step, which could be skipped if there are no closures. >> > > > One more think I've found is that many unused utilities are loaded. > -- vitja. From stefan_ml at behnel.de Sat Oct 8 09:03:50 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 08 Oct 2011 09:03:50 +0200 Subject: [Cython] compiler performance issue for extended utility code In-Reply-To: References: <4E8C0448.6010204@behnel.de> <4E8D4EDB.2090009@behnel.de> <4E8EAD2E.8040701@behnel.de> Message-ID: <4E8FF5D6.4070104@behnel.de> Vitja Makarov, 07.10.2011 18:01: >> 2011/10/7 Stefan Behnel: >>> Vitja Makarov, 06.10.2011 23:12: >>>> >>>> Here is small comparison on compiling urllib.py with cython: >>>> >>>> ((e8527c5...)) vitja at mchome:~/work/cython-vitek-git/zzz$ time python >>>> ../cython.py urllib.py >>>> >>>> real 0m1.699s >>>> user 0m1.650s >>>> sys 0m0.040s >>>> (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python >>>> ../cython.py urllib.py >>>> >>>> real 0m2.830s >>>> user 0m2.790s >>>> sys 0m0.030s >>>> >>>> >>>> It's about 1.5 times slower. >>> >>> That's a pretty serious regression for >>> plain Python code then. Again, this needs proper profiling. > > I've added return statement on top of CythonScope.test_cythonscope, > now I have these timings: > > (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python > ../cython.py urllib.py > > real 0m1.764s > user 0m1.700s > sys 0m0.060s Ok, then it's only a bug. "create_testscope" is on by default in Main.py, Context.__init__(). I don't know what it does exactly, but my guess is that the option should a) be off by default and b) should rather be passed in by the test runner as part of the compile options rather than being a parameter of the Context class. AFAICT, it's currently only used in TreeFragment.py, where it is being switched off explicitly for parsing code snippets. Stefan From markflorisson88 at gmail.com Sat Oct 8 11:22:12 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 8 Oct 2011 10:22:12 +0100 Subject: [Cython] compiler performance issue for extended utility code In-Reply-To: <4E8FF5D6.4070104@behnel.de> References: <4E8C0448.6010204@behnel.de> <4E8D4EDB.2090009@behnel.de> <4E8EAD2E.8040701@behnel.de> <4E8FF5D6.4070104@behnel.de> Message-ID: On 8 October 2011 08:03, Stefan Behnel wrote: > Vitja Makarov, 07.10.2011 18:01: >>> >>> 2011/10/7 Stefan Behnel: >>>> >>>> Vitja Makarov, 06.10.2011 23:12: >>>>> >>>>> Here is small comparison on compiling urllib.py with cython: >>>>> >>>>> ((e8527c5...)) vitja at mchome:~/work/cython-vitek-git/zzz$ time python >>>>> ../cython.py urllib.py >>>>> >>>>> real ? ?0m1.699s >>>>> user ? ?0m1.650s >>>>> sys ? ? 0m0.040s >>>>> (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python >>>>> ../cython.py urllib.py >>>>> >>>>> real ? ?0m2.830s >>>>> user ? ?0m2.790s >>>>> sys ? ? 0m0.030s >>>>> >>>>> >>>>> It's about 1.5 times slower. >>>> >>>> That's a pretty serious regression for >>>> plain Python code then. Again, this needs proper profiling. >> >> I've added return statement on top of CythonScope.test_cythonscope, >> now I have these timings: >> >> (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python >> ../cython.py urllib.py >> >> real ? ?0m1.764s >> user ? ?0m1.700s >> sys ? ? 0m0.060s > > Ok, then it's only a bug. "create_testscope" is on by default in Main.py, > Context.__init__(). I don't know what it does exactly, but my guess is that > the option should a) be off by default and b) should rather be passed in by > the test runner as part of the compile options rather than being a parameter > of the Context class. AFAICT, it's currently only used in TreeFragment.py, > where it is being switched off explicitly for parsing code snippets. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > It turns it off to avoid infinite recursion. This basically means that you cannot use stuf from the Cython scope in your Cython utilities. So in your Cython utilities, you have to declare the C version of it (which you declared with the @cname decorator). This is not really something that can just be avoided loading like this. Perhaps one solution could be to load the test scope when you do a lookup in the cython scope for which no entry is found. But really, libcython and serializing entries will solve all this, so I suppose the real question is, do we want to do a release before we support such functionality? Anyway, the cython scope lookup would be a simple hack worth a try. From vitja.makarov at gmail.com Sat Oct 8 14:10:53 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Sat, 8 Oct 2011 16:10:53 +0400 Subject: [Cython] compiler performance issue for extended utility code In-Reply-To: References: <4E8C0448.6010204@behnel.de> <4E8D4EDB.2090009@behnel.de> <4E8EAD2E.8040701@behnel.de> <4E8FF5D6.4070104@behnel.de> Message-ID: 2011/10/8 mark florisson : > On 8 October 2011 08:03, Stefan Behnel wrote: >> Vitja Makarov, 07.10.2011 18:01: >>>> >>>> 2011/10/7 Stefan Behnel: >>>>> >>>>> Vitja Makarov, 06.10.2011 23:12: >>>>>> >>>>>> Here is small comparison on compiling urllib.py with cython: >>>>>> >>>>>> ((e8527c5...)) vitja at mchome:~/work/cython-vitek-git/zzz$ time python >>>>>> ../cython.py urllib.py >>>>>> >>>>>> real ? ?0m1.699s >>>>>> user ? ?0m1.650s >>>>>> sys ? ? 0m0.040s >>>>>> (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python >>>>>> ../cython.py urllib.py >>>>>> >>>>>> real ? ?0m2.830s >>>>>> user ? ?0m2.790s >>>>>> sys ? ? 0m0.030s >>>>>> >>>>>> >>>>>> It's about 1.5 times slower. >>>>> >>>>> That's a pretty serious regression for >>>>> plain Python code then. Again, this needs proper profiling. >>> >>> I've added return statement on top of CythonScope.test_cythonscope, >>> now I have these timings: >>> >>> (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python >>> ../cython.py urllib.py >>> >>> real ? ?0m1.764s >>> user ? ?0m1.700s >>> sys ? ? 0m0.060s >> >> Ok, then it's only a bug. "create_testscope" is on by default in Main.py, >> Context.__init__(). I don't know what it does exactly, but my guess is that >> the option should a) be off by default and b) should rather be passed in by >> the test runner as part of the compile options rather than being a parameter >> of the Context class. AFAICT, it's currently only used in TreeFragment.py, >> where it is being switched off explicitly for parsing code snippets. >> >> Stefan >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> > > It turns it off to avoid infinite recursion. This basically means that > you cannot use stuf from the Cython scope in your Cython utilities. So > in your Cython utilities, you have to declare the C version of it > (which you declared with the @cname decorator). > > This is not really something that can just be avoided loading like > this. Perhaps one solution could be to load the test scope when you do > a lookup in the cython scope for which no entry is found. But really, > libcython and serializing entries will solve all this, so I suppose > the real question is, do we want to do a release before we support > such functionality? > Anyway, the cython scope lookup would be a simple hack worth a try. > Does utility code supports something like dependencies? And could that help here? I've also noticed that some utilities are loaded unconditionally perhaps it's better to introduce lazy loading. -- vitja. From stefan_ml at behnel.de Sat Oct 8 14:25:25 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 08 Oct 2011 14:25:25 +0200 Subject: [Cython] Any news from the IronPython port? In-Reply-To: References: <4E23D558.5000104@behnel.de> Message-ID: <4E904135.6090708@behnel.de> Robert Bradshaw, 19.07.2011 05:57: > On Mon, Jul 18, 2011 at 7:45 AM, Jason McCampbell wrote: >> Definitely not buried for good, though we haven't made a lot of changes >> recently. :) We used it for porting SciPy to .NET and re-wrote a large >> number of the SciPy C module implementations in Cython. It is generally >> stable and produces good code within the set of features that were needed >> (by no means has feature parity with the CPython version). >> In general, I have been quite happy with the results given that it is >> possible to generate interfaces for two Python implementations from a single >> source. Of course, it is not free. One can, in general, not take a >> NumPy-heavy Cython file and just generate source code for IronPython. >> Because IronPython and NumPy for .NET do not share any common C APIs we had >> to wrap some of the APIs and in other cases switch to using Python notation >> and/or call the new Python-independent NumPy core API (present only in the >> refactored version). >> Overall, I think it's a good start and holds some promise for generating >> re-targetable native wrappings, but there is still plenty of work to do to >> make it more accessible. >> Regards, >> Jason > > Thanks for the status update--is the code available somewhere (e.g. as > a forked git repo)? Is it something that would be worth merging, or at > this point is it mostly hacked up to just do what you need it to for > SciPy? The code is here: https://bitbucket.org/cwitty/cython-for-ironpython/overview No idea what the status is, but it hasn't been updated for a while. Stefan From markflorisson88 at gmail.com Sat Oct 8 15:18:27 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 8 Oct 2011 14:18:27 +0100 Subject: [Cython] compiler performance issue for extended utility code In-Reply-To: References: <4E8C0448.6010204@behnel.de> <4E8D4EDB.2090009@behnel.de> <4E8EAD2E.8040701@behnel.de> <4E8FF5D6.4070104@behnel.de> Message-ID: On 8 October 2011 13:10, Vitja Makarov wrote: > 2011/10/8 mark florisson : >> On 8 October 2011 08:03, Stefan Behnel wrote: >>> Vitja Makarov, 07.10.2011 18:01: >>>>> >>>>> 2011/10/7 Stefan Behnel: >>>>>> >>>>>> Vitja Makarov, 06.10.2011 23:12: >>>>>>> >>>>>>> Here is small comparison on compiling urllib.py with cython: >>>>>>> >>>>>>> ((e8527c5...)) vitja at mchome:~/work/cython-vitek-git/zzz$ time python >>>>>>> ../cython.py urllib.py >>>>>>> >>>>>>> real ? ?0m1.699s >>>>>>> user ? ?0m1.650s >>>>>>> sys ? ? 0m0.040s >>>>>>> (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python >>>>>>> ../cython.py urllib.py >>>>>>> >>>>>>> real ? ?0m2.830s >>>>>>> user ? ?0m2.790s >>>>>>> sys ? ? 0m0.030s >>>>>>> >>>>>>> >>>>>>> It's about 1.5 times slower. >>>>>> >>>>>> That's a pretty serious regression for >>>>>> plain Python code then. Again, this needs proper profiling. >>>> >>>> I've added return statement on top of CythonScope.test_cythonscope, >>>> now I have these timings: >>>> >>>> (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python >>>> ../cython.py urllib.py >>>> >>>> real ? ?0m1.764s >>>> user ? ?0m1.700s >>>> sys ? ? 0m0.060s >>> >>> Ok, then it's only a bug. "create_testscope" is on by default in Main.py, >>> Context.__init__(). I don't know what it does exactly, but my guess is that >>> the option should a) be off by default and b) should rather be passed in by >>> the test runner as part of the compile options rather than being a parameter >>> of the Context class. AFAICT, it's currently only used in TreeFragment.py, >>> where it is being switched off explicitly for parsing code snippets. >>> >>> Stefan >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >>> >> >> It turns it off to avoid infinite recursion. This basically means that >> you cannot use stuf from the Cython scope in your Cython utilities. So >> in your Cython utilities, you have to declare the C version of it >> (which you declared with the @cname decorator). >> >> This is not really something that can just be avoided loading like >> this. Perhaps one solution could be to load the test scope when you do >> a lookup in the cython scope for which no entry is found. But really, >> libcython and serializing entries will solve all this, so I suppose >> the real question is, do we want to do a release before we support >> such functionality? >> Anyway, the cython scope lookup would be a simple hack worth a try. >> > > Does utility code supports something like dependencies? And could that > help here? Yeah they can have dependencies like normal UtilitieCodes. > I've also noticed that some utilities are loaded unconditionally > perhaps it's better to introduce lazy loading. Well, they shouldn't be. If they are it's generally a bug. I noticed that it happens in the test runner though, although it should create a fresh context with freshly initialized entries. > -- > vitja. > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From vitja.makarov at gmail.com Sun Oct 9 07:12:33 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Sun, 9 Oct 2011 09:12:33 +0400 Subject: [Cython] Can't login into my trac account In-Reply-To: References: Message-ID: Hi! Any news here? 2011/9/28 Robert Bradshaw : > I can't log in either, though I haven't had a chance to investigate. > > On Tue, Sep 27, 2011 at 9:41 AM, Vitja Makarov wrote: >> Hi! >> >> Today I found that I can't login into my trac account. Is that common >> problem or only mine? >> >> -- >> vitja. >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > -- vitja. From markflorisson88 at gmail.com Sun Oct 9 12:19:34 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 9 Oct 2011 11:19:34 +0100 Subject: [Cython] compiler performance issue for extended utility code In-Reply-To: References: <4E8C0448.6010204@behnel.de> <4E8D4EDB.2090009@behnel.de> <4E8EAD2E.8040701@behnel.de> <4E8FF5D6.4070104@behnel.de> Message-ID: On 8 October 2011 10:22, mark florisson wrote: > On 8 October 2011 08:03, Stefan Behnel wrote: > > Vitja Makarov, 07.10.2011 18:01: > >>> > >>> 2011/10/7 Stefan Behnel: > >>>> > >>>> Vitja Makarov, 06.10.2011 23:12: > >>>>> > >>>>> Here is small comparison on compiling urllib.py with cython: > >>>>> > >>>>> ((e8527c5...)) vitja at mchome:~/work/cython-vitek-git/zzz$ time python > >>>>> ../cython.py urllib.py > >>>>> > >>>>> real 0m1.699s > >>>>> user 0m1.650s > >>>>> sys 0m0.040s > >>>>> (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python > >>>>> ../cython.py urllib.py > >>>>> > >>>>> real 0m2.830s > >>>>> user 0m2.790s > >>>>> sys 0m0.030s > >>>>> > >>>>> > >>>>> It's about 1.5 times slower. > >>>> > >>>> That's a pretty serious regression for > >>>> plain Python code then. Again, this needs proper profiling. > >> > >> I've added return statement on top of CythonScope.test_cythonscope, > >> now I have these timings: > >> > >> (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python > >> ../cython.py urllib.py > >> > >> real 0m1.764s > >> user 0m1.700s > >> sys 0m0.060s > > > > Ok, then it's only a bug. "create_testscope" is on by default in Main.py, > > Context.__init__(). I don't know what it does exactly, but my guess is > that > > the option should a) be off by default and b) should rather be passed in > by > > the test runner as part of the compile options rather than being a > parameter > > of the Context class. AFAICT, it's currently only used in > TreeFragment.py, > > where it is being switched off explicitly for parsing code snippets. > > > > Stefan > > _______________________________________________ > > cython-devel mailing list > > cython-devel at python.org > > http://mail.python.org/mailman/listinfo/cython-devel > > > > It turns it off to avoid infinite recursion. This basically means that > you cannot use stuf from the Cython scope in your Cython utilities. So > in your Cython utilities, you have to declare the C version of it > (which you declared with the @cname decorator). > > This is not really something that can just be avoided loading like > this. Perhaps one solution could be to load the test scope when you do > a lookup in the cython scope for which no entry is found. But really, > libcython and serializing entries will solve all this, so I suppose > the real question is, do we want to do a release before we support > such functionality? > Anyway, the cython scope lookup would be a simple hack worth a try. > I applied the hack, i.e. defer loading the scope until the first entry in the cython scope can't be found: https://github.com/markflorisson88/cython/commit/ad4cf6303d1bf8a81e3afccc9572559a34827a3b [0] [11:16] ~ ? time cython urllib.py # conditionally load scope cython urllib.py 2.75s user 0.14s system 99% cpu 2.893 total [0] [11:17] ~ ? time cython urllib.py # always load scope cython urllib.py 4.08s user 0.16s system 99% cpu 4.239 total -------------- next part -------------- An HTML attachment was scrubbed... URL: From markflorisson88 at gmail.com Sun Oct 9 14:11:56 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 9 Oct 2011 13:11:56 +0100 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers Message-ID: Hey, So far people have been enthusiastic about the cython.parallel features, I think we should introduce some new features. I propose the following, assume parallel has been imported from cython: with parallel.master(): this is executed in the master thread in a parallel (non-prange) section with parallel.single(): same as master, except any thread may do the execution An optional keyword argument 'nowait' specifies whether there will be a barrier at the end. The default is to wait. with parallel.task(): create a task to be executed by some thread in the team once a thread takes up the task it shall only be executed by that thread and no other thread (so the task will be tied to the thread) C variables will be firstprivate Python objects will be shared parallel.taskwait() # wait on any direct descendent tasks to finish with parallel.critical(): this section of code is mutually exclusive with other critical sections optional keyword argument 'name' specifies a name for the critical section, which means all sections with that name will exclude each other, but not critical sections with different names Note: all threads that encounter the section will execute it, just not at the same time with parallel.barrier(): all threads wait until everyone has reached the barrier either no one or everyone should encounter the barrier shared variables are flushed Unfortunately, gcc again manages to horribly break master and single constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll first file a bug report. Other (better) compilers like Portland (and I'm sure Intel) work fine. I suppose a warning in the documentation will suffice there. If we at some point implement vector/SIMD operations we could also try out the Fortran openmp workshare construct. What do you guys think? Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Sun Oct 9 14:18:08 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sun, 09 Oct 2011 14:18:08 +0200 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: References: Message-ID: <4E919100.8020801@astro.uio.no> On 10/09/2011 02:11 PM, mark florisson wrote: > Hey, > > So far people have been enthusiastic about the cython.parallel features, > I think we should introduce some new features. I propose the following, Great!! I only have time for a very short feedback now, perhaps more will follow. > assume parallel has been imported from cython: > > with parallel.master(): > this is executed in the master thread in a parallel (non-prange) > section > > with parallel.single(): > same as master, except any thread may do the execution > > An optional keyword argument 'nowait' specifies whether there will be a > barrier at the end. The default is to wait. > > with parallel.task(): > create a task to be executed by some thread in the team > once a thread takes up the task it shall only be executed by that > thread and no other thread (so the task will be tied to the thread) > > C variables will be firstprivate > Python objects will be shared > > parallel.taskwait() # wait on any direct descendent tasks to finish Regarding tasks, I think this is mapping OpenMP too close to Python. Closures are excellent for the notion of a task, so I think something based on the futures API would work better. I realize that makes the mapping to OpenMP and implementation a bit more difficult, but I think it is worth it in the long run. > > with parallel.critical(): > this section of code is mutually exclusive with other critical sections > optional keyword argument 'name' specifies a name for the critical > section, > which means all sections with that name will exclude each other, > but not > critical sections with different names > > Note: all threads that encounter the section will execute it, just > not at the same time > > with parallel.barrier(): > all threads wait until everyone has reached the barrier > either no one or everyone should encounter the barrier > shared variables are flushed > > Unfortunately, gcc again manages to horribly break master and single > constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll > first file a bug report. Other (better) compilers like Portland (and I'm > sure Intel) work fine. I suppose a warning in the documentation will > suffice there. > > If we at some point implement vector/SIMD operations we could also try > out the Fortran openmp workshare construct. I'm starting to learn myself OpenCL as part of a course. It's very neat for some kinds of parallelism. What I'm saying is that at least of the case of SIMD, we should not lock ourselves to Fortran+OpenMP thinking too early, but also look forward to coming architectures (e.g., AMD's GPU-and-CPU on same die design). Dag Sverre From d.s.seljebotn at astro.uio.no Sun Oct 9 14:57:36 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sun, 09 Oct 2011 14:57:36 +0200 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: <4E919100.8020801@astro.uio.no> References: <4E919100.8020801@astro.uio.no> Message-ID: <4E919A40.2090001@astro.uio.no> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote: > On 10/09/2011 02:11 PM, mark florisson wrote: >> Hey, >> >> So far people have been enthusiastic about the cython.parallel features, >> I think we should introduce some new features. I propose the following, > > Great!! > > I only have time for a very short feedback now, perhaps more will follow. > >> assume parallel has been imported from cython: >> >> with parallel.master(): >> this is executed in the master thread in a parallel (non-prange) >> section >> >> with parallel.single(): >> same as master, except any thread may do the execution >> >> An optional keyword argument 'nowait' specifies whether there will be a >> barrier at the end. The default is to wait. I like if parallel.is_master(): ... explicit_barrier_somehow() # see below better as a Pythonization. One could easily support is_master to be used in other contexts as well, simply by assigning a status flag in the master block. Using an if-test flows much better with Python I feel, but that naturally lead to making the barrier explicit. But I like the barrier always being explicit, rather than having it as a predicate on all the different constructs like in OpenMP.... I'm less sure about single, since making it a function indicates one could use it in other contexts and the whole thing becomes too magic (since it's tied to the position of invocation). I'm tempted to suggest for _ in prange(1): ... as our syntax for single. >> >> with parallel.task(): >> create a task to be executed by some thread in the team >> once a thread takes up the task it shall only be executed by that >> thread and no other thread (so the task will be tied to the thread) >> >> C variables will be firstprivate >> Python objects will be shared >> >> parallel.taskwait() # wait on any direct descendent tasks to finish > > Regarding tasks, I think this is mapping OpenMP too close to Python. > Closures are excellent for the notion of a task, so I think something > based on the futures API would work better. I realize that makes the > mapping to OpenMP and implementation a bit more difficult, but I think > it is worth it in the long run. > >> >> with parallel.critical(): >> this section of code is mutually exclusive with other critical sections >> optional keyword argument 'name' specifies a name for the critical >> section, >> which means all sections with that name will exclude each other, >> but not >> critical sections with different names >> >> Note: all threads that encounter the section will execute it, just >> not at the same time Yes, this works well as a with-statement... ..except that it is slightly magic in that it binds to call position (unlike anything in Python). I.e. this would be more "correct", or at least Pythonic: with parallel.critical(__file__, __line__): ... >> >> with parallel.barrier(): >> all threads wait until everyone has reached the barrier >> either no one or everyone should encounter the barrier >> shared variables are flushed I have problems with requiring a noop with block... I'd much rather write parallel.barrier() However, that ties a function call to the place of invocation, and suggests that one could do if rand() > .5: barrier() else: i += 3 barrier() and have the same barrier in each case. Again, barrier(__file__, __line__) gets us purity at the cost of practicality. Another way is the pthreads approach (although one may have to use pthread rather then OpenMP to get it, unless there are named barriers?): barrier_a = parallel.barrier() barrier_b = parallel.barrier() with parallel: barrier_a.wait() if rand() > .5: barrier_b.wait() else: i += 3 barrier_b.wait() I'm really not sure here. >> >> Unfortunately, gcc again manages to horribly break master and single >> constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll >> first file a bug report. Other (better) compilers like Portland (and I'm >> sure Intel) work fine. I suppose a warning in the documentation will >> suffice there. >> >> If we at some point implement vector/SIMD operations we could also try >> out the Fortran openmp workshare construct. > > I'm starting to learn myself OpenCL as part of a course. It's very neat > for some kinds of parallelism. What I'm saying is that at least of the > case of SIMD, we should not lock ourselves to Fortran+OpenMP thinking > too early, but also look forward to coming architectures (e.g., AMD's > GPU-and-CPU on same die design). > > Dag Sverre > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Sun Oct 9 15:28:24 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 9 Oct 2011 14:28:24 +0100 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: <4E919100.8020801@astro.uio.no> References: <4E919100.8020801@astro.uio.no> Message-ID: On 9 October 2011 13:18, Dag Sverre Seljebotn wrote: > > On 10/09/2011 02:11 PM, mark florisson wrote: >> >> Hey, >> >> So far people have been enthusiastic about the cython.parallel features, >> I think we should introduce some new features. I propose the following, > > Great!! > > I only have time for a very short feedback now, perhaps more will follow. > >> assume parallel has been imported from cython: >> >> with parallel.master(): >> ? ? this is executed in the master thread in a parallel (non-prange) >> section >> >> with parallel.single(): >> ? ?same as master, except any thread may do the execution >> >> An optional keyword argument 'nowait' specifies whether there will be a >> barrier at the end. The default is to wait. >> >> with parallel.task(): >> ? ? create a task to be executed by some thread in the team >> ? ? once a thread takes up the task it shall only be executed by that >> thread and no other thread (so the task will be tied to the thread) >> >> ? ? C variables will be firstprivate >> ? ? Python objects will be shared >> >> parallel.taskwait() # wait on any direct descendent tasks to finish > > Regarding tasks, I think this is mapping OpenMP too close to Python. Closures are excellent for the notion of a task, so I think something based on the futures API would work better. I realize that makes the mapping to OpenMP and implementation a bit more difficult, but I think it is worth it in the long run. Hmm, that would be cool as well. Something like parallel.submit_task(myclosure)? The problem I see with that is that parallel stuff can't have the GIL, and you can only have 'def' closures at the moment. I realize that you won't actually have to use closure support here though, and could just transform the inner function to OpenMP task code. This would maybe look inconsistent with other closures though, and you'd also have to restrict the use of such a closure to parallel.submit_task(). Anyway, perhaps you have a concrete proposal that addresses these problems. >> >> with parallel.critical(): >> ? ? this section of code is mutually exclusive with other critical sections >> ? ? optional keyword argument 'name' specifies a name for the critical >> section, >> ? ? which means all sections with that name will exclude each other, >> but not >> ? ? critical sections with different names >> >> ? ? Note: all threads that encounter the section will execute it, just >> not at the same time >> >> with parallel.barrier(): >> ? ? all threads wait until everyone has reached the barrier >> ? ? either no one or everyone should encounter the barrier >> ? ? shared variables are flushed >> >> Unfortunately, gcc again manages to horribly break master and single >> constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll >> first file a bug report. Other (better) compilers like Portland (and I'm >> sure Intel) work fine. I suppose a warning in the documentation will >> suffice there. >> >> If we at some point implement vector/SIMD operations we could also try >> out the Fortran openmp workshare construct. > > I'm starting to learn myself OpenCL as part of a course. It's very neat for some kinds of parallelism. What I'm saying is that at least of the case of SIMD, we should not lock ourselves to Fortran+OpenMP thinking too early, but also look forward to coming architectures (e.g., AMD's GPU-and-CPU on same die design). Oh, definitely. The good thing is that code generation backends needn't be that hard. If you figure all semantics out in the Python code you could based on a backend load a different utility template as a string. It's probably not that easy, but the point is that as long as your code semantics don't prevent other backends, you keep your options open. In the end I want to be able to write a parallel program almost serially and have Cython compile it to OpenMP, MPI, GPU's or whatever else I need. At the same time I need to stay in touch with reality, so it's one step at a time :) > Dag Sverre > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Sun Oct 9 15:30:39 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 9 Oct 2011 14:30:39 +0100 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: <4E919A40.2090001@astro.uio.no> References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> Message-ID: On 9 October 2011 13:57, Dag Sverre Seljebotn wrote: > On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote: >> >> On 10/09/2011 02:11 PM, mark florisson wrote: >>> >>> Hey, >>> >>> So far people have been enthusiastic about the cython.parallel features, >>> I think we should introduce some new features. I propose the following, >> >> Great!! >> >> I only have time for a very short feedback now, perhaps more will follow. >> >>> assume parallel has been imported from cython: >>> >>> with parallel.master(): >>> this is executed in the master thread in a parallel (non-prange) >>> section >>> >>> with parallel.single(): >>> same as master, except any thread may do the execution >>> >>> An optional keyword argument 'nowait' specifies whether there will be a >>> barrier at the end. The default is to wait. > > I like > > if parallel.is_master(): > ? ?... > explicit_barrier_somehow() # see below > > better as a Pythonization. One could easily support is_master to be used in > other contexts as well, simply by assigning a status flag in the master > block. > > Using an if-test flows much better with Python I feel, but that naturally > lead to making the barrier explicit. But I like the barrier always being > explicit, rather than having it as a predicate on all the different > constructs like in OpenMP.... Hmm, that might mean you also want the barrier for a prange in a parallel to be explicit. I like the 'if' test though, although it wouldn't make sense for 'single'. > I'm less sure about single, since making it a function indicates one could > use it in other contexts and the whole thing becomes too magic (since it's > tied to the position of invocation). I'm tempted to suggest > > for _ in prange(1): > ? ?... > > as our syntax for single. I think that syntax is absolutely terrible :) Perhaps single is not so important and one can just use master instead (or, if really needed, master + a task with the actual work). >>> >>> with parallel.task(): >>> create a task to be executed by some thread in the team >>> once a thread takes up the task it shall only be executed by that >>> thread and no other thread (so the task will be tied to the thread) >>> >>> C variables will be firstprivate >>> Python objects will be shared >>> >>> parallel.taskwait() # wait on any direct descendent tasks to finish >> >> Regarding tasks, I think this is mapping OpenMP too close to Python. >> Closures are excellent for the notion of a task, so I think something >> based on the futures API would work better. I realize that makes the >> mapping to OpenMP and implementation a bit more difficult, but I think >> it is worth it in the long run. >> >>> >>> with parallel.critical(): >>> this section of code is mutually exclusive with other critical sections >>> optional keyword argument 'name' specifies a name for the critical >>> section, >>> which means all sections with that name will exclude each other, >>> but not >>> critical sections with different names >>> >>> Note: all threads that encounter the section will execute it, just >>> not at the same time > > Yes, this works well as a with-statement... > > ..except that it is slightly magic in that it binds to call position (unlike > anything in Python). I.e. this would be more "correct", or at least > Pythonic: > > with parallel.critical(__file__, __line__): > ? ?... > I'm not entirely sure what you mean here. Critical is really about the block contained within, not about a position in a file. Not all threads have to encounter the critical region, and not specifying a name means you exclude with *all other* unnamed critical sections (not just this one). >>> >>> with parallel.barrier(): >>> all threads wait until everyone has reached the barrier >>> either no one or everyone should encounter the barrier >>> shared variables are flushed > > I have problems with requiring a noop with block... > > I'd much rather write > > parallel.barrier() Although in OpenMP it doesn't have any associated code, but we could give it those semantics: apply the barrier at the end of the block of code. The con is that the barrier is at the top while it only affects leaving the block, you would write: with parallel.barrier(): if rand() > .5: ... else: ... # the barrier is here > However, that ties a function call to the place of invocation, and suggests > that one could do > > if rand() > .5: > ? ?barrier() > else: > ? ?i += 3 > ? ?barrier() > > and have the same barrier in each case. Again, > > barrier(__file__, __line__) > > gets us purity at the cost of practicality. In this case (unlike the critical construct), yes. I think a warning in the docs stating that either all or none of the threads must encounter the barrier should suffice. > Another way is the pthreads > approach (although one may have to use pthread rather then OpenMP to get it, > unless there are named barriers?): > > barrier_a = parallel.barrier() > barrier_b = parallel.barrier() > with parallel: > ? ?barrier_a.wait() > ? ?if rand() > .5: > ? ? ? ?barrier_b.wait() > ? ?else: > ? ? ? ?i += 3 > ? ? ? ?barrier_b.wait() > > > I'm really not sure here. I think we should really just say to the user: "dont do this". There are no named barriers, implementing this wouldn't be easy at all (in fact, I'm not sure you can specify sane semantics for this if you have more branches and some do not contain the same barrier). The block structure for barriers would help here, as blocks are inconvenient to write: if C: with barrier(): ... else: with barrier(): ... is just not nice to write, you would instead write with barrier(): if C: ... else: ... >>> >>> Unfortunately, gcc again manages to horribly break master and single >>> constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll >>> first file a bug report. Other (better) compilers like Portland (and I'm >>> sure Intel) work fine. I suppose a warning in the documentation will >>> suffice there. >>> >>> If we at some point implement vector/SIMD operations we could also try >>> out the Fortran openmp workshare construct. >> >> I'm starting to learn myself OpenCL as part of a course. It's very neat >> for some kinds of parallelism. What I'm saying is that at least of the >> case of SIMD, we should not lock ourselves to Fortran+OpenMP thinking >> too early, but also look forward to coming architectures (e.g., AMD's >> GPU-and-CPU on same die design). >> >> Dag Sverre >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From markflorisson88 at gmail.com Sun Oct 9 15:39:45 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 9 Oct 2011 14:39:45 +0100 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> Message-ID: On 9 October 2011 14:30, mark florisson wrote: > On 9 October 2011 13:57, Dag Sverre Seljebotn > wrote: >> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote: >>> >>> On 10/09/2011 02:11 PM, mark florisson wrote: >>>> >>>> Hey, >>>> >>>> So far people have been enthusiastic about the cython.parallel features, >>>> I think we should introduce some new features. I propose the following, >>> >>> Great!! >>> >>> I only have time for a very short feedback now, perhaps more will follow. >>> >>>> assume parallel has been imported from cython: >>>> >>>> with parallel.master(): >>>> this is executed in the master thread in a parallel (non-prange) >>>> section >>>> >>>> with parallel.single(): >>>> same as master, except any thread may do the execution >>>> >>>> An optional keyword argument 'nowait' specifies whether there will be a >>>> barrier at the end. The default is to wait. >> >> I like >> >> if parallel.is_master(): >> ? ?... >> explicit_barrier_somehow() # see below >> >> better as a Pythonization. One could easily support is_master to be used in >> other contexts as well, simply by assigning a status flag in the master >> block. >> >> Using an if-test flows much better with Python I feel, but that naturally >> lead to making the barrier explicit. But I like the barrier always being >> explicit, rather than having it as a predicate on all the different >> constructs like in OpenMP.... > > Hmm, that might mean you also want the barrier for a prange in a > parallel to be explicit. I like the 'if' test though, although it > wouldn't make sense for 'single'. > >> I'm less sure about single, since making it a function indicates one could >> use it in other contexts and the whole thing becomes too magic (since it's >> tied to the position of invocation). I'm tempted to suggest >> >> for _ in prange(1): >> ? ?... >> >> as our syntax for single. > > I think that syntax is absolutely terrible :) Perhaps single is not so > important and one can just use master instead (or, if really needed, > master + a task with the actual work). > >>>> >>>> with parallel.task(): >>>> create a task to be executed by some thread in the team >>>> once a thread takes up the task it shall only be executed by that >>>> thread and no other thread (so the task will be tied to the thread) >>>> >>>> C variables will be firstprivate >>>> Python objects will be shared >>>> >>>> parallel.taskwait() # wait on any direct descendent tasks to finish >>> >>> Regarding tasks, I think this is mapping OpenMP too close to Python. >>> Closures are excellent for the notion of a task, so I think something >>> based on the futures API would work better. I realize that makes the >>> mapping to OpenMP and implementation a bit more difficult, but I think >>> it is worth it in the long run. >>> >>>> >>>> with parallel.critical(): >>>> this section of code is mutually exclusive with other critical sections >>>> optional keyword argument 'name' specifies a name for the critical >>>> section, >>>> which means all sections with that name will exclude each other, >>>> but not >>>> critical sections with different names >>>> >>>> Note: all threads that encounter the section will execute it, just >>>> not at the same time >> >> Yes, this works well as a with-statement... >> >> ..except that it is slightly magic in that it binds to call position (unlike >> anything in Python). I.e. this would be more "correct", or at least >> Pythonic: >> >> with parallel.critical(__file__, __line__): >> ? ?... >> > > I'm not entirely sure what you mean here. Critical is really about the > block contained within, not about a position in a file. Not all > threads have to encounter the critical region, and not specifying a > name means you exclude with *all other* unnamed critical sections (not > just this one). > >>>> >>>> with parallel.barrier(): >>>> all threads wait until everyone has reached the barrier >>>> either no one or everyone should encounter the barrier >>>> shared variables are flushed >> >> I have problems with requiring a noop with block... >> >> I'd much rather write >> >> parallel.barrier() > > Although in OpenMP it doesn't have any associated code, but we could > give it those semantics: apply the barrier at the end of the block of > code. The con is that the barrier is at the top while it only affects > leaving the block, you would write: > > with parallel.barrier(): > ? ?if rand() > .5: > ? ? ? ?... > ? ?else: > ? ? ? ?... > # the barrier is here > >> However, that ties a function call to the place of invocation, and suggests >> that one could do >> >> if rand() > .5: >> ? ?barrier() >> else: >> ? ?i += 3 >> ? ?barrier() >> >> and have the same barrier in each case. Again, >> >> barrier(__file__, __line__) >> >> gets us purity at the cost of practicality. > > In this case (unlike the critical construct), yes. I think a warning > in the docs stating that either all or none of the threads must > encounter the barrier should suffice. > >> Another way is the pthreads >> approach (although one may have to use pthread rather then OpenMP to get it, >> unless there are named barriers?): >> >> barrier_a = parallel.barrier() >> barrier_b = parallel.barrier() >> with parallel: >> ? ?barrier_a.wait() >> ? ?if rand() > .5: >> ? ? ? ?barrier_b.wait() >> ? ?else: >> ? ? ? ?i += 3 >> ? ? ? ?barrier_b.wait() >> >> >> I'm really not sure here. > > I think we should really just say to the user: "dont do this". There > are no named barriers, implementing this wouldn't be easy at all (in > fact, I'm not sure you can specify sane semantics for this if you have > more branches and some do not contain the same barrier). The block > structure for barriers would help here, as blocks are inconvenient to > write: > > if C: > ? ?with barrier(): ... > else: > ? ?with barrier(): ... > > is just not nice to write, you would instead write > > with barrier(): > ? ?if C: > ? ? ? ?... > ? ?else: > ? ? ? ?... This would also allow one to write with barrier(), master(): ... Basically it's up to the user to use it sensibly. Usually you want a barrier to ensure that you have a well-defined state set by some code. One could (correctly) only put the last line of such code in the with block, but it would make more sense to put all associated code in there. If there isn't really any associated code, you could just put 'pass' in the block. Does that make sense? I haven't even convinced myself of it yet. >>>> >>>> Unfortunately, gcc again manages to horribly break master and single >>>> constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll >>>> first file a bug report. Other (better) compilers like Portland (and I'm >>>> sure Intel) work fine. I suppose a warning in the documentation will >>>> suffice there. >>>> >>>> If we at some point implement vector/SIMD operations we could also try >>>> out the Fortran openmp workshare construct. >>> >>> I'm starting to learn myself OpenCL as part of a course. It's very neat >>> for some kinds of parallelism. What I'm saying is that at least of the >>> case of SIMD, we should not lock ourselves to Fortran+OpenMP thinking >>> too early, but also look forward to coming architectures (e.g., AMD's >>> GPU-and-CPU on same die design). >>> >>> Dag Sverre >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> > Of course, a 'with barrier():' means you can apply it anywhere: with parallel(): lots of code with barrier(): single line of code But the trick for readable programs would be to find the section of code that is From markflorisson88 at gmail.com Sun Oct 9 15:44:05 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 9 Oct 2011 14:44:05 +0100 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> Message-ID: On 9 October 2011 14:39, mark florisson wrote: > On 9 October 2011 14:30, mark florisson wrote: >> On 9 October 2011 13:57, Dag Sverre Seljebotn >> wrote: >>> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote: >>>> >>>> On 10/09/2011 02:11 PM, mark florisson wrote: >>>>> >>>>> Hey, >>>>> >>>>> So far people have been enthusiastic about the cython.parallel features, >>>>> I think we should introduce some new features. I propose the following, >>>> >>>> Great!! >>>> >>>> I only have time for a very short feedback now, perhaps more will follow. >>>> >>>>> assume parallel has been imported from cython: >>>>> >>>>> with parallel.master(): >>>>> this is executed in the master thread in a parallel (non-prange) >>>>> section >>>>> >>>>> with parallel.single(): >>>>> same as master, except any thread may do the execution >>>>> >>>>> An optional keyword argument 'nowait' specifies whether there will be a >>>>> barrier at the end. The default is to wait. >>> >>> I like >>> >>> if parallel.is_master(): >>> ? ?... >>> explicit_barrier_somehow() # see below >>> >>> better as a Pythonization. One could easily support is_master to be used in >>> other contexts as well, simply by assigning a status flag in the master >>> block. >>> >>> Using an if-test flows much better with Python I feel, but that naturally >>> lead to making the barrier explicit. But I like the barrier always being >>> explicit, rather than having it as a predicate on all the different >>> constructs like in OpenMP.... >> >> Hmm, that might mean you also want the barrier for a prange in a >> parallel to be explicit. I like the 'if' test though, although it >> wouldn't make sense for 'single'. >> >>> I'm less sure about single, since making it a function indicates one could >>> use it in other contexts and the whole thing becomes too magic (since it's >>> tied to the position of invocation). I'm tempted to suggest >>> >>> for _ in prange(1): >>> ? ?... >>> >>> as our syntax for single. >> >> I think that syntax is absolutely terrible :) Perhaps single is not so >> important and one can just use master instead (or, if really needed, >> master + a task with the actual work). >> >>>>> >>>>> with parallel.task(): >>>>> create a task to be executed by some thread in the team >>>>> once a thread takes up the task it shall only be executed by that >>>>> thread and no other thread (so the task will be tied to the thread) >>>>> >>>>> C variables will be firstprivate >>>>> Python objects will be shared >>>>> >>>>> parallel.taskwait() # wait on any direct descendent tasks to finish >>>> >>>> Regarding tasks, I think this is mapping OpenMP too close to Python. >>>> Closures are excellent for the notion of a task, so I think something >>>> based on the futures API would work better. I realize that makes the >>>> mapping to OpenMP and implementation a bit more difficult, but I think >>>> it is worth it in the long run. >>>> >>>>> >>>>> with parallel.critical(): >>>>> this section of code is mutually exclusive with other critical sections >>>>> optional keyword argument 'name' specifies a name for the critical >>>>> section, >>>>> which means all sections with that name will exclude each other, >>>>> but not >>>>> critical sections with different names >>>>> >>>>> Note: all threads that encounter the section will execute it, just >>>>> not at the same time >>> >>> Yes, this works well as a with-statement... >>> >>> ..except that it is slightly magic in that it binds to call position (unlike >>> anything in Python). I.e. this would be more "correct", or at least >>> Pythonic: >>> >>> with parallel.critical(__file__, __line__): >>> ? ?... >>> >> >> I'm not entirely sure what you mean here. Critical is really about the >> block contained within, not about a position in a file. Not all >> threads have to encounter the critical region, and not specifying a >> name means you exclude with *all other* unnamed critical sections (not >> just this one). >> >>>>> >>>>> with parallel.barrier(): >>>>> all threads wait until everyone has reached the barrier >>>>> either no one or everyone should encounter the barrier >>>>> shared variables are flushed >>> >>> I have problems with requiring a noop with block... >>> >>> I'd much rather write >>> >>> parallel.barrier() >> >> Although in OpenMP it doesn't have any associated code, but we could >> give it those semantics: apply the barrier at the end of the block of >> code. The con is that the barrier is at the top while it only affects >> leaving the block, you would write: >> >> with parallel.barrier(): >> ? ?if rand() > .5: >> ? ? ? ?... >> ? ?else: >> ? ? ? ?... >> # the barrier is here >> >>> However, that ties a function call to the place of invocation, and suggests >>> that one could do >>> >>> if rand() > .5: >>> ? ?barrier() >>> else: >>> ? ?i += 3 >>> ? ?barrier() >>> >>> and have the same barrier in each case. Again, >>> >>> barrier(__file__, __line__) >>> >>> gets us purity at the cost of practicality. >> >> In this case (unlike the critical construct), yes. I think a warning >> in the docs stating that either all or none of the threads must >> encounter the barrier should suffice. >> >>> Another way is the pthreads >>> approach (although one may have to use pthread rather then OpenMP to get it, >>> unless there are named barriers?): >>> >>> barrier_a = parallel.barrier() >>> barrier_b = parallel.barrier() >>> with parallel: >>> ? ?barrier_a.wait() >>> ? ?if rand() > .5: >>> ? ? ? ?barrier_b.wait() >>> ? ?else: >>> ? ? ? ?i += 3 >>> ? ? ? ?barrier_b.wait() >>> >>> >>> I'm really not sure here. >> >> I think we should really just say to the user: "dont do this". There >> are no named barriers, implementing this wouldn't be easy at all (in >> fact, I'm not sure you can specify sane semantics for this if you have >> more branches and some do not contain the same barrier). The block >> structure for barriers would help here, as blocks are inconvenient to >> write: >> >> if C: >> ? ?with barrier(): ... >> else: >> ? ?with barrier(): ... >> >> is just not nice to write, you would instead write >> >> with barrier(): >> ? ?if C: >> ? ? ? ?... >> ? ?else: >> ? ? ? ?... > > This would also allow one to write > > with barrier(), master(): > ? ?... > > Basically it's up to the user to use it sensibly. Usually you want a > barrier to ensure that you have a well-defined state set by some code. > One could (correctly) only put the last line of such code in the with > block, but it would make more sense to put all associated code in > there. > > If there isn't really any associated code, you could just put 'pass' > in the block. > > Does that make sense? I haven't even convinced myself of it yet. > >>>>> >>>>> Unfortunately, gcc again manages to horribly break master and single >>>>> constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll >>>>> first file a bug report. Other (better) compilers like Portland (and I'm >>>>> sure Intel) work fine. I suppose a warning in the documentation will >>>>> suffice there. >>>>> >>>>> If we at some point implement vector/SIMD operations we could also try >>>>> out the Fortran openmp workshare construct. >>>> >>>> I'm starting to learn myself OpenCL as part of a course. It's very neat >>>> for some kinds of parallelism. What I'm saying is that at least of the >>>> case of SIMD, we should not lock ourselves to Fortran+OpenMP thinking >>>> too early, but also look forward to coming architectures (e.g., AMD's >>>> GPU-and-CPU on same die design). >>>> >>>> Dag Sverre >>>> _______________________________________________ >>>> cython-devel mailing list >>>> cython-devel at python.org >>>> http://mail.python.org/mailman/listinfo/cython-devel >>> >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >>> >> > > Of course, a 'with barrier():' means you can apply it anywhere: > > with parallel(): > ? ?lots of code > > ? ?with barrier(): > ? ? ? ?single line of code > > But the trick for readable programs would be to find the section of code that is > It seems I didn't finish my last mail. I wanted to say that readable programs would try to find a logical block of code which you're synchronizing on with the barrier. From stefan_ml at behnel.de Sun Oct 9 19:35:32 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 09 Oct 2011 19:35:32 +0200 Subject: [Cython] PyCon-DE wrap-up by Kay Hayen Message-ID: <4E91DB64.9050201@behnel.de> Hi, Kay Hayen wrote a blog post about his view of the first PyCon-DE, including a bit on the discussions I had with him about Nuitka. http://www.nuitka.net/blog/2011/10/pycon-de-2011-my-report/ It was interesting to see that Nuitka actually comes from the other side, meaning that it tries to be a pure Python compiler, but should at some point start to support (Python) type hints for the compiler. Cython made static types a language feature from the very beginning and is now fixing up the Python compatibility. So both systems will eventually become rather similar in what they achieve, with Cython being essentially a superset of the feature set of Nuitka due to its additional focus on talking to external libraries efficiently and supporting things like parallel loops or the PEP-3118 buffer interface. One of the impressions I took out of the technical discussions with Kay is that there isn't really a good reason why Cython should refuse to duplicate some of the inner mechanics of CPython for optimisation purposes. Nuitka appears to be somewhat more aggressive here, partly because Kay doesn't currently care all that much about portability (e.g. to Python 3). I was previously very opposed to that (you may remember my opposition to the list.pop() optimisation), but now I think that we have to fix up the generated code for each new major CPython release anyway, so it won't make a difference if we have to rework some more of the code because a bit of those inner workings changed. They sure won't change for released CPython versions anymore, and many implementation details are unlikely enough to change for years to come. It's good to continue to be considerate about such changes, but some of them may well bring another serious bit of performance without introducing real portability risks. Changes like the Unicode string restructuring in PEP-393 show that even relying on official and long standing parts of the C-API isn't enough to guarantee that code still works as expected in new releases, so we may just as well start digging deeper. Stefan From markflorisson88 at gmail.com Sun Oct 9 19:57:16 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 9 Oct 2011 18:57:16 +0100 Subject: [Cython] PyCon-DE wrap-up by Kay Hayen In-Reply-To: <4E91DB64.9050201@behnel.de> References: <4E91DB64.9050201@behnel.de> Message-ID: On 9 October 2011 18:35, Stefan Behnel wrote: > Hi, > > Kay Hayen wrote a blog post about his view of the first PyCon-DE, including > a bit on the discussions I had with him about Nuitka. > > http://www.nuitka.net/blog/2011/10/pycon-de-2011-my-report/ > > It was interesting to see that Nuitka actually comes from the other side, > meaning that it tries to be a pure Python compiler, but should at some point > start to support (Python) type hints for the compiler. Cython made static > types a language feature from the very beginning and is now fixing up the > Python compatibility. So both systems will eventually become rather similar > in what they achieve, with Cython being essentially a superset of the > feature set of Nuitka due to its additional focus on talking to external > libraries efficiently and supporting things like parallel loops or the > PEP-3118 buffer interface. > > One of the impressions I took out of the technical discussions with Kay is > that there isn't really a good reason why Cython should refuse to duplicate > some of the inner mechanics of CPython for optimisation purposes. Nuitka > appears to be somewhat more aggressive here, partly because Kay doesn't > currently care all that much about portability (e.g. to Python 3). Interesting. What kind of (significant) optimizations could be made by duplicating code? Do you want to duplicate entire functions or do you want to inline parts of those? I actually think we should not get too tied to CPython, e.g. what if PyPy gets a CPython compatible API, or possibly a subset like PEP 384? > I was previously very opposed to that (you may remember my opposition to the > list.pop() optimisation), but now I think that we have to fix up the > generated code for each new major CPython release anyway, so it won't make a > difference if we have to rework some more of the code because a bit of those > inner workings changed. They sure won't change for released CPython versions > anymore, and many implementation details are unlikely enough to change for > years to come. It's good to continue to be considerate about such changes, > but some of them may well bring another serious bit of performance without > introducing real portability risks. Changes like the Unicode string > restructuring in PEP-393 show that even relying on official and long > standing parts of the C-API isn't enough to guarantee that code still works > as expected in new releases, so we may just as well start digging deeper. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From jonovik at gmail.com Sun Oct 9 20:54:01 2011 From: jonovik at gmail.com (Jon Olav Vik) Date: Sun, 9 Oct 2011 20:54:01 +0200 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: <4E919A40.2090001@astro.uio.no> References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> Message-ID: On Sun, Oct 9, 2011 at 2:57 PM, Dag Sverre Seljebotn wrote: >>> with parallel.single(): >>> same as master, except any thread may do the execution >>> >>> An optional keyword argument 'nowait' specifies whether there will be a >>> barrier at the end. The default is to wait. > > I like > > if parallel.is_master(): > ? ?... > explicit_barrier_somehow() # see below > > better as a Pythonization. One could easily support is_master to be used in > other contexts as well, simply by assigning a status flag in the master > block. > > Using an if-test flows much better with Python I feel, but that naturally > lead to making the barrier explicit. But I like the barrier always being > explicit, rather than having it as a predicate on all the different > constructs like in OpenMP.... Personally, I think I'd prefer find context managers as a very readable way to deal with parallelism, similar to the "threading" module: http://docs.python.org/library/threading.html#using-locks-conditions-and-semaphores-in-the-with-statement From markflorisson88 at gmail.com Sun Oct 9 21:01:00 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 9 Oct 2011 20:01:00 +0100 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> Message-ID: On 9 October 2011 19:54, Jon Olav Vik wrote: > On Sun, Oct 9, 2011 at 2:57 PM, Dag Sverre Seljebotn > wrote: >>>> with parallel.single(): >>>> same as master, except any thread may do the execution >>>> >>>> An optional keyword argument 'nowait' specifies whether there will be a >>>> barrier at the end. The default is to wait. >> >> I like >> >> if parallel.is_master(): >> ? ?... >> explicit_barrier_somehow() # see below >> >> better as a Pythonization. One could easily support is_master to be used in >> other contexts as well, simply by assigning a status flag in the master >> block. >> >> Using an if-test flows much better with Python I feel, but that naturally >> lead to making the barrier explicit. But I like the barrier always being >> explicit, rather than having it as a predicate on all the different >> constructs like in OpenMP.... > > Personally, I think I'd prefer find context managers as a very > readable way to deal with parallelism, similar to the "threading" > module: > > http://docs.python.org/library/threading.html#using-locks-conditions-and-semaphores-in-the-with-statement > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > Yeah it makes a lot of sense for mutual exclusion, but 'master' really means "only the master thread executes this peace of code, even though other threads encounter the same code", which is more akin to 'if' than 'with'. From jonovik at gmail.com Sun Oct 9 22:48:49 2011 From: jonovik at gmail.com (Jon Olav Vik) Date: Sun, 9 Oct 2011 22:48:49 +0200 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> Message-ID: On Sun, Oct 9, 2011 at 9:01 PM, mark florisson wrote: > On 9 October 2011 19:54, Jon Olav Vik wrote: >> Personally, I think I'd prefer context managers as a very >> readable way to deal with parallelism > > Yeah it makes a lot of sense for mutual exclusion, but 'master' really > means "only the master thread executes this peace of code, even though > other threads encounter the same code", which is more akin to 'if' > than 'with'. I see your point. However, another similarity with "with" statements as an encapsulated "try..finally" is when there's a barrier at the end of the block. I can live with some magic if it saves me from having a boilerplate line of "barrier" everywhere 8-) From markflorisson88 at gmail.com Sun Oct 9 23:27:37 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 9 Oct 2011 22:27:37 +0100 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> Message-ID: On 9 October 2011 21:48, Jon Olav Vik wrote: > On Sun, Oct 9, 2011 at 9:01 PM, mark florisson > wrote: >> On 9 October 2011 19:54, Jon Olav Vik wrote: >>> Personally, I think I'd prefer context managers as a very >>> readable way to deal with parallelism >> >> Yeah it makes a lot of sense for mutual exclusion, but 'master' really >> means "only the master thread executes this peace of code, even though >> other threads encounter the same code", which is more akin to 'if' >> than 'with'. > > I see your point. However, another similarity with "with" statements > as an encapsulated "try..finally" is when there's a barrier at the end > of the block. I can live with some magic if it saves me from having a > boilerplate line of "barrier" everywhere 8-) > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > Hm, indeed. I just noticed that unlike single constructs, master constructs don't have barriers. Both are also not allowed to be closely nested in worksharing constructs. I think the single directive is more useful with respect to tasks, e.g. have a single thread generate tasks and have other threads waiting at the barrier execute them. In that sense I suppose 'if parallel.is_master():' makes sense (no barrier, master thread) and 'with single():' (with barrier, any thread). We could still support single in prange though, if we simply have the master thread execute it ('if (omp_get_thread_num() == 0)') and put a barrier after the block. This makes me wonder what the point of master was supposed to be... From markflorisson88 at gmail.com Mon Oct 10 10:12:52 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 10 Oct 2011 09:12:52 +0100 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> Message-ID: On 9 October 2011 22:27, mark florisson wrote: > > On 9 October 2011 21:48, Jon Olav Vik wrote: > > On Sun, Oct 9, 2011 at 9:01 PM, mark florisson > > wrote: > >> On 9 October 2011 19:54, Jon Olav Vik wrote: > >>> Personally, I think I'd prefer context managers as a very > >>> readable way to deal with parallelism > >> > >> Yeah it makes a lot of sense for mutual exclusion, but 'master' really > >> means "only the master thread executes this peace of code, even though > >> other threads encounter the same code", which is more akin to 'if' > >> than 'with'. > > > > I see your point. However, another similarity with "with" statements > > as an encapsulated "try..finally" is when there's a barrier at the end > > of the block. I can live with some magic if it saves me from having a > > boilerplate line of "barrier" everywhere 8-) > > _______________________________________________ > > cython-devel mailing list > > cython-devel at python.org > > http://mail.python.org/mailman/listinfo/cython-devel > > > > Hm, indeed. I just noticed that unlike single constructs, master > constructs don't have barriers. Both are also not allowed to be > closely nested in worksharing constructs. I think the single directive > is more useful with respect to tasks, e.g. have a single thread > generate tasks and have other threads waiting at the barrier execute > them. In that sense I suppose 'if parallel.is_master():' makes sense > (no barrier, master thread) and 'with single():' (with barrier, any > thread). > > We could still support single in prange though, if we simply have the > master thread execute it ('if (omp_get_thread_num() == 0)') and put a > barrier after the block. This makes me wonder what the point of master > was supposed to be... Scratch that last part about master/single in parallel sections, it doesn't make sense. It only makes sense if you think of those sections as tasks you submit that would be immediately taken up by a (certain) thread. But that's not quite what it means. I do like 'if is_master()' and 'with single', though. Another thing we could support is arbitrary reductions. In OpenMP 3.1 you get reduction operators 'and', 'max' and 'min', but it wouldn't be hard to support arbitrary user functions. e.g. @cython.reduction cdef int func(int a, int b): ... for i in prange(...): a = func(a, b) I'm not sure how common this is though. You probably have your reduction data in an array so you're already using numpy so you'll likely already have your functionality. From stefan_ml at behnel.de Mon Oct 10 10:38:35 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 10 Oct 2011 10:38:35 +0200 Subject: [Cython] PyCon-DE wrap-up by Kay Hayen In-Reply-To: References: <4E91DB64.9050201@behnel.de> Message-ID: <4E92AF0B.6070905@behnel.de> mark florisson, 09.10.2011 19:57: > On 9 October 2011 18:35, Stefan Behnel wrote: >> One of the impressions I took out of the technical discussions with Kay is >> that there isn't really a good reason why Cython should refuse to duplicate >> some of the inner mechanics of CPython for optimisation purposes. Nuitka >> appears to be somewhat more aggressive here, partly because Kay doesn't >> currently care all that much about portability (e.g. to Python 3). > > Interesting. What kind of (significant) optimizations could be made by > duplicating code? Do you want to duplicate entire functions or do you > want to inline parts of those? I was mainly referring to things like direct access to type/object struct fields and little things like that. They can make a difference especially in loops, compared to calling into a generic C-API function. For example, we could have our own interned implementation of PyDict_Next(). I'm not very impressed by the performance of that C-API function - repeated calls to GetItem can be faster than looping over a dict with PyDict_Next()! That being said, I wasn't referring to any specific changes. It was more of a general remark about the invisible line that we currently draw in Cython. > I actually think we should not get too tied to CPython, e.g. what if > PyPy gets a CPython compatible API It already implements a part of the C-API: http://morepypy.blogspot.com/2010/04/using-cpython-extension-modules-with.html However, if we really want to support it at that level, there's likely more to do than just removing low-level optimisations. And that would take the normal route that we always use: macros and conditionally compiled inline functions. The mere fact that we try to support different targets doesn't mean that we should stop optimising for specific targets. The same is true for different versions of CPython, where we often use better optimisations in newer releases, without sacrificing backwards compatibility. Personally, I think that supporting PyPy at the Python level is a lot more interesting, although it may be easier to get it working at the cpyext level. > or possibly a subset like PEP 384? That's currently not very interesting since there are basically no C extensions around (generated or hand written) that restrict themselves to that API. Supporting it in Cython would mean that we have to rewrite huge parts of the generated C code. It's not even clear to me yet that we *can* implement all of Cython's features based on PEP 384. For example, fast indexing into lists and tuples is basically a no-no in the restricted C-API. There are tons of rather unexpected restrictions like this. Stefan From markflorisson88 at gmail.com Mon Oct 10 21:59:11 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 10 Oct 2011 20:59:11 +0100 Subject: [Cython] PyCon-DE wrap-up by Kay Hayen In-Reply-To: <4E92AF0B.6070905@behnel.de> References: <4E91DB64.9050201@behnel.de> <4E92AF0B.6070905@behnel.de> Message-ID: On 10 October 2011 09:38, Stefan Behnel wrote: > mark florisson, 09.10.2011 19:57: >> >> On 9 October 2011 18:35, Stefan Behnel wrote: >>> >>> One of the impressions I took out of the technical discussions with Kay >>> is >>> that there isn't really a good reason why Cython should refuse to >>> duplicate >>> some of the inner mechanics of CPython for optimisation purposes. Nuitka >>> appears to be somewhat more aggressive here, partly because Kay doesn't >>> currently care all that much about portability (e.g. to Python 3). >> >> Interesting. What kind of (significant) optimizations could be made by >> duplicating code? Do you want to duplicate entire functions or do you >> want to inline parts of those? > > I was mainly referring to things like direct access to type/object struct > fields and little things like that. Ah, I see. I suppose that if you do everything through Cython-specific macros it will be easy to change it at any time and it will make it easy to experiment with performance as well. > They can make a difference especially in > loops, compared to calling into a generic C-API function. For example, we > could have our own interned implementation of PyDict_Next(). I'm not very > impressed by the performance of that C-API function - repeated calls to > GetItem can be faster than looping over a dict with PyDict_Next()! > > That being said, I wasn't referring to any specific changes. It was more of > a general remark about the invisible line that we currently draw in Cython. > > >> I actually think we should not get too tied to CPython, e.g. what if >> PyPy gets a CPython compatible API > > It already implements a part of the C-API: > > http://morepypy.blogspot.com/2010/04/using-cpython-extension-modules-with.html > > However, if we really want to support it at that level, there's likely more > to do than just removing low-level optimisations. And that would take the > normal route that we always use: macros and conditionally compiled inline > functions. The mere fact that we try to support different targets doesn't > mean that we should stop optimising for specific targets. The same is true > for different versions of CPython, where we often use better optimisations > in newer releases, without sacrificing backwards compatibility. > > Personally, I think that supporting PyPy at the Python level is a lot more > interesting, although it may be easier to get it working at the cpyext > level. > Yeah it's certainly interesting. It might be hard to support things like cython.parallel and efficient buffer access though. I think releasing the GIL might not be very easy either, although perhaps that could be circumvented by factoring the entire nogil block out into a C function which you call with ctypes. >> or possibly a subset like PEP 384? > > That's currently not very interesting since there are basically no C > extensions around (generated or hand written) that restrict themselves to > that API. Supporting it in Cython would mean that we have to rewrite huge > parts of the generated C code. It's not even clear to me yet that we *can* > implement all of Cython's features based on PEP 384. For example, fast > indexing into lists and tuples is basically a no-no in the restricted C-API. > There are tons of rather unexpected restrictions like this. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From robertwb at math.washington.edu Tue Oct 11 08:11:06 2011 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Mon, 10 Oct 2011 23:11:06 -0700 Subject: [Cython] PyCon-DE wrap-up by Kay Hayen In-Reply-To: <4E92AF0B.6070905@behnel.de> References: <4E91DB64.9050201@behnel.de> <4E92AF0B.6070905@behnel.de> Message-ID: Thanks for the update and link. Sounds like PyCon-DE went well. On Mon, Oct 10, 2011 at 1:38 AM, Stefan Behnel wrote: > mark florisson, 09.10.2011 19:57: >> >> On 9 October 2011 18:35, Stefan Behnel wrote: >>> >>> One of the impressions I took out of the technical discussions with Kay >>> is >>> that there isn't really a good reason why Cython should refuse to >>> duplicate >>> some of the inner mechanics of CPython for optimisation purposes. Nuitka >>> appears to be somewhat more aggressive here, partly because Kay doesn't >>> currently care all that much about portability (e.g. to Python 3). >> >> Interesting. What kind of (significant) optimizations could be made by >> duplicating code? Do you want to duplicate entire functions or do you >> want to inline parts of those? > > I was mainly referring to things like direct access to type/object struct > fields and little things like that. They can make a difference especially in > loops, compared to calling into a generic C-API function. For example, we > could have our own interned implementation of PyDict_Next(). I'm not very > impressed by the performance of that C-API function - repeated calls to > GetItem can be faster than looping over a dict with PyDict_Next()! > > That being said, I wasn't referring to any specific changes. It was more of > a general remark about the invisible line that we currently draw in Cython. CPython, especially the internals, is a slow enough moving target that I'm not too concerned about reaching into the internals if there is a clear benefit. If we're flexible enough to support 2.x and 3.x, I think we can handle 3.(x+1) when it comes. >> I actually think we should not get too tied to CPython, e.g. what if >> PyPy gets a CPython compatible API > > It already implements a part of the C-API: > > http://morepypy.blogspot.com/2010/04/using-cpython-extension-modules-with.html > > However, if we really want to support it at that level, there's likely more > to do than just removing low-level optimisations. And that would take the > normal route that we always use: macros and conditionally compiled inline > functions. The mere fact that we try to support different targets doesn't > mean that we should stop optimising for specific targets. +1 > The same is true > for different versions of CPython, where we often use better optimisations > in newer releases, without sacrificing backwards compatibility. > > Personally, I think that supporting PyPy at the Python level is a lot more > interesting, although it may be easier to get it working at the cpyext > level. > > >> or possibly a subset like PEP 384? > > That's currently not very interesting since there are basically no C > extensions around (generated or hand written) that restrict themselves to > that API. Supporting it in Cython would mean that we have to rewrite huge > parts of the generated C code. It's not even clear to me yet that we *can* > implement all of Cython's features based on PEP 384. For example, fast > indexing into lists and tuples is basically a no-no in the restricted C-API. > There are tons of rather unexpected restrictions like this. I agree, PEP 384 is a nice idea, but it seems to be a rather lot of work for an unclear/small benefit (compared to other stuff we could be doing.) - Robert From robertwb at math.washington.edu Wed Oct 12 09:55:55 2011 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 12 Oct 2011 00:55:55 -0700 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: <4E919A40.2090001@astro.uio.no> References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> Message-ID: On Sun, Oct 9, 2011 at 5:57 AM, Dag Sverre Seljebotn wrote: > On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote: >> >> On 10/09/2011 02:11 PM, mark florisson wrote: >>> >>> Hey, >>> >>> So far people have been enthusiastic about the cython.parallel features, >>> I think we should introduce some new features. Excellent. I think this is going to become a killer feature like buffer support. >>> I propose the following, >> >> Great!! >> >> I only have time for a very short feedback now, perhaps more will follow. >> >>> assume parallel has been imported from cython: >>> >>> with parallel.master(): >>> this is executed in the master thread in a parallel (non-prange) >>> section >>> >>> with parallel.single(): >>> same as master, except any thread may do the execution >>> >>> An optional keyword argument 'nowait' specifies whether there will be a >>> barrier at the end. The default is to wait. > > I like > > if parallel.is_master(): > ? ?... > explicit_barrier_somehow() # see below > > better as a Pythonization. One could easily support is_master to be used in > other contexts as well, simply by assigning a status flag in the master > block. +1, the if statement feels a lot more natural. > Using an if-test flows much better with Python I feel, but that naturally > lead to making the barrier explicit. But I like the barrier always being > explicit, rather than having it as a predicate on all the different > constructs like in OpenMP.... > > I'm less sure about single, since making it a function indicates one could > use it in other contexts and the whole thing becomes too magic (since it's > tied to the position of invocation). I'm tempted to suggest > > for _ in prange(1): > ? ?... > > as our syntax for single. The idea here is that you want a block of code executed once, presumably by the first thread that gets here? I think this could also be handled by a if statement, perhaps "if parallel.first()" or something like that. Is there anything special about this construct that couldn't simply be done by flushing/checking a variable? >>> with parallel.task(): >>> create a task to be executed by some thread in the team >>> once a thread takes up the task it shall only be executed by that >>> thread and no other thread (so the task will be tied to the thread) >>> >>> C variables will be firstprivate >>> Python objects will be shared >>> >>> parallel.taskwait() # wait on any direct descendent tasks to finish >> >> Regarding tasks, I think this is mapping OpenMP too close to Python. >> Closures are excellent for the notion of a task, so I think something >> based on the futures API would work better. I realize that makes the >> mapping to OpenMP and implementation a bit more difficult, but I think >> it is worth it in the long run. It's almost as if you're reading my thoughts. There are much more natural task APIs, e.g. futures or the way the Python threading/multiprocessing does things. >>> with parallel.critical(): >>> this section of code is mutually exclusive with other critical sections >>> optional keyword argument 'name' specifies a name for the critical >>> section, >>> which means all sections with that name will exclude each other, >>> but not >>> critical sections with different names >>> >>> Note: all threads that encounter the section will execute it, just >>> not at the same time > > Yes, this works well as a with-statement... > > ..except that it is slightly magic in that it binds to call position (unlike > anything in Python). I.e. this would be more "correct", or at least > Pythonic: > > with parallel.critical(__file__, __line__): > ? ?... This feels a lot like a lock, which of course fits well with the with statement. >>> with parallel.barrier(): >>> all threads wait until everyone has reached the barrier >>> either no one or everyone should encounter the barrier >>> shared variables are flushed > > I have problems with requiring a noop with block... > > I'd much rather write > > parallel.barrier() > > However, that ties a function call to the place of invocation, and suggests > that one could do > > if rand() > .5: > ? ?barrier() > else: > ? ?i += 3 > ? ?barrier() > > and have the same barrier in each case. Again, > > barrier(__file__, __line__) > > gets us purity at the cost of practicality. Another way is the pthreads > approach (although one may have to use pthread rather then OpenMP to get it, > unless there are named barriers?): > > barrier_a = parallel.barrier() > barrier_b = parallel.barrier() > with parallel: > ? ?barrier_a.wait() > ? ?if rand() > .5: > ? ? ? ?barrier_b.wait() > ? ?else: > ? ? ? ?i += 3 > ? ? ? ?barrier_b.wait() > > > I'm really not sure here. I agree, the barrier doesn't seem like it belongs in a context. For example, it's ambiguous whether the block is supposed to proceed or succeed the barrier. I like the named barrier idea, but if that's not feasible we could perhaps use control flow to disallow conditionally calling barriers (or that every path calls the barrier (an equal number of times?)). >>> Unfortunately, gcc again manages to horribly break master and single >>> constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll >>> first file a bug report. Other (better) compilers like Portland (and I'm >>> sure Intel) work fine. I suppose a warning in the documentation will >>> suffice there. One can emit conditional #error pragmas in this case, though of course it's better to produce code that works correctly on all compilers. >>> If we at some point implement vector/SIMD operations we could also try >>> out the Fortran openmp workshare construct. >> >> I'm starting to learn myself OpenCL as part of a course. It's very neat >> for some kinds of parallelism. What I'm saying is that at least of the >> case of SIMD, we should not lock ourselves to Fortran+OpenMP thinking >> too early, but also look forward to coming architectures (e.g., AMD's >> GPU-and-CPU on same die design). +1. I like the idea of providing more parallelism constructs, but rather than risk fixating on OpenMP's model, perhaps we should look at the problem we're trying to solve (e.g., what can't one do well now) and create (or more likely borrow) the right Pythonic API to do it. - Robert From robertwb at math.washington.edu Wed Oct 12 10:00:00 2011 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 12 Oct 2011 01:00:00 -0700 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> Message-ID: On Mon, Oct 10, 2011 at 1:12 AM, mark florisson wrote: > On 9 October 2011 22:27, mark florisson wrote: >> >> On 9 October 2011 21:48, Jon Olav Vik wrote: >> > On Sun, Oct 9, 2011 at 9:01 PM, mark florisson >> > wrote: >> >> On 9 October 2011 19:54, Jon Olav Vik wrote: >> >>> Personally, I think I'd prefer context managers as a very >> >>> readable way to deal with parallelism >> >> >> >> Yeah it makes a lot of sense for mutual exclusion, but 'master' really >> >> means "only the master thread executes this peace of code, even though >> >> other threads encounter the same code", which is more akin to 'if' >> >> than 'with'. >> > >> > I see your point. However, another similarity with "with" statements >> > as an encapsulated "try..finally" is when there's a barrier at the end >> > of the block. I can live with some magic if it saves me from having a >> > boilerplate line of "barrier" everywhere 8-) >> > _______________________________________________ >> > cython-devel mailing list >> > cython-devel at python.org >> > http://mail.python.org/mailman/listinfo/cython-devel >> > >> >> Hm, indeed. I just noticed that unlike single constructs, master >> constructs don't have barriers. Both are also not allowed to be >> closely nested in worksharing constructs. I think the single directive >> is more useful with respect to tasks, e.g. have a single thread >> generate tasks and have other threads waiting at the barrier execute >> them. In that sense I suppose 'if parallel.is_master():' makes sense >> (no barrier, master thread) and 'with single():' (with barrier, any >> thread). >> >> We could still support single in prange though, if we simply have the >> master thread execute it ('if (omp_get_thread_num() == 0)') and put a >> barrier after the block. This makes me wonder what the point of master >> was supposed to be... > > Scratch that last part about master/single in parallel sections, it > doesn't make sense. It only makes sense if you think of those sections > as tasks you submit that would be immediately taken up by a (certain) > thread. But that's not quite what it means. I do like 'if is_master()' > and 'with single', though. > > Another thing we could support is arbitrary reductions. In OpenMP 3.1 > you get reduction operators 'and', 'max' and 'min', but it wouldn't be > hard to support arbitrary user functions. e.g. > > @cython.reduction > cdef int func(int a, int b): > ? ?... > > for i in prange(...): > ? ?a = func(a, b) Interesting idea. An alternative syntax could be a = cython.parallel.reduce(func, a, b) > I'm not sure how common this is though. You probably have your > reduction data in an array so you're already using numpy so you'll > likely already have your functionality. > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From d.s.seljebotn at astro.uio.no Wed Oct 12 10:36:16 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 12 Oct 2011 10:36:16 +0200 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> Message-ID: <4E955180.1070601@astro.uio.no> On 10/12/2011 09:55 AM, Robert Bradshaw wrote: > On Sun, Oct 9, 2011 at 5:57 AM, Dag Sverre Seljebotn > wrote: >> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote: >>> >>> On 10/09/2011 02:11 PM, mark florisson wrote: >>>> >>>> Hey, >>>> >>>> So far people have been enthusiastic about the cython.parallel features, >>>> I think we should introduce some new features. > > Excellent. I think this is going to become a killer feature like > buffer support. > >>>> I propose the following, >>> >>> Great!! >>> >>> I only have time for a very short feedback now, perhaps more will follow. >>> >>>> assume parallel has been imported from cython: >>>> >>>> with parallel.master(): >>>> this is executed in the master thread in a parallel (non-prange) >>>> section >>>> >>>> with parallel.single(): >>>> same as master, except any thread may do the execution >>>> >>>> An optional keyword argument 'nowait' specifies whether there will be a >>>> barrier at the end. The default is to wait. >> >> I like >> >> if parallel.is_master(): >> ... >> explicit_barrier_somehow() # see below >> >> better as a Pythonization. One could easily support is_master to be used in >> other contexts as well, simply by assigning a status flag in the master >> block. > > +1, the if statement feels a lot more natural. > >> Using an if-test flows much better with Python I feel, but that naturally >> lead to making the barrier explicit. But I like the barrier always being >> explicit, rather than having it as a predicate on all the different >> constructs like in OpenMP.... >> >> I'm less sure about single, since making it a function indicates one could >> use it in other contexts and the whole thing becomes too magic (since it's >> tied to the position of invocation). I'm tempted to suggest >> >> for _ in prange(1): >> ... >> >> as our syntax for single. Just to be clear: My point was that the above implements single behaviour even now, without any extra effort. > > The idea here is that you want a block of code executed once, > presumably by the first thread that gets here? I think this could also > be handled by a if statement, perhaps "if parallel.first()" or > something like that. Is there anything special about this construct > that couldn't simply be done by flushing/checking a variable? Good point. I think there's a problem with OpenMP that it has too many primitives for similar things. I'm -1 on single -- either using a for loop or flag+flush is more to type, but more readable to people who don't know cython.parallel (look: Python even makes "self." explicit -- the bias in language design is clearly on readability rather than writability). I thought of "if is_first()" as well, but my problem is again that it binds to the location of the call. if foo: if parallel.is_first(): ... else: if parallel.is_first(): ... can not be refactored to: if parallel.is_first(): if foo: ... else: ... which I think is highly confusing for people who didn't write the code and don't know the details of cython.parallel. (Unlike is_master(), which works the same either way). I think we should aim for something that's as easy to read as possible for Python users with no cython.parallel knowledge. > >>>> with parallel.task(): >>>> create a task to be executed by some thread in the team >>>> once a thread takes up the task it shall only be executed by that >>>> thread and no other thread (so the task will be tied to the thread) >>>> >>>> C variables will be firstprivate >>>> Python objects will be shared >>>> >>>> parallel.taskwait() # wait on any direct descendent tasks to finish >>> >>> Regarding tasks, I think this is mapping OpenMP too close to Python. >>> Closures are excellent for the notion of a task, so I think something >>> based on the futures API would work better. I realize that makes the >>> mapping to OpenMP and implementation a bit more difficult, but I think >>> it is worth it in the long run. > > It's almost as if you're reading my thoughts. There are much more > natural task APIs, e.g. futures or the way the Python > threading/multiprocessing does things. > >>>> with parallel.critical(): >>>> this section of code is mutually exclusive with other critical sections >>>> optional keyword argument 'name' specifies a name for the critical >>>> section, >>>> which means all sections with that name will exclude each other, >>>> but not >>>> critical sections with different names >>>> >>>> Note: all threads that encounter the section will execute it, just >>>> not at the same time >> >> Yes, this works well as a with-statement... >> >> ..except that it is slightly magic in that it binds to call position (unlike >> anything in Python). I.e. this would be more "correct", or at least >> Pythonic: >> >> with parallel.critical(__file__, __line__): >> ... Mark: I stand corrected on this point. +1 on your critical proposal. > This feels a lot like a lock, which of course fits well with the with > statement. > >>>> with parallel.barrier(): >>>> all threads wait until everyone has reached the barrier >>>> either no one or everyone should encounter the barrier >>>> shared variables are flushed >> >> I have problems with requiring a noop with block... >> >> I'd much rather write >> >> parallel.barrier() >> >> However, that ties a function call to the place of invocation, and suggests >> that one could do >> >> if rand()> .5: >> barrier() >> else: >> i += 3 >> barrier() >> >> and have the same barrier in each case. Again, >> >> barrier(__file__, __line__) >> >> gets us purity at the cost of practicality. Another way is the pthreads >> approach (although one may have to use pthread rather then OpenMP to get it, >> unless there are named barriers?): >> >> barrier_a = parallel.barrier() >> barrier_b = parallel.barrier() >> with parallel: >> barrier_a.wait() >> if rand()> .5: >> barrier_b.wait() >> else: >> i += 3 >> barrier_b.wait() >> >> >> I'm really not sure here. > > I agree, the barrier doesn't seem like it belongs in a context. For > example, it's ambiguous whether the block is supposed to proceed or > succeed the barrier. I like the named barrier idea, but if that's not > feasible we could perhaps use control flow to disallow conditionally > calling barriers (or that every path calls the barrier (an equal > number of times?)). It is always an option to go beyond OpenMP. Pthread barriers are a lot more powerful in this way, and with pthread and Windows covered I think we should be good... IIUC, you can't have different path calling the barrier the same number of times, it's merely #pragma omp barrier and a seperate barrier statement gets another counter. Which is why I think it is not powerful enough and we should use pthreads. > +1. I like the idea of providing more parallelism constructs, but > rather than risk fixating on OpenMP's model, perhaps we should look at > the problem we're trying to solve (e.g., what can't one do well now) > and create (or more likely borrow) the right Pythonic API to do it. Also, quick and flexible message-passing between threads/processes through channels is becoming an increasingly popular concept. Go even has a seperate syntax for channel communication, and zeromq is becoming popular for distributed work. The is a problem Cython may need to solve here, since one currently has to use very low-level C to do it quickly (either zeromq or pthreads in most cases -- I guess, an OpenMP critical section would help in implementing a queue though). I wouldn't resist a builtin "channel" type in Cython (since we don't have full templating/generics, it would be the only way of sending typed data conveniently?). I ultimately feel things like that is more important than 100% coverage of the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit. Dag Sverre From d.s.seljebotn at astro.uio.no Wed Oct 12 10:49:09 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 12 Oct 2011 10:49:09 +0200 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: <4E955180.1070601@astro.uio.no> References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> <4E955180.1070601@astro.uio.no> Message-ID: <4E955485.7060809@astro.uio.no> On 10/12/2011 10:36 AM, Dag Sverre Seljebotn wrote: > On 10/12/2011 09:55 AM, Robert Bradshaw wrote: >> On Sun, Oct 9, 2011 at 5:57 AM, Dag Sverre Seljebotn >> wrote: >>> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote: >>>> >>>> On 10/09/2011 02:11 PM, mark florisson wrote: >>>>> with parallel.critical(): >>>>> this section of code is mutually exclusive with other critical >>>>> sections >>>>> optional keyword argument 'name' specifies a name for the critical >>>>> section, >>>>> which means all sections with that name will exclude each other, >>>>> but not >>>>> critical sections with different names >>>>> >>>>> Note: all threads that encounter the section will execute it, just >>>>> not at the same time >>> On critical sections, I do feel string naming is rather un-Pythonic. I'd rather have lock_a = parallel.Mutex() lock_b = parallel.Mutex() with cython.parallel: with lock_a: ... with lock_b: ... This maps well to pthread mutexes, though much harder to map it to OpenMP... So my proposal is: a) parallel.Mutex() can take a string argument and then returns the same mutex each time for the same string, meaning you can do with parallel.Mutex("somename"): which maps directly to OpenMP. b) However, this does not make sense: with parallel.Mutex(): because each thread would instantiate a *seperate* mutex. So raise compiler error ("Redundant code, thread will never block on fresh mutex") c) However, one can use a default global Mutex instance: with parallel.global_mutex (mapping to an un-named critical in OpenMP) This seems to be simple enough to implement, and allows generalizing to the advanced case above later (probably using pthreads/Windows directly). Dag Sverre From robertwb at math.washington.edu Wed Oct 12 11:08:42 2011 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 12 Oct 2011 02:08:42 -0700 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: <4E955180.1070601@astro.uio.no> References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> <4E955180.1070601@astro.uio.no> Message-ID: On Wed, Oct 12, 2011 at 1:36 AM, Dag Sverre Seljebotn wrote: > On 10/12/2011 09:55 AM, Robert Bradshaw wrote: >>> I'm less sure about single, since making it a function indicates one >>> could >>> use it in other contexts and the whole thing becomes too magic (since >>> it's >>> tied to the position of invocation). I'm tempted to suggest >>> >>> for _ in prange(1): >>> ? ?... >>> >>> as our syntax for single. > > Just to be clear: My point was that the above implements single behaviour > even now, without any extra effort. > >> >> The idea here is that you want a block of code executed once, >> presumably by the first thread that gets here? I think this could also >> be handled by a if statement, perhaps "if parallel.first()" or >> something like that. Is there anything special about this construct >> that couldn't simply be done by flushing/checking a variable? > > Good point. I think there's a problem with OpenMP that it has too many > primitives for similar things. > > I'm -1 on single -- either using a for loop or flag+flush is more to type, > but more readable to people who don't know cython.parallel (look: Python > even makes "self." explicit -- the bias in language design is clearly on > readability rather than writability). > > I thought of "if is_first()" as well, but my problem is again that it binds > to the location of the call. > > if foo: > ? ?if parallel.is_first(): > ? ? ? ?... > else: > ? ?if parallel.is_first(): > ? ? ? ?... > > can not be refactored to: > > if parallel.is_first(): > ? ?if foo: > ? ? ? ?... > ? ?else: > ? ? ? ?... > > which I think is highly confusing for people who didn't write the code and > don't know the details of cython.parallel. (Unlike is_master(), which works > the same either way). > > I think we should aim for something that's as easy to read as possible for > Python users with no cython.parallel knowledge. Exactly. This is what's so beautiful about prange. >>>>> with parallel.barrier(): >>>>> all threads wait until everyone has reached the barrier >>>>> either no one or everyone should encounter the barrier >>>>> shared variables are flushed >>> >>> I have problems with requiring a noop with block... >>> >>> I'd much rather write >>> >>> parallel.barrier() >>> >>> However, that ties a function call to the place of invocation, and >>> suggests >>> that one could do >>> >>> if rand()> ?.5: >>> ? ?barrier() >>> else: >>> ? ?i += 3 >>> ? ?barrier() >>> >>> and have the same barrier in each case. Again, >>> >>> barrier(__file__, __line__) >>> >>> gets us purity at the cost of practicality. Another way is the pthreads >>> approach (although one may have to use pthread rather then OpenMP to get >>> it, >>> unless there are named barriers?): >>> >>> barrier_a = parallel.barrier() >>> barrier_b = parallel.barrier() >>> with parallel: >>> ? ?barrier_a.wait() >>> ? ?if rand()> ?.5: >>> ? ? ? ?barrier_b.wait() >>> ? ?else: >>> ? ? ? ?i += 3 >>> ? ? ? ?barrier_b.wait() >>> >>> >>> I'm really not sure here. >> >> I agree, the barrier doesn't seem like it belongs in a context. For >> example, it's ambiguous whether the block is supposed to proceed or >> succeed the barrier. I like the named barrier idea, but if that's not >> feasible we could perhaps use control flow to disallow conditionally >> calling barriers (or that every path calls the barrier (an equal >> number of times?)). > > It is always an option to go beyond OpenMP. Pthread barriers are a lot more > powerful in this way, and with pthread and Windows covered I think we should > be good... > > IIUC, you can't have different path calling the barrier the same number of > times, it's merely > > #pragma omp barrier > > and a seperate barrier statement gets another counter. Makes sense, but this greatly restricts where we could use the OpenMP version. > Which is why I think > it is not powerful enough and we should use pthreads. > >> +1. I like the idea of providing more parallelism constructs, but >> rather than risk fixating on OpenMP's model, perhaps we should look at >> the problem we're trying to solve (e.g., what can't one do well now) >> and create (or more likely borrow) the right Pythonic API to do it. > > Also, quick and flexible message-passing between threads/processes through > channels is becoming an increasingly popular concept. Go even has a seperate > syntax for channel communication, and zeromq is becoming popular for > distributed work. > > The is a problem Cython may need to solve here, since one currently has to > use very low-level C to do it quickly (either zeromq or pthreads in most > cases -- I guess, an OpenMP critical section would help in implementing a > queue though). > > I wouldn't resist a builtin "channel" type in Cython (since we don't have > full templating/generics, it would be the only way of sending typed data > conveniently?). zeromq seems to be a nice level of abstraction--we could probably get far with a zeromq "overlay" module that didn't require the GIL. Or is the C API easy enough to use if we could provide convenient mechanisms to initialize the tasks/threads. I think perhaps the communication model could be solved by a library more easily than the treading model. > I ultimately feel things like that is more important than 100% coverage of > the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit. +1 Prange handles the (corse-grained) SIMD case nicely, and a task/futures model based on closures would I think flesh this out to the next level of generality (and complexity). - Robert From robertwb at math.washington.edu Wed Oct 12 11:20:11 2011 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 12 Oct 2011 02:20:11 -0700 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: <4E955485.7060809@astro.uio.no> References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> <4E955180.1070601@astro.uio.no> <4E955485.7060809@astro.uio.no> Message-ID: On Wed, Oct 12, 2011 at 1:49 AM, Dag Sverre Seljebotn wrote: > On 10/12/2011 10:36 AM, Dag Sverre Seljebotn wrote: >> >> On 10/12/2011 09:55 AM, Robert Bradshaw wrote: >>> >>> On Sun, Oct 9, 2011 at 5:57 AM, Dag Sverre Seljebotn >>> wrote: >>>> >>>> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote: >>>>> >>>>> On 10/09/2011 02:11 PM, mark florisson wrote: >>>>>> >>>>>> with parallel.critical(): >>>>>> this section of code is mutually exclusive with other critical >>>>>> sections >>>>>> optional keyword argument 'name' specifies a name for the critical >>>>>> section, >>>>>> which means all sections with that name will exclude each other, >>>>>> but not >>>>>> critical sections with different names >>>>>> >>>>>> Note: all threads that encounter the section will execute it, just >>>>>> not at the same time >>>> > > On critical sections, I do feel string naming is rather un-Pythonic. I'd > rather have > > lock_a = parallel.Mutex() > lock_b = parallel.Mutex() > with cython.parallel: > ? ?with lock_a: > ? ? ? ?... > ? ?with lock_b: > ? ? ? ?... > > This maps well to pthread mutexes, though much harder to map it to OpenMP... For this low level, perhaps people should just be using the pthreads library directly? Here I'm showing my ignorance: can that work with OpenMP spawned threads? (Maybe a compatibility layer is required for transparent Windows support.) Suppose one could write a context object that did not require the GIL, then one could do with MyContext(): ... in a nogil block, MyContext could be implemented by whoever on whatever thread library, no special language support required. > So my proposal is: > > ?a) parallel.Mutex() can take a string argument and then returns the same > mutex each time for the same string, meaning you can do > > with parallel.Mutex("somename"): > > which maps directly to OpenMP. > > ?b) However, this does not make sense: > > with parallel.Mutex(): > > because each thread would instantiate a *seperate* mutex. So raise compiler > error ("Redundant code, thread will never block on fresh mutex") > > ?c) However, one can use a default global Mutex instance: > > with parallel.global_mutex > > (mapping to an un-named critical in OpenMP) > > This seems to be simple enough to implement, and allows generalizing to the > advanced case above later (probably using pthreads/Windows directly). Alternatively, let parallel.Mutex() be the global mutex, with some other way of getting a new, unique mutex to pass around and use in multiple places. - Robert From d.s.seljebotn at astro.uio.no Wed Oct 12 11:24:45 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 12 Oct 2011 11:24:45 +0200 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> <4E955180.1070601@astro.uio.no> Message-ID: <4E955CDD.8060203@astro.uio.no> On 10/12/2011 11:08 AM, Robert Bradshaw wrote: > On Wed, Oct 12, 2011 at 1:36 AM, Dag Sverre Seljebotn >> I wouldn't resist a builtin "channel" type in Cython (since we don't have >> full templating/generics, it would be the only way of sending typed data >> conveniently?). > > zeromq seems to be a nice level of abstraction--we could probably get > far with a zeromq "overlay" module that didn't require the GIL. Or is > the C API easy enough to use if we could provide convenient mechanisms > to initialize the tasks/threads. I think perhaps the communication > model could be solved by a library more easily than the treading > model. Ah, zeromq even has an in-process transport, so should work nicely for multithreading as well. The main problem is that I'd like something like ctypedef struct Msg: int what double when cdef Msg msg cdef channel[Msg] mychan = channel[msg](blocking=True, in_process=True) with cython.parallel: ... if is_master(): mychan.send(what=1, when=2.3) else: msg = mychan.recv() Which one can't really do without either builtin support or templating support. One *could* implement it in C++... C-level API just sends char* around, e.g., int zmq_msg_init_data (zmq_msg_t *msg, void *data, size_t size, zmq_free_fn *ffn, void *hint); Dag Sverre From stefan_ml at behnel.de Wed Oct 12 14:03:14 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 12 Oct 2011 14:03:14 +0200 Subject: [Cython] Utilities, cython.h, libcython In-Reply-To: References: Message-ID: <4E958202.9000806@behnel.de> mark florisson, 06.10.2011 11:45: > On 6 October 2011 01:05, Robert Bradshaw wrote: >> I'm not sure what the overhead is, if any, in calling function pointers vs. >> actually linking things together at the C level (which is essentially the >> same idea, but perhaps addresses are resolved at library load time rather >> than requiring a dereference on each call?) > > I think there isn't any difference with dynamic linking and having a > pointer. My understanding (of ELF shared libraries) is that the > procedure lookup table will contain the actual address of the symbol > (likely after the first reference to it has been made, it may have a > stub that resolves the symbol and replaces it's own address with the > actual address), which to me sounds like the same thing as a pointer. > I think only static linking can prevent this, i.e. directly encode the > static address into the call opcode, but I'm not an expert. Even if it makes a slight difference that the CPU's branch prediction cannot cope with, it's still up to us to decide which code must be inside the module for performance reasons and which we can afford to move outside. Generally speaking, any code section that is large enough to be worth being moved into a separate library shouldn't notice any performance difference through an indirect call. Stefan From markflorisson88 at gmail.com Wed Oct 12 16:00:13 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Wed, 12 Oct 2011 15:00:13 +0100 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> <4E955180.1070601@astro.uio.no> <4E955485.7060809@astro.uio.no> Message-ID: On 12 October 2011 10:20, Robert Bradshaw wrote: > On Wed, Oct 12, 2011 at 1:49 AM, Dag Sverre Seljebotn > wrote: >> On 10/12/2011 10:36 AM, Dag Sverre Seljebotn wrote: >>> >>> On 10/12/2011 09:55 AM, Robert Bradshaw wrote: >>>> >>>> On Sun, Oct 9, 2011 at 5:57 AM, Dag Sverre Seljebotn >>>> wrote: >>>>> >>>>> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote: >>>>>> >>>>>> On 10/09/2011 02:11 PM, mark florisson wrote: >>>>>>> >>>>>>> with parallel.critical(): >>>>>>> this section of code is mutually exclusive with other critical >>>>>>> sections >>>>>>> optional keyword argument 'name' specifies a name for the critical >>>>>>> section, >>>>>>> which means all sections with that name will exclude each other, >>>>>>> but not >>>>>>> critical sections with different names >>>>>>> >>>>>>> Note: all threads that encounter the section will execute it, just >>>>>>> not at the same time >>>>> >> >> On critical sections, I do feel string naming is rather un-Pythonic. I'd >> rather have >> >> lock_a = parallel.Mutex() >> lock_b = parallel.Mutex() >> with cython.parallel: >> ? ?with lock_a: >> ? ? ? ?... >> ? ?with lock_b: >> ? ? ? ?... >> >> This maps well to pthread mutexes, though much harder to map it to OpenMP... > > For this low level, perhaps people should just be using the pthreads > library directly? Here I'm showing my ignorance: can that work with > OpenMP spawned threads? (Maybe a compatibility layer is required for > transparent Windows support.) Suppose one could write a context object > that did not require the GIL, then one could do > > with MyContext(): > ? ... > > in a nogil block, MyContext could be implemented by whoever on > whatever thread library, no special language support required. Exactly, that's always possible. I myself very much like how critical works, but if you want a more Pythonic-looking mutex, it might be better to make that the user's burden. Otherwise we'd also have to give it a type, make it compatible with code that doesn't have the GIL, acquisition count it when passing it around, etc. If your program doesn't even have other Python threads running, you could even use 'with gil:' as a global synchronization. The only good thing about named and unnamed critical sections is really the convenience of writing it, and the resulting conciseness (which imho, if you know how critical works, only adds to the code readability). However, not providing parallel.Mutex would mean people probably want to resort to the goodies from the threading module, which would ironically not be impossible because you'd need to GIL to use them :) But we could recommend the PyThread_*_lock stuff in the documentation. >> So my proposal is: >> >> ?a) parallel.Mutex() can take a string argument and then returns the same >> mutex each time for the same string, meaning you can do >> >> with parallel.Mutex("somename"): >> >> which maps directly to OpenMP. >> >> ?b) However, this does not make sense: >> >> with parallel.Mutex(): >> >> because each thread would instantiate a *seperate* mutex. So raise compiler >> error ("Redundant code, thread will never block on fresh mutex") >> >> ?c) However, one can use a default global Mutex instance: >> >> with parallel.global_mutex >> >> (mapping to an un-named critical in OpenMP) >> >> This seems to be simple enough to implement, and allows generalizing to the >> advanced case above later (probably using pthreads/Windows directly). > > Alternatively, let parallel.Mutex() be the global mutex, with some > other way of getting a new, unique mutex to pass around and use in > multiple places. > > - Robert > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From markflorisson88 at gmail.com Wed Oct 12 16:07:21 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Wed, 12 Oct 2011 15:07:21 +0100 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: <4E955180.1070601@astro.uio.no> References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> <4E955180.1070601@astro.uio.no> Message-ID: On 12 October 2011 09:36, Dag Sverre Seljebotn wrote: > On 10/12/2011 09:55 AM, Robert Bradshaw wrote: >> >> On Sun, Oct 9, 2011 at 5:57 AM, Dag Sverre Seljebotn >> ?wrote: >>> >>> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote: >>>> >>>> On 10/09/2011 02:11 PM, mark florisson wrote: >>>>> >>>>> Hey, >>>>> >>>>> So far people have been enthusiastic about the cython.parallel >>>>> features, >>>>> I think we should introduce some new features. >> >> Excellent. I think this is going to become a killer feature like >> buffer support. >> >>>>> I propose the following, >>>> >>>> Great!! >>>> >>>> I only have time for a very short feedback now, perhaps more will >>>> follow. >>>> >>>>> assume parallel has been imported from cython: >>>>> >>>>> with parallel.master(): >>>>> this is executed in the master thread in a parallel (non-prange) >>>>> section >>>>> >>>>> with parallel.single(): >>>>> same as master, except any thread may do the execution >>>>> >>>>> An optional keyword argument 'nowait' specifies whether there will be a >>>>> barrier at the end. The default is to wait. >>> >>> I like >>> >>> if parallel.is_master(): >>> ? ?... >>> explicit_barrier_somehow() # see below >>> >>> better as a Pythonization. One could easily support is_master to be used >>> in >>> other contexts as well, simply by assigning a status flag in the master >>> block. >> >> +1, the if statement feels a lot more natural. >> >>> Using an if-test flows much better with Python I feel, but that naturally >>> lead to making the barrier explicit. But I like the barrier always being >>> explicit, rather than having it as a predicate on all the different >>> constructs like in OpenMP.... >>> >>> I'm less sure about single, since making it a function indicates one >>> could >>> use it in other contexts and the whole thing becomes too magic (since >>> it's >>> tied to the position of invocation). I'm tempted to suggest >>> >>> for _ in prange(1): >>> ? ?... >>> >>> as our syntax for single. > > Just to be clear: My point was that the above implements single behaviour > even now, without any extra effort. Right I got that. In the same way you could use for _ in prange(0): pass to get a barrier. I'm just saying that it looks pretty weird. >> >> The idea here is that you want a block of code executed once, >> presumably by the first thread that gets here? I think this could also >> be handled by a if statement, perhaps "if parallel.first()" or >> something like that. Is there anything special about this construct >> that couldn't simply be done by flushing/checking a variable? > > Good point. I think there's a problem with OpenMP that it has too many > primitives for similar things. Definitely. > I'm -1 on single -- either using a for loop or flag+flush is more to type, > but more readable to people who don't know cython.parallel (look: Python > even makes "self." explicit -- the bias in language design is clearly on > readability rather than writability). > > I thought of "if is_first()" as well, but my problem is again that it binds > to the location of the call. > > if foo: > ? ?if parallel.is_first(): > ? ? ? ?... > else: > ? ?if parallel.is_first(): > ? ? ? ?... > > can not be refactored to: > > if parallel.is_first(): > ? ?if foo: > ? ? ? ?... > ? ?else: > ? ? ? ?... > > which I think is highly confusing for people who didn't write the code and > don't know the details of cython.parallel. (Unlike is_master(), which works > the same either way). > > I think we should aim for something that's as easy to read as possible for > Python users with no cython.parallel knowledge. That's a good point. I suppose single and master is not really needed, so just master ("is_master") could be sufficient there. >> >>>>> with parallel.task(): >>>>> create a task to be executed by some thread in the team >>>>> once a thread takes up the task it shall only be executed by that >>>>> thread and no other thread (so the task will be tied to the thread) >>>>> >>>>> C variables will be firstprivate >>>>> Python objects will be shared >>>>> >>>>> parallel.taskwait() # wait on any direct descendent tasks to finish >>>> >>>> Regarding tasks, I think this is mapping OpenMP too close to Python. >>>> Closures are excellent for the notion of a task, so I think something >>>> based on the futures API would work better. I realize that makes the >>>> mapping to OpenMP and implementation a bit more difficult, but I think >>>> it is worth it in the long run. >> >> It's almost as if you're reading my thoughts. There are much more >> natural task APIs, e.g. futures or the way the Python >> threading/multiprocessing does things. >> >>>>> with parallel.critical(): >>>>> this section of code is mutually exclusive with other critical sections >>>>> optional keyword argument 'name' specifies a name for the critical >>>>> section, >>>>> which means all sections with that name will exclude each other, >>>>> but not >>>>> critical sections with different names >>>>> >>>>> Note: all threads that encounter the section will execute it, just >>>>> not at the same time >>> >>> Yes, this works well as a with-statement... >>> >>> ..except that it is slightly magic in that it binds to call position >>> (unlike >>> anything in Python). I.e. this would be more "correct", or at least >>> Pythonic: >>> >>> with parallel.critical(__file__, __line__): >>> ? ?... > > Mark: I stand corrected on this point. +1 on your critical proposal. > >> This feels a lot like a lock, which of course fits well with the with >> statement. >> >>>>> with parallel.barrier(): >>>>> all threads wait until everyone has reached the barrier >>>>> either no one or everyone should encounter the barrier >>>>> shared variables are flushed >>> >>> I have problems with requiring a noop with block... >>> >>> I'd much rather write >>> >>> parallel.barrier() >>> >>> However, that ties a function call to the place of invocation, and >>> suggests >>> that one could do >>> >>> if rand()> ?.5: >>> ? ?barrier() >>> else: >>> ? ?i += 3 >>> ? ?barrier() >>> >>> and have the same barrier in each case. Again, >>> >>> barrier(__file__, __line__) >>> >>> gets us purity at the cost of practicality. Another way is the pthreads >>> approach (although one may have to use pthread rather then OpenMP to get >>> it, >>> unless there are named barriers?): >>> >>> barrier_a = parallel.barrier() >>> barrier_b = parallel.barrier() >>> with parallel: >>> ? ?barrier_a.wait() >>> ? ?if rand()> ?.5: >>> ? ? ? ?barrier_b.wait() >>> ? ?else: >>> ? ? ? ?i += 3 >>> ? ? ? ?barrier_b.wait() >>> >>> >>> I'm really not sure here. >> >> I agree, the barrier doesn't seem like it belongs in a context. For >> example, it's ambiguous whether the block is supposed to proceed or >> succeed the barrier. I like the named barrier idea, but if that's not >> feasible we could perhaps use control flow to disallow conditionally >> calling barriers (or that every path calls the barrier (an equal >> number of times?)). > > It is always an option to go beyond OpenMP. Pthread barriers are a lot more > powerful in this way, and with pthread and Windows covered I think we should > be good... > > IIUC, you can't have different path calling the barrier the same number of > times, it's merely > > #pragma omp barrier > > and a seperate barrier statement gets another counter. Which is why I think > it is not powerful enough and we should use pthreads. I don't think we should quite jump to that conclusion. Indeed openmp barriers may not do what we want, but I think you could implement barriers yourself (I haven't looked at an implementation, but I think a condition lock + OpenMP flush can do what you need). Implementing all this in pthreads wouldn't be trivial and it would also be hard to do portably for non-Posix systems, considering that most Cython developers don't know much about/care a lot about windows for instance. >> +1. I like the idea of providing more parallelism constructs, but >> rather than risk fixating on OpenMP's model, perhaps we should look at >> the problem we're trying to solve (e.g., what can't one do well now) >> and create (or more likely borrow) the right Pythonic API to do it. > > Also, quick and flexible message-passing between threads/processes through > channels is becoming an increasingly popular concept. Go even has a seperate > syntax for channel communication, and zeromq is becoming popular for > distributed work. > > The is a problem Cython may need to solve here, since one currently has to > use very low-level C to do it quickly (either zeromq or pthreads in most > cases -- I guess, an OpenMP critical section would help in implementing a > queue though). > > I wouldn't resist a builtin "channel" type in Cython (since we don't have > full templating/generics, it would be the only way of sending typed data > conveniently?). I'm not sure if we should introduce more syntax, but what about reusing arrays or memoryview slices? If you assign to elements or subslices you send messages, if you read them but don't have the data you get the messages (so the program which has the data will send it, etc). But really, I think this is a different beast all together. If you want to do this then you must be sure to cover all aspects, otherwise people will just use the respective libraries. I think if you really want this kind of thing on a cluster, you'd be using fortran anyway (maybe with co-arrays), and if you need to do distributed computing you'd be using zeromq directly. > I ultimately feel things like that is more important than 100% coverage of > the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit. Yeah I never wanted full OpenMP coverage, it's just the first (easiest) thing that comes to mind, it's easy to implement and if you're familiar with OpenMP, it makes sense. It would also be easier to support orphaned worksharing in the future, if we wanted. But I think that might just be even more confusing for people. > Dag Sverre > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From markflorisson88 at gmail.com Wed Oct 12 16:55:44 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Wed, 12 Oct 2011 15:55:44 +0100 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> <4E955180.1070601@astro.uio.no> Message-ID: On 12 October 2011 10:08, Robert Bradshaw wrote: > On Wed, Oct 12, 2011 at 1:36 AM, Dag Sverre Seljebotn > wrote: >> On 10/12/2011 09:55 AM, Robert Bradshaw wrote: >>>> I'm less sure about single, since making it a function indicates one >>>> could >>>> use it in other contexts and the whole thing becomes too magic (since >>>> it's >>>> tied to the position of invocation). I'm tempted to suggest >>>> >>>> for _ in prange(1): >>>> ? ?... >>>> >>>> as our syntax for single. >> >> Just to be clear: My point was that the above implements single behaviour >> even now, without any extra effort. >> >>> >>> The idea here is that you want a block of code executed once, >>> presumably by the first thread that gets here? I think this could also >>> be handled by a if statement, perhaps "if parallel.first()" or >>> something like that. Is there anything special about this construct >>> that couldn't simply be done by flushing/checking a variable? >> >> Good point. I think there's a problem with OpenMP that it has too many >> primitives for similar things. >> >> I'm -1 on single -- either using a for loop or flag+flush is more to type, >> but more readable to people who don't know cython.parallel (look: Python >> even makes "self." explicit -- the bias in language design is clearly on >> readability rather than writability). >> >> I thought of "if is_first()" as well, but my problem is again that it binds >> to the location of the call. >> >> if foo: >> ? ?if parallel.is_first(): >> ? ? ? ?... >> else: >> ? ?if parallel.is_first(): >> ? ? ? ?... >> >> can not be refactored to: >> >> if parallel.is_first(): >> ? ?if foo: >> ? ? ? ?... >> ? ?else: >> ? ? ? ?... >> >> which I think is highly confusing for people who didn't write the code and >> don't know the details of cython.parallel. (Unlike is_master(), which works >> the same either way). >> >> I think we should aim for something that's as easy to read as possible for >> Python users with no cython.parallel knowledge. > > Exactly. This is what's so beautiful about prange. > >>>>>> with parallel.barrier(): >>>>>> all threads wait until everyone has reached the barrier >>>>>> either no one or everyone should encounter the barrier >>>>>> shared variables are flushed >>>> >>>> I have problems with requiring a noop with block... >>>> >>>> I'd much rather write >>>> >>>> parallel.barrier() >>>> >>>> However, that ties a function call to the place of invocation, and >>>> suggests >>>> that one could do >>>> >>>> if rand()> ?.5: >>>> ? ?barrier() >>>> else: >>>> ? ?i += 3 >>>> ? ?barrier() >>>> >>>> and have the same barrier in each case. Again, >>>> >>>> barrier(__file__, __line__) >>>> >>>> gets us purity at the cost of practicality. Another way is the pthreads >>>> approach (although one may have to use pthread rather then OpenMP to get >>>> it, >>>> unless there are named barriers?): >>>> >>>> barrier_a = parallel.barrier() >>>> barrier_b = parallel.barrier() >>>> with parallel: >>>> ? ?barrier_a.wait() >>>> ? ?if rand()> ?.5: >>>> ? ? ? ?barrier_b.wait() >>>> ? ?else: >>>> ? ? ? ?i += 3 >>>> ? ? ? ?barrier_b.wait() >>>> >>>> >>>> I'm really not sure here. >>> >>> I agree, the barrier doesn't seem like it belongs in a context. For >>> example, it's ambiguous whether the block is supposed to proceed or >>> succeed the barrier. I like the named barrier idea, but if that's not >>> feasible we could perhaps use control flow to disallow conditionally >>> calling barriers (or that every path calls the barrier (an equal >>> number of times?)). >> >> It is always an option to go beyond OpenMP. Pthread barriers are a lot more >> powerful in this way, and with pthread and Windows covered I think we should >> be good... >> >> IIUC, you can't have different path calling the barrier the same number of >> times, it's merely >> >> #pragma omp barrier >> >> and a seperate barrier statement gets another counter. > > Makes sense, but this greatly restricts where we could use the OpenMP version. > >> Which is why I think >> it is not powerful enough and we should use pthreads. >> >>> +1. I like the idea of providing more parallelism constructs, but >>> rather than risk fixating on OpenMP's model, perhaps we should look at >>> the problem we're trying to solve (e.g., what can't one do well now) >>> and create (or more likely borrow) the right Pythonic API to do it. >> >> Also, quick and flexible message-passing between threads/processes through >> channels is becoming an increasingly popular concept. Go even has a seperate >> syntax for channel communication, and zeromq is becoming popular for >> distributed work. >> >> The is a problem Cython may need to solve here, since one currently has to >> use very low-level C to do it quickly (either zeromq or pthreads in most >> cases -- I guess, an OpenMP critical section would help in implementing a >> queue though). >> >> I wouldn't resist a builtin "channel" type in Cython (since we don't have >> full templating/generics, it would be the only way of sending typed data >> conveniently?). > > zeromq seems to be a nice level of abstraction--we could probably get > far with a zeromq "overlay" module that didn't require the GIL. Or is > the C API easy enough to use if we could provide convenient mechanisms > to initialize the tasks/threads. I think perhaps the communication > model could be solved by a library more easily than the treading > model. > >> I ultimately feel things like that is more important than 100% coverage of >> the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit. > > +1 Prange handles the (corse-grained) SIMD case nicely, and a > task/futures model based on closures would I think flesh this out to > the next level of generality (and complexity). Futures are definitely nice. I suppose I think really like "inline futures", i.e. openmp tasks. I realize that futures may look more pythonic. However, as mentioned previously, I also see issues with that. When you submit a task then you expect a future object, which you might want to pass around. But we don't have the GIL for that. I personally feel that futures is something that should be done by a library (such as concurrent.futures in python 3.2), and inline tasks by a language. It also means I have to write an entire function or closure for perhaps only a few lines of code. I might also want to submit other functions that are not closures, or I might want to reuse my closures that are used for tasks and for something else. So what if my tasks contain more parallel constructs? e.g. what if I have a task closure that I return from my function that generates more tasks itself? Would you just execute them sequentially outside of the parallel construct, or would you simply disallow that? Also, do you restrict future "objects" to only the parallel section? Another problem is that you can only wait on tasks of your direct children. So what if I get access to my parent's future object (assuming you allow tasks to generate tasks), and then want the result of my parent? Or what if I store these future objects in an array or list and access them arbitrarily? You will only know at runtime which task to wait on, and openmp only has a static, lexical taskwait. I suppose my point is that without either a drastic rewrite (e.g., use pthreads instead of openmp) or quite a bit of contraints, I am unsure how futures would work here. Perhaps you guys have some concrete syntax and semantics proposals? > - Robert > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From stefan_ml at behnel.de Wed Oct 12 21:52:07 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 12 Oct 2011 21:52:07 +0200 Subject: [Cython] PyCon-DE wrap-up by Kay Hayen In-Reply-To: References: <4E91DB64.9050201@behnel.de> <4E92AF0B.6070905@behnel.de> Message-ID: <4E95EFE7.1020500@behnel.de> Robert Bradshaw, 11.10.2011 08:11: > Thanks for the update and link. Sounds like PyCon-DE went well. More than that - here's my take on it: http://blog.behnel.de/index.php?p=188 Stefan From stefan_ml at behnel.de Thu Oct 13 07:10:06 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 13 Oct 2011 07:10:06 +0200 Subject: [Cython] test failure for cython-devel in Py2.4 In-Reply-To: References: <4E930C72.8080303@behnel.de> Message-ID: <4E9672AE.6080905@behnel.de> mark florisson, 12.10.2011 23:46: >>>> On 10 October 2011 16:17, Stefan Behnel wrote: >>>>> Jenkins currently reports several failures, and this one seems to be >>>>> due to your tempita changes: >>>>> >>>> https://sage.math.washington.edu:8091/hudson/view/cython-devel/job/cython-devel-lxml-trunk/PYVERSION=py24/31/console >>>> >>>> Thanks! I'll try to fix that somewhere this week. We should really get to the habit of not pushing changes to the master branch that turn out to be broken in the personal branches, or, if they appear to be ok and only turn out to break the master branch *after* pushing them (which is ok, we have Jenkins to tell us), revert them if a fix cannot be applied shortly, i.e. within a day or two at most. It's very annoying when the master branch is broken for weeks in a row, especially since that means that it will keep attracting new failures due to the cover of already broken tests, which makes it much harder to pinpoint the commits that triggered them. >> Is it me or are other builds broken as well? >> >> I pushed a fix for the tempita thing, but it seems the entire py3k build is >> broken: >> >> https://sage.math.washington.edu:8091/hudson/view/All/job/cython-devel-build/54/PYVERSION=py3k/console It's not only the py3k tests, the build is broken in general. The problem here is that it only *shows* in the py3k tests because the Py2 builds do not bail out when one of the Cython modules fails to build. That needs fixing as well. > I just cannot reproduce that error on my system, let me investigate it > further. My guess was that it's due to the innocent looking change that Robert did to enable type inference for the GeneralCallNode. It seems that there was a bit more to do here. Stefan From stefan_ml at behnel.de Thu Oct 13 07:37:13 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 13 Oct 2011 07:37:13 +0200 Subject: [Cython] test failure for cython-devel in Py2.4 In-Reply-To: <4E9672AE.6080905@behnel.de> References: <4E930C72.8080303@behnel.de> <4E9672AE.6080905@behnel.de> Message-ID: <4E967909.2040407@behnel.de> Stefan Behnel, 13.10.2011 07:10: > mark florisson, 12.10.2011 23:46: >>> Is it me or are other builds broken as well? >>> >>> I pushed a fix for the tempita thing, but it seems the entire py3k build is >>> broken: >>> >>> https://sage.math.washington.edu:8091/hudson/view/All/job/cython-devel-build/54/PYVERSION=py3k/console >>> > > It's not only the py3k tests, the build is broken in general. The problem > here is that it only *shows* in the py3k tests because the Py2 builds do > not bail out when one of the Cython modules fails to build. That needs > fixing as well. > > >> I just cannot reproduce that error on my system, let me investigate it >> further. > > My guess was that it's due to the innocent looking change that Robert did > to enable type inference for the GeneralCallNode. It seems that there was a > bit more to do here. Now that I think about it - remember that the Jenkins builds use a source distribution to build, not a plain checkout. Maybe there's something wrong with the sdist? At least, I see several warnings about file patterns in MANIFEST.in that are not matched by any files: """ reading manifest template 'MANIFEST.in' warning: no files found matching '*.pyx' under directory 'Cython/Debugger/Tests' warning: no files found matching '*.pxd' under directory 'Cython/Debugger/Tests' warning: no files found matching '*.h' under directory 'Cython/Debugger/Tests' warning: no files found matching '*.pxd' under directory 'Cython/Utility' warning: no files found matching '*.h' under directory 'Cython/Utility' warning: no files found matching '.cpp' under directory 'Cython/Utility' """ https://sage.math.washington.edu:8091/hudson/job/cython-devel-sdist/678/console Also note that the build appears to choke on test utility code: """ Error compiling Cython file: ------------------------------------------------------------ ... cdef extern from *: cdef object __pyx_test_dep(object) @cname('__pyx_TestClass') cdef class TestClass(object): cdef public int value ^ ------------------------------------------------------------ TestClass:9:20: Compiler crash in AnalyseDeclarationsTransform """ https://sage.math.washington.edu:8091/hudson/job/cython-devel-build/56/PYVERSION=py3k/console Mark, didn't you disable the loading of any test code during 'normal' builds? Maybe there's something broken on that front? Stefan From vitja.makarov at gmail.com Thu Oct 13 08:03:55 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Thu, 13 Oct 2011 10:03:55 +0400 Subject: [Cython] test failure for cython-devel in Py2.4 In-Reply-To: <4E9672AE.6080905@behnel.de> References: <4E930C72.8080303@behnel.de> <4E9672AE.6080905@behnel.de> Message-ID: 2011/10/13 Stefan Behnel : > mark florisson, 12.10.2011 23:46: >>>>> >>>>> On 10 October 2011 16:17, Stefan Behnel wrote: >>>>>> >>>>>> Jenkins currently reports several failures, and this one seems to be >>>>>> due to your tempita changes: >>>>>> >>>>> >>>>> https://sage.math.washington.edu:8091/hudson/view/cython-devel/job/cython-devel-lxml-trunk/PYVERSION=py24/31/console >>>>> >>>>> Thanks! I'll try to fix that somewhere this week. > > We should really get to the habit of not pushing changes to the master > branch that turn out to be broken in the personal branches, or, if they > appear to be ok and only turn out to break the master branch *after* pushing > them (which is ok, we have Jenkins to tell us), revert them if a fix cannot > be applied shortly, i.e. within a day or two at most. > > It's very annoying when the master branch is broken for weeks in a row, > especially since that means that it will keep attracting new failures due to > the cover of already broken tests, which makes it much harder to pinpoint > the commits that triggered them. > +1 > >>> Is it me or are other builds broken as well? >>> >>> I pushed a fix for the tempita thing, but it seems the entire py3k build >>> is >>> broken: >>> >>> >>> https://sage.math.washington.edu:8091/hudson/view/All/job/cython-devel-build/54/PYVERSION=py3k/console > > It's not only the py3k tests, the build is broken in general. The problem > here is that it only *shows* in the py3k tests because the Py2 builds do not > bail out when one of the Cython modules fails to build. That needs fixing as > well. > > >> I just cannot reproduce that error on my system, let me investigate it >> further. > > My guess was that it's due to the innocent looking change that Robert did to > enable type inference for the GeneralCallNode. It seems that there was a bit > more to do here. > I found that tempita bug goes away if you change language_level to 2. -- vitja. From robertwb at math.washington.edu Thu Oct 13 09:26:37 2011 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Thu, 13 Oct 2011 00:26:37 -0700 Subject: [Cython] test failure for cython-devel in Py2.4 In-Reply-To: <4E9672AE.6080905@behnel.de> References: <4E930C72.8080303@behnel.de> <4E9672AE.6080905@behnel.de> Message-ID: On Wed, Oct 12, 2011 at 10:10 PM, Stefan Behnel wrote: > mark florisson, 12.10.2011 23:46: >>>>> >>>>> On 10 October 2011 16:17, Stefan Behnel wrote: >>>>>> >>>>>> Jenkins currently reports several failures, and this one seems to be >>>>>> due to your tempita changes: >>>>>> >>>>> >>>>> https://sage.math.washington.edu:8091/hudson/view/cython-devel/job/cython-devel-lxml-trunk/PYVERSION=py24/31/console >>>>> >>>>> Thanks! I'll try to fix that somewhere this week. > > We should really get to the habit of not pushing changes to the master > branch that turn out to be broken in the personal branches, or, if they > appear to be ok and only turn out to break the master branch *after* pushing > them (which is ok, we have Jenkins to tell us), revert them if a fix cannot > be applied shortly, i.e. within a day or two at most. > > It's very annoying when the master branch is broken for weeks in a row, > especially since that means that it will keep attracting new failures due to > the cover of already broken tests, which makes it much harder to pinpoint > the commits that triggered them. > > >>> Is it me or are other builds broken as well? >>> >>> I pushed a fix for the tempita thing, but it seems the entire py3k build >>> is >>> broken: >>> >>> >>> https://sage.math.washington.edu:8091/hudson/view/All/job/cython-devel-build/54/PYVERSION=py3k/console > > It's not only the py3k tests, the build is broken in general. The problem > here is that it only *shows* in the py3k tests because the Py2 builds do not > bail out when one of the Cython modules fails to build. That needs fixing as > well. > > >> I just cannot reproduce that error on my system, let me investigate it >> further. > > My guess was that it's due to the innocent looking change that Robert did to > enable type inference for the GeneralCallNode. It seems that there was a bit > more to do here. This has been rolled back, but that didn't fix things... In other news, I finally set up a set of jenkins jobs for my github branch, because I agree it's super annoying to have a broken build for a long time. Still puzzled by this one though... - Robert From stefan_ml at behnel.de Thu Oct 13 10:05:09 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 13 Oct 2011 10:05:09 +0200 Subject: [Cython] test failure for cython-devel in Py2.4 In-Reply-To: References: <4E930C72.8080303@behnel.de> <4E9672AE.6080905@behnel.de> Message-ID: <4E969BB5.6060400@behnel.de> Robert Bradshaw, 13.10.2011 09:26: > On Wed, Oct 12, 2011 at 10:10 PM, Stefan Behnel wrote: >> mark florisson, 12.10.2011 23:46: >>>> Is it me or are other builds broken as well? >>>> >>>> I pushed a fix for the tempita thing, but it seems the entire py3k build >>>> is broken: >>>> >>>> https://sage.math.washington.edu:8091/hudson/view/All/job/cython-devel-build/54/PYVERSION=py3k/console >> >> It's not only the py3k tests, the build is broken in general. The problem >> here is that it only *shows* in the py3k tests because the Py2 builds do not >> bail out when one of the Cython modules fails to build. That needs fixing as >> well. >> >> My guess was that it's due to the innocent looking change that Robert did to >> enable type inference for the GeneralCallNode. It seems that there was a bit >> more to do here. > > This has been rolled back, but that didn't fix things... Hmm, ok, sorry then. That's the kind of thing I meant when I said that it becomes hard to pinpoint bugs when things are broken already. That change was the only functional change before the build broke in Jenkins... Stefan From stefan_ml at behnel.de Thu Oct 13 10:53:48 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 13 Oct 2011 10:53:48 +0200 Subject: [Cython] test failure for cython-devel in Py2.4 In-Reply-To: <4E9672AE.6080905@behnel.de> References: <4E930C72.8080303@behnel.de> <4E9672AE.6080905@behnel.de> Message-ID: <4E96A71C.1030504@behnel.de> Stefan Behnel, 13.10.2011 07:10: > mark florisson, 12.10.2011 23:46: >>> Is it me or are other builds broken as well? >>> >>> I pushed a fix for the tempita thing, but it seems the entire py3k build is >>> broken: >>> >>> https://sage.math.washington.edu:8091/hudson/view/All/job/cython-devel-build/54/PYVERSION=py3k/console > > It's not only the py3k tests, the build is broken in general. I take that back. I thought I had seen failures in other versions, too, but that might have been in older builds. Currently, it is only broken in the py3k branch, which opens up the possibility that it has something to do with the large rewrites that recently went into CPython, specifically (but not necessarily limited to) the unicode changes for PEP393. I disabled the py3k builds for now and that at least gets the other builds through. I still see the tempita bug in Py2.4, though: https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/BACKEND=c,PYVERSION=py24/47/console Stefan From stefan_ml at behnel.de Thu Oct 13 11:01:34 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 13 Oct 2011 11:01:34 +0200 Subject: [Cython] test failure for cython-devel in Py2.4 In-Reply-To: References: <4E930C72.8080303@behnel.de> <4E9672AE.6080905@behnel.de> Message-ID: <4E96A8EE.4070701@behnel.de> Vitja Makarov, 13.10.2011 08:03: > I found that tempita bug goes away if you change language_level to 2. There's no language level configured in Py2.4, which fails. https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/48/BACKEND=c,PYVERSION=py24/console Stefan From markflorisson88 at gmail.com Thu Oct 13 11:06:25 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 13 Oct 2011 10:06:25 +0100 Subject: [Cython] test failure for cython-devel in Py2.4 In-Reply-To: <4E9672AE.6080905@behnel.de> References: <4E930C72.8080303@behnel.de> <4E9672AE.6080905@behnel.de> Message-ID: On 13 October 2011 06:10, Stefan Behnel wrote: > > mark florisson, 12.10.2011 23:46: >>>>> >>>>> On 10 October 2011 16:17, Stefan Behnel wrote: >>>>>> >>>>>> Jenkins currently reports several failures, and this one seems to be >>>>>> due to your tempita changes: >>>>>> >>>>> https://sage.math.washington.edu:8091/hudson/view/cython-devel/job/cython-devel-lxml-trunk/PYVERSION=py24/31/console >>>>> >>>>> Thanks! I'll try to fix that somewhere this week. > > We should really get to the habit of not pushing changes to the master branch that turn out to be broken in the personal branches, or, if they appear to be ok and only turn out to break the master branch *after* pushing them (which is ok, we have Jenkins to tell us), revert them if a fix cannot be applied shortly, i.e. within a day or two at most. > > It's very annoying when the master branch is broken for weeks in a row, especially since that means that it will keep attracting new failures due to the cover of already broken tests, which makes it much harder to pinpoint the commits that triggered them. > Yes I totally agree. The thing is that memoryviews on hudson were rebased on the latest master and my Jenkins was entirely blue. So I merged them, I don't recall checking the cython-devel-tests results, but I think it might have only been 2.4 failing with the tempita stuff. Unfortunately I only have a 2.3 build that is perpetually broken on Jenkins. At some point my fused types py3k build also got broken after merging stuff in from master. None of it is reproducible on my machine though. >>> Is it me or are other builds broken as well? >>> >>> I pushed a fix for the tempita thing, but it seems the entire py3k build is >>> broken: >>> >>> https://sage.math.washington.edu:8091/hudson/view/All/job/cython-devel-build/54/PYVERSION=py3k/console > > It's not only the py3k tests, the build is broken in general. The problem here is that it only *shows* in the py3k tests because the Py2 builds do not bail out when one of the Cython modules fails to build. That needs fixing as well. > > >> I just cannot reproduce that error on my system, let me investigate it >> further. > > My guess was that it's due to the innocent looking change that Robert did to enable type inference for the GeneralCallNode. It seems that there was a bit more to do here. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Thu Oct 13 11:10:42 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 13 Oct 2011 10:10:42 +0100 Subject: [Cython] test failure for cython-devel in Py2.4 In-Reply-To: <4E96A71C.1030504@behnel.de> References: <4E930C72.8080303@behnel.de> <4E9672AE.6080905@behnel.de> <4E96A71C.1030504@behnel.de> Message-ID: On 13 October 2011 09:53, Stefan Behnel wrote: > Stefan Behnel, 13.10.2011 07:10: >> >> mark florisson, 12.10.2011 23:46: >>>> >>>> Is it me or are other builds broken as well? >>>> >>>> I pushed a fix for the tempita thing, but it seems the entire py3k build >>>> is >>>> broken: >>>> >>>> >>>> https://sage.math.washington.edu:8091/hudson/view/All/job/cython-devel-build/54/PYVERSION=py3k/console >> >> It's not only the py3k tests, the build is broken in general. > > I take that back. I thought I had seen failures in other versions, too, but > that might have been in older builds. Currently, it is only broken in the > py3k branch, which opens up the possibility that it has something to do with > the large rewrites that recently went into CPython, specifically (but not > necessarily limited to) the unicode changes for PEP393. > > I disabled the py3k builds for now and that at least gets the other builds > through. I still see the tempita bug in Py2.4, though: > > https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/BACKEND=c,PYVERSION=py24/47/console > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > Should we just use stable CPython only? It's confusing to see failing test suites just because CPython might break (or even if it doesn't, you might be thinking it does). Tempita also works fine on my system, I pushed a fix for that. It seems there's a problem with the memoryview tests in 2.4 though, because the PyBUF_* flags aren't available there. I'll try to add a 2.4 build to my Jenkins. From stefan_ml at behnel.de Thu Oct 13 11:23:08 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 13 Oct 2011 11:23:08 +0200 Subject: [Cython] test failure for cython-devel in Py2.4 In-Reply-To: References: <4E930C72.8080303@behnel.de> <4E9672AE.6080905@behnel.de> <4E96A71C.1030504@behnel.de> Message-ID: <4E96ADFC.3070706@behnel.de> mark florisson, 13.10.2011 11:10: > On 13 October 2011 09:53, Stefan Behnel wrote: >> Stefan Behnel, 13.10.2011 07:10: >>> mark florisson, 12.10.2011 23:46: >>>>> >>>>> Is it me or are other builds broken as well? >>>>> >>>>> I pushed a fix for the tempita thing, but it seems the entire py3k build >>>>> is broken: >>>>> >>>>> https://sage.math.washington.edu:8091/hudson/view/All/job/cython-devel-build/54/PYVERSION=py3k/console >>> >>> It's not only the py3k tests, the build is broken in general. >> >> I take that back. I thought I had seen failures in other versions, too, but >> that might have been in older builds. Currently, it is only broken in the >> py3k branch, which opens up the possibility that it has something to do with >> the large rewrites that recently went into CPython, specifically (but not >> necessarily limited to) the unicode changes for PEP393. >> >> I disabled the py3k builds for now and that at least gets the other builds >> through. I still see the tempita bug in Py2.4, though: >> >> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/BACKEND=c,PYVERSION=py24/47/console > > Should we just use stable CPython only? It's confusing to see failing > test suites just because CPython might break (or even if it doesn't, > you might be thinking it does). Well, it's rare that CPython is *that* broken, and it's good for us to see quickly when it breaks because of our own code. It's also good if we can report bugs to python-dev before they consider everything fine because they lack a test. > Tempita also works fine on my system, I pushed a fix for that. It > seems there's a problem with the memoryview tests in 2.4 though, > because the PyBUF_* flags aren't available there. I'll try to add a > 2.4 build to my Jenkins. You should just copy the cython-devel jobs. They are much friendlier to set up and change. Stefan From vitja.makarov at gmail.com Thu Oct 13 11:53:01 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Thu, 13 Oct 2011 13:53:01 +0400 Subject: [Cython] test failure for cython-devel in Py2.4 In-Reply-To: <4E96A8EE.4070701@behnel.de> References: <4E930C72.8080303@behnel.de> <4E9672AE.6080905@behnel.de> <4E96A8EE.4070701@behnel.de> Message-ID: 2011/10/13 Stefan Behnel : > Vitja Makarov, 13.10.2011 08:03: >> >> I found that tempita bug goes away if you change language_level to 2. > > There's no language level configured in Py2.4, which fails. > > https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/48/BACKEND=c,PYVERSION=py24/console > No, I mean language level 3 is set at top of the Code.py, when it's set to 2 Py2.4 build is okay. -- vitja. From markflorisson88 at gmail.com Thu Oct 13 11:56:42 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 13 Oct 2011 10:56:42 +0100 Subject: [Cython] test failure for cython-devel in Py2.4 In-Reply-To: References: <4E930C72.8080303@behnel.de> <4E9672AE.6080905@behnel.de> <4E96A8EE.4070701@behnel.de> Message-ID: On 13 October 2011 10:53, Vitja Makarov wrote: > 2011/10/13 Stefan Behnel : >> Vitja Makarov, 13.10.2011 08:03: >>> >>> I found that tempita bug goes away if you change language_level to 2. >> >> There's no language level configured in Py2.4, which fails. >> >> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/48/BACKEND=c,PYVERSION=py24/console >> > > No, I mean language level 3 is set at top of the Code.py, when it's > set to 2 Py2.4 build is okay. > > > -- > vitja. > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > Ah, it doesn't take unicode keyword arguments. That should be fixed. From markflorisson88 at gmail.com Thu Oct 13 12:18:37 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 13 Oct 2011 11:18:37 +0100 Subject: [Cython] test failure for cython-devel in Py2.4 In-Reply-To: References: <4E930C72.8080303@behnel.de> <4E9672AE.6080905@behnel.de> <4E96A8EE.4070701@behnel.de> Message-ID: On 13 October 2011 10:56, mark florisson wrote: > On 13 October 2011 10:53, Vitja Makarov wrote: >> 2011/10/13 Stefan Behnel : >>> Vitja Makarov, 13.10.2011 08:03: >>>> >>>> I found that tempita bug goes away if you change language_level to 2. >>> >>> There's no language level configured in Py2.4, which fails. >>> >>> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/48/BACKEND=c,PYVERSION=py24/console >>> >> >> No, I mean language level 3 is set at top of the Code.py, when it's >> set to 2 Py2.4 build is okay. >> >> >> -- >> vitja. >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> > > Ah, it doesn't take unicode keyword arguments. That should be fixed. > Frankly, language level 3 is rather uncomfortable to deal with in python 2(.4). Any reason it's set to 3? I'll try reverting to 2 and pushing. From stefan_ml at behnel.de Thu Oct 13 13:44:28 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 13 Oct 2011 13:44:28 +0200 Subject: [Cython] test failure for cython-devel in Py2.4 In-Reply-To: References: <4E930C72.8080303@behnel.de> <4E9672AE.6080905@behnel.de> <4E96A8EE.4070701@behnel.de> Message-ID: <4E96CF1C.1000400@behnel.de> mark florisson, 13.10.2011 12:18: > On 13 October 2011 10:56, mark florisson wrote: >> On 13 October 2011 10:53, Vitja Makarov wrote: >>> 2011/10/13 Stefan Behnel: >>>> Vitja Makarov, 13.10.2011 08:03: >>>>> >>>>> I found that tempita bug goes away if you change language_level to 2. >>>> >>>> There's no language level configured in Py2.4, which fails. >>>> >>>> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/48/BACKEND=c,PYVERSION=py24/console >>> >>> No, I mean language level 3 is set at top of the Code.py, when it's >>> set to 2 Py2.4 build is okay. >>> >> Ah, it doesn't take unicode keyword arguments. That should be fixed. > > Frankly, language level 3 is rather uncomfortable to deal with in > python 2(.4). Well, without the parentheses, I presume ... > Any reason it's set to 3? Mainly for performance reasons, especially in Python 2. Py3 code tends to run faster in Cython due to more explicit semantics. In particular, we get unicode content in and write unicode content out, so using unicode literals in the source code right away saves a decoding step for each write or interpolation of a literal string in Python 2. It won't make a difference when running Cython in Python 3, but it saves a lot of unnecessary processing cycles in Py2, even though the difference may not be substantial over a complete run. It's just so convenient to switch the language level and let that shave off a bunch of processing overhead that I didn't see a reason not to do it. I doubt that it'll make a functional difference, though, so if it works better without that option, we may have to go back to Py2 compilation. Stefan From markflorisson88 at gmail.com Thu Oct 13 13:52:07 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 13 Oct 2011 12:52:07 +0100 Subject: [Cython] test failure for cython-devel in Py2.4 In-Reply-To: <4E96CF1C.1000400@behnel.de> References: <4E930C72.8080303@behnel.de> <4E9672AE.6080905@behnel.de> <4E96A8EE.4070701@behnel.de> <4E96CF1C.1000400@behnel.de> Message-ID: On 13 October 2011 12:44, Stefan Behnel wrote: > mark florisson, 13.10.2011 12:18: >> >> On 13 October 2011 10:56, mark florisson wrote: >>> >>> On 13 October 2011 10:53, Vitja Makarov wrote: >>>> >>>> 2011/10/13 Stefan Behnel: >>>>> >>>>> Vitja Makarov, 13.10.2011 08:03: >>>>>> >>>>>> I found that tempita bug goes away if you change language_level to 2. >>>>> >>>>> There's no language level configured in Py2.4, which fails. >>>>> >>>>> >>>>> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/48/BACKEND=c,PYVERSION=py24/console >>>> >>>> No, I mean language level 3 is set at top of the Code.py, when it's >>>> set to 2 Py2.4 build is okay. >>>> >>> Ah, it doesn't take unicode keyword arguments. That should be fixed. >> >> Frankly, language level 3 is rather uncomfortable to deal with in >> python 2(.4). > > Well, without the parentheses, I presume ... Ah, it appears only 2.7 eats unicode keyword arguments. I wonder why the 2.5 and 2.6 builds didn't fail then. > >> Any reason it's set to 3? > > Mainly for performance reasons, especially in Python 2. Py3 code tends to > run faster in Cython due to more explicit semantics. In particular, we get > unicode content in and write unicode content out, so using unicode literals > in the source code right away saves a decoding step for each write or > interpolation of a literal string in Python 2. It won't make a difference > when running Cython in Python 3, but it saves a lot of unnecessary > processing cycles in Py2, even though the difference may not be substantial > over a complete run. It's just so convenient to switch the language level > and let that shave off a bunch of processing overhead that I didn't see a > reason not to do it. > > I doubt that it'll make a functional difference, though, so if it works > better without that option, we may have to go back to Py2 compilation. I see. Yeah it's sort of hard to fix, as I really need bytes in python 2 and really need unicode (str) in python 3, so I can neither write 'foo' nor b'foo' nor u'foo' with language level 3. BTW this is always a real problem in doctests too, as your bytestrings will suddenly be printed as b'foo' in python 3, which will fail your doctest. So to make it work you need to do explicit encoding/decoding to make it work everywhere. > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From markflorisson88 at gmail.com Thu Oct 13 13:54:12 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 13 Oct 2011 12:54:12 +0100 Subject: [Cython] test failure for cython-devel in Py2.4 In-Reply-To: References: <4E930C72.8080303@behnel.de> <4E9672AE.6080905@behnel.de> <4E96A8EE.4070701@behnel.de> <4E96CF1C.1000400@behnel.de> Message-ID: On 13 October 2011 12:52, mark florisson wrote: > On 13 October 2011 12:44, Stefan Behnel wrote: >> mark florisson, 13.10.2011 12:18: >>> >>> On 13 October 2011 10:56, mark florisson wrote: >>>> >>>> On 13 October 2011 10:53, Vitja Makarov wrote: >>>>> >>>>> 2011/10/13 Stefan Behnel: >>>>>> >>>>>> Vitja Makarov, 13.10.2011 08:03: >>>>>>> >>>>>>> I found that tempita bug goes away if you change language_level to 2. >>>>>> >>>>>> There's no language level configured in Py2.4, which fails. >>>>>> >>>>>> >>>>>> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/48/BACKEND=c,PYVERSION=py24/console >>>>> >>>>> No, I mean language level 3 is set at top of the Code.py, when it's >>>>> set to 2 Py2.4 build is okay. >>>>> >>>> Ah, it doesn't take unicode keyword arguments. That should be fixed. >>> >>> Frankly, language level 3 is rather uncomfortable to deal with in >>> python 2(.4). >> >> Well, without the parentheses, I presume ... > > Ah, it appears only 2.7 eats unicode keyword arguments. I wonder why > the 2.5 and 2.6 builds didn't fail then. > >> >>> Any reason it's set to 3? >> >> Mainly for performance reasons, especially in Python 2. Py3 code tends to >> run faster in Cython due to more explicit semantics. In particular, we get >> unicode content in and write unicode content out, so using unicode literals >> in the source code right away saves a decoding step for each write or >> interpolation of a literal string in Python 2. It won't make a difference >> when running Cython in Python 3, but it saves a lot of unnecessary >> processing cycles in Py2, even though the difference may not be substantial >> over a complete run. It's just so convenient to switch the language level >> and let that shave off a bunch of processing overhead that I didn't see a >> reason not to do it. >> >> I doubt that it'll make a functional difference, though, so if it works >> better without that option, we may have to go back to Py2 compilation. > > I see. Yeah it's sort of hard to fix, as I really need bytes in python > 2 and really need unicode (str) in python 3, so I can neither write > 'foo' nor b'foo' nor u'foo' with language level 3. > > BTW this is always a real problem in doctests too, as your bytestrings > will suddenly be printed as b'foo' in python 3, which will fail your > doctest. So to make it work you need to do explicit encoding/decoding > to make it work everywhere. > >> Stefan >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> > Anyway, I fixed the 2.4 build and cherry-picked the cython scope loading fix over from fused types, I'll push that to master. From stefan_ml at behnel.de Thu Oct 13 14:07:23 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 13 Oct 2011 14:07:23 +0200 Subject: [Cython] test failure for cython-devel in Py2.4 In-Reply-To: References: <4E930C72.8080303@behnel.de> <4E9672AE.6080905@behnel.de> <4E96A8EE.4070701@behnel.de> <4E96CF1C.1000400@behnel.de> Message-ID: <4E96D47B.808@behnel.de> mark florisson, 13.10.2011 13:52: > On 13 October 2011 12:44, Stefan Behnel wrote: >> mark florisson, 13.10.2011 12:18: >>> On 13 October 2011 10:56, mark florisson wrote: >>>> On 13 October 2011 10:53, Vitja Makarov wrote: >>>>> 2011/10/13 Stefan Behnel: >>>>>> Vitja Makarov, 13.10.2011 08:03: >>>>>>> I found that tempita bug goes away if you change language_level to 2. >>>>>> >>>>>> There's no language level configured in Py2.4, which fails. >>>>>> >>>>>> >>>>>> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/48/BACKEND=c,PYVERSION=py24/console >>>>> >>>>> No, I mean language level 3 is set at top of the Code.py, when it's >>>>> set to 2 Py2.4 build is okay. >>>>> >>>> Ah, it doesn't take unicode keyword arguments. That should be fixed. >>> >>> Frankly, language level 3 is rather uncomfortable to deal with in >>> python 2(.4). >> >> Well, without the parentheses, I presume ... > > Ah, it appears only 2.7 eats unicode keyword arguments. I wonder why > the 2.5 and 2.6 builds didn't fail then. Ah, right, I remember having to work around this in the C code at some point. That's where the "identifier" kind of string in Cython originated from. >>> Any reason it's set to 3? >> >> Mainly for performance reasons, especially in Python 2. Py3 code tends to >> run faster in Cython due to more explicit semantics. In particular, we get >> unicode content in and write unicode content out, so using unicode literals >> in the source code right away saves a decoding step for each write or >> interpolation of a literal string in Python 2. It won't make a difference >> when running Cython in Python 3, but it saves a lot of unnecessary >> processing cycles in Py2, even though the difference may not be substantial >> over a complete run. It's just so convenient to switch the language level >> and let that shave off a bunch of processing overhead that I didn't see a >> reason not to do it. >> >> I doubt that it'll make a functional difference, though, so if it works >> better without that option, we may have to go back to Py2 compilation. > > I see. Yeah it's sort of hard to fix, as I really need bytes in python > 2 and really need unicode (str) in python 3, so I can neither write > 'foo' nor b'foo' nor u'foo' with language level 3. You can either pass the keyword arguments explicitly in the code or use something like dict(foo=1) to get a dict of keyword arguments (also works in Py2.4). Cython will turn the names into identifier strings, i.e. bytes in Py2 and unicode in Py3, as required for keyword arguments. > BTW this is always a real problem in doctests too, as your bytestrings > will suddenly be printed as b'foo' in python 3, which will fail your > doctest. So to make it work you need to do explicit encoding/decoding > to make it work everywhere. I usually either wrap them in a helper function or, as you say, put a .decode() at the end. However, that fails to test explicitly for a byte string in Py2, as .decode() also tends to work for ASCII-only unicode strings there... Let's hope that Py2 won't take a decade to die. Stefan From vitja.makarov at gmail.com Thu Oct 13 20:33:34 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Thu, 13 Oct 2011 22:33:34 +0400 Subject: [Cython] test failure for cython-devel in Py2.4 In-Reply-To: References: <4E930C72.8080303@behnel.de> <4E9672AE.6080905@behnel.de> <4E96A8EE.4070701@behnel.de> <4E96CF1C.1000400@behnel.de> Message-ID: 2011/10/13 mark florisson : > On 13 October 2011 12:52, mark florisson wrote: >> On 13 October 2011 12:44, Stefan Behnel wrote: >>> mark florisson, 13.10.2011 12:18: >>>> >>>> On 13 October 2011 10:56, mark florisson wrote: >>>>> >>>>> On 13 October 2011 10:53, Vitja Makarov wrote: >>>>>> >>>>>> 2011/10/13 Stefan Behnel: >>>>>>> >>>>>>> Vitja Makarov, 13.10.2011 08:03: >>>>>>>> >>>>>>>> I found that tempita bug goes away if you change language_level to 2. >>>>>>> >>>>>>> There's no language level configured in Py2.4, which fails. >>>>>>> >>>>>>> >>>>>>> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/48/BACKEND=c,PYVERSION=py24/console >>>>>> >>>>>> No, I mean language level 3 is set at top of the Code.py, when it's >>>>>> set to 2 Py2.4 build is okay. >>>>>> >>>>> Ah, it doesn't take unicode keyword arguments. That should be fixed. >>>> >>>> Frankly, language level 3 is rather uncomfortable to deal with in >>>> python 2(.4). >>> >>> Well, without the parentheses, I presume ... >> >> Ah, it appears only 2.7 eats unicode keyword arguments. I wonder why >> the 2.5 and 2.6 builds didn't fail then. >> >>> >>>> Any reason it's set to 3? >>> >>> Mainly for performance reasons, especially in Python 2. Py3 code tends to >>> run faster in Cython due to more explicit semantics. In particular, we get >>> unicode content in and write unicode content out, so using unicode literals >>> in the source code right away saves a decoding step for each write or >>> interpolation of a literal string in Python 2. It won't make a difference >>> when running Cython in Python 3, but it saves a lot of unnecessary >>> processing cycles in Py2, even though the difference may not be substantial >>> over a complete run. It's just so convenient to switch the language level >>> and let that shave off a bunch of processing overhead that I didn't see a >>> reason not to do it. >>> >>> I doubt that it'll make a functional difference, though, so if it works >>> better without that option, we may have to go back to Py2 compilation. >> >> I see. Yeah it's sort of hard to fix, as I really need bytes in python >> 2 and really need unicode (str) in python 3, so I can neither write >> 'foo' nor b'foo' nor u'foo' with language level 3. >> >> BTW this is always a real problem in doctests too, as your bytestrings >> will suddenly be printed as b'foo' in python 3, which will fail your >> doctest. So to make it work you need to do explicit encoding/decoding >> to make it work everywhere. >> >>> Stefan >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >>> >> > > Anyway, I fixed the 2.4 build and cherry-picked the cython scope > loading fix over from fused types, I'll push that to master. Cool! But py3k pyregr is no red due to SIGSEGV, is that python problem: https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr/BACKEND=c,PYVERSION=py3k/26/console -- vitja. From stefan_ml at behnel.de Thu Oct 13 21:22:33 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 13 Oct 2011 21:22:33 +0200 Subject: [Cython] test failure for cython-devel in Py2.4 In-Reply-To: References: <4E930C72.8080303@behnel.de> <4E9672AE.6080905@behnel.de> <4E96A8EE.4070701@behnel.de> <4E96CF1C.1000400@behnel.de> Message-ID: <4E973A79.4030504@behnel.de> Vitja Makarov, 13.10.2011 20:33: > But py3k pyregr is no red due to SIGSEGV, is that python problem: > > https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr/BACKEND=c,PYVERSION=py3k/26/console Not sure, but rather likely. The PEP393 implementation is still being worked on. That also makes it still a moving target to which the current code in Cython may not fit exactly. Worth investigating. I'll see if I find a bit of time for updating my local py3k installation this weekend to run some tests with it. Stefan From markflorisson88 at gmail.com Thu Oct 13 22:37:30 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 13 Oct 2011 21:37:30 +0100 Subject: [Cython] test failure for cython-devel in Py2.4 In-Reply-To: <4E973A79.4030504@behnel.de> References: <4E930C72.8080303@behnel.de> <4E9672AE.6080905@behnel.de> <4E96A8EE.4070701@behnel.de> <4E96CF1C.1000400@behnel.de> <4E973A79.4030504@behnel.de> Message-ID: On 13 October 2011 20:22, Stefan Behnel wrote: > Vitja Makarov, 13.10.2011 20:33: >> >> But py3k pyregr is no red due to SIGSEGV, is that python problem: >> >> >> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr/BACKEND=c,PYVERSION=py3k/26/console > > Not sure, but rather likely. The PEP393 implementation is still being worked > on. That also makes it still a moving target to which the current code in > Cython may not fit exactly. Worth investigating. I'll see if I find a bit of > time for updating my local py3k installation this weekend to run some tests > with it. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > BTW, the recent test changes on Jenkins really made it a lot better than what we had before, thanks Stefan! It's now much easier to clone and configure jobs and see when things go awry. From stefan_ml at behnel.de Fri Oct 14 15:02:38 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 14 Oct 2011 15:02:38 +0200 Subject: [Cython] unexpected side-effect in cython utility code Message-ID: <4E9832EE.7010002@behnel.de> Hi, I started working on better malloc() support and wrote this code as a test to get going: """ cimport cython def test_malloc(int n): with cython.malloc(n*sizeof(int)) as m: for i in range(n): m[i] = i l = [ m[i] for i in range(n) ] return l """ Now, when I compile this normally, I get a compiler error about "malloc" not being a cython attribute. However, when I do the same in the test runner, it compiles without errors and crashes when trying to run the test. The code it generates for the 'with' statement above starts like this: """ __pyx_t_1 = PyObject_GetAttr(((PyObject *)malloc((__pyx_v_n * (sizeof(int))))), __pyx_n_s____exit__); /*...*/ """ It appears that something has declared malloc(). I'm pretty sure it's this code in UtilityCode.py: """ def declare_in_scope(self, dest_scope, used=False, cython_scope=None): """ Declare all entries from the utility code in dest_scope. Code will only be included for used entries. If module_name is given, declare the type entries with that name. """ tree = self.get_tree(entries_only=True, cython_scope=cython_scope) entries = tree.scope.entries entries.pop('__name__') entries.pop('__file__') entries.pop('__builtins__') entries.pop('__doc__') for name, entry in entries.iteritems(): entry.utility_code_definition = self entry.used = used """ Basically, it declares everything it finds except for an explicit blacklist. Bad design. As I argued before, it should use a whitelist in the utility code file instead, which specifically lists the names that should be public. Everything else should just be considered implementation details. Stefan From markflorisson88 at gmail.com Fri Oct 14 17:18:19 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 14 Oct 2011 16:18:19 +0100 Subject: [Cython] unexpected side-effect in cython utility code In-Reply-To: <4E9832EE.7010002@behnel.de> References: <4E9832EE.7010002@behnel.de> Message-ID: On 14 October 2011 14:02, Stefan Behnel wrote: > Hi, > > I started working on better malloc() support and wrote this code as a test > to get going: > > """ > cimport cython > > def test_malloc(int n): > ? ?with cython.malloc(n*sizeof(int)) as m: > ? ? ? ?for i in range(n): > ? ? ? ? ? ?m[i] = i > ? ? ? ?l = [ m[i] for i in range(n) ] > ? ?return l > """ > > Now, when I compile this normally, I get a compiler error about "malloc" not > being a cython attribute. However, when I do the same in the test runner, it > compiles without errors and crashes when trying to run the test. The code it > generates for the 'with' statement above starts like this: > > """ > ? ?__pyx_t_1 = PyObject_GetAttr(((PyObject *)malloc((__pyx_v_n * > (sizeof(int))))), __pyx_n_s____exit__); /*...*/ > """ > > It appears that something has declared malloc(). I'm pretty sure it's this > code in UtilityCode.py: > > """ > ? ?def declare_in_scope(self, dest_scope, used=False, cython_scope=None): > ? ? ? ?""" > ? ? ? ?Declare all entries from the utility code in dest_scope. Code will > ? ? ? ?only be included for used entries. If module_name is given, > ? ? ? ?declare the type entries with that name. > ? ? ? ?""" > ? ? ? ?tree = self.get_tree(entries_only=True, cython_scope=cython_scope) > > ? ? ? ?entries = tree.scope.entries > ? ? ? ?entries.pop('__name__') > ? ? ? ?entries.pop('__file__') > ? ? ? ?entries.pop('__builtins__') > ? ? ? ?entries.pop('__doc__') > > ? ? ? ?for name, entry in entries.iteritems(): > ? ? ? ? ? ?entry.utility_code_definition = self > ? ? ? ? ? ?entry.used = used > """ > > Basically, it declares everything it finds except for an explicit blacklist. > Bad design. As I argued before, it should use a whitelist in the utility > code file instead, which specifically lists the names that should be public. > Everything else should just be considered implementation details. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > That would indeed be better, I just never got around to it and I must admit that it was never my priority. About cython.malloc, wouldn't it be nicer if we had automatic (stack or heap allocated) arrays? e.g. def func(int n): cdef int array[n] I think you usually have homogenous data anyway. When you return, it simply goes out of scope like normal automatic variables, I see no clear advantage to 'with' here. From markflorisson88 at gmail.com Fri Oct 14 18:18:40 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 14 Oct 2011 17:18:40 +0100 Subject: [Cython] unexpected side-effect in cython utility code In-Reply-To: References: <4E9832EE.7010002@behnel.de> Message-ID: On 14 October 2011 16:18, mark florisson wrote: > On 14 October 2011 14:02, Stefan Behnel wrote: >> Hi, >> >> I started working on better malloc() support and wrote this code as a test >> to get going: >> >> """ >> cimport cython >> >> def test_malloc(int n): >> ? ?with cython.malloc(n*sizeof(int)) as m: >> ? ? ? ?for i in range(n): >> ? ? ? ? ? ?m[i] = i >> ? ? ? ?l = [ m[i] for i in range(n) ] >> ? ?return l >> """ >> >> Now, when I compile this normally, I get a compiler error about "malloc" not >> being a cython attribute. However, when I do the same in the test runner, it >> compiles without errors and crashes when trying to run the test. The code it >> generates for the 'with' statement above starts like this: >> >> """ >> ? ?__pyx_t_1 = PyObject_GetAttr(((PyObject *)malloc((__pyx_v_n * >> (sizeof(int))))), __pyx_n_s____exit__); /*...*/ >> """ >> >> It appears that something has declared malloc(). I'm pretty sure it's this >> code in UtilityCode.py: >> >> """ >> ? ?def declare_in_scope(self, dest_scope, used=False, cython_scope=None): >> ? ? ? ?""" >> ? ? ? ?Declare all entries from the utility code in dest_scope. Code will >> ? ? ? ?only be included for used entries. If module_name is given, >> ? ? ? ?declare the type entries with that name. >> ? ? ? ?""" >> ? ? ? ?tree = self.get_tree(entries_only=True, cython_scope=cython_scope) >> >> ? ? ? ?entries = tree.scope.entries >> ? ? ? ?entries.pop('__name__') >> ? ? ? ?entries.pop('__file__') >> ? ? ? ?entries.pop('__builtins__') >> ? ? ? ?entries.pop('__doc__') >> >> ? ? ? ?for name, entry in entries.iteritems(): >> ? ? ? ? ? ?entry.utility_code_definition = self >> ? ? ? ? ? ?entry.used = used >> """ >> >> Basically, it declares everything it finds except for an explicit blacklist. >> Bad design. As I argued before, it should use a whitelist in the utility >> code file instead, which specifically lists the names that should be public. >> Everything else should just be considered implementation details. >> >> Stefan >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> > > That would indeed be better, I just never got around to it and I must > admit that it was never my priority. > > About cython.malloc, wouldn't it be nicer if we had automatic (stack > or heap allocated) arrays? e.g. > > def func(int n): > ? ?cdef int array[n] > > I think you usually have homogenous data anyway. When you return, it > simply goes out of scope like normal automatic variables, I see no > clear advantage to 'with' here. > Actually these whitelists are really uncomfortable to work with (which is one of the reasons I didn't use them). I think a decorator like '@public' or some such would be nicer here. From robertwb at math.washington.edu Fri Oct 14 20:31:16 2011 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 14 Oct 2011 11:31:16 -0700 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> <4E955180.1070601@astro.uio.no> Message-ID: On Wed, Oct 12, 2011 at 7:55 AM, mark florisson wrote: >>> I ultimately feel things like that is more important than 100% coverage of >>> the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit. >> >> +1 Prange handles the (corse-grained) SIMD case nicely, and a >> task/futures model based on closures would I think flesh this out to >> the next level of generality (and complexity). > > Futures are definitely nice. I suppose I think really like "inline > futures", i.e. openmp tasks. I realize that futures may look more > pythonic. However, as mentioned previously, I also see issues with > that. When you submit a task then you expect a future object, which > you might want to pass around. But we don't have the GIL for that. I > personally feel that futures is something that should be done by a > library (such as concurrent.futures in python 3.2), and inline tasks > by a language. It also means I have to write an entire function or > closure for perhaps only a few lines of code. > > I might also want to submit other functions that are not closures, or > I might want to reuse my closures that are used for tasks and for > something else. So what if my tasks contain more parallel constructs? > e.g. what if I have a task closure that I return from my function that > generates more tasks itself? Would you just execute them sequentially > outside of the parallel construct, or would you simply disallow that? > Also, do you restrict future "objects" to only the parallel section? > > Another problem is that you can only wait on tasks of your direct > children. So what if I get access to my parent's future object > (assuming you allow tasks to generate tasks), and then want the result > of my parent? > Or what if I store these future objects in an array or list and access > them arbitrarily? You will only know at runtime which task to wait on, > and openmp only has a static, lexical taskwait. > > I suppose my point is that without either a drastic rewrite (e.g., use > pthreads instead of openmp) or quite a bit of contraints, I am unsure > how futures would work here. Perhaps you guys have some concrete > syntax and semantics proposals? It feels to me that OpenMP tasks took a different model of parallelism and tried to force them into the OpenMP model/constraints, and so it'd be even more difficult to fit them into a nice pythonic interface. Perhaps to make progress on this front we need to have a concrete example to look at. I'm also wondering if the standard threading module (perhaps with overlay support) used with nogil functions would be sufficient--locking is required for handling the queues, etc. so the fact that the GIL is involved is not a big deal. It is possible that this won't scale to as small of work units, but the overhead should be minimal once your work unit is a sufficient size (which is probably quite small) and it's already implemented and well documented/used. As for critical and barrier, the notion of a critical block as a with statement is very useful. Creating/naming locks (rather than being implicit on the file/line number) is more powerful, but is a larger burden on the user and more difficult to support with the OpenMP backend. barrier, if supported, should be a function call not a context. Not as critical as with the tasks case, but a good example to see how it flows would be useful here as well. As for single, I see doing this manually does require boilerplate locking, so what about if cython.parallel.once(): # will return True once for a tread group. ... we could implement this via our own locking/checking/flushing to allow it to occur in arbitrary expressions, e.g. special_worker = cython.parallel.once() if special_worker: ... [common code] if special_worker: # single wouldn't work here ... - Robert From markflorisson88 at gmail.com Fri Oct 14 22:07:07 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 14 Oct 2011 21:07:07 +0100 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> <4E955180.1070601@astro.uio.no> Message-ID: On 14 October 2011 19:31, Robert Bradshaw wrote: > On Wed, Oct 12, 2011 at 7:55 AM, mark florisson > wrote: >>>> I ultimately feel things like that is more important than 100% coverage of >>>> the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit. >>> >>> +1 Prange handles the (corse-grained) SIMD case nicely, and a >>> task/futures model based on closures would I think flesh this out to >>> the next level of generality (and complexity). >> >> Futures are definitely nice. I suppose I think really like "inline >> futures", i.e. openmp tasks. I realize that futures may look more >> pythonic. However, as mentioned previously, I also see issues with >> that. When you submit a task then you expect a future object, which >> you might want to pass around. But we don't have the GIL for that. I >> personally feel that futures is something that should be done by a >> library (such as concurrent.futures in python 3.2), and inline tasks >> by a language. It also means I have to write an entire function or >> closure for perhaps only a few lines of code. >> >> I might also want to submit other functions that are not closures, or >> I might want to reuse my closures that are used for tasks and for >> something else. So what if my tasks contain more parallel constructs? >> e.g. what if I have a task closure that I return from my function that >> generates more tasks itself? Would you just execute them sequentially >> outside of the parallel construct, or would you simply disallow that? >> Also, do you restrict future "objects" to only the parallel section? >> >> Another problem is that you can only wait on tasks of your direct >> children. So what if I get access to my parent's future object >> (assuming you allow tasks to generate tasks), and then want the result >> of my parent? >> Or what if I store these future objects in an array or list and access >> them arbitrarily? You will only know at runtime which task to wait on, >> and openmp only has a static, lexical taskwait. >> >> I suppose my point is that without either a drastic rewrite (e.g., use >> pthreads instead of openmp) or quite a bit of contraints, I am unsure >> how futures would work here. Perhaps you guys have some concrete >> syntax and semantics proposals? > > It feels to me that OpenMP tasks took a different model of parallelism > and tried to force them into the OpenMP model/constraints, and so it'd > be even more difficult to fit them into a nice pythonic interface. > Perhaps to make progress on this front we need to have a concrete > example to look at. I'm also wondering if the standard threading > module (perhaps with overlay support) used with nogil functions would > be sufficient--locking is required for handling the queues, etc. so > the fact that the GIL is involved is not a big deal. It is possible > that this won't scale to as small of work units, but the overhead > should be minimal once your work unit is a sufficient size (which is > probably quite small) and it's already implemented and well > documented/used. It's all definitely possible with normal threads, but the thing you lose is convenience and conciseness. For big problems the programmer might sum up the courage and effort to implement it, but typically you will just stick to a serial version. This is really where OpenMP is powerful, you can take a simple sequential piece of code and make it parallel with minimal effort and without having to restructure, rethink and rewrite your algorithms. Something like concurrent.futures is definitely nice, but most people cannot afford to mandate python 3.2 for their users. The most classical examples I can think of for tasks are 1) independent code sections, i.e. two or more pieces of code that don't depend on each other which you want to execute in parallel 2) traversal of some kind of custom data structure, like a tree or a linked list 3) some kind of other producer/consumer model e.g. using with task syntax: cdef postorder_traverse(tree *t): # bullet 1) and 2) with task: traverse(t.left) with task: traverse(t.right) taskwait() # wait until we traversed our subtrees use(t.data) cdef list_traverse(linkedlist *L): # bullet 2) with nogil, parallel(): if threadid() == 0: while L.next: with task: do_something(L.data) In the latter case we don't need a taskwait as we don't care about any particular order. Only one thread generates the tasks where the others just hit the barrier and see the tasks they can execute. The good thing is that the OpenMP runtime can decide at task generation point (not only at taskwait or barrier points!) decide to stop generating more tasks and start executing them. So you won't exhaust memory if you might have lots of tasks. > As for critical and barrier, the notion of a critical block as a with > statement is very useful. Creating/naming locks (rather than being > implicit on the file/line number) is more powerful, but is a larger > burden on the user and more difficult to support with the OpenMP > backend. Actually, as I mentioned before, critical sections do not at all depend on their line or file number. All they depend on their implicit or explicit name (the name is implicit when you simply omit it, so all unnamed critical sections exclude each other). Indeed, supporting creation of locks dynamically and allowing them to be passed around arbitrarily would be hard (and likely not worth the effort). Naming them is trivial though, which might not be incredibly pythonic but is very convenient, easy and readable. > barrier, if supported, should be a function call not a > context. Not as critical as with the tasks case, but a good example to > see how it flows would be useful here as well. I agree, it really doesn't have any associated code and trying to associate code with it is likely more confusing than meaningful. It was just an idea. Often you can rely on implicit barriers from e.g. prange, but not always. I can't think of any real-world example, but you usually need it to ensure that everyone gets a sane view on some shared data, e.g. with nogil, parallel(): array[threadid()] = func(threadid()) barrier() use array[threadid() + 1 % omp_num_threads()] # access data of some neighbour This is a rather contrived example, but (see below) it would be especially useful if you use single/master/once/first that sets some shared data everyone will operate on (for instance in a prange). To ensure the data is sane before you use it, you have to put the barrier to 1) ensure the data has been written and 2) that the data has been flushed. Basically, you'll always know when you need a barrier, but it's pretty hard to come up with a real-world example for it when you have to :) > As for single, I see doing this manually does require boilerplate > locking, so what about > > if cython.parallel.once(): ?# will return True once for a tread group. > ? ?... > > we could implement this via our own locking/checking/flushing to allow > it to occur in arbitrary expressions, e.g. > > special_worker = cython.parallel.once() > if special_worker: > ? ... > [common code] > if special_worker: ? # single wouldn't work here > ? ... > That looks OK. I've actually been thinking that if we have barriers we don't really need is_master(), once() or single() or anything. We already have threadid() and you usually don't care what thread gets there first, you only care about doing it once. So one could just write if parallel.threadid() == 0: ... parallel.barrier() # if required It might also be convenient to declare variables explicitly shared here, e.g. this code will not work: cdef int *buf with nogil, parallel.parallel(): if parallel.threadid() == 0: buf = ... parallel.barrier() # will will likely segfault, as buf is private because we assigned to it. It's only valid in thread 0 use buf[...] So basically you'd have to do something like (&buf)[0][...], which frankly looks pretty weird. However I do think such cases are rather uncommon. > - Robert > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From markflorisson88 at gmail.com Fri Oct 14 22:18:14 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 14 Oct 2011 21:18:14 +0100 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> <4E955180.1070601@astro.uio.no> Message-ID: On 14 October 2011 21:07, mark florisson wrote: > On 14 October 2011 19:31, Robert Bradshaw wrote: >> On Wed, Oct 12, 2011 at 7:55 AM, mark florisson >> wrote: >>>>> I ultimately feel things like that is more important than 100% coverage of >>>>> the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit. >>>> >>>> +1 Prange handles the (corse-grained) SIMD case nicely, and a >>>> task/futures model based on closures would I think flesh this out to >>>> the next level of generality (and complexity). >>> >>> Futures are definitely nice. I suppose I think really like "inline >>> futures", i.e. openmp tasks. I realize that futures may look more >>> pythonic. However, as mentioned previously, I also see issues with >>> that. When you submit a task then you expect a future object, which >>> you might want to pass around. But we don't have the GIL for that. I >>> personally feel that futures is something that should be done by a >>> library (such as concurrent.futures in python 3.2), and inline tasks >>> by a language. It also means I have to write an entire function or >>> closure for perhaps only a few lines of code. >>> >>> I might also want to submit other functions that are not closures, or >>> I might want to reuse my closures that are used for tasks and for >>> something else. So what if my tasks contain more parallel constructs? >>> e.g. what if I have a task closure that I return from my function that >>> generates more tasks itself? Would you just execute them sequentially >>> outside of the parallel construct, or would you simply disallow that? >>> Also, do you restrict future "objects" to only the parallel section? >>> >>> Another problem is that you can only wait on tasks of your direct >>> children. So what if I get access to my parent's future object >>> (assuming you allow tasks to generate tasks), and then want the result >>> of my parent? >>> Or what if I store these future objects in an array or list and access >>> them arbitrarily? You will only know at runtime which task to wait on, >>> and openmp only has a static, lexical taskwait. >>> >>> I suppose my point is that without either a drastic rewrite (e.g., use >>> pthreads instead of openmp) or quite a bit of contraints, I am unsure >>> how futures would work here. Perhaps you guys have some concrete >>> syntax and semantics proposals? >> >> It feels to me that OpenMP tasks took a different model of parallelism >> and tried to force them into the OpenMP model/constraints, and so it'd >> be even more difficult to fit them into a nice pythonic interface. >> Perhaps to make progress on this front we need to have a concrete >> example to look at. I'm also wondering if the standard threading >> module (perhaps with overlay support) used with nogil functions would >> be sufficient--locking is required for handling the queues, etc. so >> the fact that the GIL is involved is not a big deal. It is possible >> that this won't scale to as small of work units, but the overhead >> should be minimal once your work unit is a sufficient size (which is >> probably quite small) and it's already implemented and well >> documented/used. > > It's all definitely possible with normal threads, but the thing you > lose is convenience and conciseness. For big problems the programmer > might sum up the courage and effort to implement it, but typically you > will just stick to a serial version. This is really where OpenMP is > powerful, you can take a simple sequential piece of code and make it > parallel with minimal effort and without having to restructure, > rethink and rewrite your algorithms. > > Something like concurrent.futures is definitely nice, but most people > cannot afford to mandate python 3.2 for their users. > > The most classical examples I can think of for tasks are > > 1) independent code sections, i.e. two or more pieces of code that > don't depend on each other which you want to execute in parallel > 2) traversal of some kind of custom data structure, like a tree or a linked list > 3) some kind of other producer/consumer model > > e.g. using with task syntax: > > cdef postorder_traverse(tree *t): # bullet 1) and 2) > ? ?with task: > ? ? ? ?traverse(t.left) > ? ?with task: > ? ? ? ?traverse(t.right) > > ? ?taskwait() # wait until we traversed our subtrees > ? ?use(t.data) > > cdef list_traverse(linkedlist *L): # bullet 2) > ? ?with nogil, parallel(): > ? ? ? ?if threadid() == 0: > ? ? ? ? ? ?while L.next: > ? ? ? ? ? ? ? ?with task: > ? ? ? ? ? ? ? ? ? ?do_something(L.data) > > In the latter case we don't need a taskwait as we don't care about any > particular order. Only one thread generates the tasks where the others > just hit the barrier and see the tasks they can execute. > > The good thing is that the OpenMP runtime can decide at task > generation point (not only at taskwait or barrier points!) decide to > stop generating more tasks and start executing them. So you won't > exhaust memory if you might have lots of tasks. > >> As for critical and barrier, the notion of a critical block as a with >> statement is very useful. Creating/naming locks (rather than being >> implicit on the file/line number) is more powerful, but is a larger >> burden on the user and more difficult to support with the OpenMP >> backend. > > Actually, as I mentioned before, critical sections do not at all > depend on their line or file number. All they depend on their implicit > or explicit name (the name is implicit when you simply omit it, so all > unnamed critical sections exclude each other). > Indeed, supporting creation of locks dynamically and allowing them to > be passed around arbitrarily would be hard (and likely not worth the > effort). Naming them is trivial though, which might not be incredibly > pythonic but is very convenient, easy and readable. > >> barrier, if supported, should be a function call not a >> context. Not as critical as with the tasks case, but a good example to >> see how it flows would be useful here as well. > > I agree, it really doesn't have any associated code and trying to > associate code with it is likely more confusing than meaningful. It > was just an idea. > Often you can rely on implicit barriers from e.g. prange, but not > always. I can't think of any real-world example, but you usually need > it to ensure that everyone gets a sane view on some shared data, e.g. > > with nogil, parallel(): > ? ?array[threadid()] = func(threadid()) > ? ?barrier() > ? ?use array[threadid() + 1 % omp_num_threads()] # access data of > some neighbour > > This is a rather contrived example, but (see below) it would be > especially useful if you use single/master/once/first that sets some > shared data everyone will operate on (for instance in a prange). To > ensure the data is sane before you use it, you have to put the barrier > to 1) ensure the data has been written and 2) that the data has been > flushed. > > Basically, you'll always know when you need a barrier, but it's pretty > hard to come up with a real-world example for it when you have to :) > >> As for single, I see doing this manually does require boilerplate >> locking, so what about >> >> if cython.parallel.once(): ?# will return True once for a tread group. >> ? ?... >> >> we could implement this via our own locking/checking/flushing to allow >> it to occur in arbitrary expressions, e.g. >> >> special_worker = cython.parallel.once() >> if special_worker: >> ? ... >> [common code] >> if special_worker: ? # single wouldn't work here >> ? ... >> > > That looks OK. I've actually been thinking that if we have barriers we > don't really need is_master(), once() or single() or anything. We > already have threadid() and you usually don't care what thread gets > there first, you only care about doing it once. So one could just > write > > if parallel.threadid() == 0: > ? ?... > > parallel.barrier() # if required > > It might also be convenient to declare variables explicitly shared > here, e.g. this code will not work: > > cdef int *buf > > with nogil, parallel.parallel(): > ? ?if parallel.threadid() == 0: > ? ? ? ?buf = ... > > ? ?parallel.barrier() > > ? ?# will will likely segfault, as buf is private because we assigned > to it. It's only valid in thread 0 > ? ?use buf[...] > > So basically you'd have to do something like (&buf)[0][...], which > frankly looks pretty weird. However I do think such cases are rather > uncommon. > >> - Robert >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> > BTW, I think orphaned constructs might also really be worth our while. Suppose you have a piece of code: execute so many times: do some work we want to do in parallel compute something that needs to happen sequentially And also suppose that the two things we do in the loop might be factored out into separate functions. Now we could have a prange()/parallel() for the parallel work, but that means we have to start up a new parallel section every time. If we're unlucky, the user might also innocently release the GIL as well. There is a significant performance penalty to this, i.e. it would be vastly more efficient to do the following: with nogil, parallel(): do so many times: my_parallel_function() if threadid() == 0: compute something that needs to happen sequentially cdef void my_parallel_function(...): for i in prange(..., orphan=True): # workshare this loop with the other threads in the team ... This is not currently possible. Currently, every thread would call my_parallel_function, and every function call would do the same computations and not share any work. You can only currently avoid the overhead by writing all your code in the one function. Another possibility for a keyword argument is 'worksharing', but that would suggest normal prange()s don't share work. What do you guys think, is this too confusing for people? I think this is really reasonably common-ish situation. From stefan_ml at behnel.de Sat Oct 15 10:30:24 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 15 Oct 2011 10:30:24 +0200 Subject: [Cython] test failure for cython-devel in Py2.4 In-Reply-To: <4E973A79.4030504@behnel.de> References: <4E930C72.8080303@behnel.de> <4E9672AE.6080905@behnel.de> <4E96A8EE.4070701@behnel.de> <4E96CF1C.1000400@behnel.de> <4E973A79.4030504@behnel.de> Message-ID: <4E9944A0.8090409@behnel.de> Stefan Behnel, 13.10.2011 21:22: > Vitja Makarov, 13.10.2011 20:33: >> But py3k pyregr is no red due to SIGSEGV, is that python problem: >> >> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr/BACKEND=c,PYVERSION=py3k/26/console > > Not sure, but rather likely. The PEP393 implementation is still being > worked on. That also makes it still a moving target to which the current > code in Cython may not fit exactly. Worth investigating. I'll see if I find > a bit of time for updating my local py3k installation this weekend to run > some tests with it. Given that the benchmark runs with the optimised CPython build still work, my guess is that this is a problem only with the debug builds of CPython, which we use in all test runs (and for good reason...). Stefan From vitja.makarov at gmail.com Sat Oct 15 11:07:35 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Sat, 15 Oct 2011 13:07:35 +0400 Subject: [Cython] test failure for cython-devel in Py2.4 In-Reply-To: <4E9944A0.8090409@behnel.de> References: <4E930C72.8080303@behnel.de> <4E9672AE.6080905@behnel.de> <4E96A8EE.4070701@behnel.de> <4E96CF1C.1000400@behnel.de> <4E973A79.4030504@behnel.de> <4E9944A0.8090409@behnel.de> Message-ID: 2011/10/15 Stefan Behnel : > Stefan Behnel, 13.10.2011 21:22: >> >> Vitja Makarov, 13.10.2011 20:33: >>> >>> But py3k pyregr is no red due to SIGSEGV, is that python problem: >>> >>> >>> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr/BACKEND=c,PYVERSION=py3k/26/console >> >> Not sure, but rather likely. The PEP393 implementation is still being >> worked on. That also makes it still a moving target to which the current >> code in Cython may not fit exactly. Worth investigating. I'll see if I >> find >> a bit of time for updating my local py3k installation this weekend to run >> some tests with it. > > Given that the benchmark runs with the optimised CPython build still work, > my guess is that this is a problem only with the debug builds of CPython, > which we use in all test runs (and for good reason...). > It's something wrong with py2.7 pyregr build "'exec' currently requires a target mapping (globals/locals)" is still there after you merged my exec/eval branch. I can't reproduce it on localhost, actually it works just fine, I tried test_binop -- vitja. From vitja.makarov at gmail.com Sat Oct 15 11:17:12 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Sat, 15 Oct 2011 13:17:12 +0400 Subject: [Cython] test failure for cython-devel in Py2.4 In-Reply-To: References: <4E930C72.8080303@behnel.de> <4E9672AE.6080905@behnel.de> <4E96A8EE.4070701@behnel.de> <4E96CF1C.1000400@behnel.de> <4E973A79.4030504@behnel.de> <4E9944A0.8090409@behnel.de> Message-ID: 2011/10/15 Vitja Makarov : > 2011/10/15 Stefan Behnel : >> Stefan Behnel, 13.10.2011 21:22: >>> >>> Vitja Makarov, 13.10.2011 20:33: >>>> >>>> But py3k pyregr is no red due to SIGSEGV, is that python problem: >>>> >>>> >>>> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr/BACKEND=c,PYVERSION=py3k/26/console >>> >>> Not sure, but rather likely. The PEP393 implementation is still being >>> worked on. That also makes it still a moving target to which the current >>> code in Cython may not fit exactly. Worth investigating. I'll see if I >>> find >>> a bit of time for updating my local py3k installation this weekend to run >>> some tests with it. >> >> Given that the benchmark runs with the optimised CPython build still work, >> my guess is that this is a problem only with the debug builds of CPython, >> which we use in all test runs (and for good reason...). >> > > It's something wrong with py2.7 pyregr build "'exec' currently > requires a target mapping (globals/locals)" > is still there after you merged my exec/eval branch. I can't reproduce > it on localhost, actually it works just fine, I tried test_binop > Right. That build was triggered by previous commits: https://sage.math.washington.edu:8091/hudson/job/cython-devel-sdist/changes?from=686&to=687 Pyregr shows regression from 11876 -> 11596 for last 3 builds https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr/BACKEND=c,PYVERSION=py27/ -- vitja. From stefan_ml at behnel.de Sat Oct 15 11:17:07 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 15 Oct 2011 11:17:07 +0200 Subject: [Cython] test failure for cython-devel in Py2.4 In-Reply-To: References: <4E930C72.8080303@behnel.de> <4E9672AE.6080905@behnel.de> <4E96A8EE.4070701@behnel.de> <4E96CF1C.1000400@behnel.de> <4E973A79.4030504@behnel.de> <4E9944A0.8090409@behnel.de> Message-ID: <4E994F93.9040401@behnel.de> Vitja Makarov, 15.10.2011 11:07: > It's something wrong with py2.7 pyregr build "'exec' currently > requires a target mapping (globals/locals)" > is still there after you merged my exec/eval branch. I can't reproduce > it on localhost, actually it works just fine, I tried test_binop The tests simply haven't run yet. You can see that from the dependencies on the build page, e.g. https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr/BACKEND=c,PYVERSION=py27/33/ Oh, and it would be good if you started a new ML thread to discuss a new topic. Stefan From vitja.makarov at gmail.com Sat Oct 15 11:26:26 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Sat, 15 Oct 2011 13:26:26 +0400 Subject: [Cython] Pyregr regressions Message-ID: Hi! Recent commits to the master introduced pyregr regressions. You can see it here, just sort by age: https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr/BACKEND=c,PYVERSION=py27/33/testReport/ Here is one example: ====================================================================== ERROR: runTest (__main__.CythonPyregrTestCase) compiling (c) and running test_pipes ---------------------------------------------------------------------- Traceback (most recent call last): File "runtests.py", line 679, in run self.runCompileTest() File "runtests.py", line 491, in runCompileTest self.test_directory, self.expect_errors, self.annotate) File "runtests.py", line 656, in compile self.assertEquals(None, unexpected_error) AssertionError: None != u"39:14: Object of type '' has no attribute 'open'" May be it's a good idea to check for pyregr regressions as well as for regular tests failures before merging into master? -- vitja. From stefan_ml at behnel.de Sat Oct 15 12:05:13 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 15 Oct 2011 12:05:13 +0200 Subject: [Cython] Pyregr regressions In-Reply-To: References: Message-ID: <4E995AD9.6040204@behnel.de> Vitja Makarov, 15.10.2011 11:26: > Recent commits to the master introduced pyregr regressions. You can > see it here, just sort by age: > > https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr/BACKEND=c,PYVERSION=py27/33/testReport/ I fixed the ones I had introduced, thanks for noting. > Here is one example: > ====================================================================== > ERROR: runTest (__main__.CythonPyregrTestCase) > compiling (c) and running test_pipes > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "runtests.py", line 679, in run > self.runCompileTest() > File "runtests.py", line 491, in runCompileTest > self.test_directory, self.expect_errors, self.annotate) > File "runtests.py", line 656, in compile > self.assertEquals(None, unexpected_error) > AssertionError: None != u"39:14: Object of type '' has no > attribute 'open'" Not sure where that comes from, looks like a type inference bug. > May be it's a good idea to check for pyregr regressions as well as for > regular tests failures before merging into master? Well, sure. The problem is that it's much easier to see when a test turns from blue to yellow or red, than it is to see that a test turns from yellow to, well, yellow. I agree that it's generally worth looking through the results after a push and especially after a merge. Jenkins quite prominently complains about additional test failures in the build history: https://sage.math.washington.edu:8091/hudson/view/cython-devel/builds Throwing an eye on that page after the build/test jobs have run should help in spotting most regressions. Stefan From vitja.makarov at gmail.com Sat Oct 15 12:24:02 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Sat, 15 Oct 2011 14:24:02 +0400 Subject: [Cython] Pyregr regressions In-Reply-To: <4E995AD9.6040204@behnel.de> References: <4E995AD9.6040204@behnel.de> Message-ID: 2011/10/15 Stefan Behnel : > Vitja Makarov, 15.10.2011 11:26: >> >> Recent commits to the master introduced pyregr regressions. You can >> see it here, just sort by age: >> >> >> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr/BACKEND=c,PYVERSION=py27/33/testReport/ > > I fixed the ones I had introduced, thanks for noting. > > >> Here is one example: >> ====================================================================== >> ERROR: runTest (__main__.CythonPyregrTestCase) >> compiling (c) and running test_pipes >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> ? File "runtests.py", line 679, in run >> ? ? self.runCompileTest() >> ? File "runtests.py", line 491, in runCompileTest >> ? ? self.test_directory, self.expect_errors, self.annotate) >> ? File "runtests.py", line 656, in compile >> ? ? self.assertEquals(None, unexpected_error) >> AssertionError: None != u"39:14: Object of type '' has no >> attribute 'open'" > > Not sure where that comes from, looks like a type inference bug. > GIT bisect could help here: Error compiling Cython file: ------------------------------------------------------------ ... def testSimplePipe3(self): file(TESTFN, 'w').write('hello world #2') t = pipes.Template() t.append(s_command + ' < $IN', pipes.FILEIN_STDOUT) with t.open(TESTFN, 'r') as f: ^ ------------------------------------------------------------ /home/vitja/python/2.7/lib/python2.7/test/test_pipes.py:39:14: Object of type '' has no attribute 'open' 7445f6fcdf760215f0e472d79570a48e74382818 is the first bad commit commit 7445f6fcdf760215f0e472d79570a48e74382818 Author: Stefan Behnel Date: Fri Oct 14 21:25:31 2011 +0200 support for inlining the __enter__() method call in with statements :040000 040000 970f19cc0f9e377ccfcf6f8d154cdb21f4d86556 1e298400e802a9edaba5fee32020132e3d08056f M Cython :040000 040000 795b066ed29ee8cfd0680b7f504cf281ac8d8dbd a742911fe33b085fc484431714910e3fc263eece M tests bisect run success > >> May be it's a good idea to check for pyregr regressions as well as for >> regular tests failures before merging into master? > > Well, sure. The problem is that it's much easier to see when a test turns > from blue to yellow or red, than it is to see that a test turns from yellow > to, well, yellow. > > I agree that it's generally worth looking through the results after a push > and especially after a merge. Jenkins quite prominently complains about > additional test failures in the build history: > > https://sage.math.washington.edu:8091/hudson/view/cython-devel/builds > > Throwing an eye on that page after the build/test jobs have run should help > in spotting most regressions. > Pyregr tests are very helpful to find out that something is going wrong with your changes. -- vitja. From stefan_ml at behnel.de Sun Oct 16 20:46:03 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 16 Oct 2011 20:46:03 +0200 Subject: [Cython] compiler performance issue for extended utility code In-Reply-To: References: <4E8C0448.6010204@behnel.de> <4E8D4EDB.2090009@behnel.de> <4E8EAD2E.8040701@behnel.de> <4E8FF5D6.4070104@behnel.de> Message-ID: <4E9B266B.7020008@behnel.de> mark florisson, 08.10.2011 15:18: > On 8 October 2011 13:10, Vitja Makarov wrote: >> I've also noticed that some utilities are loaded unconditionally >> perhaps it's better to introduce lazy loading. > > Well, they shouldn't be. If they are it's generally a bug. I noticed > that it happens in the test runner though, although it should create a > fresh context with freshly initialized entries. I recently ran only the couple of with-statement related tests through cProfile and it told me that it had spent something like 20 seconds in "builtin method sub()", i.e. doing completely useless string processing, followed by some 3 seconds or so for the rest of the compilation and test execution. That doesn't sound right. Stefan From markflorisson88 at gmail.com Sun Oct 16 20:51:26 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 16 Oct 2011 19:51:26 +0100 Subject: [Cython] compiler performance issue for extended utility code In-Reply-To: <4E9B266B.7020008@behnel.de> References: <4E8C0448.6010204@behnel.de> <4E8D4EDB.2090009@behnel.de> <4E8EAD2E.8040701@behnel.de> <4E8FF5D6.4070104@behnel.de> <4E9B266B.7020008@behnel.de> Message-ID: Tempita uses re.sub to do the parsing. Most utilities are loaded at module-level, so perhaps we should use lazy loading like Vitja suggested. Are the cythonscope utilities loaded? On 16 October 2011 19:46, Stefan Behnel wrote: > mark florisson, 08.10.2011 15:18: >> >> On 8 October 2011 13:10, Vitja Makarov wrote: >>> >>> I've also noticed that some utilities are loaded unconditionally >>> perhaps it's better to introduce lazy loading. >> >> Well, they shouldn't be. If they are it's generally a bug. I noticed >> that it happens in the test runner though, although it should create a >> fresh context with freshly initialized entries. > > I recently ran only the couple of with-statement related tests through > cProfile and it told me that it had spent something like 20 seconds in > "builtin method sub()", i.e. doing completely useless string processing, > followed by some 3 seconds or so for the rest of the compilation and test > execution. That doesn't sound right. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From markflorisson88 at gmail.com Sun Oct 16 20:58:40 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 16 Oct 2011 19:58:40 +0100 Subject: [Cython] compiler performance issue for extended utility code In-Reply-To: References: <4E8C0448.6010204@behnel.de> <4E8D4EDB.2090009@behnel.de> <4E8EAD2E.8040701@behnel.de> <4E8FF5D6.4070104@behnel.de> <4E9B266B.7020008@behnel.de> Message-ID: On 16 October 2011 19:51, mark florisson wrote: > Tempita uses re.sub to do the parsing. Most utilities are loaded at > module-level, so perhaps we should use lazy loading like Vitja > suggested. Are the cythonscope utilities loaded? > > On 16 October 2011 19:46, Stefan Behnel wrote: >> mark florisson, 08.10.2011 15:18: >>> >>> On 8 October 2011 13:10, Vitja Makarov wrote: >>>> >>>> I've also noticed that some utilities are loaded unconditionally >>>> perhaps it's better to introduce lazy loading. >>> >>> Well, they shouldn't be. If they are it's generally a bug. I noticed >>> that it happens in the test runner though, although it should create a >>> fresh context with freshly initialized entries. >> >> I recently ran only the couple of with-statement related tests through >> cProfile and it told me that it had spent something like 20 seconds in >> "builtin method sub()", i.e. doing completely useless string processing, >> followed by some 3 seconds or so for the rest of the compilation and test >> execution. That doesn't sound right. >> >> Stefan >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> > Sorry for the previous accidental top-post. Most of these problems will go away if we get a libcython module and a cython.h header. In the meantime we could do the lazy stuff, it shouldn't be hard to implement. Maybe load it when any of the attributes get accessed and just wrap it. From stefan_ml at behnel.de Tue Oct 18 10:06:14 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 18 Oct 2011 10:06:14 +0200 Subject: [Cython] Cython-ctypes branch Message-ID: <4E9D3376.105@behnel.de> Hi Romain, I know your branch isn't "ready" in the sense that it's useful for the real world, but I'd like to find a way to get it merged, and to find a time frame for that. Otherwise, it will just bit-rot, which is certainly not what anyone wants. How would you judge your availability for this in the near future? I hope you're interested. :) For those who didn't follow the project, the branch lives here: https://github.com/hardshooter/CythonCTypesBackend The first thing that (IMHO, let's see if the others agree) needs to happen is that you should try to rebase it on the latest master branch. There were changes in the meantime that will not make this go clean. For example, the pipeline code was factored out of Main.py into a separate module Pipeline.py, so you will have to migrate your pipeline changes manually. That shouldn't be too hard, though, and it's the only major conflict that I currently anticipate. There's a test runner change in the master branch that will allow you to select the tested backends with a positive list, i.e. as in runtests.py --backends=c,cpp You'd want to add the ctypes backend here. The "--no-cpp" etc. set of switches become very unwieldy as new backends are added. You will also notice that Cython gained a couple of new features and syntax since you started, specifically fused types, an extended array syntax for memoryviews and parallel OpenMP loops. I'm not sure how (or even if) they will translate to the Python backend. I think all of them will need a dedicated implementation in some way, which is very unfortunate. But I don't think that has to bother us for the moment. I recreated the Jenkins build and test jobs for your branch: https://sage.math.washington.edu:8091/hudson/view/dev-romain/ There's currently a unit test failure in the build job that keeps me from trying the subsequent test runs. It looks trivial, though, so if you could push a fix, I can make sure the build and test jobs work as expected. That will give us an idea about the current status of your code. I also noticed that the ctypes_configure script is not Py3 clean, so we can't currently test your code on that platform. 2to3 may be able to do the job, but the package needs fixing upstream. Stefan From markflorisson88 at gmail.com Tue Oct 18 18:50:09 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 18 Oct 2011 17:50:09 +0100 Subject: [Cython] Cython-ctypes branch In-Reply-To: <4E9D3376.105@behnel.de> References: <4E9D3376.105@behnel.de> Message-ID: On 18 October 2011 09:06, Stefan Behnel wrote: > Hi Romain, > > I know your branch isn't "ready" in the sense that it's useful for the real > world, but I'd like to find a way to get it merged, and to find a time frame > for that. Otherwise, it will just bit-rot, which is certainly not what > anyone wants. I think you're more concerned about Cython playing a role in numpypy than in bit-rot :) I certainly agree though, it would be great to have some decent functionality in, if said functionality actually covers a large subset of the Cython language, otherwise users might be tempted to restrict themselves to certain functionality only. > How would you judge your availability for this in the near > future? I hope you're interested. :) > > For those who didn't follow the project, the branch lives here: > > https://github.com/hardshooter/CythonCTypesBackend > > The first thing that (IMHO, let's see if the others agree) needs to happen > is that you should try to rebase it on the latest master branch. There were > changes in the meantime that will not make this go clean. For example, the > pipeline code was factored out of Main.py into a separate module > Pipeline.py, so you will have to migrate your pipeline changes manually. > That shouldn't be too hard, though, and it's the only major conflict that I > currently anticipate. > > There's a test runner change in the master branch that will allow you to > select the tested backends with a positive list, i.e. as in > > ? ?runtests.py --backends=c,cpp > > You'd want to add the ctypes backend here. The "--no-cpp" etc. set of > switches become very unwieldy as new backends are added. > > You will also notice that Cython gained a couple of new features and syntax > since you started, specifically fused types, an extended array syntax for > memoryviews and parallel OpenMP loops. I'm not sure how (or even if) they > will translate to the Python backend. I think all of them will need a > dedicated implementation in some way, which is very unfortunate. But I don't > think that has to bother us for the moment. For OpenMP you might not actually need to do anything at all, it should already be supported in pure mode. Fused types and memoryviews are harder (as is the older buffer support). I'm not even sure if/how pypy's buffer support works. There is also support for pure-mode fused types, but to a very limited extend, i.e. you can do cython.fused_type(my-type-list) to create a fused type, but you don't actually generate any actual specializations unless you compile it with Cython. As for actual fused types support, I think you can replace-and-wrap fused functions at runtime with an instance of a generated FusedFunction class that is indexable and callable and does the necessary instance checks (in case of 'def' or 'cpdef'). You will also know at compile-time whether a certain cdef or cpdef call is valid, and you can basically do the same trick as we do in C: generate multiple specializations and choose which one to call. But seeing that compile time checks will ensure that you can only do an intersection of all possible operations, I think you might only need to do this for the case where you may either get a ctypes object or a normal python object (if you make weird combinations of types to fuse). In any case, I agree that leaving buffers and fused types for now in the ctypes support is probably the best idea. I'm not really familiar with RPython, but would it in any way be feasible to have Cython generate RPython code? That may just make things easier and more efficient to implement. > I recreated the Jenkins build and test jobs for your branch: > > https://sage.math.washington.edu:8091/hudson/view/dev-romain/ > > There's currently a unit test failure in the build job that keeps me from > trying the subsequent test runs. It looks trivial, though, so if you could > push a fix, I can make sure the build and test jobs work as expected. That > will give us an idea about the current status of your code. > > I also noticed that the ctypes_configure script is not Py3 clean, so we > can't currently test your code on that platform. 2to3 may be able to do the > job, but the package needs fixing upstream. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From markflorisson88 at gmail.com Tue Oct 18 18:56:23 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 18 Oct 2011 17:56:23 +0100 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: <4E955CDD.8060203@astro.uio.no> References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> <4E955180.1070601@astro.uio.no> <4E955CDD.8060203@astro.uio.no> Message-ID: On 12 October 2011 10:24, Dag Sverre Seljebotn wrote: > On 10/12/2011 11:08 AM, Robert Bradshaw wrote: >> >> On Wed, Oct 12, 2011 at 1:36 AM, Dag Sverre Seljebotn >>> >>> I wouldn't resist a builtin "channel" type in Cython (since we don't have >>> full templating/generics, it would be the only way of sending typed data >>> conveniently?). >> >> zeromq seems to be a nice level of abstraction--we could probably get >> far with a zeromq "overlay" module that didn't require the GIL. Or is >> the C API easy enough to use if we could provide convenient mechanisms >> to initialize the tasks/threads. I think perhaps the communication >> model could be solved by a library more easily than the treading >> model. > > Ah, zeromq even has an in-process transport, so should work nicely for > multithreading as well. > > The main problem is that I'd like something like > > ctypedef struct Msg: > ? ?int what > ? ?double when > > cdef Msg msg > cdef channel[Msg] mychan = channel[msg](blocking=True, in_process=True) > with cython.parallel: > ? ?... > ? ?if is_master(): > ? ? ? ?mychan.send(what=1, when=2.3) > ? ?else: > ? ? ? ?msg = mychan.recv() > > > Which one can't really do without either builtin support or templating > support. One *could* implement it in C++... > > C-level API just sends char* around, e.g., > > int zmq_msg_init_data (zmq_msg_t *msg, void *data, size_t size, zmq_free_fn > *ffn, void *hint); Actually I think fused types may be able to help here as well. E.g. you could specify 'send' and 'recv' as cdef methods that based on the type they get pack their data in a certain way (if you don't want to/cannot go for the char * + sizeof(MyType) way). This means you have to do have a branch in send and recv for every type you're going to use, though, but it might still be more convenient than writing different functions for every different type to pack your data. > Dag Sverre > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From stefan_ml at behnel.de Tue Oct 18 20:18:46 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 18 Oct 2011 20:18:46 +0200 Subject: [Cython] Cython-ctypes branch In-Reply-To: References: <4E9D3376.105@behnel.de> Message-ID: <4E9DC306.2030202@behnel.de> mark florisson, 18.10.2011 18:50: > On 18 October 2011 09:06, Stefan Behnel wrote: >> I know your branch isn't "ready" in the sense that it's useful for the real >> world, but I'd like to find a way to get it merged, and to find a time frame >> for that. Otherwise, it will just bit-rot, which is certainly not what >> anyone wants. > > I think you're more concerned about Cython playing a role in numpypy > than in bit-rot :) Both, in a way. But I think I'm really more concerned about the code dying from branch divergence. It's a really nice feature that's worth keeping and growing. Obviously, it also means it'll be more work for us to add language features, as we have an additional backend to support. But it's also a good proof that we're prepared to deal with that, and may eventually lead to more backends being added. > I certainly agree though, it would be great to have > some decent functionality in, if said functionality actually covers a > large subset of the Cython language, otherwise users might be tempted > to restrict themselves to certain functionality only. Well, it's a pretty experimental feature. Users should expect it to be limited in functionality and have bugs. If they want to use it, they'll have to accept some drawbacks for the time being. I think that's fine. >> You will also notice that Cython gained a couple of new features and syntax >> since you started, specifically fused types, an extended array syntax for >> memoryviews and parallel OpenMP loops. I'm not sure how (or even if) they >> will translate to the Python backend. I think all of them will need a >> dedicated implementation in some way, which is very unfortunate. But I don't >> think that has to bother us for the moment. > > For OpenMP you might not actually need to do anything at all, it > should already be supported in pure mode. Fused types and memoryviews > are harder (as is the older buffer support). I'm not even sure if/how > pypy's buffer support works. I have no idea. It doesn't even have to have such a feature. It's not required for language compliance, for one. > There is also support for pure-mode fused types, but to a very limited > extend, i.e. you can do cython.fused_type(my-type-list) to create a > fused type, but you don't actually generate any actual specializations > unless you compile it with Cython. > > As for actual fused types support, I think you can replace-and-wrap > fused functions at runtime with an instance of a generated > FusedFunction class that is indexable and callable and does the > necessary instance checks (in case of 'def' or 'cpdef'). You will also > know at compile-time whether a certain cdef or cpdef call is valid, > and you can basically do the same trick as we do in C: generate > multiple specializations and choose which one to call. But seeing that > compile time checks will ensure that you can only do an intersection > of all possible operations, I think you might only need to do this for > the case where you may either get a ctypes object or a normal python > object (if you make weird combinations of types to fuse). Yes, I was expecting problems when using C types. However, just because you pass different ctypes wrapped C values into generic code doesn't mean you have to split the function. It depends on what the function actually does, i.e. if the different types lead to different *Python* code. However, I guess that will quickly be the case as soon as you use typed variables in side of the function that need a ctypes initialisation in the generated code. > In any case, I agree that leaving buffers and fused types for now in > the ctypes support is probably the best idea. Certainly helps in getting this out of the door. > I'm not really familiar with RPython, but would it in any way be > feasible to have Cython generate RPython code? That may just make > things easier and more efficient to implement. I never used RPython either, but my guess is that this would be quite involved. You'd loose the more or less 1:1 mapping from Cython code to Python code and would have to replace some constructs or code patterns by different Python code. Anyway, I think PyPy can optimise Python code just fine. Stefan From romain.py at gmail.com Tue Oct 18 20:43:29 2011 From: romain.py at gmail.com (Romain Guillebert) Date: Tue, 18 Oct 2011 20:43:29 +0200 Subject: [Cython] Cython-ctypes branch In-Reply-To: <4E9DC306.2030202@behnel.de> References: <4E9D3376.105@behnel.de> <4E9DC306.2030202@behnel.de> Message-ID: <20111018184329.GA16314@hardshooter> Hi I'll try to do that this week, I agree that it's better to get this branch merged. Rpython isn't suitable at all for this kind of use case because you have to recompile the entire PyPy executable each time you change a library (long compile time and big memory consumption), loading modules is not trivial, the entire program must be type-inferable (which probably isn't the case of most Cython programs), global variables are considered constants, and I think (don't quote me) that the JIT doesn't work on rpython code. I have no idea on the speedup/slowdown though. Cheers Romain From stefan_ml at behnel.de Tue Oct 18 21:10:18 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 18 Oct 2011 21:10:18 +0200 Subject: [Cython] Cython-ctypes branch In-Reply-To: <20111018184329.GA16314@hardshooter> References: <4E9D3376.105@behnel.de> <4E9DC306.2030202@behnel.de> <20111018184329.GA16314@hardshooter> Message-ID: <4E9DCF1A.1060303@behnel.de> Romain Guillebert, 18.10.2011 20:43: > I'll try to do that this week, I agree that it's better to get this > branch merged. Cool. > Rpython isn't suitable at all for this kind of use case because you have > to recompile the entire PyPy executable each time you change a library > (long compile time and big memory consumption), loading modules is not > trivial, the entire program must be type-inferable (which probably isn't > the case of most Cython programs), global variables are considered > constants, Yes, that's about the kind of hassle that I expected. I heard a couple of PyPy developers report that it's not really fun to write code in RPython. > and I think (don't quote me) that the JIT doesn't work on > rpython code. Don't quote me either, but I think RPython basically *is* the JIT. > I have no idea on the speedup/slowdown though. That's secondary at best. The most important thing is to get it working, so that users can start to test their code against it. If it works out, it'll be a huge feature to be able to actually write code that is fast in both CPython and PyPy *and* that connects to external C code. Making it fast in PyPy is then up to them. Stefan From markflorisson88 at gmail.com Tue Oct 18 23:34:54 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 18 Oct 2011 22:34:54 +0100 Subject: [Cython] SIMD Message-ID: I'm copy/pasting this message to the ML with regard to previous discussion on cython-users and auto-vectorization (apparently my forwarded mail got rejected). Perhaps an approach as listed below would be easier than to generate Fortran (and deal with the pain of linking with it, distutils compatibility, forcing the user to install a fortran compiler etc). ------------ Forwarded Message Below ------------ Hello, With regards to the discussion on the Cython mail listing regarding SSE and vectorizing I have a unfinished project which might be of interest. The project wraps the Orc compiler ( http://code.entropywave.com/projects/orc/ ) which is a simplified assembly language to create cross platform thight loop code utilizing SMID architectures. With some simple test code for sin function approximation i get a speedup of 10x the corresponding numpy functions (Single threaded). By utilizing openmp it is possible to extend this to multiple threads and gain further speedups. The code is currently just a proof of concept and feel free to adopt and extend this code if wanted. Best regards Runar Tenfjord From robertwb at math.washington.edu Wed Oct 19 06:26:33 2011 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 18 Oct 2011 21:26:33 -0700 Subject: [Cython] Cython-ctypes branch In-Reply-To: <4E9DCF1A.1060303@behnel.de> References: <4E9D3376.105@behnel.de> <4E9DC306.2030202@behnel.de> <20111018184329.GA16314@hardshooter> <4E9DCF1A.1060303@behnel.de> Message-ID: On Tue, Oct 18, 2011 at 12:10 PM, Stefan Behnel wrote: > Romain Guillebert, 18.10.2011 20:43: >> >> I'll try to do that this week, I agree that it's better to get this >> branch merged. > > Cool. Thanks! >> Rpython isn't suitable at all for this kind of use case because you have >> to recompile the entire PyPy executable each time you change a library >> (long compile time and big memory consumption), loading modules is not >> trivial, the entire program must be type-inferable (which probably isn't >> the case of most Cython programs), global variables are considered >> constants, > > Yes, that's about the kind of hassle that I expected. I heard a couple of > PyPy developers report that it's not really fun to write code in RPython. > > >> and I think (don't quote me) that the JIT doesn't work on >> rpython code. > > Don't quote me either, but I think RPython basically *is* the JIT. > > >> I have no idea on the speedup/slowdown though. > > That's secondary at best. The most important thing is to get it working, so > that users can start to test their code against it. If it works out, it'll > be a huge feature to be able to actually write code that is fast in both > CPython and PyPy *and* that connects to external C code. Making it fast in > PyPy is then up to them. +1. The primary benefit I see is being able to write modules that interact with external C code that works in both CPython and PyPy which would be a boon to both our communities. The fact that we're both interested in speed is not as big of a deal on this front, and I have no concerns that it will work itself out. - Robert From robertwb at math.washington.edu Wed Oct 19 07:01:43 2011 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 18 Oct 2011 22:01:43 -0700 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> <4E955180.1070601@astro.uio.no> Message-ID: On Fri, Oct 14, 2011 at 1:07 PM, mark florisson wrote: > On 14 October 2011 19:31, Robert Bradshaw wrote: >> On Wed, Oct 12, 2011 at 7:55 AM, mark florisson >> wrote: >>>>> I ultimately feel things like that is more important than 100% coverage of >>>>> the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit. >>>> >>>> +1 Prange handles the (corse-grained) SIMD case nicely, and a >>>> task/futures model based on closures would I think flesh this out to >>>> the next level of generality (and complexity). >>> >>> Futures are definitely nice. I suppose I think really like "inline >>> futures", i.e. openmp tasks. I realize that futures may look more >>> pythonic. However, as mentioned previously, I also see issues with >>> that. When you submit a task then you expect a future object, which >>> you might want to pass around. But we don't have the GIL for that. I >>> personally feel that futures is something that should be done by a >>> library (such as concurrent.futures in python 3.2), and inline tasks >>> by a language. It also means I have to write an entire function or >>> closure for perhaps only a few lines of code. >>> >>> I might also want to submit other functions that are not closures, or >>> I might want to reuse my closures that are used for tasks and for >>> something else. So what if my tasks contain more parallel constructs? >>> e.g. what if I have a task closure that I return from my function that >>> generates more tasks itself? Would you just execute them sequentially >>> outside of the parallel construct, or would you simply disallow that? >>> Also, do you restrict future "objects" to only the parallel section? >>> >>> Another problem is that you can only wait on tasks of your direct >>> children. So what if I get access to my parent's future object >>> (assuming you allow tasks to generate tasks), and then want the result >>> of my parent? >>> Or what if I store these future objects in an array or list and access >>> them arbitrarily? You will only know at runtime which task to wait on, >>> and openmp only has a static, lexical taskwait. >>> >>> I suppose my point is that without either a drastic rewrite (e.g., use >>> pthreads instead of openmp) or quite a bit of contraints, I am unsure >>> how futures would work here. Perhaps you guys have some concrete >>> syntax and semantics proposals? >> >> It feels to me that OpenMP tasks took a different model of parallelism >> and tried to force them into the OpenMP model/constraints, and so it'd >> be even more difficult to fit them into a nice pythonic interface. >> Perhaps to make progress on this front we need to have a concrete >> example to look at. I'm also wondering if the standard threading >> module (perhaps with overlay support) used with nogil functions would >> be sufficient--locking is required for handling the queues, etc. so >> the fact that the GIL is involved is not a big deal. It is possible >> that this won't scale to as small of work units, but the overhead >> should be minimal once your work unit is a sufficient size (which is >> probably quite small) and it's already implemented and well >> documented/used. > > It's all definitely possible with normal threads, but the thing you > lose is convenience and conciseness. For big problems the programmer > might sum up the courage and effort to implement it, but typically you > will just stick to a serial version. This is really where OpenMP is > powerful, you can take a simple sequential piece of code and make it > parallel with minimal effort and without having to restructure, > rethink and rewrite your algorithms. That is a very good point. > Something like concurrent.futures is definitely nice, but most people > cannot afford to mandate python 3.2 for their users. > > The most classical examples I can think of for tasks are > > 1) independent code sections, i.e. two or more pieces of code that > don't depend on each other which you want to execute in parallel > 2) traversal of some kind of custom data structure, like a tree or a linked list > 3) some kind of other producer/consumer model > > e.g. using with task syntax: > > cdef postorder_traverse(tree *t): # bullet 1) and 2) > ? ?with task: > ? ? ? ?traverse(t.left) > ? ?with task: > ? ? ? ?traverse(t.right) > > ? ?taskwait() # wait until we traversed our subtrees > ? ?use(t.data) Is there an implicit parallel block here? Perhaps in the caller? > cdef list_traverse(linkedlist *L): # bullet 2) > ? ?with nogil, parallel(): > ? ? ? ?if threadid() == 0: > ? ? ? ? ? ?while L.next: > ? ? ? ? ? ? ? ?with task: > ? ? ? ? ? ? ? ? ? ?do_something(L.data) > > In the latter case we don't need a taskwait as we don't care about any > particular order. Only one thread generates the tasks where the others > just hit the barrier and see the tasks they can execute. I guess it's the fact that Python doesn't have a nice syntax for anonymous functions or blocks does make this syntax more appealing than an explicit closure. Perhaps if we came up with a more pythonic/natural name which would make the intent clear. Makes me want to do something like pool = ThreadPool(10) for item in L: with pool: process(item) but then you get into issues of passing the pool around. OpenMP has the implicit pool of the nesting parallel block, so "with one thread" or "with cython.parallel.pool" or something like that might be more readable. > The good thing is that the OpenMP runtime can decide at task > generation point (not only at taskwait or barrier points!) decide to > stop generating more tasks and start executing them. So you won't > exhaust memory if you might have lots of tasks. Often threadpools have queues that block when their buffer gets full to achieve the same goal. >> As for critical and barrier, the notion of a critical block as a with >> statement is very useful. Creating/naming locks (rather than being >> implicit on the file/line number) is more powerful, but is a larger >> burden on the user and more difficult to support with the OpenMP >> backend. > > Actually, as I mentioned before, critical sections do not at all > depend on their line or file number. All they depend on their implicit > or explicit name (the name is implicit when you simply omit it, so all > unnamed critical sections exclude each other). Ah, yes. In this case "with cython.parallel.lock([optional name])" could be obvious enough. > Indeed, supporting creation of locks dynamically and allowing them to > be passed around arbitrarily would be hard (and likely not worth the > effort). Naming them is trivial though, which might not be incredibly > pythonic but is very convenient, easy and readable. You can view this as a lookup by name, not a lock creation. Not allowing them to be used outside of a with clause is a reasonable restriction, and does not preclude a (possibly very distant) extension to being able to pass them around. >> barrier, if supported, should be a function call not a >> context. Not as critical as with the tasks case, but a good example to >> see how it flows would be useful here as well. > > I agree, it really doesn't have any associated code and trying to > associate code with it is likely more confusing than meaningful. It > was just an idea. > Often you can rely on implicit barriers from e.g. prange, but not > always. I can't think of any real-world example, but you usually need > it to ensure that everyone gets a sane view on some shared data, e.g. > > with nogil, parallel(): > ? ?array[threadid()] = func(threadid()) > ? ?barrier() > ? ?use array[threadid() + 1 % omp_num_threads()] # access data of > some neighbour > > This is a rather contrived example, but (see below) it would be > especially useful if you use single/master/once/first that sets some > shared data everyone will operate on (for instance in a prange). To > ensure the data is sane before you use it, you have to put the barrier > to 1) ensure the data has been written and 2) that the data has been > flushed. > > Basically, you'll always know when you need a barrier, but it's pretty > hard to come up with a real-world example for it when you have to :) Yes, I think barriers are explanatory enough. >> As for single, I see doing this manually does require boilerplate >> locking, so what about >> >> if cython.parallel.once(): ?# will return True once for a tread group. >> ? ?... >> >> we could implement this via our own locking/checking/flushing to allow >> it to occur in arbitrary expressions, e.g. >> >> special_worker = cython.parallel.once() >> if special_worker: >> ? ... >> [common code] >> if special_worker: ? # single wouldn't work here >> ? ... >> > > That looks OK. I've actually been thinking that if we have barriers we > don't really need is_master(), once() or single() or anything. We > already have threadid() and you usually don't care what thread gets > there first, you only care about doing it once. So one could just > write > > if parallel.threadid() == 0: > ? ?... > > parallel.barrier() # if required Perhaps you want the first free thread to take it up to minimize idle threads. I agree if parallel.threadid() == 0 is a synonym for is_master(), so probably not needed. However, what are the OpenMP semantics of cdef f(): with parallel(): g() g() cdef g(): with single(): ... # executed once, right? with task: ... # executed twice, right? > It might also be convenient to declare variables explicitly shared > here, e.g. this code will not work: > > cdef int *buf > > with nogil, parallel.parallel(): > ? ?if parallel.threadid() == 0: > ? ? ? ?buf = ... > > ? ?parallel.barrier() > > ? ?# will will likely segfault, as buf is private because we assigned > to it. It's only valid in thread 0 > ? ?use buf[...] > > So basically you'd have to do something like (&buf)[0][...], which > frankly looks pretty weird. However I do think such cases are rather > uncommon. True. Perhaps this could be declared via "with nogil, parallel.parallel(), parallel.shared(buf)" or something like that. - Robert From adriangeologo at yahoo.es Wed Oct 19 19:16:02 2011 From: adriangeologo at yahoo.es (=?ISO-8859-1?Q?Adrian_Mart=EDnez_Vargas?=) Date: Wed, 19 Oct 2011 10:16:02 -0700 Subject: [Cython] ImportError: DLL load failed: The specified module could not be found. Message-ID: <4E9F05D2.9050602@yahoo.es> Hi cython list, I am having problems to distribute a python module for windows (written with cython). It compile ok with mingw (installed with but when I import the module in python I get this error ImportError: DLL load failed: The specified module could not be found. I guess that some where there is an error linking the DLL. Bout how to solve it to create a nice distribution (.exe) for my package? Thanks Adrian From markflorisson88 at gmail.com Wed Oct 19 20:19:15 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Wed, 19 Oct 2011 19:19:15 +0100 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> <4E955180.1070601@astro.uio.no> Message-ID: On 19 October 2011 06:01, Robert Bradshaw wrote: > On Fri, Oct 14, 2011 at 1:07 PM, mark florisson > wrote: >> On 14 October 2011 19:31, Robert Bradshaw wrote: >>> On Wed, Oct 12, 2011 at 7:55 AM, mark florisson >>> wrote: >>>>>> I ultimately feel things like that is more important than 100% coverage of >>>>>> the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit. >>>>> >>>>> +1 Prange handles the (corse-grained) SIMD case nicely, and a >>>>> task/futures model based on closures would I think flesh this out to >>>>> the next level of generality (and complexity). >>>> >>>> Futures are definitely nice. I suppose I think really like "inline >>>> futures", i.e. openmp tasks. I realize that futures may look more >>>> pythonic. However, as mentioned previously, I also see issues with >>>> that. When you submit a task then you expect a future object, which >>>> you might want to pass around. But we don't have the GIL for that. I >>>> personally feel that futures is something that should be done by a >>>> library (such as concurrent.futures in python 3.2), and inline tasks >>>> by a language. It also means I have to write an entire function or >>>> closure for perhaps only a few lines of code. >>>> >>>> I might also want to submit other functions that are not closures, or >>>> I might want to reuse my closures that are used for tasks and for >>>> something else. So what if my tasks contain more parallel constructs? >>>> e.g. what if I have a task closure that I return from my function that >>>> generates more tasks itself? Would you just execute them sequentially >>>> outside of the parallel construct, or would you simply disallow that? >>>> Also, do you restrict future "objects" to only the parallel section? >>>> >>>> Another problem is that you can only wait on tasks of your direct >>>> children. So what if I get access to my parent's future object >>>> (assuming you allow tasks to generate tasks), and then want the result >>>> of my parent? >>>> Or what if I store these future objects in an array or list and access >>>> them arbitrarily? You will only know at runtime which task to wait on, >>>> and openmp only has a static, lexical taskwait. >>>> >>>> I suppose my point is that without either a drastic rewrite (e.g., use >>>> pthreads instead of openmp) or quite a bit of contraints, I am unsure >>>> how futures would work here. Perhaps you guys have some concrete >>>> syntax and semantics proposals? >>> >>> It feels to me that OpenMP tasks took a different model of parallelism >>> and tried to force them into the OpenMP model/constraints, and so it'd >>> be even more difficult to fit them into a nice pythonic interface. >>> Perhaps to make progress on this front we need to have a concrete >>> example to look at. I'm also wondering if the standard threading >>> module (perhaps with overlay support) used with nogil functions would >>> be sufficient--locking is required for handling the queues, etc. so >>> the fact that the GIL is involved is not a big deal. It is possible >>> that this won't scale to as small of work units, but the overhead >>> should be minimal once your work unit is a sufficient size (which is >>> probably quite small) and it's already implemented and well >>> documented/used. >> >> It's all definitely possible with normal threads, but the thing you >> lose is convenience and conciseness. For big problems the programmer >> might sum up the courage and effort to implement it, but typically you >> will just stick to a serial version. This is really where OpenMP is >> powerful, you can take a simple sequential piece of code and make it >> parallel with minimal effort and without having to restructure, >> rethink and rewrite your algorithms. > > That is a very good point. > >> Something like concurrent.futures is definitely nice, but most people >> cannot afford to mandate python 3.2 for their users. >> >> The most classical examples I can think of for tasks are >> >> 1) independent code sections, i.e. two or more pieces of code that >> don't depend on each other which you want to execute in parallel >> 2) traversal of some kind of custom data structure, like a tree or a linked list >> 3) some kind of other producer/consumer model >> >> e.g. using with task syntax: >> >> cdef postorder_traverse(tree *t): # bullet 1) and 2) >> ? ?with task: >> ? ? ? ?traverse(t.left) >> ? ?with task: >> ? ? ? ?traverse(t.right) >> >> ? ?taskwait() # wait until we traversed our subtrees >> ? ?use(t.data) > > Is there an implicit parallel block here? Perhaps in the caller? Yes, it was implicit in my example. If you'd use that code, you'd call it from a parallel section. Depending on what semantics you'd define (see below), you'd call it either from one thread in the team, or with all of them. >> cdef list_traverse(linkedlist *L): # bullet 2) >> ? ?with nogil, parallel(): >> ? ? ? ?if threadid() == 0: >> ? ? ? ? ? ?while L.next: >> ? ? ? ? ? ? ? ?with task: >> ? ? ? ? ? ? ? ? ? ?do_something(L.data) >> >> In the latter case we don't need a taskwait as we don't care about any >> particular order. Only one thread generates the tasks where the others >> just hit the barrier and see the tasks they can execute. > > I guess it's the fact that Python doesn't have a nice syntax for > anonymous functions or blocks does make this syntax more appealing > than an explicit closure. > > Perhaps if we came up with a more pythonic/natural name which would > make the intent clear. Makes me want to do something like > > pool = ThreadPool(10) > for item in L: > ? ?with pool: > ? ? ? ?process(item) > > but then you get into issues of passing the pool around. OpenMP has > the implicit pool of the nesting parallel block, so "with one thread" > or "with cython.parallel.pool" or something like that might be more > readable. I think with pool would be good, it must be clear that the task is submitted to a threadpool and hence may be executed asynchronously. >> The good thing is that the OpenMP runtime can decide at task >> generation point (not only at taskwait or barrier points!) decide to >> stop generating more tasks and start executing them. So you won't >> exhaust memory if you might have lots of tasks. > > Often threadpools have queues that block when their buffer gets full > to achieve the same goal. > >>> As for critical and barrier, the notion of a critical block as a with >>> statement is very useful. Creating/naming locks (rather than being >>> implicit on the file/line number) is more powerful, but is a larger >>> burden on the user and more difficult to support with the OpenMP >>> backend. >> >> Actually, as I mentioned before, critical sections do not at all >> depend on their line or file number. All they depend on their implicit >> or explicit name (the name is implicit when you simply omit it, so all >> unnamed critical sections exclude each other). > > Ah, yes. In this case "with cython.parallel.lock([optional name])" > could be obvious enough. > >> Indeed, supporting creation of locks dynamically and allowing them to >> be passed around arbitrarily would be hard (and likely not worth the >> effort). Naming them is trivial though, which might not be incredibly >> pythonic but is very convenient, easy and readable. > > You can view this as a lookup by name, not a lock creation. Not > allowing them to be used outside of a with clause is a reasonable > restriction, and does not preclude a (possibly very distant) extension > to being able to pass them around. > >>> barrier, if supported, should be a function call not a >>> context. Not as critical as with the tasks case, but a good example to >>> see how it flows would be useful here as well. >> >> I agree, it really doesn't have any associated code and trying to >> associate code with it is likely more confusing than meaningful. It >> was just an idea. >> Often you can rely on implicit barriers from e.g. prange, but not >> always. I can't think of any real-world example, but you usually need >> it to ensure that everyone gets a sane view on some shared data, e.g. >> >> with nogil, parallel(): >> ? ?array[threadid()] = func(threadid()) >> ? ?barrier() >> ? ?use array[threadid() + 1 % omp_num_threads()] # access data of >> some neighbour >> >> This is a rather contrived example, but (see below) it would be >> especially useful if you use single/master/once/first that sets some >> shared data everyone will operate on (for instance in a prange). To >> ensure the data is sane before you use it, you have to put the barrier >> to 1) ensure the data has been written and 2) that the data has been >> flushed. >> >> Basically, you'll always know when you need a barrier, but it's pretty >> hard to come up with a real-world example for it when you have to :) > > Yes, I think barriers are explanatory enough. > >>> As for single, I see doing this manually does require boilerplate >>> locking, so what about >>> >>> if cython.parallel.once(): ?# will return True once for a tread group. >>> ? ?... >>> >>> we could implement this via our own locking/checking/flushing to allow >>> it to occur in arbitrary expressions, e.g. >>> >>> special_worker = cython.parallel.once() >>> if special_worker: >>> ? ... >>> [common code] >>> if special_worker: ? # single wouldn't work here >>> ? ... >>> >> >> That looks OK. I've actually been thinking that if we have barriers we >> don't really need is_master(), once() or single() or anything. We >> already have threadid() and you usually don't care what thread gets >> there first, you only care about doing it once. So one could just >> write >> >> if parallel.threadid() == 0: >> ? ?... >> >> parallel.barrier() # if required > > Perhaps you want the first free thread to take it up to minimize idle > threads. I agree if parallel.threadid() == 0 is a synonym for > is_master(), so probably not needed. However, what are the OpenMP > semantics of > > cdef f(): > ? ?with parallel(): > ? ? ? ?g() > ? ? ? ?g() > > cdef g(): > ? ?with single(): > ? ? ? ?... # executed once, right? > ? ?with task: > ? ? ? ?... # executed twice, right? Hmm, not quite. The thing is that function g is called by every thread in the team, say N threads, and for each time the team encounters the single directive, it will execute it once, so in total it will execute the code in the single block twice, as the team encounters it twice. It will however create 2N tasks to execute, as every thread that encounters it creates a task. This is probably not what you want, so you usually want with parallel(): if threadid() == 0: g() and have the code in g (executed by one thread only) create the tasks. Note also how 'for _ in prange(1):' would not have the same semantics here, as it generates a 'parallel for' and not a worksharing for in the function (because we don't support orphaned pranges). I think this may all be confusing for users, I think usually you will want to create just a single task irrespective of whether you are in a parallel or a prange and not "however many threads are in the team for parallel and just one for prange because we're sharing work". This would also work for orphaned tasks, e.g. you expect 2 tasks in your snippet above, not 2N. Fortunately, that would be easy to support. We would however have to introduce the same restriction as with (implicit) barriers: either all or none of the threads must encounter the construct (or maybe loosen it to "if you actually want to create the task, make sure at least thread 0 encounters it", which may lead users to write more efficient code). >> It might also be convenient to declare variables explicitly shared >> here, e.g. this code will not work: >> >> cdef int *buf >> >> with nogil, parallel.parallel(): >> ? ?if parallel.threadid() == 0: >> ? ? ? ?buf = ... >> >> ? ?parallel.barrier() >> >> ? ?# will will likely segfault, as buf is private because we assigned >> to it. It's only valid in thread 0 >> ? ?use buf[...] >> >> So basically you'd have to do something like (&buf)[0][...], which >> frankly looks pretty weird. However I do think such cases are rather >> uncommon. > > True. Perhaps this could be declared via "with nogil, > parallel.parallel(), parallel.shared(buf)" or something like that. That looks elegant enough. > - Robert > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From markflorisson88 at gmail.com Wed Oct 19 21:45:02 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Wed, 19 Oct 2011 20:45:02 +0100 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> <4E955180.1070601@astro.uio.no> Message-ID: On 19 October 2011 19:19, mark florisson wrote: > On 19 October 2011 06:01, Robert Bradshaw wrote: >> On Fri, Oct 14, 2011 at 1:07 PM, mark florisson >> wrote: >>> On 14 October 2011 19:31, Robert Bradshaw wrote: >>>> On Wed, Oct 12, 2011 at 7:55 AM, mark florisson >>>> wrote: >>>>>>> I ultimately feel things like that is more important than 100% coverage of >>>>>>> the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit. >>>>>> >>>>>> +1 Prange handles the (corse-grained) SIMD case nicely, and a >>>>>> task/futures model based on closures would I think flesh this out to >>>>>> the next level of generality (and complexity). >>>>> >>>>> Futures are definitely nice. I suppose I think really like "inline >>>>> futures", i.e. openmp tasks. I realize that futures may look more >>>>> pythonic. However, as mentioned previously, I also see issues with >>>>> that. When you submit a task then you expect a future object, which >>>>> you might want to pass around. But we don't have the GIL for that. I >>>>> personally feel that futures is something that should be done by a >>>>> library (such as concurrent.futures in python 3.2), and inline tasks >>>>> by a language. It also means I have to write an entire function or >>>>> closure for perhaps only a few lines of code. >>>>> >>>>> I might also want to submit other functions that are not closures, or >>>>> I might want to reuse my closures that are used for tasks and for >>>>> something else. So what if my tasks contain more parallel constructs? >>>>> e.g. what if I have a task closure that I return from my function that >>>>> generates more tasks itself? Would you just execute them sequentially >>>>> outside of the parallel construct, or would you simply disallow that? >>>>> Also, do you restrict future "objects" to only the parallel section? >>>>> >>>>> Another problem is that you can only wait on tasks of your direct >>>>> children. So what if I get access to my parent's future object >>>>> (assuming you allow tasks to generate tasks), and then want the result >>>>> of my parent? >>>>> Or what if I store these future objects in an array or list and access >>>>> them arbitrarily? You will only know at runtime which task to wait on, >>>>> and openmp only has a static, lexical taskwait. >>>>> >>>>> I suppose my point is that without either a drastic rewrite (e.g., use >>>>> pthreads instead of openmp) or quite a bit of contraints, I am unsure >>>>> how futures would work here. Perhaps you guys have some concrete >>>>> syntax and semantics proposals? >>>> >>>> It feels to me that OpenMP tasks took a different model of parallelism >>>> and tried to force them into the OpenMP model/constraints, and so it'd >>>> be even more difficult to fit them into a nice pythonic interface. >>>> Perhaps to make progress on this front we need to have a concrete >>>> example to look at. I'm also wondering if the standard threading >>>> module (perhaps with overlay support) used with nogil functions would >>>> be sufficient--locking is required for handling the queues, etc. so >>>> the fact that the GIL is involved is not a big deal. It is possible >>>> that this won't scale to as small of work units, but the overhead >>>> should be minimal once your work unit is a sufficient size (which is >>>> probably quite small) and it's already implemented and well >>>> documented/used. >>> >>> It's all definitely possible with normal threads, but the thing you >>> lose is convenience and conciseness. For big problems the programmer >>> might sum up the courage and effort to implement it, but typically you >>> will just stick to a serial version. This is really where OpenMP is >>> powerful, you can take a simple sequential piece of code and make it >>> parallel with minimal effort and without having to restructure, >>> rethink and rewrite your algorithms. >> >> That is a very good point. >> >>> Something like concurrent.futures is definitely nice, but most people >>> cannot afford to mandate python 3.2 for their users. >>> >>> The most classical examples I can think of for tasks are >>> >>> 1) independent code sections, i.e. two or more pieces of code that >>> don't depend on each other which you want to execute in parallel >>> 2) traversal of some kind of custom data structure, like a tree or a linked list >>> 3) some kind of other producer/consumer model >>> >>> e.g. using with task syntax: >>> >>> cdef postorder_traverse(tree *t): # bullet 1) and 2) >>> ? ?with task: >>> ? ? ? ?traverse(t.left) >>> ? ?with task: >>> ? ? ? ?traverse(t.right) >>> >>> ? ?taskwait() # wait until we traversed our subtrees >>> ? ?use(t.data) >> >> Is there an implicit parallel block here? Perhaps in the caller? > > Yes, it was implicit in my example. If you'd use that code, you'd call > it from a parallel section. Depending on what semantics you'd define > (see below), you'd call it either from one thread in the team, or with > all of them. > >>> cdef list_traverse(linkedlist *L): # bullet 2) >>> ? ?with nogil, parallel(): >>> ? ? ? ?if threadid() == 0: >>> ? ? ? ? ? ?while L.next: >>> ? ? ? ? ? ? ? ?with task: >>> ? ? ? ? ? ? ? ? ? ?do_something(L.data) >>> >>> In the latter case we don't need a taskwait as we don't care about any >>> particular order. Only one thread generates the tasks where the others >>> just hit the barrier and see the tasks they can execute. >> >> I guess it's the fact that Python doesn't have a nice syntax for >> anonymous functions or blocks does make this syntax more appealing >> than an explicit closure. >> >> Perhaps if we came up with a more pythonic/natural name which would >> make the intent clear. Makes me want to do something like >> >> pool = ThreadPool(10) >> for item in L: >> ? ?with pool: >> ? ? ? ?process(item) >> >> but then you get into issues of passing the pool around. OpenMP has >> the implicit pool of the nesting parallel block, so "with one thread" >> or "with cython.parallel.pool" or something like that might be more >> readable. > > I think with pool would be good, it must be clear that the task is > submitted to a threadpool and hence may be executed asynchronously. > >>> The good thing is that the OpenMP runtime can decide at task >>> generation point (not only at taskwait or barrier points!) decide to >>> stop generating more tasks and start executing them. So you won't >>> exhaust memory if you might have lots of tasks. >> >> Often threadpools have queues that block when their buffer gets full >> to achieve the same goal. >> >>>> As for critical and barrier, the notion of a critical block as a with >>>> statement is very useful. Creating/naming locks (rather than being >>>> implicit on the file/line number) is more powerful, but is a larger >>>> burden on the user and more difficult to support with the OpenMP >>>> backend. >>> >>> Actually, as I mentioned before, critical sections do not at all >>> depend on their line or file number. All they depend on their implicit >>> or explicit name (the name is implicit when you simply omit it, so all >>> unnamed critical sections exclude each other). >> >> Ah, yes. In this case "with cython.parallel.lock([optional name])" >> could be obvious enough. >> >>> Indeed, supporting creation of locks dynamically and allowing them to >>> be passed around arbitrarily would be hard (and likely not worth the >>> effort). Naming them is trivial though, which might not be incredibly >>> pythonic but is very convenient, easy and readable. >> >> You can view this as a lookup by name, not a lock creation. Not >> allowing them to be used outside of a with clause is a reasonable >> restriction, and does not preclude a (possibly very distant) extension >> to being able to pass them around. >> >>>> barrier, if supported, should be a function call not a >>>> context. Not as critical as with the tasks case, but a good example to >>>> see how it flows would be useful here as well. >>> >>> I agree, it really doesn't have any associated code and trying to >>> associate code with it is likely more confusing than meaningful. It >>> was just an idea. >>> Often you can rely on implicit barriers from e.g. prange, but not >>> always. I can't think of any real-world example, but you usually need >>> it to ensure that everyone gets a sane view on some shared data, e.g. >>> >>> with nogil, parallel(): >>> ? ?array[threadid()] = func(threadid()) >>> ? ?barrier() >>> ? ?use array[threadid() + 1 % omp_num_threads()] # access data of >>> some neighbour >>> >>> This is a rather contrived example, but (see below) it would be >>> especially useful if you use single/master/once/first that sets some >>> shared data everyone will operate on (for instance in a prange). To >>> ensure the data is sane before you use it, you have to put the barrier >>> to 1) ensure the data has been written and 2) that the data has been >>> flushed. >>> >>> Basically, you'll always know when you need a barrier, but it's pretty >>> hard to come up with a real-world example for it when you have to :) >> >> Yes, I think barriers are explanatory enough. >> >>>> As for single, I see doing this manually does require boilerplate >>>> locking, so what about >>>> >>>> if cython.parallel.once(): ?# will return True once for a tread group. >>>> ? ?... >>>> >>>> we could implement this via our own locking/checking/flushing to allow >>>> it to occur in arbitrary expressions, e.g. >>>> >>>> special_worker = cython.parallel.once() >>>> if special_worker: >>>> ? ... >>>> [common code] >>>> if special_worker: ? # single wouldn't work here >>>> ? ... >>>> >>> >>> That looks OK. I've actually been thinking that if we have barriers we >>> don't really need is_master(), once() or single() or anything. We >>> already have threadid() and you usually don't care what thread gets >>> there first, you only care about doing it once. So one could just >>> write >>> >>> if parallel.threadid() == 0: >>> ? ?... >>> >>> parallel.barrier() # if required >> >> Perhaps you want the first free thread to take it up to minimize idle >> threads. I agree if parallel.threadid() == 0 is a synonym for >> is_master(), so probably not needed. However, what are the OpenMP >> semantics of >> >> cdef f(): >> ? ?with parallel(): >> ? ? ? ?g() >> ? ? ? ?g() >> >> cdef g(): >> ? ?with single(): >> ? ? ? ?... # executed once, right? >> ? ?with task: >> ? ? ? ?... # executed twice, right? > > Hmm, not quite. The thing is that function g is called by every thread > in the team, say N threads, and for each time the team encounters the > single directive, it will execute it once, so in total it will execute > the code in the single block twice, as the team encounters it twice. > > It will however create 2N tasks to execute, as every thread that > encounters it creates a task. This is probably not what you want, so > you usually want > > with parallel(): > ? ?if threadid() == 0: > ? ? ? ?g() > > and have the code in g (executed by one thread only) create the tasks. > > Note also how 'for _ in prange(1):' would not have the same semantics > here, as it generates a 'parallel for' and not a worksharing for in > the function (because we don't support orphaned pranges). > > I think this may all be confusing for users, I think usually you will > want to create just a single task irrespective of whether you are in a > parallel or a prange and not "however many threads are in the team for > parallel and just one for prange because we're sharing work". This > would also work for orphaned tasks, e.g. you expect 2 tasks in your > snippet above, not 2N. Fortunately, that would be easy to support. > We would however have to introduce the same restriction as with > (implicit) barriers: either all or none of the threads must encounter > the construct (or maybe loosen it to "if you actually want to create > the task, make sure at least thread 0 encounters it", which may lead > users to write more efficient code). > >>> It might also be convenient to declare variables explicitly shared >>> here, e.g. this code will not work: >>> >>> cdef int *buf >>> >>> with nogil, parallel.parallel(): >>> ? ?if parallel.threadid() == 0: >>> ? ? ? ?buf = ... >>> >>> ? ?parallel.barrier() >>> >>> ? ?# will will likely segfault, as buf is private because we assigned >>> to it. It's only valid in thread 0 >>> ? ?use buf[...] >>> >>> So basically you'd have to do something like (&buf)[0][...], which >>> frankly looks pretty weird. However I do think such cases are rather >>> uncommon. >> >> True. Perhaps this could be declared via "with nogil, >> parallel.parallel(), parallel.shared(buf)" or something like that. > > That looks elegant enough. Likewise, I think something like parallel.private(buf) would also be really nice for arrays, especially if we also allow arrays with runtime sizes (behind the scenes we could malloc and free). I think those cases are much more common than parallel.shared(). >> - Robert >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> > From aberghage at gmail.com Wed Oct 19 21:45:27 2011 From: aberghage at gmail.com (Alexander T. Berghage) Date: Wed, 19 Oct 2011 15:45:27 -0400 Subject: [Cython] ImportError: DLL load failed: The specified module could not be found. In-Reply-To: <4E9F05D2.9050602@yahoo.es> References: <4E9F05D2.9050602@yahoo.es> Message-ID: Adrian I'm a little unclear on the big picture here. Are you trying to distribute a module (a .pyd / .dll) that you or someone else can import from a .py script, or are you looking to compile a .exe that runs your cython code on execution? ---- Just interpreting the error you're describing (ImportError: DLL load failed: could not be found), the dynamic linker couldn't find a library it needed. Most likely this is either a symptom of missing dependencies or a path problem. Here's my suggestions for diagnosing and fixing the problem: Missing Dependencies: One very simple way to confirm that all the dependencies of your cython module are available is to point the dependency walker utility[1] at it, and look for missing DLLs. Directory Structure: Is the .pyd file you built from your cython module in the PYTHONPATH (or your current working directory? If it's not, there's your issue. [1] http://www.dependencywalker.com/ Hope that helps! Best, -Alex From markflorisson88 at gmail.com Wed Oct 19 21:53:43 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Wed, 19 Oct 2011 20:53:43 +0100 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> <4E955180.1070601@astro.uio.no> Message-ID: On 19 October 2011 06:01, Robert Bradshaw wrote: > On Fri, Oct 14, 2011 at 1:07 PM, mark florisson > wrote: >> On 14 October 2011 19:31, Robert Bradshaw wrote: >>> On Wed, Oct 12, 2011 at 7:55 AM, mark florisson >>> wrote: >>>>>> I ultimately feel things like that is more important than 100% coverage of >>>>>> the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit. >>>>> >>>>> +1 Prange handles the (corse-grained) SIMD case nicely, and a >>>>> task/futures model based on closures would I think flesh this out to >>>>> the next level of generality (and complexity). >>>> >>>> Futures are definitely nice. I suppose I think really like "inline >>>> futures", i.e. openmp tasks. I realize that futures may look more >>>> pythonic. However, as mentioned previously, I also see issues with >>>> that. When you submit a task then you expect a future object, which >>>> you might want to pass around. But we don't have the GIL for that. I >>>> personally feel that futures is something that should be done by a >>>> library (such as concurrent.futures in python 3.2), and inline tasks >>>> by a language. It also means I have to write an entire function or >>>> closure for perhaps only a few lines of code. >>>> >>>> I might also want to submit other functions that are not closures, or >>>> I might want to reuse my closures that are used for tasks and for >>>> something else. So what if my tasks contain more parallel constructs? >>>> e.g. what if I have a task closure that I return from my function that >>>> generates more tasks itself? Would you just execute them sequentially >>>> outside of the parallel construct, or would you simply disallow that? >>>> Also, do you restrict future "objects" to only the parallel section? >>>> >>>> Another problem is that you can only wait on tasks of your direct >>>> children. So what if I get access to my parent's future object >>>> (assuming you allow tasks to generate tasks), and then want the result >>>> of my parent? >>>> Or what if I store these future objects in an array or list and access >>>> them arbitrarily? You will only know at runtime which task to wait on, >>>> and openmp only has a static, lexical taskwait. >>>> >>>> I suppose my point is that without either a drastic rewrite (e.g., use >>>> pthreads instead of openmp) or quite a bit of contraints, I am unsure >>>> how futures would work here. Perhaps you guys have some concrete >>>> syntax and semantics proposals? >>> >>> It feels to me that OpenMP tasks took a different model of parallelism >>> and tried to force them into the OpenMP model/constraints, and so it'd >>> be even more difficult to fit them into a nice pythonic interface. >>> Perhaps to make progress on this front we need to have a concrete >>> example to look at. I'm also wondering if the standard threading >>> module (perhaps with overlay support) used with nogil functions would >>> be sufficient--locking is required for handling the queues, etc. so >>> the fact that the GIL is involved is not a big deal. It is possible >>> that this won't scale to as small of work units, but the overhead >>> should be minimal once your work unit is a sufficient size (which is >>> probably quite small) and it's already implemented and well >>> documented/used. >> >> It's all definitely possible with normal threads, but the thing you >> lose is convenience and conciseness. For big problems the programmer >> might sum up the courage and effort to implement it, but typically you >> will just stick to a serial version. This is really where OpenMP is >> powerful, you can take a simple sequential piece of code and make it >> parallel with minimal effort and without having to restructure, >> rethink and rewrite your algorithms. > > That is a very good point. > >> Something like concurrent.futures is definitely nice, but most people >> cannot afford to mandate python 3.2 for their users. >> >> The most classical examples I can think of for tasks are >> >> 1) independent code sections, i.e. two or more pieces of code that >> don't depend on each other which you want to execute in parallel >> 2) traversal of some kind of custom data structure, like a tree or a linked list >> 3) some kind of other producer/consumer model >> >> e.g. using with task syntax: >> >> cdef postorder_traverse(tree *t): # bullet 1) and 2) >> ? ?with task: >> ? ? ? ?traverse(t.left) >> ? ?with task: >> ? ? ? ?traverse(t.right) >> >> ? ?taskwait() # wait until we traversed our subtrees >> ? ?use(t.data) > > Is there an implicit parallel block here? Perhaps in the caller? > >> cdef list_traverse(linkedlist *L): # bullet 2) >> ? ?with nogil, parallel(): >> ? ? ? ?if threadid() == 0: >> ? ? ? ? ? ?while L.next: >> ? ? ? ? ? ? ? ?with task: >> ? ? ? ? ? ? ? ? ? ?do_something(L.data) >> >> In the latter case we don't need a taskwait as we don't care about any >> particular order. Only one thread generates the tasks where the others >> just hit the barrier and see the tasks they can execute. > > I guess it's the fact that Python doesn't have a nice syntax for > anonymous functions or blocks does make this syntax more appealing > than an explicit closure. > > Perhaps if we came up with a more pythonic/natural name which would > make the intent clear. Makes me want to do something like > > pool = ThreadPool(10) > for item in L: > ? ?with pool: > ? ? ? ?process(item) > > but then you get into issues of passing the pool around. OpenMP has > the implicit pool of the nesting parallel block, so "with one thread" > or "with cython.parallel.pool" or something like that might be more > readable. > >> The good thing is that the OpenMP runtime can decide at task >> generation point (not only at taskwait or barrier points!) decide to >> stop generating more tasks and start executing them. So you won't >> exhaust memory if you might have lots of tasks. > > Often threadpools have queues that block when their buffer gets full > to achieve the same goal. > >>> As for critical and barrier, the notion of a critical block as a with >>> statement is very useful. Creating/naming locks (rather than being >>> implicit on the file/line number) is more powerful, but is a larger >>> burden on the user and more difficult to support with the OpenMP >>> backend. >> >> Actually, as I mentioned before, critical sections do not at all >> depend on their line or file number. All they depend on their implicit >> or explicit name (the name is implicit when you simply omit it, so all >> unnamed critical sections exclude each other). > > Ah, yes. In this case "with cython.parallel.lock([optional name])" > could be obvious enough. We could also support atomic updates. We could either rewrite parallel.lock() blocks to atomics if all statements use inplace operators, but that might actually not be safe as the exclusion might be used for the rhs expressions. So I think you'd want a parallel.atomic() directive or some such. Alternatively, if you support parallel.shared(), you could specify that inplace operators on any such variables would actually be atomic updates, even if you use the operators on the elements of the shared variable. e.g. cdef int array1[N] cdef int array2[N] with parallel(), shared(array1): # atomic update array1[i] += ... # not an atomic update, as it is "implicitly shared" array2[i] += ... I'm not sure if that's more confusing than enlightening though. >> Indeed, supporting creation of locks dynamically and allowing them to >> be passed around arbitrarily would be hard (and likely not worth the >> effort). Naming them is trivial though, which might not be incredibly >> pythonic but is very convenient, easy and readable. > > You can view this as a lookup by name, not a lock creation. Not > allowing them to be used outside of a with clause is a reasonable > restriction, and does not preclude a (possibly very distant) extension > to being able to pass them around. > >>> barrier, if supported, should be a function call not a >>> context. Not as critical as with the tasks case, but a good example to >>> see how it flows would be useful here as well. >> >> I agree, it really doesn't have any associated code and trying to >> associate code with it is likely more confusing than meaningful. It >> was just an idea. >> Often you can rely on implicit barriers from e.g. prange, but not >> always. I can't think of any real-world example, but you usually need >> it to ensure that everyone gets a sane view on some shared data, e.g. >> >> with nogil, parallel(): >> ? ?array[threadid()] = func(threadid()) >> ? ?barrier() >> ? ?use array[threadid() + 1 % omp_num_threads()] # access data of >> some neighbour >> >> This is a rather contrived example, but (see below) it would be >> especially useful if you use single/master/once/first that sets some >> shared data everyone will operate on (for instance in a prange). To >> ensure the data is sane before you use it, you have to put the barrier >> to 1) ensure the data has been written and 2) that the data has been >> flushed. >> >> Basically, you'll always know when you need a barrier, but it's pretty >> hard to come up with a real-world example for it when you have to :) > > Yes, I think barriers are explanatory enough. > >>> As for single, I see doing this manually does require boilerplate >>> locking, so what about >>> >>> if cython.parallel.once(): ?# will return True once for a tread group. >>> ? ?... >>> >>> we could implement this via our own locking/checking/flushing to allow >>> it to occur in arbitrary expressions, e.g. >>> >>> special_worker = cython.parallel.once() >>> if special_worker: >>> ? ... >>> [common code] >>> if special_worker: ? # single wouldn't work here >>> ? ... >>> >> >> That looks OK. I've actually been thinking that if we have barriers we >> don't really need is_master(), once() or single() or anything. We >> already have threadid() and you usually don't care what thread gets >> there first, you only care about doing it once. So one could just >> write >> >> if parallel.threadid() == 0: >> ? ?... >> >> parallel.barrier() # if required > > Perhaps you want the first free thread to take it up to minimize idle > threads. I agree if parallel.threadid() == 0 is a synonym for > is_master(), so probably not needed. However, what are the OpenMP > semantics of > > cdef f(): > ? ?with parallel(): > ? ? ? ?g() > ? ? ? ?g() > > cdef g(): > ? ?with single(): > ? ? ? ?... # executed once, right? > ? ?with task: > ? ? ? ?... # executed twice, right? > >> It might also be convenient to declare variables explicitly shared >> here, e.g. this code will not work: >> >> cdef int *buf >> >> with nogil, parallel.parallel(): >> ? ?if parallel.threadid() == 0: >> ? ? ? ?buf = ... >> >> ? ?parallel.barrier() >> >> ? ?# will will likely segfault, as buf is private because we assigned >> to it. It's only valid in thread 0 >> ? ?use buf[...] >> >> So basically you'd have to do something like (&buf)[0][...], which >> frankly looks pretty weird. However I do think such cases are rather >> uncommon. > > True. Perhaps this could be declared via "with nogil, > parallel.parallel(), parallel.shared(buf)" or something like that. > > - Robert > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From d.s.seljebotn at astro.uio.no Thu Oct 20 10:42:15 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 20 Oct 2011 10:42:15 +0200 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> <4E955180.1070601@astro.uio.no> Message-ID: <4E9FDEE7.2070301@astro.uio.no> Meta: I've been meaning to respond to this thread, but can't find the time. What's the time-frame for implementing this? If it's hypothetical at the moment and just is a question of getting things spec-ed, one could perhaps look at discussing it at the next Cython workshop, or perhaps a Skype call with the three of us as some point... Regarding the tasks: One of my biggest problems with Python is the lack of an elegant syntax for anonymous functions. But since Python has that problem, I feel it is not necesarrily something we should fix (by using the with statements to create tasks). Sometimes Pythonic-ness is more important than elegance (for Cython). In general I'm happy as long as there's a chance of getting things to work in pure Python mode as well (with serial execution). So if, e.g., with statements creating tasks have the same effect when running the same code (serially) in pure Python, I'm less opposed (didn't look at it in detail). Dag Sverre On 10/19/2011 09:53 PM, mark florisson wrote: > On 19 October 2011 06:01, Robert Bradshaw wrote: >> On Fri, Oct 14, 2011 at 1:07 PM, mark florisson >> wrote: >>> On 14 October 2011 19:31, Robert Bradshaw wrote: >>>> On Wed, Oct 12, 2011 at 7:55 AM, mark florisson >>>> wrote: >>>>>>> I ultimately feel things like that is more important than 100% coverage of >>>>>>> the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit. >>>>>> >>>>>> +1 Prange handles the (corse-grained) SIMD case nicely, and a >>>>>> task/futures model based on closures would I think flesh this out to >>>>>> the next level of generality (and complexity). >>>>> >>>>> Futures are definitely nice. I suppose I think really like "inline >>>>> futures", i.e. openmp tasks. I realize that futures may look more >>>>> pythonic. However, as mentioned previously, I also see issues with >>>>> that. When you submit a task then you expect a future object, which >>>>> you might want to pass around. But we don't have the GIL for that. I >>>>> personally feel that futures is something that should be done by a >>>>> library (such as concurrent.futures in python 3.2), and inline tasks >>>>> by a language. It also means I have to write an entire function or >>>>> closure for perhaps only a few lines of code. >>>>> >>>>> I might also want to submit other functions that are not closures, or >>>>> I might want to reuse my closures that are used for tasks and for >>>>> something else. So what if my tasks contain more parallel constructs? >>>>> e.g. what if I have a task closure that I return from my function that >>>>> generates more tasks itself? Would you just execute them sequentially >>>>> outside of the parallel construct, or would you simply disallow that? >>>>> Also, do you restrict future "objects" to only the parallel section? >>>>> >>>>> Another problem is that you can only wait on tasks of your direct >>>>> children. So what if I get access to my parent's future object >>>>> (assuming you allow tasks to generate tasks), and then want the result >>>>> of my parent? >>>>> Or what if I store these future objects in an array or list and access >>>>> them arbitrarily? You will only know at runtime which task to wait on, >>>>> and openmp only has a static, lexical taskwait. >>>>> >>>>> I suppose my point is that without either a drastic rewrite (e.g., use >>>>> pthreads instead of openmp) or quite a bit of contraints, I am unsure >>>>> how futures would work here. Perhaps you guys have some concrete >>>>> syntax and semantics proposals? >>>> >>>> It feels to me that OpenMP tasks took a different model of parallelism >>>> and tried to force them into the OpenMP model/constraints, and so it'd >>>> be even more difficult to fit them into a nice pythonic interface. >>>> Perhaps to make progress on this front we need to have a concrete >>>> example to look at. I'm also wondering if the standard threading >>>> module (perhaps with overlay support) used with nogil functions would >>>> be sufficient--locking is required for handling the queues, etc. so >>>> the fact that the GIL is involved is not a big deal. It is possible >>>> that this won't scale to as small of work units, but the overhead >>>> should be minimal once your work unit is a sufficient size (which is >>>> probably quite small) and it's already implemented and well >>>> documented/used. >>> >>> It's all definitely possible with normal threads, but the thing you >>> lose is convenience and conciseness. For big problems the programmer >>> might sum up the courage and effort to implement it, but typically you >>> will just stick to a serial version. This is really where OpenMP is >>> powerful, you can take a simple sequential piece of code and make it >>> parallel with minimal effort and without having to restructure, >>> rethink and rewrite your algorithms. >> >> That is a very good point. >> >>> Something like concurrent.futures is definitely nice, but most people >>> cannot afford to mandate python 3.2 for their users. >>> >>> The most classical examples I can think of for tasks are >>> >>> 1) independent code sections, i.e. two or more pieces of code that >>> don't depend on each other which you want to execute in parallel >>> 2) traversal of some kind of custom data structure, like a tree or a linked list >>> 3) some kind of other producer/consumer model >>> >>> e.g. using with task syntax: >>> >>> cdef postorder_traverse(tree *t): # bullet 1) and 2) >>> with task: >>> traverse(t.left) >>> with task: >>> traverse(t.right) >>> >>> taskwait() # wait until we traversed our subtrees >>> use(t.data) >> >> Is there an implicit parallel block here? Perhaps in the caller? >> >>> cdef list_traverse(linkedlist *L): # bullet 2) >>> with nogil, parallel(): >>> if threadid() == 0: >>> while L.next: >>> with task: >>> do_something(L.data) >>> >>> In the latter case we don't need a taskwait as we don't care about any >>> particular order. Only one thread generates the tasks where the others >>> just hit the barrier and see the tasks they can execute. >> >> I guess it's the fact that Python doesn't have a nice syntax for >> anonymous functions or blocks does make this syntax more appealing >> than an explicit closure. >> >> Perhaps if we came up with a more pythonic/natural name which would >> make the intent clear. Makes me want to do something like >> >> pool = ThreadPool(10) >> for item in L: >> with pool: >> process(item) >> >> but then you get into issues of passing the pool around. OpenMP has >> the implicit pool of the nesting parallel block, so "with one thread" >> or "with cython.parallel.pool" or something like that might be more >> readable. >> >>> The good thing is that the OpenMP runtime can decide at task >>> generation point (not only at taskwait or barrier points!) decide to >>> stop generating more tasks and start executing them. So you won't >>> exhaust memory if you might have lots of tasks. >> >> Often threadpools have queues that block when their buffer gets full >> to achieve the same goal. >> >>>> As for critical and barrier, the notion of a critical block as a with >>>> statement is very useful. Creating/naming locks (rather than being >>>> implicit on the file/line number) is more powerful, but is a larger >>>> burden on the user and more difficult to support with the OpenMP >>>> backend. >>> >>> Actually, as I mentioned before, critical sections do not at all >>> depend on their line or file number. All they depend on their implicit >>> or explicit name (the name is implicit when you simply omit it, so all >>> unnamed critical sections exclude each other). >> >> Ah, yes. In this case "with cython.parallel.lock([optional name])" >> could be obvious enough. > > We could also support atomic updates. We could either rewrite > parallel.lock() blocks to atomics if all statements use inplace > operators, but that might actually not be safe as the exclusion might > be used for the rhs expressions. So I think you'd want a > parallel.atomic() directive or some such. > Alternatively, if you support parallel.shared(), you could specify > that inplace operators on any such variables would actually be atomic > updates, even if you use the operators on the elements of the shared > variable. e.g. > > cdef int array1[N] > cdef int array2[N] > with parallel(), shared(array1): > # atomic update > array1[i] += ... > > # not an atomic update, as it is "implicitly shared" > array2[i] += ... > > I'm not sure if that's more confusing than enlightening though. > >>> Indeed, supporting creation of locks dynamically and allowing them to >>> be passed around arbitrarily would be hard (and likely not worth the >>> effort). Naming them is trivial though, which might not be incredibly >>> pythonic but is very convenient, easy and readable. >> >> You can view this as a lookup by name, not a lock creation. Not >> allowing them to be used outside of a with clause is a reasonable >> restriction, and does not preclude a (possibly very distant) extension >> to being able to pass them around. >> >>>> barrier, if supported, should be a function call not a >>>> context. Not as critical as with the tasks case, but a good example to >>>> see how it flows would be useful here as well. >>> >>> I agree, it really doesn't have any associated code and trying to >>> associate code with it is likely more confusing than meaningful. It >>> was just an idea. >>> Often you can rely on implicit barriers from e.g. prange, but not >>> always. I can't think of any real-world example, but you usually need >>> it to ensure that everyone gets a sane view on some shared data, e.g. >>> >>> with nogil, parallel(): >>> array[threadid()] = func(threadid()) >>> barrier() >>> use array[threadid() + 1 % omp_num_threads()] # access data of >>> some neighbour >>> >>> This is a rather contrived example, but (see below) it would be >>> especially useful if you use single/master/once/first that sets some >>> shared data everyone will operate on (for instance in a prange). To >>> ensure the data is sane before you use it, you have to put the barrier >>> to 1) ensure the data has been written and 2) that the data has been >>> flushed. >>> >>> Basically, you'll always know when you need a barrier, but it's pretty >>> hard to come up with a real-world example for it when you have to :) >> >> Yes, I think barriers are explanatory enough. >> >>>> As for single, I see doing this manually does require boilerplate >>>> locking, so what about >>>> >>>> if cython.parallel.once(): # will return True once for a tread group. >>>> ... >>>> >>>> we could implement this via our own locking/checking/flushing to allow >>>> it to occur in arbitrary expressions, e.g. >>>> >>>> special_worker = cython.parallel.once() >>>> if special_worker: >>>> ... >>>> [common code] >>>> if special_worker: # single wouldn't work here >>>> ... >>>> >>> >>> That looks OK. I've actually been thinking that if we have barriers we >>> don't really need is_master(), once() or single() or anything. We >>> already have threadid() and you usually don't care what thread gets >>> there first, you only care about doing it once. So one could just >>> write >>> >>> if parallel.threadid() == 0: >>> ... >>> >>> parallel.barrier() # if required >> >> Perhaps you want the first free thread to take it up to minimize idle >> threads. I agree if parallel.threadid() == 0 is a synonym for >> is_master(), so probably not needed. However, what are the OpenMP >> semantics of >> >> cdef f(): >> with parallel(): >> g() >> g() >> >> cdef g(): >> with single(): >> ... # executed once, right? >> with task: >> ... # executed twice, right? >> >>> It might also be convenient to declare variables explicitly shared >>> here, e.g. this code will not work: >>> >>> cdef int *buf >>> >>> with nogil, parallel.parallel(): >>> if parallel.threadid() == 0: >>> buf = ... >>> >>> parallel.barrier() >>> >>> # will will likely segfault, as buf is private because we assigned >>> to it. It's only valid in thread 0 >>> use buf[...] >>> >>> So basically you'd have to do something like (&buf)[0][...], which >>> frankly looks pretty weird. However I do think such cases are rather >>> uncommon. >> >> True. Perhaps this could be declared via "with nogil, >> parallel.parallel(), parallel.shared(buf)" or something like that. >> >> - Robert >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Thu Oct 20 11:13:49 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 20 Oct 2011 10:13:49 +0100 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: <4E9FDEE7.2070301@astro.uio.no> References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> <4E955180.1070601@astro.uio.no> <4E9FDEE7.2070301@astro.uio.no> Message-ID: On 20 October 2011 09:42, Dag Sverre Seljebotn wrote: > Meta: I've been meaning to respond to this thread, but can't find the time. > What's the time-frame for implementing this? If it's hypothetical at the > moment and just is a question of getting things spec-ed, one could perhaps > look at discussing it at the next Cython workshop, or perhaps a Skype call > with the three of us as some point... For me this is just about getting this spec-ed, so that when someone finds the time, we don't need to discuss it for weeks first. And the implementor won't necessarily have to support everything at once, e.g. just critical sections or barriers alone would be nice. Is there any plan for a new workshop then? Because if it's in two years I think we could be more time-efficient :) > Regarding the tasks: One of my biggest problems with Python is the lack of > an elegant syntax for anonymous functions. But since Python has that > problem, I feel it is not necesarrily something we should fix (by using the > with statements to create tasks). Sometimes Pythonic-ness is more important > than elegance (for Cython). I agree it's not something we should fix, I just think tasks are most useful in inline blocks and not in separate functions or closures. Although it could certainly work, I think it restricts more, leads to more verbose code and possibly questionable semantics, and on top of that it would be a pain to implement (although that should not be used as a persuasive argument). I'm not saying there is no elegant way other than with blocks, I'm just saying that I think closures are not the right thing for it. > In general I'm happy as long as there's a chance of getting things to work > in pure Python mode as well (with serial execution). So if, e.g., with > statements creating tasks have the same effect when running the same code > (serially) in pure Python, I'm less opposed (didn't look at it in detail). Yes, it would have the same effect. The thing with tasks (and OpenMP constructs in general) is that usually if your compiler ignores all your pragmas, your code just runs serially in the same way. The same would be true for the tasks in with blocks. > Dag Sverre > > On 10/19/2011 09:53 PM, mark florisson wrote: >> >> On 19 October 2011 06:01, Robert Bradshaw >> ?wrote: >>> >>> On Fri, Oct 14, 2011 at 1:07 PM, mark florisson >>> ?wrote: >>>> >>>> On 14 October 2011 19:31, Robert Bradshaw >>>> ?wrote: >>>>> >>>>> On Wed, Oct 12, 2011 at 7:55 AM, mark florisson >>>>> ?wrote: >>>>>>>> >>>>>>>> I ultimately feel things like that is more important than 100% >>>>>>>> coverage of >>>>>>>> the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit. >>>>>>> >>>>>>> +1 Prange handles the (corse-grained) SIMD case nicely, and a >>>>>>> task/futures model based on closures would I think flesh this out to >>>>>>> the next level of generality (and complexity). >>>>>> >>>>>> Futures are definitely nice. I suppose I think really like "inline >>>>>> futures", i.e. openmp tasks. I realize that futures may look more >>>>>> pythonic. However, as mentioned previously, I also see issues with >>>>>> that. When you submit a task then you expect a future object, which >>>>>> you might want to pass around. But we don't have the GIL for that. I >>>>>> personally feel that futures is something that should be done by a >>>>>> library (such as concurrent.futures in python 3.2), and inline tasks >>>>>> by a language. It also means I have to write an entire function or >>>>>> closure for perhaps only a few lines of code. >>>>>> >>>>>> I might also want to submit other functions that are not closures, or >>>>>> I might want to reuse my closures that are used for tasks and for >>>>>> something else. So what if my tasks contain more parallel constructs? >>>>>> e.g. what if I have a task closure that I return from my function that >>>>>> generates more tasks itself? Would you just execute them sequentially >>>>>> outside of the parallel construct, or would you simply disallow that? >>>>>> Also, do you restrict future "objects" to only the parallel section? >>>>>> >>>>>> Another problem is that you can only wait on tasks of your direct >>>>>> children. So what if I get access to my parent's future object >>>>>> (assuming you allow tasks to generate tasks), and then want the result >>>>>> of my parent? >>>>>> Or what if I store these future objects in an array or list and access >>>>>> them arbitrarily? You will only know at runtime which task to wait on, >>>>>> and openmp only has a static, lexical taskwait. >>>>>> >>>>>> I suppose my point is that without either a drastic rewrite (e.g., use >>>>>> pthreads instead of openmp) or quite a bit of contraints, I am unsure >>>>>> how futures would work here. Perhaps you guys have some concrete >>>>>> syntax and semantics proposals? >>>>> >>>>> It feels to me that OpenMP tasks took a different model of parallelism >>>>> and tried to force them into the OpenMP model/constraints, and so it'd >>>>> be even more difficult to fit them into a nice pythonic interface. >>>>> Perhaps to make progress on this front we need to have a concrete >>>>> example to look at. I'm also wondering if the standard threading >>>>> module (perhaps with overlay support) used with nogil functions would >>>>> be sufficient--locking is required for handling the queues, etc. so >>>>> the fact that the GIL is involved is not a big deal. It is possible >>>>> that this won't scale to as small of work units, but the overhead >>>>> should be minimal once your work unit is a sufficient size (which is >>>>> probably quite small) and it's already implemented and well >>>>> documented/used. >>>> >>>> It's all definitely possible with normal threads, but the thing you >>>> lose is convenience and conciseness. For big problems the programmer >>>> might sum up the courage and effort to implement it, but typically you >>>> will just stick to a serial version. This is really where OpenMP is >>>> powerful, you can take a simple sequential piece of code and make it >>>> parallel with minimal effort and without having to restructure, >>>> rethink and rewrite your algorithms. >>> >>> That is a very good point. >>> >>>> Something like concurrent.futures is definitely nice, but most people >>>> cannot afford to mandate python 3.2 for their users. >>>> >>>> The most classical examples I can think of for tasks are >>>> >>>> 1) independent code sections, i.e. two or more pieces of code that >>>> don't depend on each other which you want to execute in parallel >>>> 2) traversal of some kind of custom data structure, like a tree or a >>>> linked list >>>> 3) some kind of other producer/consumer model >>>> >>>> e.g. using with task syntax: >>>> >>>> cdef postorder_traverse(tree *t): # bullet 1) and 2) >>>> ? ?with task: >>>> ? ? ? ?traverse(t.left) >>>> ? ?with task: >>>> ? ? ? ?traverse(t.right) >>>> >>>> ? ?taskwait() # wait until we traversed our subtrees >>>> ? ?use(t.data) >>> >>> Is there an implicit parallel block here? Perhaps in the caller? >>> >>>> cdef list_traverse(linkedlist *L): # bullet 2) >>>> ? ?with nogil, parallel(): >>>> ? ? ? ?if threadid() == 0: >>>> ? ? ? ? ? ?while L.next: >>>> ? ? ? ? ? ? ? ?with task: >>>> ? ? ? ? ? ? ? ? ? ?do_something(L.data) >>>> >>>> In the latter case we don't need a taskwait as we don't care about any >>>> particular order. Only one thread generates the tasks where the others >>>> just hit the barrier and see the tasks they can execute. >>> >>> I guess it's the fact that Python doesn't have a nice syntax for >>> anonymous functions or blocks does make this syntax more appealing >>> than an explicit closure. >>> >>> Perhaps if we came up with a more pythonic/natural name which would >>> make the intent clear. Makes me want to do something like >>> >>> pool = ThreadPool(10) >>> for item in L: >>> ? ?with pool: >>> ? ? ? ?process(item) >>> >>> but then you get into issues of passing the pool around. OpenMP has >>> the implicit pool of the nesting parallel block, so "with one thread" >>> or "with cython.parallel.pool" or something like that might be more >>> readable. >>> >>>> The good thing is that the OpenMP runtime can decide at task >>>> generation point (not only at taskwait or barrier points!) decide to >>>> stop generating more tasks and start executing them. So you won't >>>> exhaust memory if you might have lots of tasks. >>> >>> Often threadpools have queues that block when their buffer gets full >>> to achieve the same goal. >>> >>>>> As for critical and barrier, the notion of a critical block as a with >>>>> statement is very useful. Creating/naming locks (rather than being >>>>> implicit on the file/line number) is more powerful, but is a larger >>>>> burden on the user and more difficult to support with the OpenMP >>>>> backend. >>>> >>>> Actually, as I mentioned before, critical sections do not at all >>>> depend on their line or file number. All they depend on their implicit >>>> or explicit name (the name is implicit when you simply omit it, so all >>>> unnamed critical sections exclude each other). >>> >>> Ah, yes. In this case "with cython.parallel.lock([optional name])" >>> could be obvious enough. >> >> We could also support atomic updates. We could either rewrite >> parallel.lock() blocks to atomics if all statements use inplace >> operators, but that might actually not be safe as the exclusion might >> be used for the rhs expressions. So I think you'd want a >> parallel.atomic() directive or some such. >> Alternatively, if you support parallel.shared(), you could specify >> that inplace operators on any such variables would actually be atomic >> updates, even if you use the operators on the elements of the shared >> variable. e.g. >> >> cdef int array1[N] >> cdef int array2[N] >> with parallel(), shared(array1): >> ? ? # atomic update >> ? ? array1[i] += ... >> >> ? ? # not an atomic update, as it is "implicitly shared" >> ? ? array2[i] += ... >> >> I'm not sure if that's more confusing than enlightening though. >> >>>> Indeed, supporting creation of locks dynamically and allowing them to >>>> be passed around arbitrarily would be hard (and likely not worth the >>>> effort). Naming them is trivial though, which might not be incredibly >>>> pythonic but is very convenient, easy and readable. >>> >>> You can view this as a lookup by name, not a lock creation. Not >>> allowing them to be used outside of a with clause is a reasonable >>> restriction, and does not preclude a (possibly very distant) extension >>> to being able to pass them around. >>> >>>>> barrier, if supported, should be a function call not a >>>>> context. Not as critical as with the tasks case, but a good example to >>>>> see how it flows would be useful here as well. >>>> >>>> I agree, it really doesn't have any associated code and trying to >>>> associate code with it is likely more confusing than meaningful. It >>>> was just an idea. >>>> Often you can rely on implicit barriers from e.g. prange, but not >>>> always. I can't think of any real-world example, but you usually need >>>> it to ensure that everyone gets a sane view on some shared data, e.g. >>>> >>>> with nogil, parallel(): >>>> ? ?array[threadid()] = func(threadid()) >>>> ? ?barrier() >>>> ? ?use array[threadid() + 1 % omp_num_threads()] # access data of >>>> some neighbour >>>> >>>> This is a rather contrived example, but (see below) it would be >>>> especially useful if you use single/master/once/first that sets some >>>> shared data everyone will operate on (for instance in a prange). To >>>> ensure the data is sane before you use it, you have to put the barrier >>>> to 1) ensure the data has been written and 2) that the data has been >>>> flushed. >>>> >>>> Basically, you'll always know when you need a barrier, but it's pretty >>>> hard to come up with a real-world example for it when you have to :) >>> >>> Yes, I think barriers are explanatory enough. >>> >>>>> As for single, I see doing this manually does require boilerplate >>>>> locking, so what about >>>>> >>>>> if cython.parallel.once(): ?# will return True once for a tread group. >>>>> ? ?... >>>>> >>>>> we could implement this via our own locking/checking/flushing to allow >>>>> it to occur in arbitrary expressions, e.g. >>>>> >>>>> special_worker = cython.parallel.once() >>>>> if special_worker: >>>>> ? ... >>>>> [common code] >>>>> if special_worker: ? # single wouldn't work here >>>>> ? ... >>>>> >>>> >>>> That looks OK. I've actually been thinking that if we have barriers we >>>> don't really need is_master(), once() or single() or anything. We >>>> already have threadid() and you usually don't care what thread gets >>>> there first, you only care about doing it once. So one could just >>>> write >>>> >>>> if parallel.threadid() == 0: >>>> ? ?... >>>> >>>> parallel.barrier() # if required >>> >>> Perhaps you want the first free thread to take it up to minimize idle >>> threads. I agree if parallel.threadid() == 0 is a synonym for >>> is_master(), so probably not needed. However, what are the OpenMP >>> semantics of >>> >>> cdef f(): >>> ? ?with parallel(): >>> ? ? ? ?g() >>> ? ? ? ?g() >>> >>> cdef g(): >>> ? ?with single(): >>> ? ? ? ?... # executed once, right? >>> ? ?with task: >>> ? ? ? ?... # executed twice, right? >>> >>>> It might also be convenient to declare variables explicitly shared >>>> here, e.g. this code will not work: >>>> >>>> cdef int *buf >>>> >>>> with nogil, parallel.parallel(): >>>> ? ?if parallel.threadid() == 0: >>>> ? ? ? ?buf = ... >>>> >>>> ? ?parallel.barrier() >>>> >>>> ? ?# will will likely segfault, as buf is private because we assigned >>>> to it. It's only valid in thread 0 >>>> ? ?use buf[...] >>>> >>>> So basically you'd have to do something like (&buf)[0][...], which >>>> frankly looks pretty weird. However I do think such cases are rather >>>> uncommon. >>> >>> True. Perhaps this could be declared via "with nogil, >>> parallel.parallel(), parallel.shared(buf)" or something like that. >>> >>> - Robert >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >>> >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From d.s.seljebotn at astro.uio.no Thu Oct 20 11:35:50 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 20 Oct 2011 11:35:50 +0200 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> <4E955180.1070601@astro.uio.no> <4E9FDEE7.2070301@astro.uio.no> Message-ID: <4E9FEB76.9040208@astro.uio.no> On 10/20/2011 11:13 AM, mark florisson wrote: > On 20 October 2011 09:42, Dag Sverre Seljebotn > wrote: >> Meta: I've been meaning to respond to this thread, but can't find the time. >> What's the time-frame for implementing this? If it's hypothetical at the >> moment and just is a question of getting things spec-ed, one could perhaps >> look at discussing it at the next Cython workshop, or perhaps a Skype call >> with the three of us as some point... > > For me this is just about getting this spec-ed, so that when someone > finds the time, we don't need to discuss it for weeks first. And the > implementor won't necessarily have to support everything at once, e.g. > just critical sections or barriers alone would be nice. > > Is there any plan for a new workshop then? Because if it's in two > years I think we could be more time-efficient :) At least in William's grant there's plans for 2-3 Cython workshops, so hopefully there's funding for one next year if we want to. We should ask him before planning anything though. >> Regarding the tasks: One of my biggest problems with Python is the lack of >> an elegant syntax for anonymous functions. But since Python has that >> problem, I feel it is not necesarrily something we should fix (by using the >> with statements to create tasks). Sometimes Pythonic-ness is more important >> than elegance (for Cython). > > I agree it's not something we should fix, I just think tasks are most > useful in inline blocks and not in separate functions or closures. > Although it could certainly work, I think it restricts more, leads to > more verbose code and possibly questionable semantics, and on top of > that it would be a pain to implement (although that should not be used > as a persuasive argument). I'm not saying there is no elegant way > other than with blocks, I'm just saying that I think closures are not > the right thing for it. > >> In general I'm happy as long as there's a chance of getting things to work >> in pure Python mode as well (with serial execution). So if, e.g., with >> statements creating tasks have the same effect when running the same code >> (serially) in pure Python, I'm less opposed (didn't look at it in detail). > > Yes, it would have the same effect. The thing with tasks (and OpenMP > constructs in general) is that usually if your compiler ignores all > your pragmas, your code just runs serially in the same way. The same > would be true for the tasks in with blocks. Short note: I like the vision of Konrad Hinsen: http://www.euroscipy.org/talk/2011 The core idea is that the "task-ness" of a block of code is orthogonal to the place you actually write it. That is, a block of code may often either be fit for execution as a task, or not, depending on how heavy it is (= values of arguments it takes in, not its contents). He introduces the "async" expression to drive this point through. I think "with task" is fine if used in this way, if you simply call a function (which itself doesn't know whether it is a task or not). But once you start to implement an entire function within the with-statement there's a code-smell. Anyway, it's growing on me. But I think his "async" expression is more Pythonic in the way that it forces you away from making your code smell. We could simply have async(func)(arg, arg2, somekwarg=4) (He also says "functional-style programming is better for parallization than threads+locks", which I can kind of agree with but nobody tried to make an efficient immutable array implementation suitable for numerical computation yet to my knowledge... that's an interesting MSc or PhD-topic, but I already have one :-) ) (Look at me, going along discussing when I really shouldn't -- see you later.) Dag Sverre From markflorisson88 at gmail.com Thu Oct 20 14:51:19 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 20 Oct 2011 13:51:19 +0100 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: <4E9FEB76.9040208@astro.uio.no> References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> <4E955180.1070601@astro.uio.no> <4E9FDEE7.2070301@astro.uio.no> <4E9FEB76.9040208@astro.uio.no> Message-ID: On 20 October 2011 10:35, Dag Sverre Seljebotn wrote: > On 10/20/2011 11:13 AM, mark florisson wrote: >> >> On 20 October 2011 09:42, Dag Sverre Seljebotn >> ?wrote: >>> >>> Meta: I've been meaning to respond to this thread, but can't find the >>> time. >>> What's the time-frame for implementing this? If it's hypothetical at the >>> moment and just is a question of getting things spec-ed, one could >>> perhaps >>> look at discussing it at the next Cython workshop, or perhaps a Skype >>> call >>> with the three of us as some point... >> >> For me this is just about getting this spec-ed, so that when someone >> finds the time, we don't need to discuss it for weeks first. And the >> implementor won't necessarily have to support everything at once, e.g. >> just critical sections or barriers alone would be nice. >> >> Is there any plan for a new workshop then? Because if it's in two >> years I think we could be more time-efficient :) > > At least in William's grant there's plans for 2-3 Cython workshops, so > hopefully there's funding for one next year if we want to. We should ask him > before planning anything though. > >>> Regarding the tasks: One of my biggest problems with Python is the lack >>> of >>> an elegant syntax for anonymous functions. But since Python has that >>> problem, I feel it is not necesarrily something we should fix (by using >>> the >>> with statements to create tasks). Sometimes Pythonic-ness is more >>> important >>> than elegance (for Cython). >> >> I agree it's not something we should fix, I just think tasks are most >> useful in inline blocks and not in separate functions or closures. >> Although it could certainly work, I think it restricts more, leads to >> more verbose code and possibly questionable semantics, and on top of >> that it would be a pain to implement (although that should not be used >> as a persuasive argument). I'm not saying there is no elegant way >> other than with blocks, I'm just saying that I think closures are not >> the right thing for it. >> >>> In general I'm happy as long as there's a chance of getting things to >>> work >>> in pure Python mode as well (with serial execution). So if, e.g., with >>> statements creating tasks have the same effect when running the same code >>> (serially) in pure Python, I'm less opposed (didn't look at it in >>> detail). >> >> Yes, it would have the same effect. The thing with tasks (and OpenMP >> constructs in general) is that usually if your compiler ignores all >> your pragmas, your code just runs serially in the same way. The same >> would be true for the tasks in with blocks. > > Short note: I like the vision of Konrad Hinsen: > > http://www.euroscipy.org/talk/2011 > > The core idea is that the "task-ness" of a block of code is orthogonal to > the place you actually write it. That is, a block of code may often either > be fit for execution as a task, or not, depending on how heavy it is (= > values of arguments it takes in, not its contents). > > He introduces the "async" expression to drive this point through. > > I think "with task" is fine if used in this way, if you simply call a > function (which itself doesn't know whether it is a task or not). But once > you start to implement an entire function within the with-statement there's > a code-smell. Definitely, do you'd just call the function from the task. > Anyway, it's growing on me. But I think his "async" expression is more > Pythonic in the way that it forces you away from making your code smell. > > We could simply have > > async(func)(arg, arg2, somekwarg=4) > That looks good. The question is, does this constitute an expression or a statement? If it's an expression, then you expect a meaningful return value, which means you're going to have to wait for the task to complete. That would be fine if you submit multiple tasks in one expression, from the slides: max(async expr1, async expr2) or even [async expr for ... in ...] I must say, it does look really elegant and it doesn't leave the user to question when the task is executed (and if you need a taskwait directive to wait for your variables to become defined). What I don't see is how to do the producer consumer trick, unless you regard using the result of async as a taskwait, and not using it as not having a taskwait, e.g. async func(...) # generate a task and don't wait for it result = async func(...) # generate a task and wait for it. The latter is not useful unless you have multiple expressions in one statement, so we should also allow result1, result2 = async func(data=a), async func(data=b). I think you would need special support for the expression form to work in multiple places, e.g. as a start you could allow it as function arguments, tuple expressions and possibly a nogil form of list comprehensions. The statement form is a lot more simple, and as a start synchronization must simply be done through barriers. If you want to change additional data through mechanisms other than immediate result collection you have to pass in pointers to the data. I like the syntax of async(func)(arg, arg2, somekwarg=4), as it would work in pure mode, and you can still have something that looks like a normal function call, but makes it clear it has to be a function call. Then at a later point you could decide to support 'with async():' :). > (He also says "functional-style programming is better for parallization than > threads+locks", which I can kind of agree with but nobody tried to make an > efficient immutable array implementation suitable for numerical computation > yet to my knowledge... that's an interesting MSc or PhD-topic, but I already > have one :-) ) > > (Look at me, going along discussing when I really shouldn't -- see you > later.) > > Dag Sverre > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From adriangeologo at yahoo.es Thu Oct 20 18:25:22 2011 From: adriangeologo at yahoo.es (=?ISO-8859-1?Q?Adrian_Mart=EDnez_Vargas?=) Date: Thu, 20 Oct 2011 09:25:22 -0700 Subject: [Cython] ImportError: DLL load failed: The specified module could not be found. In-Reply-To: References: <4E9F05D2.9050602@yahoo.es> Message-ID: <4EA04B72.2060907@yahoo.es> Dear Alexander and cython list, the same question, different approach a) I compiled and installed my python module with these two commands C:\Temporal\source\Cython_ext> python OK_setup_windows.py build_ext C:\Temporal\source\Cython_ext> python OK_setup_windows.py install b) Testing my module in IPython --------------------------------------------------------------------------------------------------------- In [5]: cd c:\ c:\ In [6]: import okriging_py as ok ------------------------------------------------------------ Traceback (most recent call last): File "", line 1, in ImportError: DLL load failed: The specified module could not be found. In [7]: cd C:\Python27\Lib\site-packages C:\Python27\Lib\site-packages In [8]: import okriging_py as ok In [9]: --------------------------------------------------------------------------------------------------------- as you can see the module works if we call it from the source directory. my question are: a) where is the problem b) how to distribute my module without this (possible system configuration) error my OS is windows 7 (probably with win xp compatibility) sorry about my ignorance (I am more Linux Debian user...) Regards Adrian On 19/10/2011 12:45 PM, Alexander T. Berghage wrote: > Adrian > > I'm a little unclear on the big picture here. Are you trying to > distribute a module (a .pyd / .dll) that you or someone else can > import from a .py script, or are you looking to compile a .exe that > runs your cython code on execution? > > ---- > > Just interpreting the error you're describing (ImportError: DLL load > failed: could not be found), the dynamic linker couldn't find a > library it needed. Most likely this is either a symptom of missing > dependencies or a path problem. Here's my suggestions for diagnosing > and fixing the problem: > > Missing Dependencies: > One very simple way to confirm that all the dependencies of your > cython module are > available is to point the dependency walker utility[1] at it, and > look for missing DLLs. > > Directory Structure: > Is the .pyd file you built from your cython module in the > PYTHONPATH (or your current > working directory? If it's not, there's your issue. > > [1] http://www.dependencywalker.com/ > > > Hope that helps! > > Best, > -Alex > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitja.makarov at gmail.com Fri Oct 21 11:44:35 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Fri, 21 Oct 2011 13:44:35 +0400 Subject: [Cython] What's wrong with py3k pyregr tests? Message-ID: I tried to run pyregr tests on my localhost and it doesn't sigsegv. Perhaps I should try compiled version of Cython. Btw, I've implemented noargs super and now I want to see how does it affect py3k-pyregr test results. -- vitja. From stefan_ml at behnel.de Fri Oct 21 12:01:45 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 21 Oct 2011 12:01:45 +0200 Subject: [Cython] What's wrong with py3k pyregr tests? In-Reply-To: References: Message-ID: <4EA14309.2070900@behnel.de> Vitja Makarov, 21.10.2011 11:44: > I tried to run pyregr tests on my localhost and it doesn't sigsegv. It's a crash bug in the debug builds of the latest py3k branch. > Perhaps I should try compiled version of Cython. Yes, but it's not required to reproduce the crash. > Btw, I've implemented noargs super and now I want to see how does it > affect py3k-pyregr test results. Cool. You can configure your branch jobs to use the optimised py3k builds (-opt) instead of the normal debug builds. Stefan From stefan_ml at behnel.de Fri Oct 21 12:53:57 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 21 Oct 2011 12:53:57 +0200 Subject: [Cython] What's wrong with py3k pyregr tests? In-Reply-To: <4EA14309.2070900@behnel.de> References: <4EA14309.2070900@behnel.de> Message-ID: <4EA14F45.4080707@behnel.de> Stefan Behnel, 21.10.2011 12:01: > Vitja Makarov, 21.10.2011 11:44: >> I tried to run pyregr tests on my localhost and it doesn't sigsegv. > > It's a crash bug in the debug builds of the latest py3k branch. > > >> Perhaps I should try compiled version of Cython. > > Yes, but it's not required to reproduce the crash. Hmm, I may have been mistaken. At least it seems to be a problem with getattr(), which breaks the lookup of builtin names. My guess is that unicode hashing is broken in some way for str subtypes (as we use for names). Stefan From d.s.seljebotn at astro.uio.no Fri Oct 21 19:43:35 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 21 Oct 2011 19:43:35 +0200 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> <4E955180.1070601@astro.uio.no> <4E9FDEE7.2070301@astro.uio.no> <4E9FEB76.9040208@astro.uio.no> Message-ID: <4EA1AF47.2090908@astro.uio.no> On 10/20/2011 02:51 PM, mark florisson wrote: > On 20 October 2011 10:35, Dag Sverre Seljebotn > wrote: >> On 10/20/2011 11:13 AM, mark florisson wrote: >>> >>> On 20 October 2011 09:42, Dag Sverre Seljebotn >>> wrote: >>>> >>>> Meta: I've been meaning to respond to this thread, but can't find the >>>> time. >>>> What's the time-frame for implementing this? If it's hypothetical at the >>>> moment and just is a question of getting things spec-ed, one could >>>> perhaps >>>> look at discussing it at the next Cython workshop, or perhaps a Skype >>>> call >>>> with the three of us as some point... >>> >>> For me this is just about getting this spec-ed, so that when someone >>> finds the time, we don't need to discuss it for weeks first. And the >>> implementor won't necessarily have to support everything at once, e.g. >>> just critical sections or barriers alone would be nice. >>> >>> Is there any plan for a new workshop then? Because if it's in two >>> years I think we could be more time-efficient :) >> >> At least in William's grant there's plans for 2-3 Cython workshops, so >> hopefully there's funding for one next year if we want to. We should ask him >> before planning anything though. >> >>>> Regarding the tasks: One of my biggest problems with Python is the lack >>>> of >>>> an elegant syntax for anonymous functions. But since Python has that >>>> problem, I feel it is not necesarrily something we should fix (by using >>>> the >>>> with statements to create tasks). Sometimes Pythonic-ness is more >>>> important >>>> than elegance (for Cython). >>> >>> I agree it's not something we should fix, I just think tasks are most >>> useful in inline blocks and not in separate functions or closures. >>> Although it could certainly work, I think it restricts more, leads to >>> more verbose code and possibly questionable semantics, and on top of >>> that it would be a pain to implement (although that should not be used >>> as a persuasive argument). I'm not saying there is no elegant way >>> other than with blocks, I'm just saying that I think closures are not >>> the right thing for it. >>> >>>> In general I'm happy as long as there's a chance of getting things to >>>> work >>>> in pure Python mode as well (with serial execution). So if, e.g., with >>>> statements creating tasks have the same effect when running the same code >>>> (serially) in pure Python, I'm less opposed (didn't look at it in >>>> detail). >>> >>> Yes, it would have the same effect. The thing with tasks (and OpenMP >>> constructs in general) is that usually if your compiler ignores all >>> your pragmas, your code just runs serially in the same way. The same >>> would be true for the tasks in with blocks. >> >> Short note: I like the vision of Konrad Hinsen: >> >> http://www.euroscipy.org/talk/2011 >> >> The core idea is that the "task-ness" of a block of code is orthogonal to >> the place you actually write it. That is, a block of code may often either >> be fit for execution as a task, or not, depending on how heavy it is (= >> values of arguments it takes in, not its contents). >> >> He introduces the "async" expression to drive this point through. >> >> I think "with task" is fine if used in this way, if you simply call a >> function (which itself doesn't know whether it is a task or not). But once >> you start to implement an entire function within the with-statement there's >> a code-smell. > > Definitely, do you'd just call the function from the task. > >> Anyway, it's growing on me. But I think his "async" expression is more >> Pythonic in the way that it forces you away from making your code smell. >> >> We could simply have >> >> async(func)(arg, arg2, somekwarg=4) >> > > That looks good. The question is, does this constitute an expression > or a statement? If it's an expression, then you expect a meaningful > return value, which means you're going to have to wait for the task to > complete. That would be fine if you submit multiple tasks in one > expression, from the slides: > > max(async expr1, async expr2) > > or even > > [async expr for ... in ...] > > I must say, it does look really elegant and it doesn't leave the user > to question when the task is executed (and if you need a taskwait > directive to wait for your variables to become defined). What I don't > see is how to do the producer consumer trick, unless you regard using > the result of async as a taskwait, and not using it as not having a > taskwait, e.g. > > async func(...) # generate a task and don't wait for it > result = async func(...) # generate a task and wait for it. > > The latter is not useful unless you have multiple expressions in one > statement, so we should also allow result1, result2 = async > func(data=a), async func(data=b). I think the idea is that you have a transparent, implicit future. You block when you use the result; you are allowed to pass the result back to the caller without blocking, and the caller does not need to know whether it is a future or not. Implemented in Python itself, the protocol would be something like INCREF/DECREF does not block, but all other operations do block. Of course, this is rather hard to implement in present-day Cython. Options: a) Have async(func)(x) return a future, must call result(). b) Make async part of the type spec, such as "cdef async int x". And coerce it to Python using a proxy. Seems messy, and going beyond what current Python semantics allow. But I do like it a bit better than explicit futures everywhere. Dag Sverre From markflorisson88 at gmail.com Fri Oct 21 21:31:58 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 21 Oct 2011 20:31:58 +0100 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: <4EA1AF47.2090908@astro.uio.no> References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> <4E955180.1070601@astro.uio.no> <4E9FDEE7.2070301@astro.uio.no> <4E9FEB76.9040208@astro.uio.no> <4EA1AF47.2090908@astro.uio.no> Message-ID: On 21 October 2011 18:43, Dag Sverre Seljebotn wrote: > On 10/20/2011 02:51 PM, mark florisson wrote: >> >> On 20 October 2011 10:35, Dag Sverre Seljebotn >> ?wrote: >>> >>> On 10/20/2011 11:13 AM, mark florisson wrote: >>>> >>>> On 20 October 2011 09:42, Dag Sverre Seljebotn >>>> ? ?wrote: >>>>> >>>>> Meta: I've been meaning to respond to this thread, but can't find the >>>>> time. >>>>> What's the time-frame for implementing this? If it's hypothetical at >>>>> the >>>>> moment and just is a question of getting things spec-ed, one could >>>>> perhaps >>>>> look at discussing it at the next Cython workshop, or perhaps a Skype >>>>> call >>>>> with the three of us as some point... >>>> >>>> For me this is just about getting this spec-ed, so that when someone >>>> finds the time, we don't need to discuss it for weeks first. And the >>>> implementor won't necessarily have to support everything at once, e.g. >>>> just critical sections or barriers alone would be nice. >>>> >>>> Is there any plan for a new workshop then? Because if it's in two >>>> years I think we could be more time-efficient :) >>> >>> At least in William's grant there's plans for 2-3 Cython workshops, so >>> hopefully there's funding for one next year if we want to. We should ask >>> him >>> before planning anything though. >>> >>>>> Regarding the tasks: One of my biggest problems with Python is the lack >>>>> of >>>>> an elegant syntax for anonymous functions. But since Python has that >>>>> problem, I feel it is not necesarrily something we should fix (by using >>>>> the >>>>> with statements to create tasks). Sometimes Pythonic-ness is more >>>>> important >>>>> than elegance (for Cython). >>>> >>>> I agree it's not something we should fix, I just think tasks are most >>>> useful in inline blocks and not in separate functions or closures. >>>> Although it could certainly work, I think it restricts more, leads to >>>> more verbose code and possibly questionable semantics, and on top of >>>> that it would be a pain to implement (although that should not be used >>>> as a persuasive argument). I'm not saying there is no elegant way >>>> other than with blocks, I'm just saying that I think closures are not >>>> the right thing for it. >>>> >>>>> In general I'm happy as long as there's a chance of getting things to >>>>> work >>>>> in pure Python mode as well (with serial execution). So if, e.g., with >>>>> statements creating tasks have the same effect when running the same >>>>> code >>>>> (serially) in pure Python, I'm less opposed (didn't look at it in >>>>> detail). >>>> >>>> Yes, it would have the same effect. The thing with tasks (and OpenMP >>>> constructs in general) is that usually if your compiler ignores all >>>> your pragmas, your code just runs serially in the same way. The same >>>> would be true for the tasks in with blocks. >>> >>> Short note: I like the vision of Konrad Hinsen: >>> >>> http://www.euroscipy.org/talk/2011 >>> >>> The core idea is that the "task-ness" of a block of code is orthogonal to >>> the place you actually write it. That is, a block of code may often >>> either >>> be fit for execution as a task, or not, depending on how heavy it is (= >>> values of arguments it takes in, not its contents). >>> >>> He introduces the "async" expression to drive this point through. >>> >>> I think "with task" is fine if used in this way, if you simply call a >>> function (which itself doesn't know whether it is a task or not). But >>> once >>> you start to implement an entire function within the with-statement >>> there's >>> a code-smell. >> >> Definitely, do you'd just call the function from the task. >> >>> Anyway, it's growing on me. But I think his "async" expression is more >>> Pythonic in the way that it forces you away from making your code smell. >>> >>> We could simply have >>> >>> async(func)(arg, arg2, somekwarg=4) >>> >> >> That looks good. The question is, does this constitute an expression >> or a statement? If it's an expression, then you expect a meaningful >> return value, which means you're going to have to wait for the task to >> complete. That would be fine if you submit multiple tasks in one >> expression, from the slides: >> >> ? ? max(async expr1, async expr2) >> >> or even >> >> ? ? [async expr for ... in ...] >> >> I must say, it does look really elegant and it doesn't leave the user >> to question when the task is executed (and if you need a taskwait >> directive to wait for your variables to become defined). What I don't >> see is how to do the producer consumer trick, unless you regard using >> the result of async as a taskwait, and not using it as not having a >> taskwait, e.g. >> >> async func(...) # generate a task and don't wait for it >> result = async func(...) # generate a task and wait for it. >> >> The latter is not useful unless you have multiple expressions in one >> statement, so we should also allow result1, result2 = async >> func(data=a), async func(data=b). > > I think the idea is that you have a transparent, implicit future. You block > when you use the result; you are allowed to pass the result back to the > caller without blocking, and the caller does not need to know whether it is > a future or not. > > Implemented in Python itself, the protocol would be something like > INCREF/DECREF does not block, but all other operations do block. > > Of course, this is rather hard to implement in present-day Cython. Options: > > ?a) Have async(func)(x) return a future, must call result(). > > ?b) Make async part of the type spec, such as "cdef async int x". And coerce > it to Python using a proxy. Seems messy, and going beyond what current > Python semantics allow. But I do like it a bit better than explicit futures > everywhere. Interesting. However, what happens when I do cdef async int x x = async(func)(y) x = async(func)(z) print x ? You don't really know what x will be, as you don't know which task will complete first. This case could be solved by having multiple different future result storage locations, but what if I do this in a loop? You could just define that as a race condition though, but I would expect the value from the task last specified. What happens when you return an async value from the task? Do you get "cdef async async int x"? Or what if you pass in an async variable as async argument to a new task? Basically we have to restrict async value usage to "direct parents only". I think it also makes sense to restrict use to the parallel section/orphaned function only. What I don't really like about such a declaration is that it's really only async until you first use it, but you might not know until runtime. So for every use there will be (a slight) overhead. Also, I think it's common to just specify a bunch of tasks and then wait for all of them just once (often but not always at an implicit barrier). I'm afraid that if you want to implement this and use the result of just one task, you will have to wait on all of them, which is somewhat misleading. This is unfortunately all OpenMP provides, if we create another backend that will probably not be true. So there are two alternatives to this: 1) allow async without a result, pass in a pointer and once you want to know results are defined, you wait for all (children) tasks to finish 2) use mechanism 1) + allow multiple async expressions in a single statement, e.g. in tuples, list comprehensions and as function call parameters Although it would be incompatible with pure mode, I do find an async keyword more elegant than the function equivalent. > Dag Sverre > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From adriangeologo at yahoo.es Fri Oct 21 21:51:51 2011 From: adriangeologo at yahoo.es (=?ISO-8859-1?Q?Adrian_Mart=EDnez_Vargas?=) Date: Fri, 21 Oct 2011 12:51:51 -0700 Subject: [Cython] ImportError: DLL load failed: The specified module could not be found. In-Reply-To: References: <4E9F05D2.9050602@yahoo.es> Message-ID: <4EA1CD57.2090701@yahoo.es> I'm 90% sure that the problem is that the pyd file is not registered (it works if I put the module in my working directory). I'm trying to registered in windows 7 with regsvr32 but don't work. I need HELP guys! Regards Adrian On 19/10/2011 12:45 PM, Alexander T. Berghage wrote: > Adrian > > I'm a little unclear on the big picture here. Are you trying to > distribute a module (a .pyd / .dll) that you or someone else can > import from a .py script, or are you looking to compile a .exe that > runs your cython code on execution? > > ---- > > Just interpreting the error you're describing (ImportError: DLL load > failed: could not be found), the dynamic linker couldn't find a > library it needed. Most likely this is either a symptom of missing > dependencies or a path problem. Here's my suggestions for diagnosing > and fixing the problem: > > Missing Dependencies: > One very simple way to confirm that all the dependencies of your > cython module are > available is to point the dependency walker utility[1] at it, and > look for missing DLLs. > > Directory Structure: > Is the .pyd file you built from your cython module in the > PYTHONPATH (or your current > working directory? If it's not, there's your issue. > > [1] http://www.dependencywalker.com/ > > > Hope that helps! > > Best, > -Alex > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From markflorisson88 at gmail.com Fri Oct 21 22:03:07 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 21 Oct 2011 21:03:07 +0100 Subject: [Cython] ImportError: DLL load failed: The specified module could not be found. In-Reply-To: <4EA1CD57.2090701@yahoo.es> References: <4E9F05D2.9050602@yahoo.es> <4EA1CD57.2090701@yahoo.es> Message-ID: Sorry, most of us don't use Windows. In any case, this is something that belongs on the cython-users list, please continue the discussion there. On 21 October 2011 20:51, Adrian Mart?nez Vargas wrote: > I'm 90% sure that the problem is that the pyd file is not registered (it > works if I put the module in my working directory). I'm trying to registered > in windows 7 ?with regsvr32 but don't work. > > I need HELP guys! > > Regards > Adrian > > On 19/10/2011 12:45 PM, Alexander T. Berghage wrote: >> >> Adrian >> >> I'm a little unclear on the big picture here. Are you trying to >> distribute a module (a .pyd / .dll) that you or someone else can >> import from a .py script, or are you looking to compile a .exe that >> runs your cython code on execution? >> >> ---- >> >> Just interpreting the error you're describing (ImportError: DLL load >> failed: could not be found), ?the dynamic linker couldn't find a >> library it needed. Most likely this is either a symptom of missing >> dependencies or a path problem. Here's my suggestions for diagnosing >> and fixing the problem: >> >> Missing Dependencies: >> ? ? One very simple way to confirm that all the dependencies of your >> cython module are >> ? ? available is to point the dependency walker utility[1] at it, and >> look for missing DLLs. >> >> Directory Structure: >> ? ? Is the .pyd file you built from your cython module in the >> PYTHONPATH (or your current >> ? ? working directory? If it's not, there's your issue. >> >> [1] ?http://www.dependencywalker.com/ >> >> >> Hope that helps! >> >> Best, >> -Alex >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From vitja.makarov at gmail.com Fri Oct 21 22:38:07 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Sat, 22 Oct 2011 00:38:07 +0400 Subject: [Cython] What's wrong with py3k pyregr tests? In-Reply-To: <4EA14F45.4080707@behnel.de> References: <4EA14309.2070900@behnel.de> <4EA14F45.4080707@behnel.de> Message-ID: 2011/10/21 Stefan Behnel : > Stefan Behnel, 21.10.2011 12:01: >> >> Vitja Makarov, 21.10.2011 11:44: >>> >>> I tried to run pyregr tests on my localhost and it doesn't sigsegv. >> >> It's a crash bug in the debug builds of the latest py3k branch. >> >> >>> Perhaps I should try compiled version of Cython. >> >> Yes, but it's not required to reproduce the crash. > > Hmm, I may have been mistaken. At least it seems to be a problem with > getattr(), which breaks the lookup of builtin names. My guess is that > unicode hashing is broken in some way for str subtypes (as we use for > names). > I switched to py3k-opt and it worked! Now we got ~13K/265: https://sage.math.washington.edu:8091/hudson/view/dev-vitek/job/cython-vitek-tests-pyregr-py3k-c/ -- vitja. From d.s.seljebotn at astro.uio.no Fri Oct 21 23:27:50 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 21 Oct 2011 23:27:50 +0200 Subject: [Cython] cython.parallel tasks, single, master, critical, barriers In-Reply-To: References: <4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no> <4E955180.1070601@astro.uio.no> <4E9FDEE7.2070301@astro.uio.no> <4E9FEB76.9040208@astro.uio.no> <4EA1AF47.2090908@astro.uio.no> Message-ID: <4EA1E3D6.9040103@astro.uio.no> On 10/21/2011 09:31 PM, mark florisson wrote: > On 21 October 2011 18:43, Dag Sverre Seljebotn > wrote: >> On 10/20/2011 02:51 PM, mark florisson wrote: >>> >>> On 20 October 2011 10:35, Dag Sverre Seljebotn >>> wrote: >>>> >>>> On 10/20/2011 11:13 AM, mark florisson wrote: >>>>> >>>>> On 20 October 2011 09:42, Dag Sverre Seljebotn >>>>> wrote: >>>>>> >>>>>> Meta: I've been meaning to respond to this thread, but can't find the >>>>>> time. >>>>>> What's the time-frame for implementing this? If it's hypothetical at >>>>>> the >>>>>> moment and just is a question of getting things spec-ed, one could >>>>>> perhaps >>>>>> look at discussing it at the next Cython workshop, or perhaps a Skype >>>>>> call >>>>>> with the three of us as some point... >>>>> >>>>> For me this is just about getting this spec-ed, so that when someone >>>>> finds the time, we don't need to discuss it for weeks first. And the >>>>> implementor won't necessarily have to support everything at once, e.g. >>>>> just critical sections or barriers alone would be nice. >>>>> >>>>> Is there any plan for a new workshop then? Because if it's in two >>>>> years I think we could be more time-efficient :) >>>> >>>> At least in William's grant there's plans for 2-3 Cython workshops, so >>>> hopefully there's funding for one next year if we want to. We should ask >>>> him >>>> before planning anything though. >>>> >>>>>> Regarding the tasks: One of my biggest problems with Python is the lack >>>>>> of >>>>>> an elegant syntax for anonymous functions. But since Python has that >>>>>> problem, I feel it is not necesarrily something we should fix (by using >>>>>> the >>>>>> with statements to create tasks). Sometimes Pythonic-ness is more >>>>>> important >>>>>> than elegance (for Cython). >>>>> >>>>> I agree it's not something we should fix, I just think tasks are most >>>>> useful in inline blocks and not in separate functions or closures. >>>>> Although it could certainly work, I think it restricts more, leads to >>>>> more verbose code and possibly questionable semantics, and on top of >>>>> that it would be a pain to implement (although that should not be used >>>>> as a persuasive argument). I'm not saying there is no elegant way >>>>> other than with blocks, I'm just saying that I think closures are not >>>>> the right thing for it. >>>>> >>>>>> In general I'm happy as long as there's a chance of getting things to >>>>>> work >>>>>> in pure Python mode as well (with serial execution). So if, e.g., with >>>>>> statements creating tasks have the same effect when running the same >>>>>> code >>>>>> (serially) in pure Python, I'm less opposed (didn't look at it in >>>>>> detail). >>>>> >>>>> Yes, it would have the same effect. The thing with tasks (and OpenMP >>>>> constructs in general) is that usually if your compiler ignores all >>>>> your pragmas, your code just runs serially in the same way. The same >>>>> would be true for the tasks in with blocks. >>>> >>>> Short note: I like the vision of Konrad Hinsen: >>>> >>>> http://www.euroscipy.org/talk/2011 >>>> >>>> The core idea is that the "task-ness" of a block of code is orthogonal to >>>> the place you actually write it. That is, a block of code may often >>>> either >>>> be fit for execution as a task, or not, depending on how heavy it is (= >>>> values of arguments it takes in, not its contents). >>>> >>>> He introduces the "async" expression to drive this point through. >>>> >>>> I think "with task" is fine if used in this way, if you simply call a >>>> function (which itself doesn't know whether it is a task or not). But >>>> once >>>> you start to implement an entire function within the with-statement >>>> there's >>>> a code-smell. >>> >>> Definitely, do you'd just call the function from the task. >>> >>>> Anyway, it's growing on me. But I think his "async" expression is more >>>> Pythonic in the way that it forces you away from making your code smell. >>>> >>>> We could simply have >>>> >>>> async(func)(arg, arg2, somekwarg=4) >>>> >>> >>> That looks good. The question is, does this constitute an expression >>> or a statement? If it's an expression, then you expect a meaningful >>> return value, which means you're going to have to wait for the task to >>> complete. That would be fine if you submit multiple tasks in one >>> expression, from the slides: >>> >>> max(async expr1, async expr2) >>> >>> or even >>> >>> [async expr for ... in ...] >>> >>> I must say, it does look really elegant and it doesn't leave the user >>> to question when the task is executed (and if you need a taskwait >>> directive to wait for your variables to become defined). What I don't >>> see is how to do the producer consumer trick, unless you regard using >>> the result of async as a taskwait, and not using it as not having a >>> taskwait, e.g. >>> >>> async func(...) # generate a task and don't wait for it >>> result = async func(...) # generate a task and wait for it. >>> >>> The latter is not useful unless you have multiple expressions in one >>> statement, so we should also allow result1, result2 = async >>> func(data=a), async func(data=b). >> >> I think the idea is that you have a transparent, implicit future. You block >> when you use the result; you are allowed to pass the result back to the >> caller without blocking, and the caller does not need to know whether it is >> a future or not. >> >> Implemented in Python itself, the protocol would be something like >> INCREF/DECREF does not block, but all other operations do block. >> >> Of course, this is rather hard to implement in present-day Cython. Options: >> >> a) Have async(func)(x) return a future, must call result(). >> >> b) Make async part of the type spec, such as "cdef async int x". And coerce >> it to Python using a proxy. Seems messy, and going beyond what current >> Python semantics allow. But I do like it a bit better than explicit futures >> everywhere. > > Interesting. However, what happens when I do > > cdef async int x > > x = async(func)(y) > x = async(func)(z) > > print x > > ? You don't really know what x will be, as you don't know which task > will complete first. This case could be solved by having multiple > different future result storage locations, but what if I do this in a > loop? > You could just define that as a race condition though, but I would > expect the value from the task last specified. The only intuitive thing to me is that the first x is discarded and you block for the second. Yes, that means heap-allocation and reference counting (the async function holds a reference, which would be the only reference in the case above, so that when the first call returns the target heap-allocated int gets deallocated). Really a better model for changing CPython than Cython.. > What happens when you return an async value from the task? Do you get > "cdef async async int x"? Or what if you pass in an async variable as > async argument to a new task? Basically we have to restrict async > value usage to "direct parents only". I think it also makes sense to > restrict use to the parallel section/orphaned function only. No, I imagined these to be heap-allocated things, so you just pass around these heap-allocated wrappers containing i) something you wait on (pthread semaphore?), ii) refcount, iii) value storage. (After all, the inspiration is Konrad's slides on "Python 4" (as he'd wish it)). Yes, there's some performance penalty for every read, but there's penalty with any task really. Also control flow analysis will likely take you as far as one wait per function. Though I'm still not convinced that channels in the way Go uses them aren't "better". Dag Sverre From stefan_ml at behnel.de Sat Oct 22 06:58:39 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 22 Oct 2011 06:58:39 +0200 Subject: [Cython] What's wrong with py3k pyregr tests? In-Reply-To: <4EA14F45.4080707@behnel.de> References: <4EA14309.2070900@behnel.de> <4EA14F45.4080707@behnel.de> Message-ID: <4EA24D7F.80400@behnel.de> Stefan Behnel, 21.10.2011 12:53: > Stefan Behnel, 21.10.2011 12:01: >> Vitja Makarov, 21.10.2011 11:44: >>> I tried to run pyregr tests on my localhost and it doesn't sigsegv. >> >> It's a crash bug in the debug builds of the latest py3k branch. >> >>> Perhaps I should try compiled version of Cython. >> >> Yes, but it's not required to reproduce the crash. > > Hmm, I may have been mistaken. At least it seems to be a problem with > getattr(), which breaks the lookup of builtin names. My guess is that > unicode hashing is broken in some way for str subtypes (as we use for names). I've sent a fix to python-dev, let's see when they get it in. http://thread.gmane.org/gmane.comp.python.devel/127321 Stefan From aberghage at gmail.com Sun Oct 23 03:11:04 2011 From: aberghage at gmail.com (Alex Berghage) Date: Sat, 22 Oct 2011 21:11:04 -0400 Subject: [Cython] ImportError: DLL load failed: The specified module could not be found. In-Reply-To: <4EA1CD57.2090701@yahoo.es> References: <4E9F05D2.9050602@yahoo.es> <4EA1CD57.2090701@yahoo.es> Message-ID: Adrian, If the import works when the module is in your working directory, the problem is probably your path. Have you tried adding the folder containing the module to your PYTHONPATH environment variable? Sent from my iPhone (please pardon brevity) On Oct 21, 2011, at 3:51 PM, Adrian Mart?nez Vargas wrote: > I'm 90% sure that the problem is that the pyd file is not registered (it works if I put the module in my working directory). I'm trying to registered in windows 7 with regsvr32 but don't work. > > I need HELP guys! > > Regards > Adrian > > On 19/10/2011 12:45 PM, Alexander T. Berghage wrote: >> Adrian >> >> I'm a little unclear on the big picture here. Are you trying to >> distribute a module (a .pyd / .dll) that you or someone else can >> import from a .py script, or are you looking to compile a .exe that >> runs your cython code on execution? >> >> ---- >> >> Just interpreting the error you're describing (ImportError: DLL load >> failed: could not be found), the dynamic linker couldn't find a >> library it needed. Most likely this is either a symptom of missing >> dependencies or a path problem. Here's my suggestions for diagnosing >> and fixing the problem: >> >> Missing Dependencies: >> One very simple way to confirm that all the dependencies of your >> cython module are >> available is to point the dependency walker utility[1] at it, and >> look for missing DLLs. >> >> Directory Structure: >> Is the .pyd file you built from your cython module in the >> PYTHONPATH (or your current >> working directory? If it's not, there's your issue. >> >> [1] http://www.dependencywalker.com/ >> >> >> Hope that helps! >> >> Best, >> -Alex >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From vitja.makarov at gmail.com Sun Oct 23 08:39:24 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Sun, 23 Oct 2011 10:39:24 +0400 Subject: [Cython] Compiler crash at parsing stage Message-ID: Hi! This simple code crashes compiler: lambda i=1: i """ File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py", line 122, in p_test return p_lambdef(s) File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py", line 102, in p_lambdef s, terminator=':', annotated=False) File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py", line 2741, in p_varargslist annotated = annotated) File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py", line 2388, in p_c_arg_list annotated = annotated)) File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py", line 2435, in p_c_arg_decl print s.level AttributeError: 'PyrexScanner' object has no attribute 'level' """ I'm not sure what's the best way to fix this. -- vitja. From stefan_ml at behnel.de Sun Oct 23 10:15:52 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 23 Oct 2011 10:15:52 +0200 Subject: [Cython] Compiler crash at parsing stage In-Reply-To: References: Message-ID: <4EA3CD38.90709@behnel.de> Vitja Makarov, 23.10.2011 08:39: > This simple code crashes compiler: > > lambda i=1: i > > """ > File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py", > line 122, in p_test > return p_lambdef(s) > File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py", > line 102, in p_lambdef > s, terminator=':', annotated=False) > File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py", > line 2741, in p_varargslist > annotated = annotated) > File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py", > line 2388, in p_c_arg_list > annotated = annotated)) > File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py", > line 2435, in p_c_arg_decl > print s.level > AttributeError: 'PyrexScanner' object has no attribute 'level' > """ > > I'm not sure what's the best way to fix this. I don't see a "print" statement anywhere, but it seems that the "level" attribute is really missing from the compiled scanner. This should do the trick: diff -r 886697a10602 Cython/Compiler/Scanning.pxd --- a/Cython/Compiler/Scanning.pxd Sat Oct 22 19:43:45 2011 +0100 +++ b/Cython/Compiler/Scanning.pxd Sun Oct 23 10:11:10 2011 +0200 @@ -28,6 +28,7 @@ cdef public int bracket_nesting_level cdef public sy cdef public systring + cdef public level cdef long current_level(self) #cpdef commentline(self, text) I didn't commit it, just go ahead and do so if it works for you. Stefan From stefan_ml at behnel.de Sun Oct 23 10:19:00 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 23 Oct 2011 10:19:00 +0200 Subject: [Cython] What's wrong with py3k pyregr tests? In-Reply-To: References: <4EA14309.2070900@behnel.de> <4EA14F45.4080707@behnel.de> Message-ID: <4EA3CDF4.2000303@behnel.de> Vitja Makarov, 21.10.2011 22:38: > 2011/10/21 Stefan Behnel: >> Stefan Behnel, 21.10.2011 12:01: >>> >>> Vitja Makarov, 21.10.2011 11:44: >>>> >>>> I tried to run pyregr tests on my localhost and it doesn't sigsegv. >>> >>> It's a crash bug in the debug builds of the latest py3k branch. >>> >>> >>>> Perhaps I should try compiled version of Cython. >>> >>> Yes, but it's not required to reproduce the crash. >> >> Hmm, I may have been mistaken. At least it seems to be a problem with >> getattr(), which breaks the lookup of builtin names. My guess is that >> unicode hashing is broken in some way for str subtypes (as we use for >> names). > > I switched to py3k-opt and it worked! Now we got ~13K/265: > > https://sage.math.washington.edu:8091/hudson/view/dev-vitek/job/cython-vitek-tests-pyregr-py3k-c/ Very cool. Note that the bug in CPython has finally been fixed, so the debug builds should be back to normal again. I re-enabled the tests for the master branch. Stefan From vitja.makarov at gmail.com Sun Oct 23 11:05:09 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Sun, 23 Oct 2011 13:05:09 +0400 Subject: [Cython] Compiler crash at parsing stage In-Reply-To: <4EA3CD38.90709@behnel.de> References: <4EA3CD38.90709@behnel.de> Message-ID: 2011/10/23 Stefan Behnel : > Vitja Makarov, 23.10.2011 08:39: >> >> This simple code crashes compiler: >> >> lambda i=1: i >> >> """ >> ? File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py", >> line 122, in p_test >> ? ? return p_lambdef(s) >> ? File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py", >> line 102, in p_lambdef >> ? ? s, terminator=':', annotated=False) >> ? File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py", >> line 2741, in p_varargslist >> ? ? annotated = annotated) >> ? File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py", >> line 2388, in p_c_arg_list >> ? ? annotated = annotated)) >> ? File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py", >> line 2435, in p_c_arg_decl >> ? ? print s.level >> AttributeError: 'PyrexScanner' object has no attribute 'level' >> """ >> >> I'm not sure what's the best way to fix this. > > I don't see a "print" statement anywhere, but it seems that the "level" > attribute is really missing from the compiled scanner. > Yes, I've added print for debug purpose actually there is: if 'pxd' in s.level: > This should do the trick: > > diff -r 886697a10602 Cython/Compiler/Scanning.pxd > --- a/Cython/Compiler/Scanning.pxd ? ? ?Sat Oct 22 19:43:45 2011 +0100 > +++ b/Cython/Compiler/Scanning.pxd ? ? ?Sun Oct 23 10:11:10 2011 +0200 > @@ -28,6 +28,7 @@ > ? ? cdef public int bracket_nesting_level > ? ? cdef public sy > ? ? cdef public systring > + ? ?cdef public level > > ? ? cdef long current_level(self) > ? ? #cpdef commentline(self, text) > > I didn't commit it, just go ahead and do so if it works for you. > Hmm, that will help for compiled cython. I'm running uncompiled. It seems that when lambda is spotted level is not set yet. Btw it works fine if def node precedes lamda: def foo(): pass lambda i=1: i -- vitja. From wesmckinn at gmail.com Mon Oct 24 21:26:03 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Mon, 24 Oct 2011 15:26:03 -0400 Subject: [Cython] Buffer interface to boolean arrays with cast=True on Python 2.5 failing Message-ID: I've been using ndarray[uint8_t, cast=True] bool_arr to work with dtype=bool arrays in Cython lately. When testing using Python 2.5 / NumPy 1.6.1 on Windows, I'm getting "unknown dtype code in numpy.pxd (0)". Everything works fine with Python 2.6/2.7 and NumPy 1.6.1. This is with Cython 0.15.1. Any advice or do I have to (very unhappily) work around this? thanks, Wes From d.s.seljebotn at astro.uio.no Mon Oct 24 21:37:22 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 24 Oct 2011 21:37:22 +0200 Subject: [Cython] Buffer interface to boolean arrays with cast=True on Python 2.5 failing In-Reply-To: References: Message-ID: <4EA5BE72.1090104@astro.uio.no> On 10/24/2011 09:26 PM, Wes McKinney wrote: > I've been using > > ndarray[uint8_t, cast=True] bool_arr > > to work with dtype=bool arrays in Cython lately. When testing using > Python 2.5 / NumPy 1.6.1 on Windows, I'm getting "unknown dtype code > in numpy.pxd (0)". Everything works fine with Python 2.6/2.7 and NumPy > 1.6.1. This is with Cython 0.15.1. > > Any advice or do I have to (very unhappily) work around this? Is this a recent bug in Cython? Try to bisect the the Cython release (and if it turns out to be Cython, possible commit). Dag Sverre From wesmckinn at gmail.com Mon Oct 24 21:40:55 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Mon, 24 Oct 2011 15:40:55 -0400 Subject: [Cython] Buffer interface to boolean arrays with cast=True on Python 2.5 failing In-Reply-To: <4EA5BE72.1090104@astro.uio.no> References: <4EA5BE72.1090104@astro.uio.no> Message-ID: On Mon, Oct 24, 2011 at 3:37 PM, Dag Sverre Seljebotn wrote: > On 10/24/2011 09:26 PM, Wes McKinney wrote: >> >> I've been using >> >> ndarray[uint8_t, cast=True] bool_arr >> >> to work with dtype=bool arrays in Cython lately. When testing using >> Python 2.5 / NumPy 1.6.1 on Windows, I'm getting "unknown dtype code >> in numpy.pxd (0)". Everything works fine with Python 2.6/2.7 and NumPy >> 1.6.1. This is with Cython 0.15.1. >> >> Any advice or do I have to (very unhappily) work around this? > > Is this a recent bug in Cython? Try to bisect the the Cython release (and if > it turns out to be Cython, possible commit). > > Dag Sverre > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > I'll check the HEAD revision and bisect if I can, don't have a lot of time-- it's just strange that it's Python 2.5 only. From markflorisson88 at gmail.com Mon Oct 24 21:50:05 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 24 Oct 2011 20:50:05 +0100 Subject: [Cython] Acquisition counted cdef classes Message-ID: Hey, This is in response to http://groups.google.com/group/cython-users/browse_thread/thread/bcbc5fe0e329224f and http://trac.cython.org/cython_trac/ticket/498 , and some of the previous discussion on cython.parallel. Basically I think we should have something more powerful than 'cdef borrowed CdefClass obj', something that also doesn't rely on new syntax. What if we support acquisition counting for every instance of a cdef class? In Python and Cython GIL mode you use reference counting, and in Cython nogil mode and for structs attributes, array dtypes etc you use acquisition counting. This allows you to pass around cdef objects without the GIL and use their nogil methods. If the acquisition count is greater than 1, the acquisition count owns a reference to the object. If it reaches 0 you discard your owned reference (you can simply acquire the GIL if you don't have it) and when you increment from zero you obtain it. Perhaps something like libatomic could be used to efficiently implement this. The advantages are: 1) allow users to pass around cdef typed objects in nogil mode 2) allow cdef typed objects in as struct attributes or array elements 3) make it easy to implement things like memoryviews (already done but would have been a lot easier), cython.parallel.async/future objects, cython.parallel.mutex objects and possibly other things in the future We should then allow a syntax like with mycdefobject: ... to lock the object in GIL or nogil mode (like java's 'synchronized'). For objects that already have __enter__ and __exit__ you could support something like 'with cython.synchronized(mycdefobject): ...' instead. Or perhaps you should always require cython.synchronized (or cython.parallel.synchronized). In addition to nogil methods a user may provide special cdef nogil methods, i.e. cdef int __len__(self) nogil: ... which would provide a Cython as well as a Python implementation for the function (with automatic cpdef behaviour), so you could use it in both contexts. There are two options for assignment semantics to a struct attribute or array element: - decref the old value (this implies always initializing the pointers to NULL first) - don't decref the old value (the user has to manually use 'del') I think 1) is more definitely consistent with how everything else works. All of this functionality should also get a sane C API (to be provided by cython.h). You'd get a Cy_INCREF(obj, have_gil)/Cy_DECREF() etc. Every class using this functionality is a subclass of CythonObject (that contains a PyObject + an acquisition count + a lock). Perhaps if the user is subclassing something other than object we could allow the user to specify custom __cython_(un)lock__ and __cython_acquisition_count__ methods and fields. Now, building on top of this functionality, Cython could provide built-in nogil-compatible types, like lists, dicts and maybe tuples (as a start). These will by default not lock for operations to allow e.g. one thread to iterate over the list and another thread to index it without lock contention and other general overhead. If one thread is somehow changing the size of the list, or writing to indices that another thread is reading from/writing to, the results will of course be undefined unless the user synchronizes on the object. So it would be the user's responsibility. The acquisition counting itself will always be thread-safe (i.e., it will be atomic if possible, otherwise it will lock). It's probably best to not enable this functionality by default as it would be more expensive to instantiate objects, but it could be supported through a cdef class decorator and a general directive. Of course one may still use non-cdef borrowed objects, by simply casting to a PyObject *. Thoughts? Mark From wesmckinn at gmail.com Mon Oct 24 21:51:21 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Mon, 24 Oct 2011 14:51:21 -0500 Subject: [Cython] Buffer interface to boolean arrays with cast=True on Python 2.5 failing In-Reply-To: References: <4EA5BE72.1090104@astro.uio.no> Message-ID: On Mon, Oct 24, 2011 at 2:40 PM, Wes McKinney wrote: > On Mon, Oct 24, 2011 at 3:37 PM, Dag Sverre Seljebotn > wrote: >> On 10/24/2011 09:26 PM, Wes McKinney wrote: >>> >>> I've been using >>> >>> ndarray[uint8_t, cast=True] bool_arr >>> >>> to work with dtype=bool arrays in Cython lately. When testing using >>> Python 2.5 / NumPy 1.6.1 on Windows, I'm getting "unknown dtype code >>> in numpy.pxd (0)". Everything works fine with Python 2.6/2.7 and NumPy >>> 1.6.1. This is with Cython 0.15.1. >>> >>> Any advice or do I have to (very unhappily) work around this? >> >> Is this a recent bug in Cython? Try to bisect the the Cython release (and if >> it turns out to be Cython, possible commit). >> >> Dag Sverre >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> > > I'll check the HEAD revision and bisect if I can, don't have a lot of > time-- it's just strange that it's Python 2.5 only. > For some reason I can't build Cython (0.15.1 or git HEAD) with mingw32: C:\cython>python setup.py install running install running build running build_py running build_ext building 'Cython.Compiler.Scanning' extension C:\MinGW\bin\gcc.exe -mno-cygwin -mdll -O -Wall -IC:\Python25\include -IC:\Pytho n25\PC -c Cython\Compiler\Scanning.c -o build\temp.win32-2.5\Release\cython\comp iler\scanning.o Cython\Compiler\Scanning.c:13340: error: initializer element is not constant Cython\Compiler\Scanning.c:13340: error: (near initialization for `__pyx_CyFunct ionType_type.tp_call') error: command 'gcc' failed with exit status 1 C:\cython> I've half a mind to drop Python 2.5 support in pandas over this... From d.s.seljebotn at astro.uio.no Mon Oct 24 22:09:47 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 24 Oct 2011 22:09:47 +0200 Subject: [Cython] Buffer interface to boolean arrays with cast=True on Python 2.5 failing In-Reply-To: References: <4EA5BE72.1090104@astro.uio.no> Message-ID: <4EA5C60B.2040704@astro.uio.no> On 10/24/2011 09:40 PM, Wes McKinney wrote: > On Mon, Oct 24, 2011 at 3:37 PM, Dag Sverre Seljebotn > wrote: >> On 10/24/2011 09:26 PM, Wes McKinney wrote: >>> >>> I've been using >>> >>> ndarray[uint8_t, cast=True] bool_arr >>> >>> to work with dtype=bool arrays in Cython lately. When testing using >>> Python 2.5 / NumPy 1.6.1 on Windows, I'm getting "unknown dtype code >>> in numpy.pxd (0)". Everything works fine with Python 2.6/2.7 and NumPy >>> 1.6.1. This is with Cython 0.15.1. >>> >>> Any advice or do I have to (very unhappily) work around this? >> >> Is this a recent bug in Cython? Try to bisect the the Cython release (and if >> it turns out to be Cython, possible commit). >> >> Dag Sverre >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> > > I'll check the HEAD revision and bisect if I can, don't have a lot of > time-- it's just strange that it's Python 2.5 only. So the difference between Python 2.5 and 2.6 is that in 2.5 the __getbuffer__ in numpy.pxd will be called, whereas in Python 2.6, NumPy is able to do the job itself. (PEP 3118) Which means...that there's likely a bug in __getbuffer__ in numpy.pxd. You can debug If you do have time, that's the place to start inserting print statements etc. to debug this. It's difficult to say more without a copy&paste directly from your terminal. Dag Sverre From greg.ewing at canterbury.ac.nz Mon Oct 24 23:03:32 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 25 Oct 2011 10:03:32 +1300 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: References: Message-ID: <4EA5D2A4.3010303@canterbury.ac.nz> mark florisson wrote: > These will by default not lock for operations to allow > e.g. one thread to iterate over the list and another thread to index > it without lock contention and other general overhead. I don't think that's safe. You can't say "I'm not modifying this, so I don't need to lock it" because there may be another thread that *is* in the midst of modifying it. -- Greg From wesmckinn at gmail.com Mon Oct 24 23:03:56 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Mon, 24 Oct 2011 17:03:56 -0400 Subject: [Cython] Buffer interface to boolean arrays with cast=True on Python 2.5 failing In-Reply-To: <4EA5C60B.2040704@astro.uio.no> References: <4EA5BE72.1090104@astro.uio.no> <4EA5C60B.2040704@astro.uio.no> Message-ID: On Mon, Oct 24, 2011 at 4:09 PM, Dag Sverre Seljebotn wrote: > On 10/24/2011 09:40 PM, Wes McKinney wrote: >> >> On Mon, Oct 24, 2011 at 3:37 PM, Dag Sverre Seljebotn >> ?wrote: >>> >>> On 10/24/2011 09:26 PM, Wes McKinney wrote: >>>> >>>> I've been using >>>> >>>> ndarray[uint8_t, cast=True] bool_arr >>>> >>>> to work with dtype=bool arrays in Cython lately. When testing using >>>> Python 2.5 / NumPy 1.6.1 on Windows, I'm getting "unknown dtype code >>>> in numpy.pxd (0)". Everything works fine with Python 2.6/2.7 and NumPy >>>> 1.6.1. This is with Cython 0.15.1. >>>> >>>> Any advice or do I have to (very unhappily) work around this? >>> >>> Is this a recent bug in Cython? Try to bisect the the Cython release (and >>> if >>> it turns out to be Cython, possible commit). >>> >>> Dag Sverre >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >>> >> >> I'll check the HEAD revision and bisect if I can, don't have a lot of >> time-- it's just strange that it's Python 2.5 only. > > So the difference between Python 2.5 and 2.6 is that in 2.5 the > __getbuffer__ in numpy.pxd will be called, whereas in Python 2.6, NumPy is > able to do the job itself. (PEP 3118) > > Which means...that there's likely a bug in __getbuffer__ in numpy.pxd. You > can debug > > If you do have time, that's the place to start inserting print statements > etc. to debug this. > > It's difficult to say more without a copy&paste directly from your terminal. > > Dag Sverre > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > I need pandas to build off of a released version of Cython so I am just going to have to work around this by doing taking views of boolean arrays as np.uint8. I wouldn't mind dropping Python 2.5 support altogether but some people might not like that. From markflorisson88 at gmail.com Mon Oct 24 23:51:22 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 24 Oct 2011 22:51:22 +0100 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: <4EA5D2A4.3010303@canterbury.ac.nz> References: <4EA5D2A4.3010303@canterbury.ac.nz> Message-ID: On 24 October 2011 22:03, Greg Ewing wrote: > mark florisson wrote: >> >> These will by default not lock for operations to allow >> e.g. one thread to iterate over the list and another thread to index >> it without lock contention and other general overhead. > > I don't think that's safe. You can't say "I'm not modifying > this, so I don't need to lock it" because there may be another > thread that *is* in the midst of modifying it. Oh yes you're definitely right, that was silly of me. I suppose every operation needs to lock. This can still be useful though, to allow more fine-grained parallelism. Then it would be more efficient to use arrays or memoryviews with acquisition counted objects, and the dicts/lists/tuples etc for cases where you just need more fine-grained locking and can deal with that overhead. > -- > Greg > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From markflorisson88 at gmail.com Mon Oct 24 23:52:35 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 24 Oct 2011 22:52:35 +0100 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: <4EA5D2A4.3010303@canterbury.ac.nz> References: <4EA5D2A4.3010303@canterbury.ac.nz> Message-ID: On 24 October 2011 22:03, Greg Ewing wrote: > mark florisson wrote: >> >> These will by default not lock for operations to allow >> e.g. one thread to iterate over the list and another thread to index >> it without lock contention and other general overhead. > > I don't think that's safe. You can't say "I'm not modifying > this, so I don't need to lock it" because there may be another > thread that *is* in the midst of modifying it. I was really thinking of the case where you instantiate it in Cython and then do some parallel work, in which case you're the only user. But you can't assume that in general. > -- > Greg > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From robertwb at math.washington.edu Tue Oct 25 06:47:05 2011 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Mon, 24 Oct 2011 21:47:05 -0700 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: References: <4EA5D2A4.3010303@canterbury.ac.nz> Message-ID: On Mon, Oct 24, 2011 at 2:52 PM, mark florisson wrote: > On 24 October 2011 22:03, Greg Ewing wrote: >> mark florisson wrote: >>> >>> These will by default not lock for operations to allow >>> e.g. one thread to iterate over the list and another thread to index >>> it without lock contention and other general overhead. >> >> I don't think that's safe. You can't say "I'm not modifying >> this, so I don't need to lock it" because there may be another >> thread that *is* in the midst of modifying it. > > I was really thinking of the case where you instantiate it in Cython > and then do some parallel work, in which case you're the only user. > But you can't assume that in general. It could be useful to assert for a chunk of code that a given object is read-only and will not be mutated for the duration of the context (programmer error and strange crash/data corruption if it is). E.g. with nogil, assert_frozen(my_dict): a = (my_dict[key]).c_attribute [...] All references obtained could be borrowed. Perhaps we could even enforce this for cdef classes (but perhaps not consistently enough, and perhaps that would make things even more confusing). Just a thought. - Robert From stefan_ml at behnel.de Tue Oct 25 09:33:28 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 25 Oct 2011 09:33:28 +0200 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: References: Message-ID: <4EA66648.8030102@behnel.de> mark florisson, 24.10.2011 21:50: > This is in response to > http://groups.google.com/group/cython-users/browse_thread/thread/bcbc5fe0e329224f > and http://trac.cython.org/cython_trac/ticket/498 , and some of the > previous discussion on cython.parallel. > > Basically I think we should have something more powerful than 'cdef > borrowed CdefClass obj', something that also doesn't rely on new > syntax. We will still need borrowed reference support in the compiler eventually, whether we make it a language feature or not. > What if we support acquisition counting for every instance of a cdef > class? In Python and Cython GIL mode you use reference counting, and > in Cython nogil mode and for structs attributes, array dtypes etc you > use acquisition counting. This allows you to pass around cdef objects > without the GIL and use their nogil methods. If the acquisition count > is greater than 1, the acquisition count owns a reference to the > object. If it reaches 0 you discard your owned reference (you can > simply acquire the GIL if you don't have it) and when you increment > from zero you obtain it. Perhaps something like libatomic could be > used to efficiently implement this. Where would you store that count? In the object struct? That would increase the size of each instance. > The advantages are: > > 1) allow users to pass around cdef typed objects in nogil mode > 2) allow cdef typed objects in as struct attributes or array elements > 3) make it easy to implement things like memoryviews (already done but > would have been a lot easier), cython.parallel.async/future objects, > cython.parallel.mutex objects and possibly other things in the future Would it really be easier? You can already call cdef methods in nogil mode, AFAIR. > We should then allow a syntax like > > with mycdefobject: > ... > > to lock the object in GIL or nogil mode (like java's 'synchronized'). > For objects that already have __enter__ and __exit__ you could support > something like 'with cython.synchronized(mycdefobject): ...' instead. > Or perhaps you should always require cython.synchronized (or > cython.parallel.synchronized). The latter, I sure hope. > In addition to nogil methods a user may provide special cdef nogil methods, i.e. > > cdef int __len__(self) nogil: > ... > > which would provide a Cython as well as a Python implementation for > the function (with automatic cpdef behaviour), so you could use it in > both contexts. That can already be done for final types, simply by adding cpdef behaviour to all special methods. That would also fix ticket #3, for example. Note that the DefNode refactoring is still pending, it would help here. > There are two options for assignment semantics to a struct attribute > or array element: > - decref the old value (this implies always initializing the > pointers to NULL first) > - don't decref the old value (the user has to manually use 'del') > > I think 1) is more definitely consistent with how everything else works. Yes. > All of this functionality should also get a sane C API (to be provided > by cython.h). You'd get a Cy_INCREF(obj, have_gil)/Cy_DECREF() etc. > Every class using this functionality is a subclass of CythonObject > (that contains a PyObject + an acquisition count + a lock). Perhaps if > the user is subclassing something other than object we could allow the > user to specify custom __cython_(un)lock__ and > __cython_acquisition_count__ methods and fields. > > Now, building on top of this functionality, Cython could provide > built-in nogil-compatible types, like lists, dicts and maybe tuples > (as a start). These will by default not lock for operations to allow > e.g. one thread to iterate over the list and another thread to index > it without lock contention and other general overhead. If one thread > is somehow changing the size of the list, or writing to indices that > another thread is reading from/writing to, the results will of course > be undefined unless the user synchronizes on the object. So it would > be the user's responsibility. The acquisition counting itself will > always be thread-safe (i.e., it will be atomic if possible, otherwise > it will lock). > > It's probably best to not enable this functionality by default as it > would be more expensive to instantiate objects, but it could be > supported through a cdef class decorator and a general directive. It's well known that this would be expensive. One of the approaches that tried to get rid of the GIL in CPython introduced fine grained locking, and it turned out to be substantially slower, AFAIR by a factor of two. You could potentially drop the locking for local variables, but you'd loose that ability as soon as the 'object' is passed into a function. Basically, what you are trying to do here is to duplicate the complete ref-counting infrastructure of CPython, but without using CPython. > Of course one may still use non-cdef borrowed objects, by simply > casting to a PyObject *. That's very ugly, though, because you loose all access to methods and attributes of the object. Basically, it becomes useless that way, except for storing away a pointer to it somewhere. You could just as well use a void*. Stefan From markflorisson88 at gmail.com Tue Oct 25 11:11:24 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 25 Oct 2011 10:11:24 +0100 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: References: <4EA5D2A4.3010303@canterbury.ac.nz> Message-ID: On 25 October 2011 05:47, Robert Bradshaw wrote: > On Mon, Oct 24, 2011 at 2:52 PM, mark florisson > wrote: >> On 24 October 2011 22:03, Greg Ewing wrote: >>> mark florisson wrote: >>>> >>>> These will by default not lock for operations to allow >>>> e.g. one thread to iterate over the list and another thread to index >>>> it without lock contention and other general overhead. >>> >>> I don't think that's safe. You can't say "I'm not modifying >>> this, so I don't need to lock it" because there may be another >>> thread that *is* in the midst of modifying it. >> >> I was really thinking of the case where you instantiate it in Cython >> and then do some parallel work, in which case you're the only user. >> But you can't assume that in general. > > It could be useful to assert for a chunk of code that a given object > is read-only and will not be mutated for the duration of the context > (programmer error and strange crash/data corruption if it is). E.g. > > with nogil, assert_frozen(my_dict): > ? ?a = (my_dict[key]).c_attribute > ? ?[...] > > All references obtained could be borrowed. Perhaps we could even > enforce this for cdef classes (but perhaps not consistently enough, > and perhaps that would make things even more confusing). Just a > thought. Hmm, I actually think that passing around references in general (without having to declare them as borrowed in parameters) would be a good feature. If my_dict would be e.g. a cython.types.dict, then it would only accept CythonObjects, so it could just do the acquisition counting. For cython.parallel we could provide types more suited for the cython.parallel kind of fine-grained parallelism, e.g. lock for writes, don't lock for reads, which allows either to happen simultaneously, but not any mixing of those two. Through explicit or implicit barriers one may be sure that operations are correct. > - Robert > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From markflorisson88 at gmail.com Tue Oct 25 11:11:47 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 25 Oct 2011 10:11:47 +0100 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: <4EA66648.8030102@behnel.de> References: <4EA66648.8030102@behnel.de> Message-ID: On 25 October 2011 08:33, Stefan Behnel wrote: > mark florisson, 24.10.2011 21:50: >> >> This is in response to >> >> http://groups.google.com/group/cython-users/browse_thread/thread/bcbc5fe0e329224f >> and http://trac.cython.org/cython_trac/ticket/498 , and some of the >> previous discussion on cython.parallel. >> >> Basically I think we should have something more powerful than 'cdef >> borrowed CdefClass obj', something that also doesn't rely on new >> syntax. > > We will still need borrowed reference support in the compiler eventually, > whether we make it a language feature or not. > I'm not sure I understand why,?acquisition counting could solve these problems for cdef classes, and general objects may not be used without the GIL. Do you want this as an optimization? >> What if we support acquisition counting for every instance of a cdef >> class? In Python and Cython GIL mode you use reference counting, and >> in Cython nogil mode and for structs attributes, array dtypes etc you >> use acquisition counting. This allows you to pass around cdef objects >> without the GIL and use their nogil methods. If the acquisition count >> is greater than 1, the acquisition count owns a reference to the >> object. If it reaches 0 you discard your owned reference (you can >> simply acquire the GIL if you don't have it) and when you increment >> from zero you obtain it. Perhaps something like libatomic could be >> used to efficiently implement this. > > Where would you store that count? In the object struct? That would increase > the size of each instance. Yes, not just the count, also the lock. This feature would be optional and may be very useful for people (I think). > >> The advantages are: >> >> 1) allow users to pass around cdef typed objects in nogil mode >> 2) allow cdef typed objects in as struct attributes or array elements >> 3) make it easy to implement things like memoryviews (already done but >> would have been a lot easier), cython.parallel.async/future objects, >> cython.parallel.mutex objects and possibly other things in the future > > Would it really be easier? You can already call cdef methods in nogil mode, > AFAIR. > Sure, but you cannot store cdef objects as struct attributes, array elements (you could implement it with reference counting, but not for nogil mode), and you cannot pass them around without the GIL. This proposal is about making your life easier without the GIL, and currently it's kind of a pain. >> We should then allow a syntax like >> >> ? ? with mycdefobject: >> ? ? ? ? ... >> >> to lock the object in GIL or nogil mode (like java's 'synchronized'). >> For objects that already have __enter__ and __exit__ you could support >> something like 'with cython.synchronized(mycdefobject): ...' instead. >> Or perhaps you should always require cython.synchronized (or >> cython.parallel.synchronized). > > The latter, I sure hope. > > >> In addition to nogil methods a user may provide special cdef nogil >> methods, i.e. >> >> cdef int __len__(self) nogil: >> ? ? ... >> >> which would provide a Cython as well as a Python implementation for >> the function (with automatic cpdef behaviour), so you could use it in >> both contexts. > > That can already be done for final types, simply by adding cpdef behaviour > to all special methods. That would also fix ticket #3, for example. > > Note that the DefNode refactoring is still pending, it would help here. > Ah I assumed cpdef nogil was invalid, I see it isn't, cool. This breaks terribly for special methods though. >> There are two options for assignment semantics to a struct attribute >> or array element: >> ? ? - decref the old value (this implies always initializing the >> pointers to NULL first) >> ? ? - don't decref the old value (the user has to manually use 'del') >> >> I think 1) is more definitely consistent with how everything else works. > > Yes. > > >> All of this functionality should also get a sane C API (to be provided >> by cython.h). You'd get a Cy_INCREF(obj, have_gil)/Cy_DECREF() etc. >> Every class using this functionality is a subclass of CythonObject >> (that contains a PyObject + an acquisition count + a lock). Perhaps if >> the user is subclassing something other than object we could allow the >> user to specify custom __cython_(un)lock__ and >> __cython_acquisition_count__ methods and fields. >> >> Now, building on top of this functionality, Cython could provide >> built-in nogil-compatible types, like lists, dicts and maybe tuples >> (as a start). These will by default not lock for operations to allow >> e.g. one thread to iterate over the list and another thread to index >> it without lock contention and other general overhead. If one thread >> is somehow changing the size of the list, or writing to indices that >> another thread is reading from/writing to, the results will of course >> be undefined unless the user synchronizes on the object. So it would >> be the user's responsibility. The acquisition counting itself will >> always be thread-safe (i.e., it will be atomic if possible, otherwise >> it will lock). >> >> It's probably best to not enable this functionality by default as it >> would be more expensive to instantiate objects, but it could be >> supported through a cdef class decorator and a general directive. > > It's well known that this would be expensive. One of the approaches that > tried to get rid of the GIL in CPython introduced fine grained locking, and > it turned out to be substantially slower, AFAIR by a factor of two. Sure, I am aware of that. Often you can just keep the GIL, in which case you wouldn't use these types. But when you want to leave the shiny world of the GIL you still want these goodies. Acquiring the GIL is too expensive as there is pretty much always contention. > You could potentially drop the locking for local variables, but you'd loose > that ability as soon as the 'object' is passed into a function. Definitely, but you cannot use them with the GIL anyway :) > Basically, what you are trying to do here is to duplicate the complete > ref-counting infrastructure of CPython, but without using CPython. > > >> Of course one may still use non-cdef borrowed objects, by simply >> casting to a PyObject *. > > That's very ugly, though, because you loose all access to methods and > attributes of the object. Basically, it becomes useless that way, except for > storing away a pointer to it somewhere. You could just as well use a void*. Indeed, and that's really all you can do without the GIL. I think we're talking about different things, I'm talking about supporting nogil, and you're talking about borrowed references in general. I'm not sure why you'd not just take a reference instead in GIL mode, unless you were worried about incrementing a counter. > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From stefan_ml at behnel.de Tue Oct 25 13:22:08 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 25 Oct 2011 13:22:08 +0200 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: References: <4EA66648.8030102@behnel.de> Message-ID: <4EA69BE0.3060307@behnel.de> mark florisson, 25.10.2011 11:11: > On 25 October 2011 08:33, Stefan Behnel wrote: >> mark florisson, 24.10.2011 21:50: >>> >>> This is in response to >>> >>> http://groups.google.com/group/cython-users/browse_thread/thread/bcbc5fe0e329224f >>> and http://trac.cython.org/cython_trac/ticket/498 , and some of the >>> previous discussion on cython.parallel. >>> >>> Basically I think we should have something more powerful than 'cdef >>> borrowed CdefClass obj', something that also doesn't rely on new >>> syntax. >> >> We will still need borrowed reference support in the compiler eventually, >> whether we make it a language feature or not. > > I'm not sure I understand why, acquisition counting could solve these > problems for cdef classes, and general objects may not be used without > the GIL. Do you want this as an optimization? Yes. Think of type(x), for example, or PyDict_GetItem(). They return borrowed references, and in many cases, Cython wouldn't have to INCREF and DECREF them when they are only being used as part of some specific kinds of expressions. The same applies to some utility functions in Cython that currently must INCREF their return value unconditionally, simply because they can't tell Cython that they could also return a borrowed reference instead. If there was a way to do that, we could optimise the reference counting away in a couple of more places, which would get us another bit closer to hand-tuned code. However, note that this doesn't necessarily have an impact on nogil code. If you took a borrowed reference in one nogil thread, and a gil-holding thread deletes the object at the same time or during the lifetime of the borrowed reference (e.g. by updating a dict or assigning to a cdef attribute), the nogil thread would end up with a dead pointer in its hands. That's why the usage of borrowed references needs to be explicit in the code ("I know what I'm doing"), and the optimisations require the GIL to be held. >>> What if we support acquisition counting for every instance of a cdef >>> class? In Python and Cython GIL mode you use reference counting, and >>> in Cython nogil mode and for structs attributes, array dtypes etc you >>> use acquisition counting. This allows you to pass around cdef objects >>> without the GIL and use their nogil methods. If the acquisition count >>> is greater than 1, the acquisition count owns a reference to the >>> object. If it reaches 0 you discard your owned reference (you can >>> simply acquire the GIL if you don't have it) and when you increment >>> from zero you obtain it. Perhaps something like libatomic could be >>> used to efficiently implement this. >> >> Where would you store that count? In the object struct? That would increase >> the size of each instance. > > Yes, not just the count, also the lock. This feature would be optional > and may be very useful for people (I think). Well, as long as it's an optional feature that requires a class decorator, the only obvious drawback is that it'll bloat the compiler even more than it is already. >>> The advantages are: >>> >>> 1) allow users to pass around cdef typed objects in nogil mode >>> 2) allow cdef typed objects in as struct attributes or array elements >>> 3) make it easy to implement things like memoryviews (already done but >>> would have been a lot easier), cython.parallel.async/future objects, >>> cython.parallel.mutex objects and possibly other things in the future >> >> Would it really be easier? You can already call cdef methods in nogil mode, >> AFAIR. > > Sure, but you cannot store cdef objects as struct attributes, array > elements (you could implement it with reference counting, but not for > nogil mode) You could do that with borrowed references, though, assuming that you keep another reference around (or do your own ref-counting). However, I do see that keeping a real reference around may be hard to do in some cases. > and you cannot pass them around without the GIL. Yes, you can, as long as you only go through cdef functions. Obviously, you can't pass them into a Python function call, but you can (and could, if it was implemented) do loads of useful things with existing references even in nogil sections. The GIL checker is quite fine grained already but could do even better. > This > proposal is about making your life easier without the GIL, and > currently it's kind of a pain. The nogil sections I use are usually quite short, so I can't tell. It's certainly a pain to work without the GIL, because it means you have to take a lot more care when writing your code. But that won't change just by dropping reference counting. And nogil code will definitely become another bit harder to get right when using borrowed references. > Ah I assumed cpdef nogil was invalid, I see it isn't, cool. It makes perfect sense. Just because a function *can* be called without the GIL doesn't mean it can't be called from Python. So the Python wrapper requires the GIL, but the underlying cdef function doesn't. > This breaks terribly for special methods though. Why? It's just a matter of properly separating out their Python wrapper. That's why I was referring to the DefNode refactoring. >>> All of this functionality should also get a sane C API (to be provided >>> by cython.h). You'd get a Cy_INCREF(obj, have_gil)/Cy_DECREF() etc. >>> Every class using this functionality is a subclass of CythonObject >>> (that contains a PyObject + an acquisition count + a lock). Perhaps if >>> the user is subclassing something other than object we could allow the >>> user to specify custom __cython_(un)lock__ and >>> __cython_acquisition_count__ methods and fields. >>> >>> Now, building on top of this functionality, Cython could provide >>> built-in nogil-compatible types, like lists, dicts and maybe tuples >>> (as a start). These will by default not lock for operations to allow >>> e.g. one thread to iterate over the list and another thread to index >>> it without lock contention and other general overhead. If one thread >>> is somehow changing the size of the list, or writing to indices that >>> another thread is reading from/writing to, the results will of course >>> be undefined unless the user synchronizes on the object. So it would >>> be the user's responsibility. The acquisition counting itself will >>> always be thread-safe (i.e., it will be atomic if possible, otherwise >>> it will lock). >>> >>> It's probably best to not enable this functionality by default as it >>> would be more expensive to instantiate objects, but it could be >>> supported through a cdef class decorator and a general directive. >> >> It's well known that this would be expensive. One of the approaches that >> tried to get rid of the GIL in CPython introduced fine grained locking, and >> it turned out to be substantially slower, AFAIR by a factor of two. > > Sure, I am aware of that. Often you can just keep the GIL, in which > case you wouldn't use these types. But when you want to leave the > shiny world of the GIL you still want these goodies. Acquiring the GIL > is too expensive as there is pretty much always contention. Acquiring a more fine grained lock is more likely to reduce the contention, but is not necessarily less expensive. The lock still needs to get acquired and released. GIL protected reference counting is a lot cheaper than that, as is manual locking in a more coarse grained fashion. >> You could potentially drop the locking for local variables, but you'd loose >> that ability as soon as the 'object' is passed into a function. > > Definitely, but you cannot use them with the GIL anyway :) Yes you can. For cdef functions, it's the responsibility of the caller to own the references of object arguments it passes. The called function doesn't have to do reference counting for them, as long as it doesn't try to reassign the variable. And even that could be fixed with borrowed references, and also partly by better control flow analysis. >> Basically, what you are trying to do here is to duplicate the complete >> ref-counting infrastructure of CPython, but without using CPython. >> >>> Of course one may still use non-cdef borrowed objects, by simply >>> casting to a PyObject *. >> >> That's very ugly, though, because you loose all access to methods and >> attributes of the object. Basically, it becomes useless that way, except for >> storing away a pointer to it somewhere. You could just as well use a void*. > > Indeed, and that's really all you can do without the GIL. I think you're underestimating what can (or could) be done without holding the GIL. There are still some open features that wait for being implemented, even without adding new syntax (and thus further increasing the complexity of the language). > I think > we're talking about different things, I'm talking about supporting > nogil, and you're talking about borrowed references in general. Both are related, though. It's certainly a lot easier and cleaner to support borrowed references in the compiler, than to implement a whole new scheme for handling extension type instances in addition to the normal object handling which we need anyway. > I'm > not sure why you'd not just take a reference instead in GIL mode, > unless you were worried about incrementing a counter. Decrementing it, not incrementing. :) The problem is not so much the INCREF (which is just an indirect add), it's the DECREF, which contains a conditional jump based on an unknown external value, that may trigger external code. That can kill several C compiler optimisations for the surrounding code. (And that would only get worse by using a dedicated locking mechanism.) Stefan From d.s.seljebotn at astro.uio.no Tue Oct 25 15:28:57 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 25 Oct 2011 15:28:57 +0200 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: <4EA66648.8030102@behnel.de> References: <4EA66648.8030102@behnel.de> Message-ID: <4EA6B999.7060807@astro.uio.no> On 10/25/2011 09:33 AM, Stefan Behnel wrote: > mark florisson, 24.10.2011 21:50: >> This is in response to >> http://groups.google.com/group/cython-users/browse_thread/thread/bcbc5fe0e329224f >> >> and http://trac.cython.org/cython_trac/ticket/498 , and some of the >> previous discussion on cython.parallel. >> >> Basically I think we should have something more powerful than 'cdef >> borrowed CdefClass obj', something that also doesn't rely on new >> syntax. > > We will still need borrowed reference support in the compiler > eventually, whether we make it a language feature or not. > > >> What if we support acquisition counting for every instance of a cdef >> class? In Python and Cython GIL mode you use reference counting, and >> in Cython nogil mode and for structs attributes, array dtypes etc you >> use acquisition counting. This allows you to pass around cdef objects >> without the GIL and use their nogil methods. If the acquisition count >> is greater than 1, the acquisition count owns a reference to the >> object. If it reaches 0 you discard your owned reference (you can >> simply acquire the GIL if you don't have it) and when you increment >> from zero you obtain it. Perhaps something like libatomic could be >> used to efficiently implement this. > > Where would you store that count? In the object struct? That would > increase the size of each instance. > > >> The advantages are: >> >> 1) allow users to pass around cdef typed objects in nogil mode >> 2) allow cdef typed objects in as struct attributes or array elements >> 3) make it easy to implement things like memoryviews (already done but >> would have been a lot easier), cython.parallel.async/future objects, >> cython.parallel.mutex objects and possibly other things in the future > > Would it really be easier? You can already call cdef methods in nogil > mode, AFAIR. > > >> We should then allow a syntax like >> >> with mycdefobject: >> ... >> >> to lock the object in GIL or nogil mode (like java's 'synchronized'). >> For objects that already have __enter__ and __exit__ you could support >> something like 'with cython.synchronized(mycdefobject): ...' instead. >> Or perhaps you should always require cython.synchronized (or >> cython.parallel.synchronized). > > The latter, I sure hope. > > >> In addition to nogil methods a user may provide special cdef nogil >> methods, i.e. >> >> cdef int __len__(self) nogil: >> ... >> >> which would provide a Cython as well as a Python implementation for >> the function (with automatic cpdef behaviour), so you could use it in >> both contexts. > > That can already be done for final types, simply by adding cpdef > behaviour to all special methods. That would also fix ticket #3, for > example. > > Note that the DefNode refactoring is still pending, it would help here. > > >> There are two options for assignment semantics to a struct attribute >> or array element: >> - decref the old value (this implies always initializing the >> pointers to NULL first) >> - don't decref the old value (the user has to manually use 'del') >> >> I think 1) is more definitely consistent with how everything else works. > > Yes. > > >> All of this functionality should also get a sane C API (to be provided >> by cython.h). You'd get a Cy_INCREF(obj, have_gil)/Cy_DECREF() etc. >> Every class using this functionality is a subclass of CythonObject >> (that contains a PyObject + an acquisition count + a lock). Perhaps if >> the user is subclassing something other than object we could allow the >> user to specify custom __cython_(un)lock__ and >> __cython_acquisition_count__ methods and fields. >> >> Now, building on top of this functionality, Cython could provide >> built-in nogil-compatible types, like lists, dicts and maybe tuples >> (as a start). These will by default not lock for operations to allow >> e.g. one thread to iterate over the list and another thread to index >> it without lock contention and other general overhead. If one thread >> is somehow changing the size of the list, or writing to indices that >> another thread is reading from/writing to, the results will of course >> be undefined unless the user synchronizes on the object. So it would >> be the user's responsibility. The acquisition counting itself will >> always be thread-safe (i.e., it will be atomic if possible, otherwise >> it will lock). >> >> It's probably best to not enable this functionality by default as it >> would be more expensive to instantiate objects, but it could be >> supported through a cdef class decorator and a general directive. > > It's well known that this would be expensive. One of the approaches that > tried to get rid of the GIL in CPython introduced fine grained locking, > and it turned out to be substantially slower, AFAIR by a factor of two. I'd gladly take a factor two (or even four) slowdown of CPython code any day to get rid of the GIL :-). The thing is, sometimes one has 48 cores and consider a 10x speedup better than nothing... Dag Sverre From stefan_ml at behnel.de Tue Oct 25 16:37:24 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 25 Oct 2011 16:37:24 +0200 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: <4EA6B999.7060807@astro.uio.no> References: <4EA66648.8030102@behnel.de> <4EA6B999.7060807@astro.uio.no> Message-ID: <4EA6C9A4.9030905@behnel.de> Dag Sverre Seljebotn, 25.10.2011 15:28: > On 10/25/2011 09:33 AM, Stefan Behnel wrote: >> mark florisson, 24.10.2011 21:50: >>> All of this functionality should also get a sane C API (to be provided >>> by cython.h). You'd get a Cy_INCREF(obj, have_gil)/Cy_DECREF() etc. >>> Every class using this functionality is a subclass of CythonObject >>> (that contains a PyObject + an acquisition count + a lock). Perhaps if >>> the user is subclassing something other than object we could allow the >>> user to specify custom __cython_(un)lock__ and >>> __cython_acquisition_count__ methods and fields. >>> >>> Now, building on top of this functionality, Cython could provide >>> built-in nogil-compatible types, like lists, dicts and maybe tuples >>> (as a start). These will by default not lock for operations to allow >>> e.g. one thread to iterate over the list and another thread to index >>> it without lock contention and other general overhead. If one thread >>> is somehow changing the size of the list, or writing to indices that >>> another thread is reading from/writing to, the results will of course >>> be undefined unless the user synchronizes on the object. So it would >>> be the user's responsibility. The acquisition counting itself will >>> always be thread-safe (i.e., it will be atomic if possible, otherwise >>> it will lock). >>> >>> It's probably best to not enable this functionality by default as it >>> would be more expensive to instantiate objects, but it could be >>> supported through a cdef class decorator and a general directive. >> >> It's well known that this would be expensive. One of the approaches that >> tried to get rid of the GIL in CPython introduced fine grained locking, >> and it turned out to be substantially slower, AFAIR by a factor of two. > > I'd gladly take a factor two (or even four) slowdown of CPython code any > day to get rid of the GIL :-). The thing is, sometimes one has 48 cores and > consider a 10x speedup better than nothing... Ah, sorry, that factor was for single-threaded code. How it would scale for multi-core code depends on too many factors to make any general statement. Stefan From markflorisson88 at gmail.com Tue Oct 25 18:58:39 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 25 Oct 2011 17:58:39 +0100 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: <4EA69BE0.3060307@behnel.de> References: <4EA66648.8030102@behnel.de> <4EA69BE0.3060307@behnel.de> Message-ID: On 25 October 2011 12:22, Stefan Behnel wrote: > mark florisson, 25.10.2011 11:11: >> >> On 25 October 2011 08:33, Stefan Behnel wrote: >>> >>> mark florisson, 24.10.2011 21:50: >>>> >>>> This is in response to >>>> >>>> >>>> http://groups.google.com/group/cython-users/browse_thread/thread/bcbc5fe0e329224f >>>> and http://trac.cython.org/cython_trac/ticket/498 , and some of the >>>> previous discussion on cython.parallel. >>>> >>>> Basically I think we should have something more powerful than 'cdef >>>> borrowed CdefClass obj', something that also doesn't rely on new >>>> syntax. >>> >>> We will still need borrowed reference support in the compiler eventually, >>> whether we make it a language feature or not. >> >> I'm not sure I understand why, acquisition counting could solve these >> problems for cdef classes, and general objects may not be used without >> the GIL. Do you want this as an optimization? > > Yes. Think of type(x), for example, or PyDict_GetItem(). They return > borrowed references, and in many cases, Cython wouldn't have to INCREF and > DECREF them when they are only being used as part of some specific kinds of > expressions. The same applies to some utility functions in Cython that > currently must INCREF their return value unconditionally, simply because > they can't tell Cython that they could also return a borrowed reference > instead. If there was a way to do that, we could optimise the reference > counting away in a couple of more places, which would get us another bit > closer to hand-tuned code. > > However, note that this doesn't necessarily have an impact on nogil code. If > you took a borrowed reference in one nogil thread, and a gil-holding thread > deletes the object at the same time or during the lifetime of the borrowed > reference (e.g. by updating a dict or assigning to a cdef attribute), the > nogil thread would end up with a dead pointer in its hands. That's why the > usage of borrowed references needs to be explicit in the code ("I know what > I'm doing"), and the optimisations require the GIL to be held. > I see, ok. Thanks, that really helped me see the motivation behind it (i.e., the INC/DECREF really is a performance issue for you). >>>> What if we support acquisition counting for every instance of a cdef >>>> class? In Python and Cython GIL mode you use reference counting, and >>>> in Cython nogil mode and for structs attributes, array dtypes etc you >>>> use acquisition counting. This allows you to pass around cdef objects >>>> without the GIL and use their nogil methods. If the acquisition count >>>> is greater than 1, the acquisition count owns a reference to the >>>> object. If it reaches 0 you discard your owned reference (you can >>>> simply acquire the GIL if you don't have it) and when you increment >>>> from zero you obtain it. Perhaps something like libatomic could be >>>> used to efficiently implement this. >>> >>> Where would you store that count? In the object struct? That would >>> increase >>> the size of each instance. >> >> Yes, not just the count, also the lock. This feature would be optional >> and may be very useful for people (I think). > > Well, as long as it's an optional feature that requires a class decorator, > the only obvious drawback is that it'll bloat the compiler even more than it > is already. > Actually, I think it will help the implementation of mutexes and async objects if we want those, and possibly other stuff in the future. The acquisition counting is basically already there (for memoryviews), so it's easy to track down where and when to apply this. However one major problem would be circular acquisition counts, so you'd also have to implement a garbage collector like CPython has (e.g. if you have a cdef class with a cython.parallel.dict). We should just have a real garbage collector instead of all the counting crap. Or we could make it a burden for the user... I agree that this is really not as feasible as I first thought. It actually shows me a problem where I can have a memoryview object in a memoryview with dtype 'object', although the problem here is that the memoryview object doesn't traverse the object in the Py_buffer, or when coerced from a memoryview slice to a memoryview object, the memoryview slice struct object... I suppose I need to fix that (but I'm not sure how, as you can't provide a manual traverse function in Cython). But I really believe that these are much-wanted features. If you're using threads in Python you can only get concurrency not parallelism, unless you release the GIL, even if there is some performance overhead it will still be a lot better than sequential execution. Perhaps when cython.parallel will be more mature, we may get functionality to specify data distribution schemes and message passing, in which case the GIL won't be a problem. But many things would be harder or much more expensive, e.g. transposing, sending objects etc. I think I'll just drop this discussion for now. I'm going to look at how garbage collection works, how pypy works and their GIL, and figure out what I want. >>>> The advantages are: >>>> >>>> 1) allow users to pass around cdef typed objects in nogil mode >>>> 2) allow cdef typed objects in as struct attributes or array elements >>>> 3) make it easy to implement things like memoryviews (already done but >>>> would have been a lot easier), cython.parallel.async/future objects, >>>> cython.parallel.mutex objects and possibly other things in the future >>> >>> Would it really be easier? You can already call cdef methods in nogil >>> mode, >>> AFAIR. >> >> Sure, but you cannot store cdef objects as struct attributes, array >> elements (you could implement it with reference counting, but not for >> nogil mode) > > You could do that with borrowed references, though, assuming that you keep > another reference around (or do your own ref-counting). However, I do see > that keeping a real reference around may be hard to do in some cases. > > >> and you cannot pass them around without the GIL. > > Yes, you can, as long as you only go through cdef functions. Obviously, you > can't pass them into a Python function call, but you can (and could, if it > was implemented) do loads of useful things with existing references even in > nogil sections. The GIL checker is quite fine grained already but could do > even better. > Ok, so cdef arguments are borrowed, which gets you somewhere but not very far. It's rather baffling that f(x) is fine in nogil mode, but y = x isn't. >> This >> proposal is about making your life easier without the GIL, and >> currently it's kind of a pain. > > The nogil sections I use are usually quite short, so I can't tell. It's > certainly a pain to work without the GIL, because it means you have to take > a lot more care when writing your code. But that won't change just by > dropping reference counting. And nogil code will definitely become another > bit harder to get right when using borrowed references. > > >> Ah I assumed cpdef nogil was invalid, I see it isn't, cool. > > It makes perfect sense. Just because a function *can* be called without the > GIL doesn't mean it can't be called from Python. So the Python wrapper > requires the GIL, but the underlying cdef function doesn't. > > >> This breaks terribly for special methods though. > > Why? It's just a matter of properly separating out their Python wrapper. > That's why I was referring to the DefNode refactoring. > I see, ok. All I meant was that it currently gives you compile errors. >>>> All of this functionality should also get a sane C API (to be provided >>>> by cython.h). You'd get a Cy_INCREF(obj, have_gil)/Cy_DECREF() etc. >>>> Every class using this functionality is a subclass of CythonObject >>>> (that contains a PyObject + an acquisition count + a lock). Perhaps if >>>> the user is subclassing something other than object we could allow the >>>> user to specify custom __cython_(un)lock__ and >>>> __cython_acquisition_count__ methods and fields. >>>> >>>> Now, building on top of this functionality, Cython could provide >>>> built-in nogil-compatible types, like lists, dicts and maybe tuples >>>> (as a start). These will by default not lock for operations to allow >>>> e.g. one thread to iterate over the list and another thread to index >>>> it without lock contention and other general overhead. If one thread >>>> is somehow changing the size of the list, or writing to indices that >>>> another thread is reading from/writing to, the results will of course >>>> be undefined unless the user synchronizes on the object. So it would >>>> be the user's responsibility. The acquisition counting itself will >>>> always be thread-safe (i.e., it will be atomic if possible, otherwise >>>> it will lock). >>>> >>>> It's probably best to not enable this functionality by default as it >>>> would be more expensive to instantiate objects, but it could be >>>> supported through a cdef class decorator and a general directive. >>> >>> It's well known that this would be expensive. One of the approaches that >>> tried to get rid of the GIL in CPython introduced fine grained locking, >>> and >>> it turned out to be substantially slower, AFAIR by a factor of two. >> >> Sure, I am aware of that. Often you can just keep the GIL, in which >> case you wouldn't use these types. But when you want to leave the >> shiny world of the GIL you still want these goodies. Acquiring the GIL >> is too expensive as there is pretty much always contention. > > Acquiring a more fine grained lock is more likely to reduce the contention, > but is not necessarily less expensive. The lock still needs to get acquired > and released. GIL protected reference counting is a lot cheaper than that, > as is manual locking in a more coarse grained fashion. > Well, many processors support atomic incrementing and decrementing counters + checking whether the counter has reached zero. So for most architectures you wouldn't need to lock for the counting (unless you reach a count of zero and you're going to decref your object). Any operation would lock though, which would indeed be expensive. >>> You could potentially drop the locking for local variables, but you'd >>> loose >>> that ability as soon as the 'object' is passed into a function. >> >> Definitely, but you cannot use them with the GIL anyway :) > > Yes you can. For cdef functions, it's the responsibility of the caller to > own the references of object arguments it passes. The called function > doesn't have to do reference counting for them, as long as it doesn't try to > reassign the variable. And even that could be fixed with borrowed > references, and also partly by better control flow analysis. > Sorry, with "use" I mean "actually do something", like call a method, lookup an attribute, coerce it, etc. >>> Basically, what you are trying to do here is to duplicate the complete >>> ref-counting infrastructure of CPython, but without using CPython. >>> >>>> Of course one may still use non-cdef borrowed objects, by simply >>>> casting to a PyObject *. >>> >>> That's very ugly, though, because you loose all access to methods and >>> attributes of the object. Basically, it becomes useless that way, except >>> for >>> storing away a pointer to it somewhere. You could just as well use a >>> void*. >> >> Indeed, and that's really all you can do without the GIL. > > I think you're underestimating what can (or could) be done without holding > the GIL. There are still some open features that wait for being implemented, > even without adding new syntax (and thus further increasing the complexity > of the language). > Yeah borrowed references definitely somewhere. It's just that for supporting the parallel types that wouldn't be good enough. >> I think >> we're talking about different things, I'm talking about supporting >> nogil, and you're talking about borrowed references in general. > > Both are related, though. It's certainly a lot easier and cleaner to support > borrowed references in the compiler, than to implement a whole new scheme > for handling extension type instances in addition to the normal object > handling which we need anyway. > > >> I'm >> not sure why you'd not just take a reference instead in GIL mode, >> unless you were worried about incrementing a counter. > > Decrementing it, not incrementing. :) > > The problem is not so much the INCREF (which is just an indirect add), it's > the DECREF, which contains a conditional jump based on an unknown external > value, that may trigger external code. That can kill several C compiler > optimisations for the surrounding code. (And that would only get worse by > using a dedicated locking mechanism.) > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > Anyway, sorry for the long mail. I agree this is likely not feasible to implement, although I would like the functionality to be there. Perhaps I'm trying to solve problems which don't really need to be solved. Maybe we should just use multiprocessing, or MPI and numpy with global arrays and pickling. Maybe memoryviews could help out with that as well. From stefan_ml at behnel.de Tue Oct 25 20:10:51 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 25 Oct 2011 20:10:51 +0200 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: References: <4EA66648.8030102@behnel.de> <4EA69BE0.3060307@behnel.de> Message-ID: <4EA6FBAB.5070301@behnel.de> mark florisson, 25.10.2011 18:58: > On 25 October 2011 12:22, Stefan Behnel wrote: >> mark florisson, 25.10.2011 11:11: >>> On 25 October 2011 08:33, Stefan Behnel wrote: >>>> mark florisson, 24.10.2011 21:50: >>>>> What if we support acquisition counting for every instance of a cdef >>>>> class? In Python and Cython GIL mode you use reference counting, and >>>>> in Cython nogil mode and for structs attributes, array dtypes etc you >>>>> use acquisition counting. This allows you to pass around cdef objects >>>>> without the GIL and use their nogil methods. If the acquisition count >>>>> is greater than 1, the acquisition count owns a reference to the >>>>> object. If it reaches 0 you discard your owned reference (you can >>>>> simply acquire the GIL if you don't have it) and when you increment >>>>> from zero you obtain it. Perhaps something like libatomic could be >>>>> used to efficiently implement this. >>>> >>>> Where would you store that count? In the object struct? That would >>>> increase the size of each instance. >>> >>> Yes, not just the count, also the lock. This feature would be optional >>> and may be very useful for people (I think). >> >> Well, as long as it's an optional feature that requires a class decorator, >> the only obvious drawback is that it'll bloat the compiler even more than it >> is already. > > Actually, I think it will help the implementation of mutexes and async > objects if we want those, and possibly other stuff in the future. If all you want is to support the regular with statement in nogil blocks, part of that is implemented already. I recently added support for implementing the context manager's __enter__() method as c(p)def method. However, __exit__() isn't there yet, as it's a bit more tricky - maybe taking off a C pointer to the cdef method and calling that, or calling the cdef method directly instead (not sure), but always making sure that there still is a reference to the context manager itself, and eventually freeing it. I'm sure it can be done, though, maybe with some restrictions in nogil mode. If we additionally fix it up to use the exception propagation and try-finally support that you wrote for the with-gil feature, we're basically there. > The > acquisition counting is basically already there (for memoryviews), so > it's easy to track down where and when to apply this. However one > major problem would be circular acquisition counts, so you'd also have > to implement a garbage collector like CPython has (e.g. if you have a > cdef class with a cython.parallel.dict). We should just have a real > garbage collector instead of all the counting crap. Or we could make > it a burden for the user... Right, these things can grow endlessly. It took CPython something like a dozen years to a) recognise the need for and b) implement a garbage collector. Let's hope that Cython will never get one. > I agree that this is really not as feasible as I first thought. It > actually shows me a problem where I can have a memoryview object in a > memoryview with dtype 'object', although the problem here is that the > memoryview object doesn't traverse the object in the Py_buffer, or > when coerced from a memoryview slice to a memoryview object, the > memoryview slice struct object... I suppose I need to fix that (but > I'm not sure how, as you can't provide a manual traverse function in > Cython). No, you may have to descend into C here. Or, you could disable a Python object dtype for the time being? > But I really believe that these are much-wanted features. If you're > using threads in Python you can only get concurrency not parallelism, > unless you release the GIL, even if there is some performance overhead > it will still be a lot better than sequential execution. Perhaps when > cython.parallel will be more mature, we may get functionality to > specify data distribution schemes and message passing, in which case > the GIL won't be a problem. But many things would be harder or much > more expensive, e.g. transposing, sending objects etc. See? That's what I mean with language complexity. These things quickly turn into an open can of worms. I don't think the language should handle any of these. Message passing is up to libraries, for example. If you want language support, use Erlang. >>>>> The advantages are: >>>>> >>>>> 1) allow users to pass around cdef typed objects in nogil mode >>>>> 2) allow cdef typed objects in as struct attributes or array elements >>>>> 3) make it easy to implement things like memoryviews (already done but >>>>> would have been a lot easier), cython.parallel.async/future objects, >>>>> cython.parallel.mutex objects and possibly other things in the future >>>> >>>> Would it really be easier? You can already call cdef methods in nogil >>>> mode, >>>> AFAIR. >>> >>> Sure, but you cannot store cdef objects as struct attributes, array >>> elements (you could implement it with reference counting, but not for >>> nogil mode) >> >> You could do that with borrowed references, though, assuming that you keep >> another reference around (or do your own ref-counting). However, I do see >> that keeping a real reference around may be hard to do in some cases. >> >> >>> and you cannot pass them around without the GIL. >> >> Yes, you can, as long as you only go through cdef functions. Obviously, you >> can't pass them into a Python function call, but you can (and could, if it >> was implemented) do loads of useful things with existing references even in >> nogil sections. The GIL checker is quite fine grained already but could do >> even better. >> > > Ok, so cdef arguments are borrowed, which gets you somewhere but not > very far. It's rather baffling that f(x) is fine in nogil mode, but y > = x isn't. "y = x" could work if it's using borrowed references, though. The "borrowed" flag could be inferred automatically in nogil mode. Then it would only be an error if the user explicitly declared it as owned. >>> This >>> proposal is about making your life easier without the GIL, and >>> currently it's kind of a pain. >> >> The nogil sections I use are usually quite short, so I can't tell. It's >> certainly a pain to work without the GIL, because it means you have to take >> a lot more care when writing your code. But that won't change just by >> dropping reference counting. And nogil code will definitely become another >> bit harder to get right when using borrowed references. >> >> >>> Ah I assumed cpdef nogil was invalid, I see it isn't, cool. >> >> It makes perfect sense. Just because a function *can* be called without the >> GIL doesn't mean it can't be called from Python. So the Python wrapper >> requires the GIL, but the underlying cdef function doesn't. >> >> >>> This breaks terribly for special methods though. >> >> Why? It's just a matter of properly separating out their Python wrapper. >> That's why I was referring to the DefNode refactoring. > > I see, ok. All I meant was that it currently gives you compile errors. I know. I've given ticket #3 enough (smaller) tries to know basically all problems by now. > Anyway, sorry for the long mail. I agree this is likely not feasible > to implement, although I would like the functionality to be there. > Perhaps I'm trying to solve problems which don't really need to be > solved. Maybe we should just use multiprocessing, or MPI and numpy > with global arrays and pickling. Maybe memoryviews could help out with > that as well. In any case, I think we should let the existing features settle for a while, and see what users come up with. Not every feature that *can* be done is worth making a language feature. Stefan From markflorisson88 at gmail.com Tue Oct 25 20:45:46 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 25 Oct 2011 19:45:46 +0100 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: <4EA6FBAB.5070301@behnel.de> References: <4EA66648.8030102@behnel.de> <4EA69BE0.3060307@behnel.de> <4EA6FBAB.5070301@behnel.de> Message-ID: On 25 October 2011 19:10, Stefan Behnel wrote: > mark florisson, 25.10.2011 18:58: >> >> On 25 October 2011 12:22, Stefan Behnel wrote: >>> >>> mark florisson, 25.10.2011 11:11: >>>> >>>> On 25 October 2011 08:33, Stefan Behnel wrote: >>>>> >>>>> mark florisson, 24.10.2011 21:50: >>>>>> >>>>>> What if we support acquisition counting for every instance of a cdef >>>>>> class? In Python and Cython GIL mode you use reference counting, and >>>>>> in Cython nogil mode and for structs attributes, array dtypes etc you >>>>>> use acquisition counting. This allows you to pass around cdef objects >>>>>> without the GIL and use their nogil methods. If the acquisition count >>>>>> is greater than 1, the acquisition count owns a reference to the >>>>>> object. If it reaches 0 you discard your owned reference (you can >>>>>> simply acquire the GIL if you don't have it) and when you increment >>>>>> from zero you obtain it. Perhaps something like libatomic could be >>>>>> used to efficiently implement this. >>>>> >>>>> Where would you store that count? In the object struct? That would >>>>> increase the size of each instance. >>>> >>>> Yes, not just the count, also the lock. This feature would be optional >>>> and may be very useful for people (I think). >>> >>> Well, as long as it's an optional feature that requires a class >>> decorator, >>> the only obvious drawback is that it'll bloat the compiler even more than >>> it >>> is already. >> >> Actually, I think it will help the implementation of mutexes and async >> objects if we want those, and possibly other stuff in the future. > > If all you want is to support the regular with statement in nogil blocks, > part of that is implemented already. I recently added support for > implementing the context manager's __enter__() method as c(p)def method. > However, __exit__() isn't there yet, as it's a bit more tricky - maybe > taking off a C pointer to the cdef method and calling that, or calling the > cdef method directly instead (not sure), but always making sure that there > still is a reference to the context manager itself, and eventually freeing > it. I'm sure it can be done, though, maybe with some restrictions in nogil > mode. If we additionally fix it up to use the exception propagation and > try-finally support that you wrote for the with-gil feature, we're basically > there. > Cool. I suppose if you combine that with borrowed references you may just get somewhere implementing the mutexes. On the other hand it won't really be more convenient than passing OpenMP or Python locks around, just slightly more pythonic. >> The >> acquisition counting is basically already there (for memoryviews), so >> it's easy to track down where and when to apply this. However one >> major problem would be circular acquisition counts, so you'd also have >> to implement a garbage collector like CPython has (e.g. if you have a >> cdef class with a cython.parallel.dict). We should just have a real >> garbage collector instead of all the counting crap. Or we could make >> it a burden for the user... > > Right, these things can grow endlessly. It took CPython something like a > dozen years to a) recognise the need for and b) implement a garbage > collector. Let's hope that Cython will never get one. > > >> I agree that this is really not as feasible as I first thought. It >> actually shows me a problem where I can have a memoryview object in a >> memoryview with dtype 'object', although the problem here is that the >> memoryview object doesn't traverse the object in the Py_buffer, or >> when coerced from a memoryview slice to a memoryview object, the >> memoryview slice struct object... I suppose I need to fix that (but >> I'm not sure how, as you can't provide a manual traverse function in >> Cython). > > No, you may have to descend into C here. Or, you could disable a Python > object dtype for the time being? > Yes disabling would be easy, but it should be fixed (at some point). Perhaps I can just override the tp_traverse of the type object in the module init function (and maybe save that pointer and call it from the new function + traverse the Py_buffer). I'm not entire sure how we support Py_buffer, but it is a built-in thing and it doesn't result in a traverse: cdef class X(object): cdef Py_buffer view <- this won't have a traverse function. Fixing that won't get me there though, I need to do the same thing for memoryview objects wrapping a memoryview struct. >> But I really believe that these are much-wanted features. If you're >> using threads in Python you can only get concurrency not parallelism, >> unless you release the GIL, even if there is some performance overhead >> it will still be a lot better than sequential execution. Perhaps when >> cython.parallel will be more mature, we may get functionality to >> specify data distribution schemes and message passing, in which case >> the GIL won't be a problem. But many things would be harder or much >> more expensive, e.g. transposing, sending objects etc. > > See? That's what I mean with language complexity. These things quickly turn > into an open can of worms. I don't think the language should handle any of > these. Message passing is up to libraries, for example. If you want language > support, use Erlang. > I haven't used Erlang (though I should give it a go), but I find that built-in support for these things just ends up to be much more elegant. MPI (and possibly zeromq) just look terrible and complicated if you compare them to Unified Parallel C, High Performance Fortran or Co-Array Fortran. I don't know about Go channels. This doesn't mean that we should support it, but we might consider it. >>>>>> The advantages are: >>>>>> >>>>>> 1) allow users to pass around cdef typed objects in nogil mode >>>>>> 2) allow cdef typed objects in as struct attributes or array elements >>>>>> 3) make it easy to implement things like memoryviews (already done but >>>>>> would have been a lot easier), cython.parallel.async/future objects, >>>>>> cython.parallel.mutex objects and possibly other things in the future >>>>> >>>>> Would it really be easier? You can already call cdef methods in nogil >>>>> mode, >>>>> AFAIR. >>>> >>>> Sure, but you cannot store cdef objects as struct attributes, array >>>> elements (you could implement it with reference counting, but not for >>>> nogil mode) >>> >>> You could do that with borrowed references, though, assuming that you >>> keep >>> another reference around (or do your own ref-counting). However, I do see >>> that keeping a real reference around may be hard to do in some cases. >>> >>> >>>> and you cannot pass them around without the GIL. >>> >>> Yes, you can, as long as you only go through cdef functions. Obviously, >>> you >>> can't pass them into a Python function call, but you can (and could, if >>> it >>> was implemented) do loads of useful things with existing references even >>> in >>> nogil sections. The GIL checker is quite fine grained already but could >>> do >>> even better. >>> >> >> Ok, so cdef arguments are borrowed, which gets you somewhere but not >> very far. It's rather baffling that f(x) is fine in nogil mode, but y >> = x isn't. > > "y = x" could work if it's using borrowed references, though. The "borrowed" > flag could be inferred automatically in nogil mode. Then it would only be an > error if the user explicitly declared it as owned. > I think inferring that would be hard, unless x is already borrowed. E.g. I could do 'with gil: x = None' and it might break. It would be cool if it could detect that though. >>>> This >>>> proposal is about making your life easier without the GIL, and >>>> currently it's kind of a pain. >>> >>> The nogil sections I use are usually quite short, so I can't tell. It's >>> certainly a pain to work without the GIL, because it means you have to >>> take >>> a lot more care when writing your code. But that won't change just by >>> dropping reference counting. And nogil code will definitely become >>> another >>> bit harder to get right when using borrowed references. >>> >>> >>>> Ah I assumed cpdef nogil was invalid, I see it isn't, cool. >>> >>> It makes perfect sense. Just because a function *can* be called without >>> the >>> GIL doesn't mean it can't be called from Python. So the Python wrapper >>> requires the GIL, but the underlying cdef function doesn't. >>> >>> >>>> This breaks terribly for special methods though. >>> >>> Why? It's just a matter of properly separating out their Python wrapper. >>> That's why I was referring to the DefNode refactoring. >> >> I see, ok. All I meant was that it currently gives you compile errors. > > I know. I've given ticket #3 enough (smaller) tries to know basically all > problems by now. > > >> Anyway, sorry for the long mail. I agree this is likely not feasible >> to implement, although I would like the functionality to be there. >> Perhaps I'm trying to solve problems which don't really need to be >> solved. Maybe we should just use multiprocessing, or MPI and numpy >> with global arrays and pickling. Maybe memoryviews could help out with >> that as well. > > In any case, I think we should let the existing features settle for a while, > and see what users come up with. Not every feature that *can* be done is > worth making a language feature. Definitely. It's just that you regularly find users who want to do things in nogil mode that they just can't. Or to have arrays or some such of objects etc. Lettings things settle is a good idea. > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From d.s.seljebotn at astro.uio.no Tue Oct 25 21:01:00 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 25 Oct 2011 21:01:00 +0200 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: References: <4EA66648.8030102@behnel.de> <4EA69BE0.3060307@behnel.de> Message-ID: <4EA7076C.1090708@astro.uio.no> On 10/25/2011 06:58 PM, mark florisson wrote: > On 25 October 2011 12:22, Stefan Behnel wrote: >> The problem is not so much the INCREF (which is just an indirect add), it's >> the DECREF, which contains a conditional jump based on an unknown external >> value, that may trigger external code. That can kill several C compiler >> optimisations for the surrounding code. (And that would only get worse by >> using a dedicated locking mechanism.) What you could do is a form of psuedo-garbage-collection where, when the Cython refcount/acquisition count reaches 0, you enqueue a Python DECREF until you're holding the GIL anyway. If sticking it into the queue is unlikely(), and it is transparent to the compiler that it doesn't dispatch into unknown code. (And regarding Stefan's comment about Erlang: It's all about available libraries. A language for concurrent computing running on CPython and able to use all the libraries available for CPython would be awesome. It doesn't need to be named Cython -- show me an Erlang port to the CPython platform and I'd perhaps jump ship.) > Anyway, sorry for the long mail. I agree this is likely not feasible > to implement, although I would like the functionality to be there. > Perhaps I'm trying to solve problems which don't really need to be > solved. Maybe we should just use multiprocessing, or MPI and numpy > with global arrays and pickling. Maybe memoryviews could help out with > that as well. Nice conclusion. I think prange was a very nice 80%-there-solution (which is also the way we framed it when starting), but the GIL just creates to many barriers. Real garbage collection is needed, and CPython just isn't there. What I'd like to see personally is: - A convenient utility to allocate an array in shared memory, so that when you pickle a view of it and send it to another Python process with multiprocessing and it unpickles, it gets a slice into to the same shared memory. People already do this but it's just a lot of jumping through hoops. A good place would probably be in NumPy. - Decent message passing using ZeroMQ in Cython code without any Python overhead, for fine-grained communication in Cython code in Python processes spawned using multiprocessing. I think this requires some syntax candy in Cython to feel natural enough, but perhaps it can be put on a form so that it is not ZeroMQ-specific. Dag Sverre From d.s.seljebotn at astro.uio.no Tue Oct 25 21:02:19 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 25 Oct 2011 21:02:19 +0200 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: <4EA7076C.1090708@astro.uio.no> References: <4EA66648.8030102@behnel.de> <4EA69BE0.3060307@behnel.de> <4EA7076C.1090708@astro.uio.no> Message-ID: <4EA707BB.9060408@astro.uio.no> On 10/25/2011 09:01 PM, Dag Sverre Seljebotn wrote: > On 10/25/2011 06:58 PM, mark florisson wrote: >> On 25 October 2011 12:22, Stefan Behnel wrote: >>> The problem is not so much the INCREF (which is just an indirect >>> add), it's >>> the DECREF, which contains a conditional jump based on an unknown >>> external >>> value, that may trigger external code. That can kill several C compiler >>> optimisations for the surrounding code. (And that would only get >>> worse by >>> using a dedicated locking mechanism.) > > What you could do is a form of psuedo-garbage-collection where, when the > Cython refcount/acquisition count reaches 0, you enqueue a Python DECREF > until you're holding the GIL anyway. If sticking it into the queue is > unlikely(), and it is transparent to the compiler that it doesn't > dispatch into unknown code. ...then the C compiler optimizations should presumably not be killed. DS > > (And regarding Stefan's comment about Erlang: It's all about available > libraries. A language for concurrent computing running on CPython and > able to use all the libraries available for CPython would be awesome. It > doesn't need to be named Cython -- show me an Erlang port to the CPython > platform and I'd perhaps jump ship.) > > >> Anyway, sorry for the long mail. I agree this is likely not feasible >> to implement, although I would like the functionality to be there. >> Perhaps I'm trying to solve problems which don't really need to be >> solved. Maybe we should just use multiprocessing, or MPI and numpy >> with global arrays and pickling. Maybe memoryviews could help out with >> that as well. > > Nice conclusion. I think prange was a very nice 80%-there-solution > (which is also the way we framed it when starting), but the GIL just > creates to many barriers. Real garbage collection is needed, and CPython > just isn't there. > > What I'd like to see personally is: > > - A convenient utility to allocate an array in shared memory, so that > when you pickle a view of it and send it to another Python process with > multiprocessing and it unpickles, it gets a slice into to the same > shared memory. People already do this but it's just a lot of jumping > through hoops. A good place would probably be in NumPy. > > - Decent message passing using ZeroMQ in Cython code without any Python > overhead, for fine-grained communication in Cython code in Python > processes spawned using multiprocessing. I think this requires some > syntax candy in Cython to feel natural enough, but perhaps it can be put > on a form so that it is not ZeroMQ-specific. > > Dag Sverre From d.s.seljebotn at astro.uio.no Tue Oct 25 21:15:26 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 25 Oct 2011 21:15:26 +0200 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: References: <4EA66648.8030102@behnel.de> <4EA69BE0.3060307@behnel.de> <4EA6FBAB.5070301@behnel.de> Message-ID: <4EA70ACE.3000401@astro.uio.no> On 10/25/2011 08:45 PM, mark florisson wrote: > On 25 October 2011 19:10, Stefan Behnel wrote: >> See? That's what I mean with language complexity. These things quickly turn >> into an open can of worms. I don't think the language should handle any of >> these. Message passing is up to libraries, for example. If you want language >> support, use Erlang. >> > > I haven't used Erlang (though I should give it a go), but I find that > built-in support for these things just ends up to be much more > elegant. MPI (and possibly zeromq) just look terrible and complicated > if you compare them to Unified Parallel C, High Performance Fortran or Using libraries for message passing is sort of like doing complex string manipulation only using malloc, free, and string.h :-) > Co-Array Fortran. I don't know about Go channels. This doesn't mean > that we should support it, but we might consider it. I think you should definitely read up on Go channels, they're just like what I'd like to write in Cython. Dag Sverre From markflorisson88 at gmail.com Tue Oct 25 21:24:13 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 25 Oct 2011 20:24:13 +0100 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: <4EA7076C.1090708@astro.uio.no> References: <4EA66648.8030102@behnel.de> <4EA69BE0.3060307@behnel.de> <4EA7076C.1090708@astro.uio.no> Message-ID: On 25 October 2011 20:01, Dag Sverre Seljebotn wrote: > On 10/25/2011 06:58 PM, mark florisson wrote: >> >> On 25 October 2011 12:22, Stefan Behnel ?wrote: >>> >>> The problem is not so much the INCREF (which is just an indirect add), >>> it's >>> the DECREF, which contains a conditional jump based on an unknown >>> external >>> value, that may trigger external code. That can kill several C compiler >>> optimisations for the surrounding code. (And that would only get worse by >>> using a dedicated locking mechanism.) > > What you could do is a form of psuedo-garbage-collection where, when the > Cython refcount/acquisition count reaches 0, you enqueue a Python DECREF > until you're holding the GIL anyway. If sticking it into the queue is > unlikely(), and it is transparent to the compiler that it doesn't dispatch > into unknown code. I thought about that as wel, but the problem is that you can only defer the DECREF to a garbage collector if your acquisition count reaches zero and your reference count is one. However, you may reach an acquisition count of zero with a reference count > 1, which means you could have the following race: 1) acquisition count reaches zero, a DECREF is pending in the garbage collector thread 2) you obtain a nonzero acquisition count from the object (e.g. by assigning a non-typed to a typed variable) 3) you lose your acquisition count again, another DECREF should be pending 4) the garbage collector figures out it needs to DECREF (it should actually do this twice) Now, you could keep a counter for how many times that happens, but that will likely not be better than an immediate DECREF. In short, reference counting is terrible. I think unlikely() will help the compiler here as you said though, and your processor will have branch prediction, out of order execution and conditional instructions which may all help. > (And regarding Stefan's comment about Erlang: It's all about available > libraries. A language for concurrent computing running on CPython and able > to use all the libraries available for CPython would be awesome. It doesn't > need to be named Cython -- show me an Erlang port to the CPython platform > and I'd perhaps jump ship.) > > >> Anyway, sorry for the long mail. I agree this is likely not feasible >> to implement, although I would like the functionality to be there. >> Perhaps I'm trying to solve problems which don't really need to be >> solved. Maybe we should just use multiprocessing, or MPI and numpy >> with global arrays and pickling. Maybe memoryviews could help out with >> that as well. > > Nice conclusion. I think prange was a very nice 80%-there-solution (which is > also the way we framed it when starting), but the GIL just creates to many > barriers. Real garbage collection is needed, and CPython just isn't there. > > What I'd like to see personally is: > > ?- A convenient utility to allocate an array in shared memory, so that when > you pickle a view of it and send it to another Python process with > multiprocessing and it unpickles, it gets a slice into to the same shared > memory. People already do this but it's just a lot of jumping through hoops. > A good place would probably be in NumPy. I haven't used it myself, but can the global array support help in that regard? > ?- Decent message passing using ZeroMQ in Cython code without any Python > overhead, for fine-grained communication in Cython code in Python processes > spawned using multiprocessing. I think this requires some syntax candy in > Cython to feel natural enough, but perhaps it can be put on a form so that > it is not ZeroMQ-specific. > > Dag Sverre > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From markflorisson88 at gmail.com Tue Oct 25 21:24:49 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 25 Oct 2011 20:24:49 +0100 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: <4EA70ACE.3000401@astro.uio.no> References: <4EA66648.8030102@behnel.de> <4EA69BE0.3060307@behnel.de> <4EA6FBAB.5070301@behnel.de> <4EA70ACE.3000401@astro.uio.no> Message-ID: On 25 October 2011 20:15, Dag Sverre Seljebotn wrote: > On 10/25/2011 08:45 PM, mark florisson wrote: >> >> On 25 October 2011 19:10, Stefan Behnel ?wrote: >>> >>> See? That's what I mean with language complexity. These things quickly >>> turn >>> into an open can of worms. I don't think the language should handle any >>> of >>> these. Message passing is up to libraries, for example. If you want >>> language >>> support, use Erlang. >>> >> >> I haven't used Erlang (though I should give it a go), but I find that >> built-in support for these things just ends up to be much more >> elegant. MPI (and possibly zeromq) just look terrible and complicated >> if you compare them to Unified Parallel C, High Performance Fortran or > > Using libraries for message passing is sort of like doing complex string > manipulation only using malloc, free, and string.h :-) > >> Co-Array Fortran. I don't know about Go channels. This doesn't mean >> that we should support it, but we might consider it. > > I think you should definitely read up on Go channels, they're just like what > I'd like to write in Cython. That's a good motivator :) I'll do that. > Dag Sverre > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From greg.ewing at canterbury.ac.nz Wed Oct 26 00:27:19 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 26 Oct 2011 11:27:19 +1300 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: <4EA6B999.7060807@astro.uio.no> References: <4EA66648.8030102@behnel.de> <4EA6B999.7060807@astro.uio.no> Message-ID: <4EA737C7.5010500@canterbury.ac.nz> Dag Sverre Seljebotn wrote: > I'd gladly take a factor two (or even four) slowdown of CPython code any > day to get rid of the GIL :-). The thing is, sometimes one has 48 cores > and consider a 10x speedup better than nothing... Another thing to consider is that locking around refcount changes may not be as expensive in typical Cython code as it is in Python. The trouble with Python is that you can't so much as scratch your nose without touching a big pile of ref counts. But if the Cython code is only dealing with a few Python objects and doing most of its work at the C level, the relative overhead of locking around refcount changes may not be significant. So it may be worth trying the strategy of just acquiring the GIL whenever a refcount needs to be changed in a nogil section, and damn the consequences. -- Greg From stefan_ml at behnel.de Wed Oct 26 09:56:35 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 26 Oct 2011 09:56:35 +0200 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: <4EA737C7.5010500@canterbury.ac.nz> References: <4EA66648.8030102@behnel.de> <4EA6B999.7060807@astro.uio.no> <4EA737C7.5010500@canterbury.ac.nz> Message-ID: <4EA7BD33.60009@behnel.de> Greg Ewing, 26.10.2011 00:27: > Dag Sverre Seljebotn wrote: > >> I'd gladly take a factor two (or even four) slowdown of CPython code any >> day to get rid of the GIL :-). The thing is, sometimes one has 48 cores >> and consider a 10x speedup better than nothing... > > Another thing to consider is that locking around refcount > changes may not be as expensive in typical Cython code as > it is in Python. > > The trouble with Python is that you can't so much as scratch > your nose without touching a big pile of ref counts. But > if the Cython code is only dealing with a few Python objects > and doing most of its work at the C level, the relative > overhead of locking around refcount changes may not be > significant. > > So it may be worth trying the strategy of just acquiring > the GIL whenever a refcount needs to be changed in a nogil > section, and damn the consequences. Hmm, interesting. That would give new semantics to "nogil" sections, basically: """ You can do Python interaction in nogil code, however, this will slow down your code. Cython will generate C code to acquire and release the GIL around any Python interaction that your code performs, thus serialising any calls into the CPython runtime. If you want to avoid this serialisation, use "cython -a" to find out where Python interaction happens and use static typing to let Cython generate C code instead. """ In other words: "with gil" sections hold the GIL by default and give it away on explicit request, whereas "nogil" sections have the GIL released by default and acquire it on implicit need. The advantage over object level locking is that this does not increase the in-memory size of the object structs, and that it works with *any* Python object, not just extension types with a compile time known type. I kind of like that. Stefan From markflorisson88 at gmail.com Wed Oct 26 11:45:06 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Wed, 26 Oct 2011 10:45:06 +0100 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: <4EA7BD33.60009@behnel.de> References: <4EA66648.8030102@behnel.de> <4EA6B999.7060807@astro.uio.no> <4EA737C7.5010500@canterbury.ac.nz> <4EA7BD33.60009@behnel.de> Message-ID: On 26 October 2011 08:56, Stefan Behnel wrote: > Greg Ewing, 26.10.2011 00:27: >> >> Dag Sverre Seljebotn wrote: >> >>> I'd gladly take a factor two (or even four) slowdown of CPython code any >>> day to get rid of the GIL :-). The thing is, sometimes one has 48 cores >>> and consider a 10x speedup better than nothing... >> >> Another thing to consider is that locking around refcount >> changes may not be as expensive in typical Cython code as >> it is in Python. >> >> The trouble with Python is that you can't so much as scratch >> your nose without touching a big pile of ref counts. But >> if the Cython code is only dealing with a few Python objects >> and doing most of its work at the C level, the relative >> overhead of locking around refcount changes may not be >> significant. >> >> So it may be worth trying the strategy of just acquiring >> the GIL whenever a refcount needs to be changed in a nogil >> section, and damn the consequences. > > Hmm, interesting. That would give new semantics to "nogil" sections, > basically: > > """ > You can do Python interaction in nogil code, however, this will slow down > your code. Cython will generate C code to acquire and release the GIL around > any Python interaction that your code performs, thus serialising any calls > into the CPython runtime. If you want to avoid this serialisation, use > "cython -a" to find out where Python interaction happens and use static > typing to let Cython generate C code instead. > """ > > In other words: "with gil" sections hold the GIL by default and give it away > on explicit request, whereas "nogil" sections have the GIL released by > default and acquire it on implicit need. > > The advantage over object level locking is that this does not increase the > in-memory size of the object structs, and that it works with *any* Python > object, not just extension types with a compile time known type. > > I kind of like that. My problem with that is that if there if any other python thread, you're likely just going to sleep for thousands of CPU cycles as that thread will keep the GIL. Doing this implicitly for operations with such overhead would be unacceptable. I think writing 'with gil:' is fine, it's the performance that's the problem in the first place which prevents you from doing that, not the 9 characters you need to type. What I would like is having Cython infer whether the GIL is needed for a function, and mark it "implicitly nogil", so it can be called from nogil contexts without actually having to declare it nogil. This would only work for non-extern things, and you would still need to declare it nogil in your pxd if you want to export it. Apparently many users (even those that have used Cython quite a bit) are confused with what nogil on functions actually does (or they are not even aware it exists). > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From d.s.seljebotn at astro.uio.no Wed Oct 26 12:23:15 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 26 Oct 2011 12:23:15 +0200 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: References: <4EA66648.8030102@behnel.de> <4EA6B999.7060807@astro.uio.no> <4EA737C7.5010500@canterbury.ac.nz> <4EA7BD33.60009@behnel.de> Message-ID: <4EA7DF93.6090700@astro.uio.no> On 10/26/2011 11:45 AM, mark florisson wrote: > On 26 October 2011 08:56, Stefan Behnel wrote: >> Greg Ewing, 26.10.2011 00:27: >>> >>> Dag Sverre Seljebotn wrote: >>> >>>> I'd gladly take a factor two (or even four) slowdown of CPython code any >>>> day to get rid of the GIL :-). The thing is, sometimes one has 48 cores >>>> and consider a 10x speedup better than nothing... >>> >>> Another thing to consider is that locking around refcount >>> changes may not be as expensive in typical Cython code as >>> it is in Python. >>> >>> The trouble with Python is that you can't so much as scratch >>> your nose without touching a big pile of ref counts. But >>> if the Cython code is only dealing with a few Python objects >>> and doing most of its work at the C level, the relative >>> overhead of locking around refcount changes may not be >>> significant. >>> >>> So it may be worth trying the strategy of just acquiring >>> the GIL whenever a refcount needs to be changed in a nogil >>> section, and damn the consequences. >> >> Hmm, interesting. That would give new semantics to "nogil" sections, >> basically: >> >> """ >> You can do Python interaction in nogil code, however, this will slow down >> your code. Cython will generate C code to acquire and release the GIL around >> any Python interaction that your code performs, thus serialising any calls >> into the CPython runtime. If you want to avoid this serialisation, use >> "cython -a" to find out where Python interaction happens and use static >> typing to let Cython generate C code instead. >> """ >> >> In other words: "with gil" sections hold the GIL by default and give it away >> on explicit request, whereas "nogil" sections have the GIL released by >> default and acquire it on implicit need. >> >> The advantage over object level locking is that this does not increase the >> in-memory size of the object structs, and that it works with *any* Python >> object, not just extension types with a compile time known type. >> >> I kind of like that. > > My problem with that is that if there if any other python thread, > you're likely just going to sleep for thousands of CPU cycles as that > thread will keep the GIL. Doing this implicitly for operations with > such overhead would be unacceptable. I think writing 'with gil:' is > fine, it's the performance that's the problem in the first place which > prevents you from doing that, not the 9 characters you need to type. You are sure about the complete impossibility of having a seperate thread doing all INCREFs and DECREFs posted to it asynchronously (in the order they are posted), without race conditions? > > What I would like is having Cython infer whether the GIL is needed for > a function, and mark it "implicitly nogil", so it can be called from > nogil contexts without actually having to declare it nogil. This would > only work for non-extern things, and you would still need to declare > it nogil in your pxd if you want to export it. Apparently many users > (even those that have used Cython quite a bit) are confused with what > nogil on functions actually does (or they are not even aware it > exists). There's a long thread by me and Robert (and some of Stefan) on this from a couple of months back, don't know if you read it. You could support exports across pxds as well. Basically for *every* cdef function, export two function pointers: 1) To a wrapper to be called if you hold the GIL (outside nogil sections) 2) To a wrapper to be called if you don't hold the GIL, or don't know whether you hold the GIL (the wrapper can acquire the GIL if needed) Taking the address of a function (for passing to C, e.g.) would give you the one that can be called without holding the GIL. The implications should hopefully be getting rid of "with gil" and "nogil" on function declarations entirely. Dag Sverre From d.s.seljebotn at astro.uio.no Wed Oct 26 12:29:18 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 26 Oct 2011 12:29:18 +0200 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: References: <4EA66648.8030102@behnel.de> <4EA6B999.7060807@astro.uio.no> <4EA737C7.5010500@canterbury.ac.nz> <4EA7BD33.60009@behnel.de> Message-ID: <4EA7E0FE.1020605@astro.uio.no> On 10/26/2011 11:45 AM, mark florisson wrote: > On 26 October 2011 08:56, Stefan Behnel wrote: >> Greg Ewing, 26.10.2011 00:27: >>> >>> Dag Sverre Seljebotn wrote: >>> >>>> I'd gladly take a factor two (or even four) slowdown of CPython code any >>>> day to get rid of the GIL :-). The thing is, sometimes one has 48 cores >>>> and consider a 10x speedup better than nothing... >>> >>> Another thing to consider is that locking around refcount >>> changes may not be as expensive in typical Cython code as >>> it is in Python. >>> >>> The trouble with Python is that you can't so much as scratch >>> your nose without touching a big pile of ref counts. But >>> if the Cython code is only dealing with a few Python objects >>> and doing most of its work at the C level, the relative >>> overhead of locking around refcount changes may not be >>> significant. >>> >>> So it may be worth trying the strategy of just acquiring >>> the GIL whenever a refcount needs to be changed in a nogil >>> section, and damn the consequences. >> >> Hmm, interesting. That would give new semantics to "nogil" sections, >> basically: >> >> """ >> You can do Python interaction in nogil code, however, this will slow down >> your code. Cython will generate C code to acquire and release the GIL around >> any Python interaction that your code performs, thus serialising any calls >> into the CPython runtime. If you want to avoid this serialisation, use >> "cython -a" to find out where Python interaction happens and use static >> typing to let Cython generate C code instead. >> """ >> >> In other words: "with gil" sections hold the GIL by default and give it away >> on explicit request, whereas "nogil" sections have the GIL released by >> default and acquire it on implicit need. >> >> The advantage over object level locking is that this does not increase the >> in-memory size of the object structs, and that it works with *any* Python >> object, not just extension types with a compile time known type. >> >> I kind of like that. > > My problem with that is that if there if any other python thread, > you're likely just going to sleep for thousands of CPU cycles as that > thread will keep the GIL. Doing this implicitly for operations with > such overhead would be unacceptable. I think writing 'with gil:' is > fine, it's the performance that's the problem in the first place which > prevents you from doing that, not the 9 characters you need to type. I'm with Stefan here. We have more or less the exact same problem if you inadvertendly do arithmetic with Python floats rather than C doubles. The workflow then is to check the HTML for yellow lines. Same with the GIL (we could even introduce a new color in the HTML report for where you hold the GIL and not). The advice to get fast code is But, we should also introduce directives that emit warnings in both of these situations, that you can use while developing to quickly pinpoint source code lines ("Type of variable not inferred", "GIL automatically acquired"). DS > > What I would like is having Cython infer whether the GIL is needed for > a function, and mark it "implicitly nogil", so it can be called from > nogil contexts without actually having to declare it nogil. This would > only work for non-extern things, and you would still need to declare > it nogil in your pxd if you want to export it. Apparently many users > (even those that have used Cython quite a bit) are confused with what > nogil on functions actually does (or they are not even aware it > exists). > >> Stefan >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From d.s.seljebotn at astro.uio.no Wed Oct 26 12:30:11 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 26 Oct 2011 12:30:11 +0200 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: <4EA7E0FE.1020605@astro.uio.no> References: <4EA66648.8030102@behnel.de> <4EA6B999.7060807@astro.uio.no> <4EA737C7.5010500@canterbury.ac.nz> <4EA7BD33.60009@behnel.de> <4EA7E0FE.1020605@astro.uio.no> Message-ID: <4EA7E133.3000806@astro.uio.no> On 10/26/2011 12:29 PM, Dag Sverre Seljebotn wrote: > On 10/26/2011 11:45 AM, mark florisson wrote: >> On 26 October 2011 08:56, Stefan Behnel wrote: >>> Greg Ewing, 26.10.2011 00:27: >>>> >>>> Dag Sverre Seljebotn wrote: >>>> >>>>> I'd gladly take a factor two (or even four) slowdown of CPython >>>>> code any >>>>> day to get rid of the GIL :-). The thing is, sometimes one has 48 >>>>> cores >>>>> and consider a 10x speedup better than nothing... >>>> >>>> Another thing to consider is that locking around refcount >>>> changes may not be as expensive in typical Cython code as >>>> it is in Python. >>>> >>>> The trouble with Python is that you can't so much as scratch >>>> your nose without touching a big pile of ref counts. But >>>> if the Cython code is only dealing with a few Python objects >>>> and doing most of its work at the C level, the relative >>>> overhead of locking around refcount changes may not be >>>> significant. >>>> >>>> So it may be worth trying the strategy of just acquiring >>>> the GIL whenever a refcount needs to be changed in a nogil >>>> section, and damn the consequences. >>> >>> Hmm, interesting. That would give new semantics to "nogil" sections, >>> basically: >>> >>> """ >>> You can do Python interaction in nogil code, however, this will slow >>> down >>> your code. Cython will generate C code to acquire and release the GIL >>> around >>> any Python interaction that your code performs, thus serialising any >>> calls >>> into the CPython runtime. If you want to avoid this serialisation, use >>> "cython -a" to find out where Python interaction happens and use static >>> typing to let Cython generate C code instead. >>> """ >>> >>> In other words: "with gil" sections hold the GIL by default and give >>> it away >>> on explicit request, whereas "nogil" sections have the GIL released by >>> default and acquire it on implicit need. >>> >>> The advantage over object level locking is that this does not >>> increase the >>> in-memory size of the object structs, and that it works with *any* >>> Python >>> object, not just extension types with a compile time known type. >>> >>> I kind of like that. >> >> My problem with that is that if there if any other python thread, >> you're likely just going to sleep for thousands of CPU cycles as that >> thread will keep the GIL. Doing this implicitly for operations with >> such overhead would be unacceptable. I think writing 'with gil:' is >> fine, it's the performance that's the problem in the first place which >> prevents you from doing that, not the 9 characters you need to type. > > I'm with Stefan here. We have more or less the exact same problem if you > inadvertendly do arithmetic with Python floats rather than C doubles. > The workflow then is to check the HTML for yellow lines. Same with the > GIL (we could even introduce a new color in the HTML report for where > you hold the GIL and not). > > The advice to get fast code is Sorry, I keep hitting post to early... "The advice to get fast code is still to 'eliminate the yellow lines'". DS > > But, we should also introduce directives that emit warnings in both of > these situations, that you can use while developing to quickly pinpoint > source code lines ("Type of variable not inferred", "GIL automatically > acquired"). > > DS > >> >> What I would like is having Cython infer whether the GIL is needed for >> a function, and mark it "implicitly nogil", so it can be called from >> nogil contexts without actually having to declare it nogil. This would >> only work for non-extern things, and you would still need to declare >> it nogil in your pxd if you want to export it. Apparently many users >> (even those that have used Cython quite a bit) are confused with what >> nogil on functions actually does (or they are not even aware it >> exists). >> >>> Stefan >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >>> >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > From markflorisson88 at gmail.com Wed Oct 26 19:23:48 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Wed, 26 Oct 2011 18:23:48 +0100 Subject: [Cython] Acquisition counted cdef classes In-Reply-To: <4EA7DF93.6090700@astro.uio.no> References: <4EA66648.8030102@behnel.de> <4EA6B999.7060807@astro.uio.no> <4EA737C7.5010500@canterbury.ac.nz> <4EA7BD33.60009@behnel.de> <4EA7DF93.6090700@astro.uio.no> Message-ID: On 26 October 2011 11:23, Dag Sverre Seljebotn wrote: > On 10/26/2011 11:45 AM, mark florisson wrote: >> >> On 26 October 2011 08:56, Stefan Behnel ?wrote: >>> >>> Greg Ewing, 26.10.2011 00:27: >>>> >>>> Dag Sverre Seljebotn wrote: >>>> >>>>> I'd gladly take a factor two (or even four) slowdown of CPython code >>>>> any >>>>> day to get rid of the GIL :-). The thing is, sometimes one has 48 cores >>>>> and consider a 10x speedup better than nothing... >>>> >>>> Another thing to consider is that locking around refcount >>>> changes may not be as expensive in typical Cython code as >>>> it is in Python. >>>> >>>> The trouble with Python is that you can't so much as scratch >>>> your nose without touching a big pile of ref counts. But >>>> if the Cython code is only dealing with a few Python objects >>>> and doing most of its work at the C level, the relative >>>> overhead of locking around refcount changes may not be >>>> significant. >>>> >>>> So it may be worth trying the strategy of just acquiring >>>> the GIL whenever a refcount needs to be changed in a nogil >>>> section, and damn the consequences. >>> >>> Hmm, interesting. That would give new semantics to "nogil" sections, >>> basically: >>> >>> """ >>> You can do Python interaction in nogil code, however, this will slow down >>> your code. Cython will generate C code to acquire and release the GIL >>> around >>> any Python interaction that your code performs, thus serialising any >>> calls >>> into the CPython runtime. If you want to avoid this serialisation, use >>> "cython -a" to find out where Python interaction happens and use static >>> typing to let Cython generate C code instead. >>> """ >>> >>> In other words: "with gil" sections hold the GIL by default and give it >>> away >>> on explicit request, whereas "nogil" sections have the GIL released by >>> default and acquire it on implicit need. >>> >>> The advantage over object level locking is that this does not increase >>> the >>> in-memory size of the object structs, and that it works with *any* Python >>> object, not just extension types with a compile time known type. >>> >>> I kind of like that. >> >> My problem with that is that if there if any other python thread, >> you're likely just going to sleep for thousands of CPU cycles as that >> thread will keep the GIL. Doing this implicitly for operations with >> such overhead would be unacceptable. I think writing 'with gil:' is >> fine, it's the performance that's the problem in the first place which >> prevents you from doing that, not the 9 characters you need to type. > > You are sure about the complete impossibility of having a seperate thread > doing all INCREFs and DECREFs posted to it asynchronously (in the order they > are posted), without race conditions? No I think it is possible, but I don't believe it will solve the DECREF C compiler optimization prevention problem (unlikely() should help there though) as it will still have to submit an asynchronous DECREF without races which means it has to call some kind of (synchronized or atomically operating) function (which prevented the optimization). It would be nice to have as it would mean you can pass stuff around in nogil mode without acquisition counting, and it would mean you can implement these types that can be used in nogil mode and can synchronize using their own lock (if needed). I wonder if deferring INCREFs are safe though. What if you have one reference, you INCREF (deferred, because you don't have the GIL), you call some function that steals your reference (after you obtained the GIL), you somehow cause the program to lose the stolen reference which causes it to be collected, and then the reference counter thread decides to do the INCREF (too late). You also cannot atomically INCREF, and Python doesn't do that, so there could be a race there as well. So I think you really need the GIL to INCREF, and you need to do it synchronously (I'm not completely sure, please feel free to poke holes in my logic any time :). I think it would be nicer to just fix this in CPython in any case, though. Reference counting is terrible to work with in general (regardless of whether you do them immediately or defer them), and it's part of the reason why we have a GIL (although really not the only one). As long as CPython does reference counting, removing the GIL is an absolute no-go (although I wonder how many architectures don't support atomic reference counting). Refcounting has upsides too, though. One is more deterministic collection of objects and destructor calling. Of course this argument becomes moot if you have a reference cycle somewhere. Has anyone ever attempted to implement a garbage collector for CPython? Or did everyone who wanted this feature move to PyPy? >> >> What I would like is having Cython infer whether the GIL is needed for >> a function, and mark it "implicitly nogil", so it can be called from >> nogil contexts without actually having to declare it nogil. This would >> only work for non-extern things, and you would still need to declare >> it nogil in your pxd if you want to export it. Apparently many users >> (even those that have used Cython quite a bit) are confused with what >> nogil on functions actually does (or they are not even aware it >> exists). > > There's a long thread by me and Robert (and some of Stefan) on this from a > couple of months back, don't know if you read it. You could support exports > across pxds as well. Basically for *every* cdef function, export two > function pointers: > > ?1) To a wrapper to be called if you hold the GIL (outside nogil sections) > > ?2) To a wrapper to be called if you don't hold the GIL, or don't know > whether you hold the GIL (the wrapper can acquire the GIL if needed) > > Taking the address of a function (for passing to C, e.g.) would give you the > one that can be called without holding the GIL. > > The implications should hopefully be getting rid of "with gil" and "nogil" > on function declarations entirely. Oh, this was about functions. I agree that for functions that would be neat. For inlined code in functions I don't like it very much, although (unconditional) warnings help a lot in that regard. However, this would mean that e.g. adding a print statement to your function makes it acquire the GIL for nogil contexts, and since it doesn't automatically release it again it may just call another function that was really supposed to operate without the GIL (because it's going to/may take a long time). Overall making all this transparent to the user would be great, people care about their code, not about how CPython is implemented. > Dag Sverre > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From robertwb at math.washington.edu Fri Oct 28 22:55:19 2011 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 28 Oct 2011 13:55:19 -0700 Subject: [Cython] Cython 0.16 Message-ID: With Mark's fused types and memory views going in, I think it's about time for a new release. Thoughts? Anyone want to volunteer to take up the process? - Robert From markflorisson88 at gmail.com Fri Oct 28 22:59:43 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 28 Oct 2011 21:59:43 +0100 Subject: [Cython] Cython 0.16 In-Reply-To: References: Message-ID: On 28 October 2011 21:55, Robert Bradshaw wrote: > With Mark's fused types and memory views going in, I think it's about > time for a new release. Thoughts? Anyone want to volunteer to take up > the process? > > - Robert > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > That'd be cool. However there are a few outstanding issues: a) the compiler is somewhat slower (possible solution: lazy utility codes) b) there's a potential memory leak problem for memoryviews with object dtype that contain themselves, this still needs investigation. As for a), Stefan mentioned code spending a lot of time in sub. Stefan, could you post the code for this that made Cython compile very slowly? From robertwb at math.washington.edu Sat Oct 29 00:37:14 2011 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Fri, 28 Oct 2011 15:37:14 -0700 Subject: [Cython] Cython 0.16 In-Reply-To: References: Message-ID: On Fri, Oct 28, 2011 at 1:59 PM, mark florisson wrote: > On 28 October 2011 21:55, Robert Bradshaw wrote: >> With Mark's fused types and memory views going in, I think it's about >> time for a new release. Thoughts? Anyone want to volunteer to take up >> the process? >> >> - Robert >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> > > That'd be cool. However there are a few outstanding issues: > ? ?a) the compiler is somewhat slower (possible solution: lazy utility codes) Yeah, I forgot about that. This should get resolved. Lazy utility codes (perhaps breaking them up) would probably got us most of the way there. Long term, I really like the "declaration caching" idea which could be used for users .pxd files as well as internally. > ? ?b) there's a potential memory leak problem for memoryviews with > object dtype that contain themselves, this still needs investigation. I think this could be mentioned as a caviat rather than being a blocker. > As for a), Stefan mentioned code spending a lot of time in sub. > Stefan, could you post the code for this that made Cython compile very > slowly? > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From d.s.seljebotn at astro.uio.no Sat Oct 29 11:30:43 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 29 Oct 2011 11:30:43 +0200 Subject: [Cython] Cython 0.16 In-Reply-To: References: Message-ID: Re b), it would be better to disable object dtypes (or emit a warning about the possible bug when using them) than to delay the release. Object memoryviews are rare in the first place, and those who contain themselves should be very rare. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. Robert Bradshaw wrote: On Fri, Oct 28, 2011 at 1:59 PM, mark florisson wrote: > On 28 October 2011 21:55, Robert Bradshaw wrote: >> With Mark's fused types and memory views going in, I think it's about >> time for a new release. Thoughts? Anyone want to volunteer to take up >> the process? >> >> - Robert >>_____________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> > > That'd be cool. However there are a few outstanding issues: > a) the compiler is somewhat slower (possible solution: lazy utility codes) Yeah, I forgot about that. This should get resolved. Lazy utility codes (perhaps breaking them up) would probably got us most of the way there. Long term, I really like the "declaration caching" idea which could be used for users .pxd files as well as internally. > b) there's a potential memory leak problem for memoryviews with > object dtype that contain themselves, this still needs investigation. I think this could be mentioned as a caviat rather than being a blocker. > As for a), Stefan mentioned code spending a lot of time in sub. > Stefan, could you post the code for this that made Cython compile very > slowly? >_____________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel >_____________________________________________ cython-devel mailing list cython-devel at python.org http://mail.python.org/mailman/listinfo/cython-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From markflorisson88 at gmail.com Sat Oct 29 13:41:34 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 29 Oct 2011 12:41:34 +0100 Subject: [Cython] Cython 0.16 In-Reply-To: References: Message-ID: Hm ok I'll disable them then. Pointers and some other dtypes are also not supported yet. As for the documentation, have you guys reviewed the documentation for fused types and memoryviews? For instance this is the introduction for memoryviews: " Typed memoryviews can be used for efficient access to buffers. It is similar to the current buffer support, but has more features and cleaner syntax. A memoryview can be used in any context (function parameters, module-level, cdef class attribute, etc) and can be obtained from any object that exposes the PEP 3118 buffer interface. " but I'm not sure this new functionality won't confuse users of the old buffer support. For fused types, cython.numeric only includes long, double and double complex. I think that should be changed to short, int, long, float, double, float complex and double complex. I was deliberately avoiding long long and long double as they (if not used as a base type) would be preferred over the others and may be a lot slower. But then, such usage wouldn't be very useful. Should I include them then? On 29 October 2011 10:30, Dag Sverre Seljebotn wrote: > Re b), it would be better to disable object dtypes (or emit a warning about > the possible bug when using them) than to delay the release. Object > memoryviews are rare in the first place, and those who contain themselves > should be very rare. > -- > Sent from my Android phone with K-9 Mail. Please excuse my brevity. > > Robert Bradshaw wrote: >> >> On Fri, Oct 28, 2011 at 1:59 PM, mark florisson >> wrote: > On 28 October 2011 21:55, Robert >> Bradshaw wrote: >> With Mark's fused types >> and memory views going in, I think it's about >> time for a new release. >> Thoughts? Anyone want to volunteer to take up >> the process? >> >> - Robert >> >> >> ________________________________ >> >> cython-devel mailing list >> cython-devel at python.org >> >> >> http://mail.python.org/mailman/listinfo/cython-devel >> > > That'd be cool. >> >> However there are a few outstanding issues: > ? ?a) the compiler is somewhat >> >> slower (possible solution: lazy utility codes) Yeah, I forgot about that. >> >> This should get resolved. Lazy utility codes (perhaps breaking them up) >> >> would probably got us most of the way there. Long term, I really like the >> >> "declaration caching" idea which could be used for users .pxd files as well >> >> as internally. > ? ?b) there's a potential memory leak problem for >> >> memoryviews with > object dtype that contain themselves, this still needs >> >> investigation. I think this could be mentioned as a caviat rather than being >> >> a blocker. > As for a), Stefan mentioned code spending a lot of time in sub. >> >> > Stefan, could you post the code for this that made Cython compile very > >> >> slowly? > >> ________________________________ >> > cython-devel mailing list > cython-devel at python.org > >> > http://mail.python.org/mailman/listinfo/cython-devel > >> ________________________________ >> cython-devel mailing list cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > > From markflorisson88 at gmail.com Sat Oct 29 15:14:12 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 29 Oct 2011 14:14:12 +0100 Subject: [Cython] Cython 0.16 In-Reply-To: References: Message-ID: Before we do a release, would anyone be opposed to a 'chunksize' keyword argument to prange()? That may have significant performance impacts. On 29 October 2011 12:41, mark florisson wrote: > Hm ok I'll disable them then. Pointers and some other dtypes are also > not supported yet. As for the documentation, have you guys reviewed > the documentation for fused types and memoryviews? For instance this > is the introduction for memoryviews: > > " > Typed memoryviews can be used for efficient access to buffers. It is > similar to the current buffer support, but has more features and > cleaner syntax. A memoryview can be used in any context (function > parameters, module-level, cdef class attribute, etc) and can be > obtained from any object that exposes the PEP 3118 buffer interface. > " > > but I'm not sure this new functionality won't confuse users of the old > buffer support. > > For fused types, cython.numeric only includes long, double and double > complex. I think that should be changed to short, int, long, float, > double, float complex and double complex. I was deliberately avoiding > long long and long double as they (if not used as a base type) would > be preferred over the others and may be a lot slower. But then, such > usage wouldn't be very useful. Should I include them then? > > On 29 October 2011 10:30, Dag Sverre Seljebotn > wrote: >> Re b), it would be better to disable object dtypes (or emit a warning about >> the possible bug when using them) than to delay the release. Object >> memoryviews are rare in the first place, and those who contain themselves >> should be very rare. >> -- >> Sent from my Android phone with K-9 Mail. Please excuse my brevity. >> >> Robert Bradshaw wrote: >>> >>> On Fri, Oct 28, 2011 at 1:59 PM, mark florisson >>> wrote: > On 28 October 2011 21:55, Robert >>> Bradshaw wrote: >> With Mark's fused types >>> and memory views going in, I think it's about >> time for a new release. >>> Thoughts? Anyone want to volunteer to take up >> the process? >> >> - Robert >>> >> >>> ________________________________ >>> >> cython-devel mailing list >> cython-devel at python.org >> >>> >> http://mail.python.org/mailman/listinfo/cython-devel >> > > That'd be cool. >>> >> However there are a few outstanding issues: > ? ?a) the compiler is somewhat >>> >> slower (possible solution: lazy utility codes) Yeah, I forgot about that. >>> >> This should get resolved. Lazy utility codes (perhaps breaking them up) >>> >> would probably got us most of the way there. Long term, I really like the >>> >> "declaration caching" idea which could be used for users .pxd files as well >>> >> as internally. > ? ?b) there's a potential memory leak problem for >>> >> memoryviews with > object dtype that contain themselves, this still needs >>> >> investigation. I think this could be mentioned as a caviat rather than being >>> >> a blocker. > As for a), Stefan mentioned code spending a lot of time in sub. >>> >> > Stefan, could you post the code for this that made Cython compile very > >>> >> slowly? > >>> ________________________________ >>> > cython-devel mailing list > cython-devel at python.org > >>> > http://mail.python.org/mailman/listinfo/cython-devel > >>> ________________________________ >>> cython-devel mailing list cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> >> > From faltet at gmail.com Sat Oct 29 16:23:53 2011 From: faltet at gmail.com (Francesc Alted) Date: Sat, 29 Oct 2011 16:23:53 +0200 Subject: [Cython] Cython 0.16 In-Reply-To: References: Message-ID: On the contrary, this is an excellent idea! El 29/10/2011 15:14, "mark florisson" va escriure: > Before we do a release, would anyone be opposed to a 'chunksize' > keyword argument to prange()? That may have significant performance > impacts. > > On 29 October 2011 12:41, mark florisson > wrote: > > Hm ok I'll disable them then. Pointers and some other dtypes are also > > not supported yet. As for the documentation, have you guys reviewed > > the documentation for fused types and memoryviews? For instance this > > is the introduction for memoryviews: > > > > " > > Typed memoryviews can be used for efficient access to buffers. It is > > similar to the current buffer support, but has more features and > > cleaner syntax. A memoryview can be used in any context (function > > parameters, module-level, cdef class attribute, etc) and can be > > obtained from any object that exposes the PEP 3118 buffer interface. > > " > > > > but I'm not sure this new functionality won't confuse users of the old > > buffer support. > > > > For fused types, cython.numeric only includes long, double and double > > complex. I think that should be changed to short, int, long, float, > > double, float complex and double complex. I was deliberately avoiding > > long long and long double as they (if not used as a base type) would > > be preferred over the others and may be a lot slower. But then, such > > usage wouldn't be very useful. Should I include them then? > > > > On 29 October 2011 10:30, Dag Sverre Seljebotn > > wrote: > >> Re b), it would be better to disable object dtypes (or emit a warning > about > >> the possible bug when using them) than to delay the release. Object > >> memoryviews are rare in the first place, and those who contain > themselves > >> should be very rare. > >> -- > >> Sent from my Android phone with K-9 Mail. Please excuse my brevity. > >> > >> Robert Bradshaw wrote: > >>> > >>> On Fri, Oct 28, 2011 at 1:59 PM, mark florisson > >>> wrote: > On 28 October 2011 21:55, Robert > >>> Bradshaw wrote: >> With Mark's fused > types > >>> and memory views going in, I think it's about >> time for a new > release. > >>> Thoughts? Anyone want to volunteer to take up >> the process? >> >> - > Robert > >>> >> > >>> ________________________________ > >>> >> cython-devel mailing list >> cython-devel at python.org >> > >>> >> http://mail.python.org/mailman/listinfo/cython-devel >> > > That'd > be cool. > >>> >> However there are a few outstanding issues: > a) the compiler is > somewhat > >>> >> slower (possible solution: lazy utility codes) Yeah, I forgot about > that. > >>> >> This should get resolved. Lazy utility codes (perhaps breaking them > up) > >>> >> would probably got us most of the way there. Long term, I really > like the > >>> >> "declaration caching" idea which could be used for users .pxd files > as well > >>> >> as internally. > b) there's a potential memory leak problem for > >>> >> memoryviews with > object dtype that contain themselves, this still > needs > >>> >> investigation. I think this could be mentioned as a caviat rather > than being > >>> >> a blocker. > As for a), Stefan mentioned code spending a lot of > time in sub. > >>> >> > Stefan, could you post the code for this that made Cython compile > very > > >>> >> slowly? > > >>> ________________________________ > >>> > cython-devel mailing list > cython-devel at python.org > > >>> > http://mail.python.org/mailman/listinfo/cython-devel > > >>> ________________________________ > >>> cython-devel mailing list cython-devel at python.org > >>> http://mail.python.org/mailman/listinfo/cython-devel > >> > >> _______________________________________________ > >> cython-devel mailing list > >> cython-devel at python.org > >> http://mail.python.org/mailman/listinfo/cython-devel > >> > >> > > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From markflorisson88 at gmail.com Sat Oct 29 16:37:17 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 29 Oct 2011 15:37:17 +0100 Subject: [Cython] Cython 0.16 In-Reply-To: References: Message-ID: Heh, that's a +1 :) This makes me wonder, should we organize soms polls to have users vote on what functionality they would like to see in Cython? Some users may read the cython-dev mailing list, but many might also not. E.g. provide a poll where we list some things that we would like to see, and an option with a form that allows them to fill in something else entirely. Maybe we could do that on cython.org to allow anonymous votes, not everyone may be interested in discussion, just in voting. On 29 October 2011 15:23, Francesc Alted wrote: > On the contrary, this is an excellent idea! > > El 29/10/2011 15:14, "mark florisson" va > escriure: >> >> Before we do a release, would anyone be opposed to a 'chunksize' >> keyword argument to prange()? That may have significant performance >> impacts. >> >> On 29 October 2011 12:41, mark florisson >> wrote: >> > Hm ok I'll disable them then. Pointers and some other dtypes are also >> > not supported yet. As for the documentation, have you guys reviewed >> > the documentation for fused types and memoryviews? For instance this >> > is the introduction for memoryviews: >> > >> > " >> > Typed memoryviews can be used for efficient access to buffers. It is >> > similar to the current buffer support, but has more features and >> > cleaner syntax. A memoryview can be used in any context (function >> > parameters, module-level, cdef class attribute, etc) and can be >> > obtained from any object that exposes the PEP 3118 buffer interface. >> > " >> > >> > but I'm not sure this new functionality won't confuse users of the old >> > buffer support. >> > >> > For fused types, cython.numeric only includes long, double and double >> > complex. I think that should be changed to short, int, long, float, >> > double, float complex and double complex. I was deliberately avoiding >> > long long and long double as they (if not used as a base type) would >> > be preferred over the others and may be a lot slower. But then, such >> > usage wouldn't be very useful. Should I include them then? >> > >> > On 29 October 2011 10:30, Dag Sverre Seljebotn >> > wrote: >> >> Re b), it would be better to disable object dtypes (or emit a warning >> >> about >> >> the possible bug when using them) than to delay the release. Object >> >> memoryviews are rare in the first place, and those who contain >> >> themselves >> >> should be very rare. >> >> -- >> >> Sent from my Android phone with K-9 Mail. Please excuse my brevity. >> >> >> >> Robert Bradshaw wrote: >> >>> >> >>> On Fri, Oct 28, 2011 at 1:59 PM, mark florisson >> >>> wrote: > On 28 October 2011 21:55, Robert >> >>> Bradshaw wrote: >> With Mark's fused >> >>> types >> >>> and memory views going in, I think it's about >> time for a new >> >>> release. >> >>> Thoughts? Anyone want to volunteer to take up >> the process? >> >> - >> >>> Robert >> >>> >> >> >>> ________________________________ >> >>> >> cython-devel mailing list >> cython-devel at python.org >> >> >>> >> http://mail.python.org/mailman/listinfo/cython-devel >> > > That'd >> >>> >> be cool. >> >>> >> However there are a few outstanding issues: > ? ?a) the compiler is >> >>> >> somewhat >> >>> >> slower (possible solution: lazy utility codes) Yeah, I forgot about >> >>> >> that. >> >>> >> This should get resolved. Lazy utility codes (perhaps breaking them >> >>> >> up) >> >>> >> would probably got us most of the way there. Long term, I really >> >>> >> like the >> >>> >> "declaration caching" idea which could be used for users .pxd files >> >>> >> as well >> >>> >> as internally. > ? ?b) there's a potential memory leak problem for >> >>> >> memoryviews with > object dtype that contain themselves, this still >> >>> >> needs >> >>> >> investigation. I think this could be mentioned as a caviat rather >> >>> >> than being >> >>> >> a blocker. > As for a), Stefan mentioned code spending a lot of >> >>> >> time in sub. >> >>> >> > Stefan, could you post the code for this that made Cython compile >> >>> >> > very > >> >>> >> slowly? > >> >>> ________________________________ >> >>> > cython-devel mailing list > cython-devel at python.org > >> >>> > http://mail.python.org/mailman/listinfo/cython-devel > >> >>> ________________________________ >> >>> cython-devel mailing list cython-devel at python.org >> >>> http://mail.python.org/mailman/listinfo/cython-devel >> >> >> >> _______________________________________________ >> >> cython-devel mailing list >> >> cython-devel at python.org >> >> http://mail.python.org/mailman/listinfo/cython-devel >> >> >> >> >> > >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > > From njs at pobox.com Sat Oct 29 16:50:59 2011 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 29 Oct 2011 07:50:59 -0700 Subject: [Cython] Cython 0.16 In-Reply-To: References: Message-ID: On Oct 29, 2011 4:41 AM, "mark florisson" wrote: > " > Typed memoryviews can be used for efficient access to buffers. It is > similar to the current buffer support, but has more features and > cleaner syntax. A memoryview can be used in any context (function > parameters, module-level, cdef class attribute, etc) and can be > obtained from any object that exposes the PEP 3118 buffer interface. > " FWIW, I do find this paragraph somewhat confusing, because the main description of what a typed memoryview is assumes that I already know the current buffer support. I think that's actually true (the ndarray[int32] syntax, right?), but I'm not sure, and people coming to this for the first time probably won't even know that buffers are what they're looking for. I'd say something like: "Typed memoryviews can be used for efficient access to buffers. For example, you can use them to read and modify numpy arrays or without incurring any python overhead." And put a compare/contrast with the old syntax later, like the second paragraph or so. My 2?, - Nathaniel -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Sat Oct 29 16:50:50 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 29 Oct 2011 16:50:50 +0200 Subject: [Cython] Cython 0.16 In-Reply-To: References: Message-ID: <4EAC12CA.5040200@behnel.de> mark florisson, 28.10.2011 22:59: > On 28 October 2011 21:55, Robert Bradshaw wrote: >> With Mark's fused types and memory views going in, I think it's about >> time for a new release. Agreed. >> Thoughts? I still haven't investigated the decorator issue that appeared in the Sage tests. I think it's related to decorators on module level def functions, which would suggest that it's best to eventually fix it as part of the function implementation changes that Vitja has started. But there may still be a simpler work-around somewhere that I'm not seeing yet. I basically broke the Sage tests by resolving a bug (593 IIRC), and both don't currently work together. So, a variant would be to revert my changes for 0.16 and just leave the bug in, if that keeps us from breaking existing code for now. But even leaving that out, the Sage tests look seriously broken currently: https://sage.math.washington.edu:8091/hudson/view/All/job/sage-tests/lastCompletedBuild/consoleFull > That'd be cool. However there are a few outstanding issues: > a) the compiler is somewhat slower (possible solution: lazy utility codes) > b) there's a potential memory leak problem for memoryviews with > object dtype that contain themselves, this still needs investigation. > > As for a), Stefan mentioned code spending a lot of time in sub. > Stefan, could you post the code for this that made Cython compile very > slowly? At the time, I just ran cProfile on runtests.py with something like "withstat with_stat" or so as tests - basically all with-statement related ones. It took about 20 seconds or so to build the utility code, just to throw it away unused afterwards. The compile/test run itself then took about 3 seconds. Stefan From markflorisson88 at gmail.com Sat Oct 29 17:03:11 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 29 Oct 2011 16:03:11 +0100 Subject: [Cython] Cython 0.16 In-Reply-To: References: Message-ID: On 29 October 2011 15:50, Nathaniel Smith wrote: > On Oct 29, 2011 4:41 AM, "mark florisson" wrote: >> " >> Typed memoryviews can be used for efficient access to buffers. It is >> similar to the current buffer support, but has more features and >> cleaner syntax. A memoryview can be used in any context (function >> parameters, module-level, cdef class attribute, etc) and can be >> obtained from any object that exposes the PEP 3118 buffer interface. >> " > > FWIW, I do find this paragraph somewhat confusing, because the main > description of what a typed memoryview is assumes that I already know the > current buffer support. I think that's actually true (the ndarray[int32] > syntax, right?), but I'm not sure, and people coming to this for the first > time probably won't even know that buffers are what they're looking for. > > I'd say something like: "Typed memoryviews can be used for efficient access > to buffers. For example, you can use them to read and modify numpy arrays or > without incurring any python overhead." And put > a compare/contrast with the old syntax later, like the second paragraph or > so. > > My 2?, > - Nathaniel Good idea, thanks! I'll update the documentation again. > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > > From markflorisson88 at gmail.com Sat Oct 29 17:03:24 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 29 Oct 2011 16:03:24 +0100 Subject: [Cython] Cython 0.16 In-Reply-To: <4EAC12CA.5040200@behnel.de> References: <4EAC12CA.5040200@behnel.de> Message-ID: On 29 October 2011 15:50, Stefan Behnel wrote: > mark florisson, 28.10.2011 22:59: >> >> On 28 October 2011 21:55, Robert Bradshaw wrote: >>> >>> With Mark's fused types and memory views going in, I think it's about >>> time for a new release. > > Agreed. > > >>> Thoughts? > > I still haven't investigated the decorator issue that appeared in the Sage > tests. I think it's related to decorators on module level def functions, > which would suggest that it's best to eventually fix it as part of the > function implementation changes that Vitja has started. But there may still > be a simpler work-around somewhere that I'm not seeing yet. > > I basically broke the Sage tests by resolving a bug (593 IIRC), and both > don't currently work together. So, a variant would be to revert my changes > for 0.16 and just leave the bug in, if that keeps us from breaking existing > code for now. If it's a bug I think it's worth fixing, even if it breaks other code. Unfortunately I lost my trac password, so I don't know which bug that is. > But even leaving that out, the Sage tests look seriously broken currently: > > https://sage.math.washington.edu:8091/hudson/view/All/job/sage-tests/lastCompletedBuild/consoleFull > > >> That'd be cool. However there are a few outstanding issues: >> ? ? a) the compiler is somewhat slower (possible solution: lazy utility >> codes) >> ? ? b) there's a potential memory leak problem for memoryviews with >> object dtype that contain themselves, this still needs investigation. >> >> As for a), Stefan mentioned code spending a lot of time in sub. >> Stefan, could you post the code for this that made Cython compile very >> slowly? > > At the time, I just ran cProfile on runtests.py with something like > "withstat with_stat" or so as tests - basically all with-statement related > ones. It took about 20 seconds or so to build the utility code, just to > throw it away unused afterwards. The compile/test run itself then took about > 3 seconds. Was that before or after the deferred cython scope loading commit? > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From markflorisson88 at gmail.com Sat Oct 29 17:11:56 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 29 Oct 2011 16:11:56 +0100 Subject: [Cython] Cython 0.16 In-Reply-To: References: Message-ID: It seems that, most ironically, OpenMP "isn't defined" to be called from multithreaded contexts. It seems that even if I use prange only in another thread that isn't the main thread the program segfaults if compiled with gcc. That's kind of worrying, I suppose we should mention that in the documentation. This may be a problem especially for people who write libraries. Are NumPy, Scipy or Sage linked with any libraries that use OpenMP? On 29 October 2011 14:14, mark florisson wrote: > Before we do a release, would anyone be opposed to a 'chunksize' > keyword argument to prange()? That may have significant performance > impacts. > > On 29 October 2011 12:41, mark florisson wrote: >> Hm ok I'll disable them then. Pointers and some other dtypes are also >> not supported yet. As for the documentation, have you guys reviewed >> the documentation for fused types and memoryviews? For instance this >> is the introduction for memoryviews: >> >> " >> Typed memoryviews can be used for efficient access to buffers. It is >> similar to the current buffer support, but has more features and >> cleaner syntax. A memoryview can be used in any context (function >> parameters, module-level, cdef class attribute, etc) and can be >> obtained from any object that exposes the PEP 3118 buffer interface. >> " >> >> but I'm not sure this new functionality won't confuse users of the old >> buffer support. >> >> For fused types, cython.numeric only includes long, double and double >> complex. I think that should be changed to short, int, long, float, >> double, float complex and double complex. I was deliberately avoiding >> long long and long double as they (if not used as a base type) would >> be preferred over the others and may be a lot slower. But then, such >> usage wouldn't be very useful. Should I include them then? >> >> On 29 October 2011 10:30, Dag Sverre Seljebotn >> wrote: >>> Re b), it would be better to disable object dtypes (or emit a warning about >>> the possible bug when using them) than to delay the release. Object >>> memoryviews are rare in the first place, and those who contain themselves >>> should be very rare. >>> -- >>> Sent from my Android phone with K-9 Mail. Please excuse my brevity. >>> >>> Robert Bradshaw wrote: >>>> >>>> On Fri, Oct 28, 2011 at 1:59 PM, mark florisson >>>> wrote: > On 28 October 2011 21:55, Robert >>>> Bradshaw wrote: >> With Mark's fused types >>>> and memory views going in, I think it's about >> time for a new release. >>>> Thoughts? Anyone want to volunteer to take up >> the process? >> >> - Robert >>>> >> >>>> ________________________________ >>>> >> cython-devel mailing list >> cython-devel at python.org >> >>>> >> http://mail.python.org/mailman/listinfo/cython-devel >> > > That'd be cool. >>>> >> However there are a few outstanding issues: > ? ?a) the compiler is somewhat >>>> >> slower (possible solution: lazy utility codes) Yeah, I forgot about that. >>>> >> This should get resolved. Lazy utility codes (perhaps breaking them up) >>>> >> would probably got us most of the way there. Long term, I really like the >>>> >> "declaration caching" idea which could be used for users .pxd files as well >>>> >> as internally. > ? ?b) there's a potential memory leak problem for >>>> >> memoryviews with > object dtype that contain themselves, this still needs >>>> >> investigation. I think this could be mentioned as a caviat rather than being >>>> >> a blocker. > As for a), Stefan mentioned code spending a lot of time in sub. >>>> >> > Stefan, could you post the code for this that made Cython compile very > >>>> >> slowly? > >>>> ________________________________ >>>> > cython-devel mailing list > cython-devel at python.org > >>>> > http://mail.python.org/mailman/listinfo/cython-devel > >>>> ________________________________ >>>> cython-devel mailing list cython-devel at python.org >>>> http://mail.python.org/mailman/listinfo/cython-devel >>> >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >>> >>> >> > From stefan_ml at behnel.de Sat Oct 29 17:42:06 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 29 Oct 2011 17:42:06 +0200 Subject: [Cython] Cython 0.16 In-Reply-To: References: <4EAC12CA.5040200@behnel.de> Message-ID: <4EAC1ECE.7060404@behnel.de> mark florisson, 29.10.2011 17:03: > On 29 October 2011 15:50, Stefan Behnel wrote: >> mark florisson, 28.10.2011 22:59: >>> >>> On 28 October 2011 21:55, Robert Bradshaw wrote: >>>> >>>> With Mark's fused types and memory views going in, I think it's about >>>> time for a new release. >> >> I still haven't investigated the decorator issue that appeared in the Sage >> tests. I think it's related to decorators on module level def functions, >> which would suggest that it's best to eventually fix it as part of the >> function implementation changes that Vitja has started. But there may still >> be a simpler work-around somewhere that I'm not seeing yet. >> >> I basically broke the Sage tests by resolving a bug (593 IIRC), and both >> don't currently work together. So, a variant would be to revert my changes >> for 0.16 and just leave the bug in, if that keeps us from breaking existing >> code for now. > > If it's a bug I think it's worth fixing, even if it breaks other code. > Unfortunately I lost my trac password, so I don't know which bug that > is. You should be able to set up a new password, that should get you back in. >> But even leaving that out, the Sage tests look seriously broken currently: >> >> https://sage.math.washington.edu:8091/hudson/view/All/job/sage-tests/lastCompletedBuild/consoleFull >> >> >>> That'd be cool. However there are a few outstanding issues: >>> a) the compiler is somewhat slower (possible solution: lazy utility >>> codes) >>> b) there's a potential memory leak problem for memoryviews with >>> object dtype that contain themselves, this still needs investigation. >>> >>> As for a), Stefan mentioned code spending a lot of time in sub. >>> Stefan, could you post the code for this that made Cython compile very >>> slowly? >> >> At the time, I just ran cProfile on runtests.py with something like >> "withstat with_stat" or so as tests - basically all with-statement related >> ones. It took about 20 seconds or so to build the utility code, just to >> throw it away unused afterwards. The compile/test run itself then took about >> 3 seconds. > > Was that before or after the deferred cython scope loading commit? Likely before. It looks *much* better now. Stefan From vitja.makarov at gmail.com Sat Oct 29 18:40:05 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Sat, 29 Oct 2011 20:40:05 +0400 Subject: [Cython] Cython 0.16 In-Reply-To: <4EAC12CA.5040200@behnel.de> References: <4EAC12CA.5040200@behnel.de> Message-ID: 2011/10/29 Stefan Behnel : > mark florisson, 28.10.2011 22:59: >> >> On 28 October 2011 21:55, Robert Bradshaw wrote: >>> >>> With Mark's fused types and memory views going in, I think it's about >>> time for a new release. > > Agreed. > > >>> Thoughts? > > I still haven't investigated the decorator issue that appeared in the Sage > tests. I think it's related to decorators on module level def functions, > which would suggest that it's best to eventually fix it as part of the > function implementation changes that Vitja has started. But there may still > be a simpler work-around somewhere that I'm not seeing yet. > Recently I've implemented py3k-super and dynamic default args if we have time. I would see this in release also. Can you please point me to sage decorators related failure? > I basically broke the Sage tests by resolving a bug (593 IIRC), and both > don't currently work together. So, a variant would be to revert my changes > for 0.16 and just leave the bug in, if that keeps us from breaking existing > code for now. > > But even leaving that out, the Sage tests look seriously broken currently: > > https://sage.math.washington.edu:8091/hudson/view/All/job/sage-tests/lastCompletedBuild/consoleFull > > >> That'd be cool. However there are a few outstanding issues: >> ? ? a) the compiler is somewhat slower (possible solution: lazy utility >> codes) >> ? ? b) there's a potential memory leak problem for memoryviews with >> object dtype that contain themselves, this still needs investigation. >> >> As for a), Stefan mentioned code spending a lot of time in sub. >> Stefan, could you post the code for this that made Cython compile very >> slowly? > > At the time, I just ran cProfile on runtests.py with something like > "withstat with_stat" or so as tests - basically all with-statement related > ones. It took about 20 seconds or so to build the utility code, just to > throw it away unused afterwards. The compile/test run itself then took about > 3 seconds. > -- vitja. From robertwb at math.washington.edu Sat Oct 29 18:58:48 2011 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Sat, 29 Oct 2011 09:58:48 -0700 Subject: [Cython] Cython 0.16 In-Reply-To: <4EAC12CA.5040200@behnel.de> References: <4EAC12CA.5040200@behnel.de> Message-ID: On Sat, Oct 29, 2011 at 7:50 AM, Stefan Behnel wrote: > mark florisson, 28.10.2011 22:59: >> >> On 28 October 2011 21:55, Robert Bradshaw wrote: >>> >>> With Mark's fused types and memory views going in, I think it's about >>> time for a new release. > > Agreed. > > >>> Thoughts? > > I still haven't investigated the decorator issue that appeared in the Sage > tests. I think it's related to decorators on module level def functions, > which would suggest that it's best to eventually fix it as part of the > function implementation changes that Vitja has started. But there may still > be a simpler work-around somewhere that I'm not seeing yet. > > I basically broke the Sage tests by resolving a bug (593 IIRC), and both > don't currently work together. So, a variant would be to revert my changes > for 0.16 and just leave the bug in, if that keeps us from breaking existing > code for now. > > But even leaving that out, the Sage tests look seriously broken currently: > > https://sage.math.washington.edu:8091/hudson/view/All/job/sage-tests/lastCompletedBuild/consoleFull I recently fixed the Sage build (the errors on public api for non public types broke it). As for those tests, they seem to be related to name mangling for double-underscore names. Did something change here recently? Or is it indirectly due to decorators (I haven't looked too deeply yet.) >> That'd be cool. However there are a few outstanding issues: >> ? ? a) the compiler is somewhat slower (possible solution: lazy utility >> codes) >> ? ? b) there's a potential memory leak problem for memoryviews with >> object dtype that contain themselves, this still needs investigation. >> >> As for a), Stefan mentioned code spending a lot of time in sub. >> Stefan, could you post the code for this that made Cython compile very >> slowly? > > At the time, I just ran cProfile on runtests.py with something like > "withstat with_stat" or so as tests - basically all with-statement related > ones. It took about 20 seconds or so to build the utility code, just to > throw it away unused afterwards. The compile/test run itself then took about > 3 seconds. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From robertwb at math.washington.edu Sat Oct 29 19:05:00 2011 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Sat, 29 Oct 2011 10:05:00 -0700 Subject: [Cython] Cython 0.16 In-Reply-To: References: Message-ID: On Sat, Oct 29, 2011 at 4:41 AM, mark florisson wrote: > Hm ok I'll disable them then. Pointers and some other dtypes are also > not supported yet. As for the documentation, have you guys reviewed > the documentation for fused types and memoryviews? I looked at the fused types docs. > For instance this > is the introduction for memoryviews: > > " > Typed memoryviews can be used for efficient access to buffers. It is > similar to the current buffer support, but has more features and > cleaner syntax. A memoryview can be used in any context (function > parameters, module-level, cdef class attribute, etc) and can be > obtained from any object that exposes the PEP 3118 buffer interface. > " > > but I'm not sure this new functionality won't confuse users of the old > buffer support. > > For fused types, cython.numeric only includes long, double and double > complex. I think that should be changed to short, int, long, float, > double, float complex and double complex. Yes. What about size_t, ssize_t, and Py_ssize_t? > I was deliberately avoiding > long long and long double as they (if not used as a base type) would > be preferred over the others and may be a lot slower. But then, such > usage wouldn't be very useful. Should I include them then? That's a good question. Perhaps these two could be used if explicitly requested, or for dispatching from a Python long (in Py2) or non-word-sized int (in Py3). > On 29 October 2011 10:30, Dag Sverre Seljebotn > wrote: >> Re b), it would be better to disable object dtypes (or emit a warning about >> the possible bug when using them) than to delay the release. Object >> memoryviews are rare in the first place, and those who contain themselves >> should be very rare. +1 to a warning, especially if the problem is only related to circular references. - Robert From markflorisson88 at gmail.com Sat Oct 29 19:44:07 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 29 Oct 2011 18:44:07 +0100 Subject: [Cython] Cython 0.16 In-Reply-To: References: Message-ID: On 29 October 2011 18:05, Robert Bradshaw wrote: > On Sat, Oct 29, 2011 at 4:41 AM, mark florisson > wrote: >> Hm ok I'll disable them then. Pointers and some other dtypes are also >> not supported yet. As for the documentation, have you guys reviewed >> the documentation for fused types and memoryviews? > > I looked at the fused types docs. > >> For instance this >> is the introduction for memoryviews: >> >> " >> Typed memoryviews can be used for efficient access to buffers. It is >> similar to the current buffer support, but has more features and >> cleaner syntax. A memoryview can be used in any context (function >> parameters, module-level, cdef class attribute, etc) and can be >> obtained from any object that exposes the PEP 3118 buffer interface. >> " >> >> but I'm not sure this new functionality won't confuse users of the old >> buffer support. >> >> For fused types, cython.numeric only includes long, double and double >> complex. I think that should be changed to short, int, long, float, >> double, float complex and double complex. > > Yes. What about size_t, ssize_t, and Py_ssize_t? Hmm, these things don't contain unsigned types as they may be chosen when calling directly (as they're longer), but they will cause problems for negative values. I think unsigned types should be explicit. I think size_t is also more for representing the size of objects, I'm not sure you'd want the same code operating on size_t and say, ints. Py_ssize_t is typically used as the type for indices, but not much else I think, so it might be weird to include it. >> I was deliberately avoiding >> long long and long double as they (if not used as a base type) would >> be preferred over the others and may be a lot slower. But then, such >> usage wouldn't be very useful. Should I include them then? > > That's a good question. Perhaps these two could be used if explicitly > requested, or for dispatching from a Python long (in Py2) or > non-word-sized int (in Py3). I'm not sure I understand, how would you request them explicitly? The user could always just created a fused type manually if he/she wants long long, long double, or long double complex. >> On 29 October 2011 10:30, Dag Sverre Seljebotn >> wrote: >>> Re b), it would be better to disable object dtypes (or emit a warning about >>> the possible bug when using them) than to delay the release. Object >>> memoryviews are rare in the first place, and those who contain themselves >>> should be very rare. > > +1 to a warning, especially if the problem is only related to circular > references. Hmm, a warning, ok. Do we desperately want to get a release out, or do we want it for somewhere e.g. at the end of the week? Because fixing this issue wouldn't be too hard I think, and it might give us some more time to review and merge Vitja's code. super() is pretty neat. > - Robert > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From markflorisson88 at gmail.com Sat Oct 29 19:47:33 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 29 Oct 2011 18:47:33 +0100 Subject: [Cython] Cython 0.16 In-Reply-To: References: Message-ID: On 29 October 2011 18:44, mark florisson wrote: > On 29 October 2011 18:05, Robert Bradshaw wrote: >> On Sat, Oct 29, 2011 at 4:41 AM, mark florisson >> wrote: >>> Hm ok I'll disable them then. Pointers and some other dtypes are also >>> not supported yet. As for the documentation, have you guys reviewed >>> the documentation for fused types and memoryviews? >> >> I looked at the fused types docs. >> >>> For instance this >>> is the introduction for memoryviews: >>> >>> " >>> Typed memoryviews can be used for efficient access to buffers. It is >>> similar to the current buffer support, but has more features and >>> cleaner syntax. A memoryview can be used in any context (function >>> parameters, module-level, cdef class attribute, etc) and can be >>> obtained from any object that exposes the PEP 3118 buffer interface. >>> " >>> >>> but I'm not sure this new functionality won't confuse users of the old >>> buffer support. >>> >>> For fused types, cython.numeric only includes long, double and double >>> complex. I think that should be changed to short, int, long, float, >>> double, float complex and double complex. >> >> Yes. What about size_t, ssize_t, and Py_ssize_t? > > Hmm, these things don't contain unsigned types as they may be chosen > when calling directly (as they're longer), but they will cause > problems for negative values. I think unsigned types should be > explicit. I think size_t is also more for representing the size of > objects, I'm not sure you'd want the same code operating on size_t and > say, ints. Py_ssize_t is typically used as the type for indices, but > not much else I think, so it might be weird to include it. Yes, I think the long long and long double ones should just be excluded. If people want them they can fuse their own types. >>> I was deliberately avoiding >>> long long and long double as they (if not used as a base type) would >>> be preferred over the others and may be a lot slower. But then, such >>> usage wouldn't be very useful. Should I include them then? >> >> That's a good question. Perhaps these two could be used if explicitly >> requested, or for dispatching from a Python long (in Py2) or >> non-word-sized int (in Py3). > > I'm not sure I understand, how would you request them explicitly? The > user could always just created a fused type manually if he/she wants > long long, long double, or long double complex. > >>> On 29 October 2011 10:30, Dag Sverre Seljebotn >>> wrote: >>>> Re b), it would be better to disable object dtypes (or emit a warning about >>>> the possible bug when using them) than to delay the release. Object >>>> memoryviews are rare in the first place, and those who contain themselves >>>> should be very rare. >> >> +1 to a warning, especially if the problem is only related to circular >> references. > > Hmm, a warning, ok. > > Do we desperately want to get a release out, or do we want it for > somewhere e.g. at the end of the week? Because fixing this issue > wouldn't be too hard I think, and it might give us some more time to > review and merge Vitja's code. super() is pretty neat. > >> - Robert >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> > From markflorisson88 at gmail.com Sat Oct 29 19:50:57 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 29 Oct 2011 18:50:57 +0100 Subject: [Cython] Cython 0.16 In-Reply-To: References: Message-ID: On 29 October 2011 18:47, mark florisson wrote: > On 29 October 2011 18:44, mark florisson wrote: >> On 29 October 2011 18:05, Robert Bradshaw wrote: >>> On Sat, Oct 29, 2011 at 4:41 AM, mark florisson >>> wrote: >>>> Hm ok I'll disable them then. Pointers and some other dtypes are also >>>> not supported yet. As for the documentation, have you guys reviewed >>>> the documentation for fused types and memoryviews? >>> >>> I looked at the fused types docs. >>> >>>> For instance this >>>> is the introduction for memoryviews: >>>> >>>> " >>>> Typed memoryviews can be used for efficient access to buffers. It is >>>> similar to the current buffer support, but has more features and >>>> cleaner syntax. A memoryview can be used in any context (function >>>> parameters, module-level, cdef class attribute, etc) and can be >>>> obtained from any object that exposes the PEP 3118 buffer interface. >>>> " >>>> >>>> but I'm not sure this new functionality won't confuse users of the old >>>> buffer support. >>>> >>>> For fused types, cython.numeric only includes long, double and double >>>> complex. I think that should be changed to short, int, long, float, >>>> double, float complex and double complex. >>> >>> Yes. What about size_t, ssize_t, and Py_ssize_t? >> >> Hmm, these things don't contain unsigned types as they may be chosen >> when calling directly (as they're longer), but they will cause >> problems for negative values. I think unsigned types should be >> explicit. I think size_t is also more for representing the size of >> objects, I'm not sure you'd want the same code operating on size_t and >> say, ints. Py_ssize_t is typically used as the type for indices, but >> not much else I think, so it might be weird to include it. > > Yes, I think the long long and long double ones should just be > excluded. If people want them they can fuse their own types. > >>>> I was deliberately avoiding >>>> long long and long double as they (if not used as a base type) would >>>> be preferred over the others and may be a lot slower. But then, such >>>> usage wouldn't be very useful. Should I include them then? >>> >>> That's a good question. Perhaps these two could be used if explicitly >>> requested, or for dispatching from a Python long (in Py2) or >>> non-word-sized int (in Py3). >> >> I'm not sure I understand, how would you request them explicitly? The >> user could always just created a fused type manually if he/she wants >> long long, long double, or long double complex. >> >>>> On 29 October 2011 10:30, Dag Sverre Seljebotn >>>> wrote: >>>>> Re b), it would be better to disable object dtypes (or emit a warning about >>>>> the possible bug when using them) than to delay the release. Object >>>>> memoryviews are rare in the first place, and those who contain themselves >>>>> should be very rare. >>> >>> +1 to a warning, especially if the problem is only related to circular >>> references. >> >> Hmm, a warning, ok. >> >> Do we desperately want to get a release out, or do we want it for >> somewhere e.g. at the end of the week? Because fixing this issue >> wouldn't be too hard I think, and it might give us some more time to >> review and merge Vitja's code. super() is pretty neat. >> >>> - Robert >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >>> >> > Maybe numpy.pxd could provide a numpy version of integral, floating and numeric, that will contain all relevant numpy types. From robertwb at math.washington.edu Sat Oct 29 19:59:18 2011 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Sat, 29 Oct 2011 10:59:18 -0700 Subject: [Cython] Cython 0.16 In-Reply-To: References: Message-ID: On Sat, Oct 29, 2011 at 10:44 AM, mark florisson wrote: > On 29 October 2011 18:05, Robert Bradshaw wrote: >> On Sat, Oct 29, 2011 at 4:41 AM, mark florisson >> wrote: >>> Hm ok I'll disable them then. Pointers and some other dtypes are also >>> not supported yet. As for the documentation, have you guys reviewed >>> the documentation for fused types and memoryviews? >> >> I looked at the fused types docs. >> >>> For instance this >>> is the introduction for memoryviews: >>> >>> " >>> Typed memoryviews can be used for efficient access to buffers. It is >>> similar to the current buffer support, but has more features and >>> cleaner syntax. A memoryview can be used in any context (function >>> parameters, module-level, cdef class attribute, etc) and can be >>> obtained from any object that exposes the PEP 3118 buffer interface. >>> " >>> >>> but I'm not sure this new functionality won't confuse users of the old >>> buffer support. >>> >>> For fused types, cython.numeric only includes long, double and double >>> complex. I think that should be changed to short, int, long, float, >>> double, float complex and double complex. >> >> Yes. What about size_t, ssize_t, and Py_ssize_t? > > Hmm, these things don't contain unsigned types as they may be chosen > when calling directly (as they're longer), but they will cause > problems for negative values. I think unsigned types should be > explicit. You're right about unsigned. > I think size_t is also more for representing the size of > objects, I'm not sure you'd want the same code operating on size_t and > say, ints. Py_ssize_t is typically used as the type for indices, but > not much else I think, so it might be weird to include it. I was thinking if one had cdef foo(integral x): ... then foo[ssize_t] should be available, but perhaps not used implicitly. I suppose this would be an exceptional case for dispatching, and cdef fused my_type: integral ssize_t long long is easy enough for the user to do. >>> I was deliberately avoiding >>> long long and long double as they (if not used as a base type) would >>> be preferred over the others and may be a lot slower. But then, such >>> usage wouldn't be very useful. Should I include them then? >> >> That's a good question. Perhaps these two could be used if explicitly >> requested, or for dispatching from a Python long (in Py2) or >> non-word-sized int (in Py3). > > I'm not sure I understand, how would you request them explicitly? The > user could always just created a fused type manually if he/she wants > long long, long double, or long double complex. > >>> On 29 October 2011 10:30, Dag Sverre Seljebotn >>> wrote: >>>> Re b), it would be better to disable object dtypes (or emit a warning about >>>> the possible bug when using them) than to delay the release. Object >>>> memoryviews are rare in the first place, and those who contain themselves >>>> should be very rare. >> >> +1 to a warning, especially if the problem is only related to circular >> references. > > Hmm, a warning, ok. > > Do we desperately want to get a release out, or do we want it for > somewhere e.g. at the end of the week? Because fixing this issue > wouldn't be too hard I think, and it might give us some more time to > review and merge Vitja's code. super() is pretty neat. No hurry, but I was thinking it'd be good to get the ball rolling and get these features released. - Robert From markflorisson88 at gmail.com Sat Oct 29 20:03:16 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 29 Oct 2011 19:03:16 +0100 Subject: [Cython] Cython 0.16 In-Reply-To: References: Message-ID: On 29 October 2011 18:59, Robert Bradshaw wrote: > On Sat, Oct 29, 2011 at 10:44 AM, mark florisson > wrote: >> On 29 October 2011 18:05, Robert Bradshaw wrote: >>> On Sat, Oct 29, 2011 at 4:41 AM, mark florisson >>> wrote: >>>> Hm ok I'll disable them then. Pointers and some other dtypes are also >>>> not supported yet. As for the documentation, have you guys reviewed >>>> the documentation for fused types and memoryviews? >>> >>> I looked at the fused types docs. >>> >>>> For instance this >>>> is the introduction for memoryviews: >>>> >>>> " >>>> Typed memoryviews can be used for efficient access to buffers. It is >>>> similar to the current buffer support, but has more features and >>>> cleaner syntax. A memoryview can be used in any context (function >>>> parameters, module-level, cdef class attribute, etc) and can be >>>> obtained from any object that exposes the PEP 3118 buffer interface. >>>> " >>>> >>>> but I'm not sure this new functionality won't confuse users of the old >>>> buffer support. >>>> >>>> For fused types, cython.numeric only includes long, double and double >>>> complex. I think that should be changed to short, int, long, float, >>>> double, float complex and double complex. >>> >>> Yes. What about size_t, ssize_t, and Py_ssize_t? >> >> Hmm, these things don't contain unsigned types as they may be chosen >> when calling directly (as they're longer), but they will cause >> problems for negative values. I think unsigned types should be >> explicit. > > You're right about unsigned. > >> I think size_t is also more for representing the size of >> objects, I'm not sure you'd want the same code operating on size_t and >> say, ints. Py_ssize_t is typically used as the type for indices, but >> not much else I think, so it might be weird to include it. > > I was thinking if one had > > cdef foo(integral x): > ? ... > > then foo[ssize_t] > > should be available, but perhaps not used implicitly. I suppose this > would be an exceptional case for dispatching, and > > cdef fused my_type: > ? ?integral > ? ?ssize_t > ? ?long long > > is easy enough for the user to do. Ah, I see. Yeah, that's not implemented :P >>>> I was deliberately avoiding >>>> long long and long double as they (if not used as a base type) would >>>> be preferred over the others and may be a lot slower. But then, such >>>> usage wouldn't be very useful. Should I include them then? >>> >>> That's a good question. Perhaps these two could be used if explicitly >>> requested, or for dispatching from a Python long (in Py2) or >>> non-word-sized int (in Py3). >> >> I'm not sure I understand, how would you request them explicitly? The >> user could always just created a fused type manually if he/she wants >> long long, long double, or long double complex. >> >>>> On 29 October 2011 10:30, Dag Sverre Seljebotn >>>> wrote: >>>>> Re b), it would be better to disable object dtypes (or emit a warning about >>>>> the possible bug when using them) than to delay the release. Object >>>>> memoryviews are rare in the first place, and those who contain themselves >>>>> should be very rare. >>> >>> +1 to a warning, especially if the problem is only related to circular >>> references. >> >> Hmm, a warning, ok. >> >> Do we desperately want to get a release out, or do we want it for >> somewhere e.g. at the end of the week? Because fixing this issue >> wouldn't be too hard I think, and it might give us some more time to >> review and merge Vitja's code. super() is pretty neat. > > No hurry, but I was thinking it'd be good to get the ball rolling and > get these features released. > > - Robert > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From stefan_ml at behnel.de Sun Oct 30 09:49:11 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 30 Oct 2011 09:49:11 +0100 Subject: [Cython] Cython 0.16 In-Reply-To: References: <4EAC12CA.5040200@behnel.de> Message-ID: <4EAD0F87.5000208@behnel.de> Robert Bradshaw, 29.10.2011 18:58: > On Sat, Oct 29, 2011 at 7:50 AM, Stefan Behnel wrote: >> I still haven't investigated the decorator issue that appeared in the Sage >> tests. I think it's related to decorators on module level def functions, >> which would suggest that it's best to eventually fix it as part of the >> function implementation changes that Vitja has started. But there may still >> be a simpler work-around somewhere that I'm not seeing yet. >> >> I basically broke the Sage tests by resolving a bug (593 IIRC), and both >> don't currently work together. So, a variant would be to revert my changes >> for 0.16 and just leave the bug in, if that keeps us from breaking existing >> code for now. >> >> But even leaving that out, the Sage tests look seriously broken currently: >> >> https://sage.math.washington.edu:8091/hudson/view/All/job/sage-tests/lastCompletedBuild/consoleFull > > I recently fixed the Sage build (the errors on public api for non > public types broke it). As for those tests, they seem to be related to > name mangling for double-underscore names. Did something change here > recently? Not exactly recently. I implemented private name mangling for cdef classes, but I could swear that that was long before the Sage build started showing these failures. At least, I'm surprised to see them now. Stefan From markflorisson88 at gmail.com Sun Oct 30 16:39:24 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 30 Oct 2011 15:39:24 +0000 Subject: [Cython] Cython 0.16 In-Reply-To: References: Message-ID: On 28 October 2011 21:59, mark florisson wrote: > On 28 October 2011 21:55, Robert Bradshaw wrote: >> With Mark's fused types and memory views going in, I think it's about >> time for a new release. Thoughts? Anyone want to volunteer to take up >> the process? >> >> - Robert >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> > > That'd be cool. However there are a few outstanding issues: > ? ?a) the compiler is somewhat slower (possible solution: lazy utility codes) > ? ?b) there's a potential memory leak problem for memoryviews with > object dtype that contain themselves, this still needs investigation. > > As for a), Stefan mentioned code spending a lot of time in sub. > Stefan, could you post the code for this that made Cython compile very > slowly? > It seems that NumPy does not support cyclic references, it has '(traverseproc)0, /* tp_traverse */' in its source (PyArray_Type is the ndarray right?). Indeed, this code prints "deallocated!" only if there is no reference cycle: import numpy cdef class DeallocateMe(object): def __dealloc__(self): print "deallocated!" a = numpy.arange(20, dtype=numpy.object) a[10] = DeallocateMe() a[1] = a # <- try commenting out this line del a import gc gc.collect() Anyway, I got it to traverse and clear the buffer object and the memoryview slice struct, so it should work if the buffer exporter supports cycles. From markflorisson88 at gmail.com Mon Oct 31 23:13:06 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 31 Oct 2011 22:13:06 +0000 Subject: [Cython] Cython 0.16 In-Reply-To: References: Message-ID: We can now pass a chunksize argument into prange: https://github.com/cython/cython/commit/5c3e77d3c70686fedd5619d7267728fc819b4c60 On 29 October 2011 14:14, mark florisson wrote: > Before we do a release, would anyone be opposed to a 'chunksize' > keyword argument to prange()? That may have significant performance > impacts. > > On 29 October 2011 12:41, mark florisson wrote: >> Hm ok I'll disable them then. Pointers and some other dtypes are also >> not supported yet. As for the documentation, have you guys reviewed >> the documentation for fused types and memoryviews? For instance this >> is the introduction for memoryviews: >> >> " >> Typed memoryviews can be used for efficient access to buffers. It is >> similar to the current buffer support, but has more features and >> cleaner syntax. A memoryview can be used in any context (function >> parameters, module-level, cdef class attribute, etc) and can be >> obtained from any object that exposes the PEP 3118 buffer interface. >> " >> >> but I'm not sure this new functionality won't confuse users of the old >> buffer support. >> >> For fused types, cython.numeric only includes long, double and double >> complex. I think that should be changed to short, int, long, float, >> double, float complex and double complex. I was deliberately avoiding >> long long and long double as they (if not used as a base type) would >> be preferred over the others and may be a lot slower. But then, such >> usage wouldn't be very useful. Should I include them then? >> >> On 29 October 2011 10:30, Dag Sverre Seljebotn >> wrote: >>> Re b), it would be better to disable object dtypes (or emit a warning about >>> the possible bug when using them) than to delay the release. Object >>> memoryviews are rare in the first place, and those who contain themselves >>> should be very rare. >>> -- >>> Sent from my Android phone with K-9 Mail. Please excuse my brevity. >>> >>> Robert Bradshaw wrote: >>>> >>>> On Fri, Oct 28, 2011 at 1:59 PM, mark florisson >>>> wrote: > On 28 October 2011 21:55, Robert >>>> Bradshaw wrote: >> With Mark's fused types >>>> and memory views going in, I think it's about >> time for a new release. >>>> Thoughts? Anyone want to volunteer to take up >> the process? >> >> - Robert >>>> >> >>>> ________________________________ >>>> >> cython-devel mailing list >> cython-devel at python.org >> >>>> >> http://mail.python.org/mailman/listinfo/cython-devel >> > > That'd be cool. >>>> >> However there are a few outstanding issues: > ? ?a) the compiler is somewhat >>>> >> slower (possible solution: lazy utility codes) Yeah, I forgot about that. >>>> >> This should get resolved. Lazy utility codes (perhaps breaking them up) >>>> >> would probably got us most of the way there. Long term, I really like the >>>> >> "declaration caching" idea which could be used for users .pxd files as well >>>> >> as internally. > ? ?b) there's a potential memory leak problem for >>>> >> memoryviews with > object dtype that contain themselves, this still needs >>>> >> investigation. I think this could be mentioned as a caviat rather than being >>>> >> a blocker. > As for a), Stefan mentioned code spending a lot of time in sub. >>>> >> > Stefan, could you post the code for this that made Cython compile very > >>>> >> slowly? > >>>> ________________________________ >>>> > cython-devel mailing list > cython-devel at python.org > >>>> > http://mail.python.org/mailman/listinfo/cython-devel > >>>> ________________________________ >>>> cython-devel mailing list cython-devel at python.org >>>> http://mail.python.org/mailman/listinfo/cython-devel >>> >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >>> >>> >> >