From markflorisson88 at gmail.com  Sun Oct  2 12:38:23 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sun, 2 Oct 2011 11:38:23 +0100
Subject: [Cython] buffer bug
Message-ID: <CANg26EVRpcgUACHsgnMfzACPz9MR5wxxgNankhmJ0j--8iYzyQ@mail.gmail.com>

Hey,

I'm unable to login in trac, but I found a bug in the buffer support:

cimport cython
cimport numpy as np

@cython.boundscheck(False)
@cython.wraparound(False)
cdef void func(np.ndarray[np.float32_t, ndim=2] a) nogil:
    pass

This calls __Pyx_GetBufferAndValidate, which needs the GIL.

When I get the last failing tests fixed (introduced after rebasing on
the latest master) for memoryviews, should be transform the current
buffer support to memoryviews before doing a release? The only
incompatibility I see is that readonly buffers are not supported.
On the other hand it might be a good idea to wait with that, in case
there are any bugs. We don't want to break everyone's existing code.
Opinions?

Mark

From d.s.seljebotn at astro.uio.no  Sun Oct  2 13:04:26 2011
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Sun, 02 Oct 2011 13:04:26 +0200
Subject: [Cython] buffer bug
In-Reply-To: <CANg26EVRpcgUACHsgnMfzACPz9MR5wxxgNankhmJ0j--8iYzyQ@mail.gmail.com>
References: <CANg26EVRpcgUACHsgnMfzACPz9MR5wxxgNankhmJ0j--8iYzyQ@mail.gmail.com>
Message-ID: <4E88453A.6090702@astro.uio.no>

On 10/02/2011 12:38 PM, mark florisson wrote:
> Hey,
>
> I'm unable to login in trac, but I found a bug in the buffer support:
>
> cimport cython
> cimport numpy as np
>
> @cython.boundscheck(False)
> @cython.wraparound(False)
> cdef void func(np.ndarray[np.float32_t, ndim=2] a) nogil:
>      pass
>
> This calls __Pyx_GetBufferAndValidate, which needs the GIL.

Hmm. I thought buffers were disallowed as arguments to cdef functions?

> When I get the last failing tests fixed (introduced after rebasing on
> the latest master) for memoryviews, should be transform the current
> buffer support to memoryviews before doing a release? The only
> incompatibility I see is that readonly buffers are not supported.

Do you mean readonly memoryviews?

I'm not sure how much of an issue it is. NumPy arrays support being 
readonly, but it is not straightforward to make a NumPy array so.

Eventually I guess "const int[:]" should be supported; one could do so 
even without allowing const anywhere else.

> On the other hand it might be a good idea to wait with that, in case
> there are any bugs. We don't want to break everyone's existing code.
> Opinions?

I think this is mostly a question of how much time you have to work on 
it. Transforming buffer support into memoryviews would be a new feature 
branch, and whether that branch is merged into next release depends on 
the timing for the next release I'd say. I don't think a new release has 
to happen in the meantime, if you want to make it before, all the better!

Dag Sverre

From markflorisson88 at gmail.com  Sun Oct  2 13:13:05 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sun, 2 Oct 2011 12:13:05 +0100
Subject: [Cython] buffer bug
In-Reply-To: <4E88453A.6090702@astro.uio.no>
References: <CANg26EVRpcgUACHsgnMfzACPz9MR5wxxgNankhmJ0j--8iYzyQ@mail.gmail.com>
	<4E88453A.6090702@astro.uio.no>
Message-ID: <CANg26EUJ4jN+NwM3Z8yD4gLFrJCF5k6NKNtKSpzYGuSm737uLg@mail.gmail.com>

On 2 October 2011 12:04, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 10/02/2011 12:38 PM, mark florisson wrote:
>>
>> Hey,
>>
>> I'm unable to login in trac, but I found a bug in the buffer support:
>>
>> cimport cython
>> cimport numpy as np
>>
>> @cython.boundscheck(False)
>> @cython.wraparound(False)
>> cdef void func(np.ndarray[np.float32_t, ndim=2] a) nogil:
>> ? ? pass
>>
>> This calls __Pyx_GetBufferAndValidate, which needs the GIL.
>
> Hmm. I thought buffers were disallowed as arguments to cdef functions?

Ah, perhaps they are, but I didn't get any error message.

>> When I get the last failing tests fixed (introduced after rebasing on
>> the latest master) for memoryviews, should be transform the current
>> buffer support to memoryviews before doing a release? The only
>> incompatibility I see is that readonly buffers are not supported.
>
> Do you mean readonly memoryviews?
>
> I'm not sure how much of an issue it is. NumPy arrays support being
> readonly, but it is not straightforward to make a NumPy array so.
>
> Eventually I guess "const int[:]" should be supported; one could do so even
> without allowing const anywhere else.

Right, readonly memoryviews, I think the current buffer support
supports not asking for a buffer with PyBUF_WRITABLE (when there is no
item assignment in that function). Memoryviews cannot make the same
assumption because they can be passed anywhere without requesting a
new buffer, so they always insert PyBUF_WRITABLE in the flags when
requesting the buffer.

>> On the other hand it might be a good idea to wait with that, in case
>> there are any bugs. We don't want to break everyone's existing code.
>> Opinions?
>
> I think this is mostly a question of how much time you have to work on it.
> Transforming buffer support into memoryviews would be a new feature branch,
> and whether that branch is merged into next release depends on the timing
> for the next release I'd say. I don't think a new release has to happen in
> the meantime, if you want to make it before, all the better!
>
> Dag Sverre
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

Ok, sounds good. Let's see what happens, I'm probably going to be
quite busy, but the weather forecast also mentioned rain... :)

From vitja.makarov at gmail.com  Sun Oct  2 19:52:23 2011
From: vitja.makarov at gmail.com (Vitja Makarov)
Date: Sun, 2 Oct 2011 21:52:23 +0400
Subject: [Cython] CyFunction refactoring plan
In-Reply-To: <CANg26EUnsO7GvXG+vBmLjWEotDVNOPqbqf=4rghpx9YWUmKg-w@mail.gmail.com>
References: <CAKGHGPRbKR576nTorj24G4YLNBMC_MS2CL6vjSbsC+2DRyKJkA@mail.gmail.com>
	<CAKGHGPTm12o+vdaG8-JPdL_o2tMN4BehFoun9b8Ewe=vBUmB_Q@mail.gmail.com>
	<4E8556E7.7050007@behnel.de>
	<CADiQ+QATMMGFjgdD9GgxiNjRWyYymdKFkcrtfKZz3-mwDRCvsw@mail.gmail.com>
	<CAKGHGPQPZqtE_CHRorVoC0nVS8qss-zzsnpULsZrCeF7mmVMXg@mail.gmail.com>
	<CAKGHGPQxR1mK6oKiYGNj+DAaruyZ9naATA=JYJw7Ej69BWYX1A@mail.gmail.com>
	<CANg26EUnsO7GvXG+vBmLjWEotDVNOPqbqf=4rghpx9YWUmKg-w@mail.gmail.com>
Message-ID: <CAKGHGPSWWmOdbiryQezNp7a-EGFMWSrADDSsULC0k5hYqdiJVw@mail.gmail.com>

2011/9/30 mark florisson <markflorisson88 at gmail.com>:
> On 30 September 2011 07:47, Vitja Makarov <vitja.makarov at gmail.com> wrote:
>> 2011/9/30 Vitja Makarov <vitja.makarov at gmail.com>:
>>> 2011/9/30 Robert Bradshaw <robertwb at math.washington.edu>:
>>>> On Thu, Sep 29, 2011 at 10:43 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
>>>>> Vitja Makarov, 30.09.2011 06:41:
>>>>>>
>>>>>> 2011/9/28 Vitja Makarov:
>>>>>>>
>>>>>>> I tried to build simple plan for ongoing cython function refactoring
>>>>>>>
>>>>>>> * Replace assignment synthesis with SingleAssignmentNode, where LHS is
>>>>>>> NameNode and RHS is PyCFunctionNode
>>>>>>> * Split function body into python wrapper and C function
>>>>>>> http://wiki.cython.org/enhancements/generators#Pythonfunctionrefactoring
>>>>>>>
>>>>>>> Then we can implement some features and optimizations:
>>>>>>>
>>>>>>> * Reduce difference between cdef and def functions
>>>>>>> * Store runtime evaluated default values inside CyFunction, ticket #674
>>>>>>> * Implement no-args super(), ticket #696
>>>>>>> * Function call inlining
>>>>>>
>>>>>> If nobody don't mind I would start with first one.
>>>>
>>>> I would love to see this happen.
>>>>
>>>>> Please go ahead. :)
>>>>>
>>>>> Note that you will encounter some problems when enabling name assignments
>>>>> for all named functions. I tried that at least once and it "didn't work",
>>>>> but I didn't take the time yet to investigate them further.
>>>>>
>>>>> I assume you are going to work on this in your own repo?
>>>>
>>>> Please also coordinate with Mark's work on function dispatching for
>>>> fused types.
>>>>
>>>
>>> I assume that that fused type functions are cdef ones so I think that
>>> should be easy to merge.
>>> On the other hand it's better to have Mark's branch merged into master.
>>>
>>> Mark, what is the state of your fused types branch?
>>> Is it possible to break it into smaller parts to ease reviewing and merging?
>>>
>>
>> It seems I meant memview branch not fusedtypes.
>
> There are 2 pending branches, _memview_rebase, which has support for
> memoryviews, and fusedtypes. The former is ready for merge, it's
> waiting to be reviewed. The fused types branch needs to subclass
> CyFunction (it basically modified the old binding function). There was
> also some duplicate functionality there, so I thought it'd be easier
> and more convenient to use the utility code loading there.
>
> Since it's not a strict dependency and it will be blocking progress, I
> will try to find some time to get it merge-ready for master.
>
> But no, it does cdef, cpdef and def methods, and it has some changes
> to all function nodes (FuncdefNode, CFuncdefNode and DefNode). These
> changes shouldn't be major though, but the logic in FusedFuncdefNode
> does differentiate between all the different functions in order to
> support them. Feel free to ask me about specifics any time.
>

I've moved def node assignment synthesis into
DefNodeAssignmentSynthesis transformation.

https://github.com/vitek/cython/commit/efacfed3c9cd8216b6c2100073a9df809b76675a

Instead of moving defnode into PyCFunctionNode I've inserted
assignment statement right after defnode.
This is much more easy and seems ok to me.

-- 
vitja.

From markflorisson88 at gmail.com  Sun Oct  2 20:21:40 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sun, 2 Oct 2011 19:21:40 +0100
Subject: [Cython] CyFunction refactoring plan
In-Reply-To: <CAKGHGPSWWmOdbiryQezNp7a-EGFMWSrADDSsULC0k5hYqdiJVw@mail.gmail.com>
References: <CAKGHGPRbKR576nTorj24G4YLNBMC_MS2CL6vjSbsC+2DRyKJkA@mail.gmail.com>
	<CAKGHGPTm12o+vdaG8-JPdL_o2tMN4BehFoun9b8Ewe=vBUmB_Q@mail.gmail.com>
	<4E8556E7.7050007@behnel.de>
	<CADiQ+QATMMGFjgdD9GgxiNjRWyYymdKFkcrtfKZz3-mwDRCvsw@mail.gmail.com>
	<CAKGHGPQPZqtE_CHRorVoC0nVS8qss-zzsnpULsZrCeF7mmVMXg@mail.gmail.com>
	<CAKGHGPQxR1mK6oKiYGNj+DAaruyZ9naATA=JYJw7Ej69BWYX1A@mail.gmail.com>
	<CANg26EUnsO7GvXG+vBmLjWEotDVNOPqbqf=4rghpx9YWUmKg-w@mail.gmail.com>
	<CAKGHGPSWWmOdbiryQezNp7a-EGFMWSrADDSsULC0k5hYqdiJVw@mail.gmail.com>
Message-ID: <CANg26EVtRPNrjDzJroa74SMXakxWqk_ajFsKOipf1HiMfMpECA@mail.gmail.com>

On 2 October 2011 18:52, Vitja Makarov <vitja.makarov at gmail.com> wrote:
> 2011/9/30 mark florisson <markflorisson88 at gmail.com>:
>> On 30 September 2011 07:47, Vitja Makarov <vitja.makarov at gmail.com> wrote:
>>> 2011/9/30 Vitja Makarov <vitja.makarov at gmail.com>:
>>>> 2011/9/30 Robert Bradshaw <robertwb at math.washington.edu>:
>>>>> On Thu, Sep 29, 2011 at 10:43 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
>>>>>> Vitja Makarov, 30.09.2011 06:41:
>>>>>>>
>>>>>>> 2011/9/28 Vitja Makarov:
>>>>>>>>
>>>>>>>> I tried to build simple plan for ongoing cython function refactoring
>>>>>>>>
>>>>>>>> * Replace assignment synthesis with SingleAssignmentNode, where LHS is
>>>>>>>> NameNode and RHS is PyCFunctionNode
>>>>>>>> * Split function body into python wrapper and C function
>>>>>>>> http://wiki.cython.org/enhancements/generators#Pythonfunctionrefactoring
>>>>>>>>
>>>>>>>> Then we can implement some features and optimizations:
>>>>>>>>
>>>>>>>> * Reduce difference between cdef and def functions
>>>>>>>> * Store runtime evaluated default values inside CyFunction, ticket #674
>>>>>>>> * Implement no-args super(), ticket #696
>>>>>>>> * Function call inlining
>>>>>>>
>>>>>>> If nobody don't mind I would start with first one.
>>>>>
>>>>> I would love to see this happen.
>>>>>
>>>>>> Please go ahead. :)
>>>>>>
>>>>>> Note that you will encounter some problems when enabling name assignments
>>>>>> for all named functions. I tried that at least once and it "didn't work",
>>>>>> but I didn't take the time yet to investigate them further.
>>>>>>
>>>>>> I assume you are going to work on this in your own repo?
>>>>>
>>>>> Please also coordinate with Mark's work on function dispatching for
>>>>> fused types.
>>>>>
>>>>
>>>> I assume that that fused type functions are cdef ones so I think that
>>>> should be easy to merge.
>>>> On the other hand it's better to have Mark's branch merged into master.
>>>>
>>>> Mark, what is the state of your fused types branch?
>>>> Is it possible to break it into smaller parts to ease reviewing and merging?
>>>>
>>>
>>> It seems I meant memview branch not fusedtypes.
>>
>> There are 2 pending branches, _memview_rebase, which has support for
>> memoryviews, and fusedtypes. The former is ready for merge, it's
>> waiting to be reviewed. The fused types branch needs to subclass
>> CyFunction (it basically modified the old binding function). There was
>> also some duplicate functionality there, so I thought it'd be easier
>> and more convenient to use the utility code loading there.
>>
>> Since it's not a strict dependency and it will be blocking progress, I
>> will try to find some time to get it merge-ready for master.
>>
>> But no, it does cdef, cpdef and def methods, and it has some changes
>> to all function nodes (FuncdefNode, CFuncdefNode and DefNode). These
>> changes shouldn't be major though, but the logic in FusedFuncdefNode
>> does differentiate between all the different functions in order to
>> support them. Feel free to ask me about specifics any time.
>>
>
> I've moved def node assignment synthesis into
> DefNodeAssignmentSynthesis transformation.
>
> https://github.com/vitek/cython/commit/efacfed3c9cd8216b6c2100073a9df809b76675a
>
> Instead of moving defnode into PyCFunctionNode I've inserted
> assignment statement right after defnode.
> This is much more easy and seems ok to me.
>
> --
> vitja.
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

Ah, I thought you were going to wait until fused types were merged. In
any case, this doesn't look like it will give too many conflicts, but
there will be a few. I'm currently moving CyFunction to a utility code
file and making a FusedFunction subclass.

From vitja.makarov at gmail.com  Sun Oct  2 20:44:34 2011
From: vitja.makarov at gmail.com (Vitja Makarov)
Date: Sun, 2 Oct 2011 22:44:34 +0400
Subject: [Cython] CyFunction refactoring plan
In-Reply-To: <CANg26EVtRPNrjDzJroa74SMXakxWqk_ajFsKOipf1HiMfMpECA@mail.gmail.com>
References: <CAKGHGPRbKR576nTorj24G4YLNBMC_MS2CL6vjSbsC+2DRyKJkA@mail.gmail.com>
	<CAKGHGPTm12o+vdaG8-JPdL_o2tMN4BehFoun9b8Ewe=vBUmB_Q@mail.gmail.com>
	<4E8556E7.7050007@behnel.de>
	<CADiQ+QATMMGFjgdD9GgxiNjRWyYymdKFkcrtfKZz3-mwDRCvsw@mail.gmail.com>
	<CAKGHGPQPZqtE_CHRorVoC0nVS8qss-zzsnpULsZrCeF7mmVMXg@mail.gmail.com>
	<CAKGHGPQxR1mK6oKiYGNj+DAaruyZ9naATA=JYJw7Ej69BWYX1A@mail.gmail.com>
	<CANg26EUnsO7GvXG+vBmLjWEotDVNOPqbqf=4rghpx9YWUmKg-w@mail.gmail.com>
	<CAKGHGPSWWmOdbiryQezNp7a-EGFMWSrADDSsULC0k5hYqdiJVw@mail.gmail.com>
	<CANg26EVtRPNrjDzJroa74SMXakxWqk_ajFsKOipf1HiMfMpECA@mail.gmail.com>
Message-ID: <CAKGHGPS55SeNkUE8LsHYezn34R-k0YYsxZCxYQUStsii8=T=Dg@mail.gmail.com>

2011/10/2 mark florisson <markflorisson88 at gmail.com>:
> On 2 October 2011 18:52, Vitja Makarov <vitja.makarov at gmail.com> wrote:
>> 2011/9/30 mark florisson <markflorisson88 at gmail.com>:
>>> On 30 September 2011 07:47, Vitja Makarov <vitja.makarov at gmail.com> wrote:
>>>> 2011/9/30 Vitja Makarov <vitja.makarov at gmail.com>:
>>>>> 2011/9/30 Robert Bradshaw <robertwb at math.washington.edu>:
>>>>>> On Thu, Sep 29, 2011 at 10:43 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
>>>>>>> Vitja Makarov, 30.09.2011 06:41:
>>>>>>>>
>>>>>>>> 2011/9/28 Vitja Makarov:
>>>>>>>>>
>>>>>>>>> I tried to build simple plan for ongoing cython function refactoring
>>>>>>>>>
>>>>>>>>> * Replace assignment synthesis with SingleAssignmentNode, where LHS is
>>>>>>>>> NameNode and RHS is PyCFunctionNode
>>>>>>>>> * Split function body into python wrapper and C function
>>>>>>>>> http://wiki.cython.org/enhancements/generators#Pythonfunctionrefactoring
>>>>>>>>>
>>>>>>>>> Then we can implement some features and optimizations:
>>>>>>>>>
>>>>>>>>> * Reduce difference between cdef and def functions
>>>>>>>>> * Store runtime evaluated default values inside CyFunction, ticket #674
>>>>>>>>> * Implement no-args super(), ticket #696
>>>>>>>>> * Function call inlining
>>>>>>>>
>>>>>>>> If nobody don't mind I would start with first one.
>>>>>>
>>>>>> I would love to see this happen.
>>>>>>
>>>>>>> Please go ahead. :)
>>>>>>>
>>>>>>> Note that you will encounter some problems when enabling name assignments
>>>>>>> for all named functions. I tried that at least once and it "didn't work",
>>>>>>> but I didn't take the time yet to investigate them further.
>>>>>>>
>>>>>>> I assume you are going to work on this in your own repo?
>>>>>>
>>>>>> Please also coordinate with Mark's work on function dispatching for
>>>>>> fused types.
>>>>>>
>>>>>
>>>>> I assume that that fused type functions are cdef ones so I think that
>>>>> should be easy to merge.
>>>>> On the other hand it's better to have Mark's branch merged into master.
>>>>>
>>>>> Mark, what is the state of your fused types branch?
>>>>> Is it possible to break it into smaller parts to ease reviewing and merging?
>>>>>
>>>>
>>>> It seems I meant memview branch not fusedtypes.
>>>
>>> There are 2 pending branches, _memview_rebase, which has support for
>>> memoryviews, and fusedtypes. The former is ready for merge, it's
>>> waiting to be reviewed. The fused types branch needs to subclass
>>> CyFunction (it basically modified the old binding function). There was
>>> also some duplicate functionality there, so I thought it'd be easier
>>> and more convenient to use the utility code loading there.
>>>
>>> Since it's not a strict dependency and it will be blocking progress, I
>>> will try to find some time to get it merge-ready for master.
>>>
>>> But no, it does cdef, cpdef and def methods, and it has some changes
>>> to all function nodes (FuncdefNode, CFuncdefNode and DefNode). These
>>> changes shouldn't be major though, but the logic in FusedFuncdefNode
>>> does differentiate between all the different functions in order to
>>> support them. Feel free to ask me about specifics any time.
>>>
>>
>> I've moved def node assignment synthesis into
>> DefNodeAssignmentSynthesis transformation.
>>
>> https://github.com/vitek/cython/commit/efacfed3c9cd8216b6c2100073a9df809b76675a
>>
>> Instead of moving defnode into PyCFunctionNode I've inserted
>> assignment statement right after defnode.
>> This is much more easy and seems ok to me.
>>
>> --
>> vitja.
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>
> Ah, I thought you were going to wait until fused types were merged. In
> any case, this doesn't look like it will give too many conflicts, but
> there will be a few.

Yeah, I just had a free time and decided to try. I think fused types
should be merged first.

> I'm currently moving CyFunction to a utility code
> file and making a FusedFunction subclass.
>

That's cool! Btw, have you seen utility code related bug in hudson it
happens only with py2.4?

-- 
vitja.

From markflorisson88 at gmail.com  Sun Oct  2 20:57:08 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sun, 2 Oct 2011 19:57:08 +0100
Subject: [Cython] CyFunction refactoring plan
In-Reply-To: <CAKGHGPS55SeNkUE8LsHYezn34R-k0YYsxZCxYQUStsii8=T=Dg@mail.gmail.com>
References: <CAKGHGPRbKR576nTorj24G4YLNBMC_MS2CL6vjSbsC+2DRyKJkA@mail.gmail.com>
	<CAKGHGPTm12o+vdaG8-JPdL_o2tMN4BehFoun9b8Ewe=vBUmB_Q@mail.gmail.com>
	<4E8556E7.7050007@behnel.de>
	<CADiQ+QATMMGFjgdD9GgxiNjRWyYymdKFkcrtfKZz3-mwDRCvsw@mail.gmail.com>
	<CAKGHGPQPZqtE_CHRorVoC0nVS8qss-zzsnpULsZrCeF7mmVMXg@mail.gmail.com>
	<CAKGHGPQxR1mK6oKiYGNj+DAaruyZ9naATA=JYJw7Ej69BWYX1A@mail.gmail.com>
	<CANg26EUnsO7GvXG+vBmLjWEotDVNOPqbqf=4rghpx9YWUmKg-w@mail.gmail.com>
	<CAKGHGPSWWmOdbiryQezNp7a-EGFMWSrADDSsULC0k5hYqdiJVw@mail.gmail.com>
	<CANg26EVtRPNrjDzJroa74SMXakxWqk_ajFsKOipf1HiMfMpECA@mail.gmail.com>
	<CAKGHGPS55SeNkUE8LsHYezn34R-k0YYsxZCxYQUStsii8=T=Dg@mail.gmail.com>
Message-ID: <CANg26EVL6fwV3m51dgKSf5n=NYw7rB1NtmGXQ+XHxN338p5uoA@mail.gmail.com>

On 2 October 2011 19:44, Vitja Makarov <vitja.makarov at gmail.com> wrote:
> 2011/10/2 mark florisson <markflorisson88 at gmail.com>:
>> On 2 October 2011 18:52, Vitja Makarov <vitja.makarov at gmail.com> wrote:
>>> 2011/9/30 mark florisson <markflorisson88 at gmail.com>:
>>>> On 30 September 2011 07:47, Vitja Makarov <vitja.makarov at gmail.com> wrote:
>>>>> 2011/9/30 Vitja Makarov <vitja.makarov at gmail.com>:
>>>>>> 2011/9/30 Robert Bradshaw <robertwb at math.washington.edu>:
>>>>>>> On Thu, Sep 29, 2011 at 10:43 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
>>>>>>>> Vitja Makarov, 30.09.2011 06:41:
>>>>>>>>>
>>>>>>>>> 2011/9/28 Vitja Makarov:
>>>>>>>>>>
>>>>>>>>>> I tried to build simple plan for ongoing cython function refactoring
>>>>>>>>>>
>>>>>>>>>> * Replace assignment synthesis with SingleAssignmentNode, where LHS is
>>>>>>>>>> NameNode and RHS is PyCFunctionNode
>>>>>>>>>> * Split function body into python wrapper and C function
>>>>>>>>>> http://wiki.cython.org/enhancements/generators#Pythonfunctionrefactoring
>>>>>>>>>>
>>>>>>>>>> Then we can implement some features and optimizations:
>>>>>>>>>>
>>>>>>>>>> * Reduce difference between cdef and def functions
>>>>>>>>>> * Store runtime evaluated default values inside CyFunction, ticket #674
>>>>>>>>>> * Implement no-args super(), ticket #696
>>>>>>>>>> * Function call inlining
>>>>>>>>>
>>>>>>>>> If nobody don't mind I would start with first one.
>>>>>>>
>>>>>>> I would love to see this happen.
>>>>>>>
>>>>>>>> Please go ahead. :)
>>>>>>>>
>>>>>>>> Note that you will encounter some problems when enabling name assignments
>>>>>>>> for all named functions. I tried that at least once and it "didn't work",
>>>>>>>> but I didn't take the time yet to investigate them further.
>>>>>>>>
>>>>>>>> I assume you are going to work on this in your own repo?
>>>>>>>
>>>>>>> Please also coordinate with Mark's work on function dispatching for
>>>>>>> fused types.
>>>>>>>
>>>>>>
>>>>>> I assume that that fused type functions are cdef ones so I think that
>>>>>> should be easy to merge.
>>>>>> On the other hand it's better to have Mark's branch merged into master.
>>>>>>
>>>>>> Mark, what is the state of your fused types branch?
>>>>>> Is it possible to break it into smaller parts to ease reviewing and merging?
>>>>>>
>>>>>
>>>>> It seems I meant memview branch not fusedtypes.
>>>>
>>>> There are 2 pending branches, _memview_rebase, which has support for
>>>> memoryviews, and fusedtypes. The former is ready for merge, it's
>>>> waiting to be reviewed. The fused types branch needs to subclass
>>>> CyFunction (it basically modified the old binding function). There was
>>>> also some duplicate functionality there, so I thought it'd be easier
>>>> and more convenient to use the utility code loading there.
>>>>
>>>> Since it's not a strict dependency and it will be blocking progress, I
>>>> will try to find some time to get it merge-ready for master.
>>>>
>>>> But no, it does cdef, cpdef and def methods, and it has some changes
>>>> to all function nodes (FuncdefNode, CFuncdefNode and DefNode). These
>>>> changes shouldn't be major though, but the logic in FusedFuncdefNode
>>>> does differentiate between all the different functions in order to
>>>> support them. Feel free to ask me about specifics any time.
>>>>
>>>
>>> I've moved def node assignment synthesis into
>>> DefNodeAssignmentSynthesis transformation.
>>>
>>> https://github.com/vitek/cython/commit/efacfed3c9cd8216b6c2100073a9df809b76675a
>>>
>>> Instead of moving defnode into PyCFunctionNode I've inserted
>>> assignment statement right after defnode.
>>> This is much more easy and seems ok to me.
>>>
>>> --
>>> vitja.
>>> _______________________________________________
>>> cython-devel mailing list
>>> cython-devel at python.org
>>> http://mail.python.org/mailman/listinfo/cython-devel
>>>
>>
>> Ah, I thought you were going to wait until fused types were merged. In
>> any case, this doesn't look like it will give too many conflicts, but
>> there will be a few.
>
> Yeah, I just had a free time and decided to try. I think fused types
> should be merged first.
>
>> I'm currently moving CyFunction to a utility code
>> file and making a FusedFunction subclass.
>>
>
> That's cool! Btw, have you seen utility code related bug in hudson it
> happens only with py2.4?

Yeah I'll fix that, thanks for pointing it out, I don't have a 2.4
build myself. I think it's not eating unicode keys for keyword
arguments.

> --
> vitja.
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From markflorisson88 at gmail.com  Sun Oct  2 23:39:24 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sun, 2 Oct 2011 22:39:24 +0100
Subject: [Cython] CyFunction refactoring plan
In-Reply-To: <CAKGHGPS55SeNkUE8LsHYezn34R-k0YYsxZCxYQUStsii8=T=Dg@mail.gmail.com>
References: <CAKGHGPRbKR576nTorj24G4YLNBMC_MS2CL6vjSbsC+2DRyKJkA@mail.gmail.com>
	<CAKGHGPTm12o+vdaG8-JPdL_o2tMN4BehFoun9b8Ewe=vBUmB_Q@mail.gmail.com>
	<4E8556E7.7050007@behnel.de>
	<CADiQ+QATMMGFjgdD9GgxiNjRWyYymdKFkcrtfKZz3-mwDRCvsw@mail.gmail.com>
	<CAKGHGPQPZqtE_CHRorVoC0nVS8qss-zzsnpULsZrCeF7mmVMXg@mail.gmail.com>
	<CAKGHGPQxR1mK6oKiYGNj+DAaruyZ9naATA=JYJw7Ej69BWYX1A@mail.gmail.com>
	<CANg26EUnsO7GvXG+vBmLjWEotDVNOPqbqf=4rghpx9YWUmKg-w@mail.gmail.com>
	<CAKGHGPSWWmOdbiryQezNp7a-EGFMWSrADDSsULC0k5hYqdiJVw@mail.gmail.com>
	<CANg26EVtRPNrjDzJroa74SMXakxWqk_ajFsKOipf1HiMfMpECA@mail.gmail.com>
	<CAKGHGPS55SeNkUE8LsHYezn34R-k0YYsxZCxYQUStsii8=T=Dg@mail.gmail.com>
Message-ID: <CANg26EVtitWVC6L=NGvLQT6cZtzmQbpcM-QC5Rwk0rbCeXHmfA@mail.gmail.com>

On 2 October 2011 19:44, Vitja Makarov <vitja.makarov at gmail.com> wrote:
> 2011/10/2 mark florisson <markflorisson88 at gmail.com>:
>> On 2 October 2011 18:52, Vitja Makarov <vitja.makarov at gmail.com> wrote:
>>> 2011/9/30 mark florisson <markflorisson88 at gmail.com>:
>>>> On 30 September 2011 07:47, Vitja Makarov <vitja.makarov at gmail.com> wrote:
>>>>> 2011/9/30 Vitja Makarov <vitja.makarov at gmail.com>:
>>>>>> 2011/9/30 Robert Bradshaw <robertwb at math.washington.edu>:
>>>>>>> On Thu, Sep 29, 2011 at 10:43 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
>>>>>>>> Vitja Makarov, 30.09.2011 06:41:
>>>>>>>>>
>>>>>>>>> 2011/9/28 Vitja Makarov:
>>>>>>>>>>
>>>>>>>>>> I tried to build simple plan for ongoing cython function refactoring
>>>>>>>>>>
>>>>>>>>>> * Replace assignment synthesis with SingleAssignmentNode, where LHS is
>>>>>>>>>> NameNode and RHS is PyCFunctionNode
>>>>>>>>>> * Split function body into python wrapper and C function
>>>>>>>>>> http://wiki.cython.org/enhancements/generators#Pythonfunctionrefactoring
>>>>>>>>>>
>>>>>>>>>> Then we can implement some features and optimizations:
>>>>>>>>>>
>>>>>>>>>> * Reduce difference between cdef and def functions
>>>>>>>>>> * Store runtime evaluated default values inside CyFunction, ticket #674
>>>>>>>>>> * Implement no-args super(), ticket #696
>>>>>>>>>> * Function call inlining
>>>>>>>>>
>>>>>>>>> If nobody don't mind I would start with first one.
>>>>>>>
>>>>>>> I would love to see this happen.
>>>>>>>
>>>>>>>> Please go ahead. :)
>>>>>>>>
>>>>>>>> Note that you will encounter some problems when enabling name assignments
>>>>>>>> for all named functions. I tried that at least once and it "didn't work",
>>>>>>>> but I didn't take the time yet to investigate them further.
>>>>>>>>
>>>>>>>> I assume you are going to work on this in your own repo?
>>>>>>>
>>>>>>> Please also coordinate with Mark's work on function dispatching for
>>>>>>> fused types.
>>>>>>>
>>>>>>
>>>>>> I assume that that fused type functions are cdef ones so I think that
>>>>>> should be easy to merge.
>>>>>> On the other hand it's better to have Mark's branch merged into master.
>>>>>>
>>>>>> Mark, what is the state of your fused types branch?
>>>>>> Is it possible to break it into smaller parts to ease reviewing and merging?
>>>>>>
>>>>>
>>>>> It seems I meant memview branch not fusedtypes.
>>>>
>>>> There are 2 pending branches, _memview_rebase, which has support for
>>>> memoryviews, and fusedtypes. The former is ready for merge, it's
>>>> waiting to be reviewed. The fused types branch needs to subclass
>>>> CyFunction (it basically modified the old binding function). There was
>>>> also some duplicate functionality there, so I thought it'd be easier
>>>> and more convenient to use the utility code loading there.
>>>>
>>>> Since it's not a strict dependency and it will be blocking progress, I
>>>> will try to find some time to get it merge-ready for master.
>>>>
>>>> But no, it does cdef, cpdef and def methods, and it has some changes
>>>> to all function nodes (FuncdefNode, CFuncdefNode and DefNode). These
>>>> changes shouldn't be major though, but the logic in FusedFuncdefNode
>>>> does differentiate between all the different functions in order to
>>>> support them. Feel free to ask me about specifics any time.
>>>>
>>>
>>> I've moved def node assignment synthesis into
>>> DefNodeAssignmentSynthesis transformation.
>>>
>>> https://github.com/vitek/cython/commit/efacfed3c9cd8216b6c2100073a9df809b76675a
>>>
>>> Instead of moving defnode into PyCFunctionNode I've inserted
>>> assignment statement right after defnode.
>>> This is much more easy and seems ok to me.
>>>
>>> --
>>> vitja.
>>> _______________________________________________
>>> cython-devel mailing list
>>> cython-devel at python.org
>>> http://mail.python.org/mailman/listinfo/cython-devel
>>>
>>
>> Ah, I thought you were going to wait until fused types were merged. In
>> any case, this doesn't look like it will give too many conflicts, but
>> there will be a few.
>
> Yeah, I just had a free time and decided to try. I think fused types
> should be merged first.

If you want you can rebase your branch on
https://github.com/markflorisson88/cython/tree/fusedmerge, I'm not
going to rebase that branch. It needs a few more fixes though.

>> I'm currently moving CyFunction to a utility code
>> file and making a FusedFunction subclass.
>>
>
> That's cool! Btw, have you seen utility code related bug in hudson it
> happens only with py2.4?
>
> --
> vitja.
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From vitja.makarov at gmail.com  Sun Oct  2 23:52:54 2011
From: vitja.makarov at gmail.com (Vitja Makarov)
Date: Mon, 3 Oct 2011 01:52:54 +0400
Subject: [Cython] CyFunction refactoring plan
In-Reply-To: <CANg26EVtitWVC6L=NGvLQT6cZtzmQbpcM-QC5Rwk0rbCeXHmfA@mail.gmail.com>
References: <CAKGHGPRbKR576nTorj24G4YLNBMC_MS2CL6vjSbsC+2DRyKJkA@mail.gmail.com>
	<CAKGHGPTm12o+vdaG8-JPdL_o2tMN4BehFoun9b8Ewe=vBUmB_Q@mail.gmail.com>
	<4E8556E7.7050007@behnel.de>
	<CADiQ+QATMMGFjgdD9GgxiNjRWyYymdKFkcrtfKZz3-mwDRCvsw@mail.gmail.com>
	<CAKGHGPQPZqtE_CHRorVoC0nVS8qss-zzsnpULsZrCeF7mmVMXg@mail.gmail.com>
	<CAKGHGPQxR1mK6oKiYGNj+DAaruyZ9naATA=JYJw7Ej69BWYX1A@mail.gmail.com>
	<CANg26EUnsO7GvXG+vBmLjWEotDVNOPqbqf=4rghpx9YWUmKg-w@mail.gmail.com>
	<CAKGHGPSWWmOdbiryQezNp7a-EGFMWSrADDSsULC0k5hYqdiJVw@mail.gmail.com>
	<CANg26EVtRPNrjDzJroa74SMXakxWqk_ajFsKOipf1HiMfMpECA@mail.gmail.com>
	<CAKGHGPS55SeNkUE8LsHYezn34R-k0YYsxZCxYQUStsii8=T=Dg@mail.gmail.com>
	<CANg26EVtitWVC6L=NGvLQT6cZtzmQbpcM-QC5Rwk0rbCeXHmfA@mail.gmail.com>
Message-ID: <CAKGHGPRxMcLiKi1ag33S7jWsHu0ubzA1PqiaoXyr9bhSOPoGCQ@mail.gmail.com>

2011/10/3 mark florisson <markflorisson88 at gmail.com>:
> On 2 October 2011 19:44, Vitja Makarov <vitja.makarov at gmail.com> wrote:
>> 2011/10/2 mark florisson <markflorisson88 at gmail.com>:
>>> On 2 October 2011 18:52, Vitja Makarov <vitja.makarov at gmail.com> wrote:
>>>> 2011/9/30 mark florisson <markflorisson88 at gmail.com>:
>>>>> On 30 September 2011 07:47, Vitja Makarov <vitja.makarov at gmail.com> wrote:
>>>>>> 2011/9/30 Vitja Makarov <vitja.makarov at gmail.com>:
>>>>>>> 2011/9/30 Robert Bradshaw <robertwb at math.washington.edu>:
>>>>>>>> On Thu, Sep 29, 2011 at 10:43 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
>>>>>>>>> Vitja Makarov, 30.09.2011 06:41:
>>>>>>>>>>
>>>>>>>>>> 2011/9/28 Vitja Makarov:
>>>>>>>>>>>
>>>>>>>>>>> I tried to build simple plan for ongoing cython function refactoring
>>>>>>>>>>>
>>>>>>>>>>> * Replace assignment synthesis with SingleAssignmentNode, where LHS is
>>>>>>>>>>> NameNode and RHS is PyCFunctionNode
>>>>>>>>>>> * Split function body into python wrapper and C function
>>>>>>>>>>> http://wiki.cython.org/enhancements/generators#Pythonfunctionrefactoring
>>>>>>>>>>>
>>>>>>>>>>> Then we can implement some features and optimizations:
>>>>>>>>>>>
>>>>>>>>>>> * Reduce difference between cdef and def functions
>>>>>>>>>>> * Store runtime evaluated default values inside CyFunction, ticket #674
>>>>>>>>>>> * Implement no-args super(), ticket #696
>>>>>>>>>>> * Function call inlining
>>>>>>>>>>
>>>>>>>>>> If nobody don't mind I would start with first one.
>>>>>>>>
>>>>>>>> I would love to see this happen.
>>>>>>>>
>>>>>>>>> Please go ahead. :)
>>>>>>>>>
>>>>>>>>> Note that you will encounter some problems when enabling name assignments
>>>>>>>>> for all named functions. I tried that at least once and it "didn't work",
>>>>>>>>> but I didn't take the time yet to investigate them further.
>>>>>>>>>
>>>>>>>>> I assume you are going to work on this in your own repo?
>>>>>>>>
>>>>>>>> Please also coordinate with Mark's work on function dispatching for
>>>>>>>> fused types.
>>>>>>>>
>>>>>>>
>>>>>>> I assume that that fused type functions are cdef ones so I think that
>>>>>>> should be easy to merge.
>>>>>>> On the other hand it's better to have Mark's branch merged into master.
>>>>>>>
>>>>>>> Mark, what is the state of your fused types branch?
>>>>>>> Is it possible to break it into smaller parts to ease reviewing and merging?
>>>>>>>
>>>>>>
>>>>>> It seems I meant memview branch not fusedtypes.
>>>>>
>>>>> There are 2 pending branches, _memview_rebase, which has support for
>>>>> memoryviews, and fusedtypes. The former is ready for merge, it's
>>>>> waiting to be reviewed. The fused types branch needs to subclass
>>>>> CyFunction (it basically modified the old binding function). There was
>>>>> also some duplicate functionality there, so I thought it'd be easier
>>>>> and more convenient to use the utility code loading there.
>>>>>
>>>>> Since it's not a strict dependency and it will be blocking progress, I
>>>>> will try to find some time to get it merge-ready for master.
>>>>>
>>>>> But no, it does cdef, cpdef and def methods, and it has some changes
>>>>> to all function nodes (FuncdefNode, CFuncdefNode and DefNode). These
>>>>> changes shouldn't be major though, but the logic in FusedFuncdefNode
>>>>> does differentiate between all the different functions in order to
>>>>> support them. Feel free to ask me about specifics any time.
>>>>>
>>>>
>>>> I've moved def node assignment synthesis into
>>>> DefNodeAssignmentSynthesis transformation.
>>>>
>>>> https://github.com/vitek/cython/commit/efacfed3c9cd8216b6c2100073a9df809b76675a
>>>>
>>>> Instead of moving defnode into PyCFunctionNode I've inserted
>>>> assignment statement right after defnode.
>>>> This is much more easy and seems ok to me.
>>>>
>>>> --
>>>> vitja.
>>>> _______________________________________________
>>>> cython-devel mailing list
>>>> cython-devel at python.org
>>>> http://mail.python.org/mailman/listinfo/cython-devel
>>>>
>>>
>>> Ah, I thought you were going to wait until fused types were merged. In
>>> any case, this doesn't look like it will give too many conflicts, but
>>> there will be a few.
>>
>> Yeah, I just had a free time and decided to try. I think fused types
>> should be merged first.
>
> If you want you can rebase your branch on
> https://github.com/markflorisson88/cython/tree/fusedmerge, I'm not
> going to rebase that branch. It needs a few more fixes though.
>

Ok. I'll try tomorrow.

-- 
vitja.

From markflorisson88 at gmail.com  Tue Oct  4 23:19:08 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Tue, 4 Oct 2011 22:19:08 +0100
Subject: [Cython] Utilities, cython.h, libcython
Message-ID: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>

Hey,

I briefly mentioned something about this in a pull request, but maybe
it deserves some actual discussion on the ML.

So I propose that after fused types gets merged we try to move as many
utility codes as possible to their utility code files (unless they are
used in pending pull requests or other branches). Preferably this will
be done in one or a few commits. How should we split up the work, any
volunteers? Perhaps people who wrote certain utilities also want to
move them? In that case, we should start a new branch and then merge
that into master when it's done.
We could actually move things before fused types get merged, as long
as we don't touch binding_cfunc_utility_code.

Before we go there, Stefan, do we still want to implement the header
.ini style which can list dependencies and such? I personally don't
care very much about it, but memoryviews and the utility loaders are
merged so if someone wants to take up that job, it'd be good to do
before moving the utilities.

Another issue is that Cython compile time is increasing with the
addition of control flow and cython utilities. If you use fused types
you're also going to combinatorially add more compile time. I'm sure
this came up earlier, but I really think we should have a libcython
and a cython.h. libcython (a shared library) should contain any common
Cython-specific code not meant to be inlined, and cython.h any types,
macros and inline functions etc. This will decrease Cython and C
compile time, and will also make executables smaller. This could be
enabled using a command line option to Cython, as well as with
distutils, eventually we may decide to make it the default (lets
figure that out later). Preferably libcython.so would be installed
alongside libpython.so and cython.h inside the Python include
directory. Assuming multiple versions of Cython and multiple Python
installations, we'd need to come up with a versioning scheme for
either.
We could also provide a static library there, for users who want to
link and ship a compiled and statically linked version of their code.
For a local Cython that isn't built, we can ignore the header and
shared library option and issue a warning or some such.

Lastly, I think we also should figure out a way to serialize Entry
objects from CythonUtilities, which could easily and swiftly be loaded
when creating the cython scope. It's quite a pain to declare all
entries for utilities you write manually, so what I mostly did was
parse the utility up to and including AnalyseDeclarationsTransform,
and then retrieve the entries from there.

Thoughts?

Mark

From robertwb at math.washington.edu  Wed Oct  5 02:46:18 2011
From: robertwb at math.washington.edu (Robert Bradshaw)
Date: Tue, 4 Oct 2011 17:46:18 -0700
Subject: [Cython] Utilities, cython.h, libcython
In-Reply-To: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
Message-ID: <CADiQ+QBaF22aWveZ75EcDcgekQPzihvkDPijEy15_-1cB7hKqA@mail.gmail.com>

On Tue, Oct 4, 2011 at 2:19 PM, mark florisson
<markflorisson88 at gmail.com> wrote:
> Hey,
>
> I briefly mentioned something about this in a pull request, but maybe
> it deserves some actual discussion on the ML.
>
> So I propose that after fused types gets merged we try to move as many
> utility codes as possible to their utility code files (unless they are
> used in pending pull requests or other branches). Preferably this will
> be done in one or a few commits. How should we split up the work, any
> volunteers? Perhaps people who wrote certain utilities also want to
> move them? In that case, we should start a new branch and then merge
> that into master when it's done.
> We could actually move things before fused types get merged, as long
> as we don't touch binding_cfunc_utility_code.

+1 to moving towards this, but I don't see the urgency or need to do
it all at once (though if there's going to be a big push, lets
coordinate on a wiki or trac).

> Before we go there, Stefan, do we still want to implement the header
> .ini style which can list dependencies and such? I personally don't
> care very much about it, but memoryviews and the utility loaders are
> merged so if someone wants to take up that job, it'd be good to do
> before moving the utilities.
>
> Another issue is that Cython compile time is increasing with the
> addition of control flow and cython utilities. If you use fused types
> you're also going to combinatorially add more compile time.

Yeah, this was especially obvious with, e.g. cython.compile(...). (In
particular, some utility code was being parsed before it could even
figure out whether it needed to do a full re-compile...)

> I'm sure
> this came up earlier, but I really think we should have a libcython
> and a cython.h. libcython (a shared library) should contain any common
> Cython-specific code not meant to be inlined, and cython.h any types,
> macros and inline functions etc. This will decrease Cython and C
> compile time, and will also make executables smaller.

+1. Yes, we talked about this earlier, but nothing concrete was
planned. It's probably worth a CEP, if anything to have a concrete
plan recorded somewhere other than a series of mailing list threads
(though discussion tends to work best here).

> This could be
> enabled using a command line option to Cython, as well as with
> distutils, eventually we may decide to make it the default (lets
> figure that out later). Preferably libcython.so would be installed
> alongside libpython.so and cython.h inside the Python include
> directory. Assuming multiple versions of Cython and multiple Python
> installations, we'd need to come up with a versioning scheme for
> either.

I would propose a cython.h file that sits in Cython/Compiler/Include
(or similar), as a first step. The .pyx -> .c pass could be configured
to copy this to a specific location (for shipping just the generated
.c files).

One option is to build the shared library as a companion
_cython_x_y_z.so module which, while not as efficient as linking at
the C level, would probably be much simpler to implement in a
cross-platform way. (This perhaps merits some benchmarks, but the main
contents is likely to be things like shared classes and objects.)
Actually linking .so files from modules that cimport each other would
be a nice feature down the road anyways. Again, the associated .c file
could be (optionally) generated/copied during the .pyx -> .c step.
Installation would determine if the required module exists, and if not
build and install it.

> We could also provide a static library there, for users who want to
> link and ship a compiled and statically linked version of their code.
> For a local Cython that isn't built, we can ignore the header and
> shared library option and issue a warning or some such.
>
> Lastly, I think we also should figure out a way to serialize Entry
> objects from CythonUtilities, which could easily and swiftly be loaded
> when creating the cython scope. It's quite a pain to declare all
> entries for utilities you write manually, so what I mostly did was
> parse the utility up to and including AnalyseDeclarationsTransform,
> and then retrieve the entries from there.

This would be really nice too. Way back in the day I did some work
with trying to pickle full module scopes, but that soon became too
painful as there are so many far-reaching references. Pickling
individual Entries and re-building modules will probably be a more
tractable goal. Eventually, I'd like to see a way to cache the full
pxd pipeline.

- Robert

From stefan_ml at behnel.de  Wed Oct  5 09:16:24 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 05 Oct 2011 09:16:24 +0200
Subject: [Cython] Utilities, cython.h, libcython
In-Reply-To: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
Message-ID: <4E8C0448.6010204@behnel.de>

mark florisson, 04.10.2011 23:19:
> So I propose that after fused types gets merged we try to move as many
> utility codes as possible to their utility code files (unless they are
> used in pending pull requests or other branches). Preferably this will
> be done in one or a few commits. How should we split up the work

I would propose that new utility code gets moved out into utility files 
right away (if doable, given the current state of the infrastructure), and 
that existing utility code gets moves when it gets modified or when someone 
feels like it. Until we really get to the point of wanting to create a 
separate shared library etc., there's no need to hurry with the move.


> We could actually move things before fused types get merged, as long
> as we don't touch binding_cfunc_utility_code.

Another reason not to hurry, right?


> Before we go there, Stefan, do we still want to implement the header
> .ini style which can list dependencies and such?

I think we'll eventually need that, but that also depends a bit on the 
question whether we want to (or can) build a shared library or not. See below.


> Another issue is that Cython compile time is increasing with the
> addition of control flow and cython utilities. If you use fused types
> you're also going to combinatorially add more compile time.

I don't see that locally - a compiled Cython is hugely fast for me. In 
comparison, the C compiler literally takes ages to compile the result. An 
external shared library may or may not help with both - in particular, it 
is not clear to me what makes the C compiler slow. If the compile time is 
dominated by the number of inlined functions (which is not unlikely), a 
shared library + header file will not make a difference.


> I'm sure
> this came up earlier, but I really think we should have a libcython
> and a cython.h. libcython (a shared library) should contain any common
> Cython-specific code not meant to be inlined, and cython.h any types,
> macros and inline functions etc.

This has a couple of implications though. In order to support this on the 
user side, we have to build one shared library per installed package in 
order to avoid any Cython versioning issues. Just installing a versioned 
"libcython_x.y.z.so" globally isn't enough, especially during development, 
but also at deployment time. Different packages may use different CFLAGS or 
Cython options, which may have an impact on the result. Encoding all 
possible factors in the file name will be cumbersome and may mean that we 
still end up with a number of installed Cython libraries that correlates 
with the number of installed Cython based packages.

Next, we may not know at build time which set of Cython modules is in the 
package. This may be less of an issue if we rely on "cythonize()" in 
setup.py to compile all modules before hand (assuming that the user doesn't 
call it twice, once for *.pyx, once for *.py, for example), but even if we 
know all modules, we'd still have to figure out the complete set of utility 
code used by all modules in order to build an adapted library with only the 
necessary code used in the package. So we'd always end up with a complete 
library with all utility code, which is only really interesting for larger 
packages with several Cython modules.

I agree with Robert that a CEP would be needed for this, both for clearing 
up the implications and actual use cases (I know that Sage is a reasonable 
use case, but it's also a rather special case).


> This will decrease Cython and C
> compile time, and will also make executables smaller.

I don't see how this actually impacts executables. However, a 
self-contained executable is a value in itself.


> This could be
> enabled using a command line option to Cython, as well as with
> distutils, eventually we may decide to make it the default (lets
> figure that out later). Preferably libcython.so would be installed
> alongside libpython.so and cython.h inside the Python include
> directory.

I don't see this happening. It's easy for Python (there is only one Python 
running at a time, with one libpython loaded), but it's a lot less safe for 
different versions of a Cython library that are used by different modules 
inside of the running Python. For example, we'd have to version all visible 
symbols in operating systems with flat namespaces, in order to support 
loading multiple versions of the library.


> Lastly, I think we also should figure out a way to serialize Entry
> objects from CythonUtilities, which could easily and swiftly be loaded
> when creating the cython scope. It's quite a pain to declare all
> entries for utilities you write manually

Why would you declare them manually? I thought everything would be moved 
out into the utility code files?


> so what I mostly did was
> parse the utility up to and including AnalyseDeclarationsTransform,
> and then retrieve the entries from there.

Sounds like a drawback regarding the processing time, but may still be a 
reasonable way to do it. I would expect that it won't be hard to pickle the 
resulting dict of entries into a cache file and rebuild it only when one of 
the utility files changes.

Stefan

From robertwb at math.washington.edu  Wed Oct  5 09:38:26 2011
From: robertwb at math.washington.edu (Robert Bradshaw)
Date: Wed, 5 Oct 2011 00:38:26 -0700
Subject: [Cython] Utilities, cython.h, libcython
In-Reply-To: <4E8C0448.6010204@behnel.de>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
	<4E8C0448.6010204@behnel.de>
Message-ID: <CADiQ+QDpd2JXrV980oJEJmfNiP_CGtoYVOJ5a3iRSd+u1Nj9Fw@mail.gmail.com>

On Wed, Oct 5, 2011 at 12:16 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> mark florisson, 04.10.2011 23:19:
>>
>> So I propose that after fused types gets merged we try to move as many
>> utility codes as possible to their utility code files (unless they are
>> used in pending pull requests or other branches). Preferably this will
>> be done in one or a few commits. How should we split up the work
>
> I would propose that new utility code gets moved out into utility files
> right away (if doable, given the current state of the infrastructure), and
> that existing utility code gets moves when it gets modified or when someone
> feels like it. Until we really get to the point of wanting to create a
> separate shared library etc., there's no need to hurry with the move.
>
>
>> We could actually move things before fused types get merged, as long
>> as we don't touch binding_cfunc_utility_code.
>
> Another reason not to hurry, right?
>
>
>> Before we go there, Stefan, do we still want to implement the header
>> .ini style which can list dependencies and such?
>
> I think we'll eventually need that, but that also depends a bit on the
> question whether we want to (or can) build a shared library or not. See
> below.
>
>
>> Another issue is that Cython compile time is increasing with the
>> addition of control flow and cython utilities. If you use fused types
>> you're also going to combinatorially add more compile time.
>
> I don't see that locally - a compiled Cython is hugely fast for me. In
> comparison, the C compiler literally takes ages to compile the result. An
> external shared library may or may not help with both - in particular, it is
> not clear to me what makes the C compiler slow. If the compile time is
> dominated by the number of inlined functions (which is not unlikely), a
> shared library + header file will not make a difference.
>
>
>> I'm sure
>> this came up earlier, but I really think we should have a libcython
>> and a cython.h. libcython (a shared library) should contain any common
>> Cython-specific code not meant to be inlined, and cython.h any types,
>> macros and inline functions etc.
>
> This has a couple of implications though. In order to support this on the
> user side, we have to build one shared library per installed package in
> order to avoid any Cython versioning issues. Just installing a versioned
> "libcython_x.y.z.so" globally isn't enough, especially during development,
> but also at deployment time. Different packages may use different CFLAGS or
> Cython options, which may have an impact on the result. Encoding all
> possible factors in the file name will be cumbersome and may mean that we
> still end up with a number of installed Cython libraries that correlates
> with the number of installed Cython based packages.

That's a good point. Perhaps an easier first target is to have one
"libcython" per package (with a randomized or project-specific name).
Longer-term, I think the goal of one libcython per version is a
reasonable one, for deployment at least. Exceptional packages (e.g.
that require a special set of CFLAGS rather than the ones Python was
built with) can either bundle their own or forgo any sharing of code
as it is done now, and features that can't be easily normalized across
(cython and c) compilation options would remain in project-specific
generated .c files.

> Next, we may not know at build time which set of Cython modules is in the
> package. This may be less of an issue if we rely on "cythonize()" in
> setup.py to compile all modules before hand (assuming that the user doesn't
> call it twice, once for *.pyx, once for *.py, for example), but even if we
> know all modules, we'd still have to figure out the complete set of utility
> code used by all modules in order to build an adapted library with only the
> necessary code used in the package. So we'd always end up with a complete
> library with all utility code, which is only really interesting for larger
> packages with several Cython modules.

Yes, I'm thinking we would create relatively complete libraries,
though if we did things on a per package level perhaps we could do
some pruning. We could still conditionally put some of the utility
code (especially the rarely used or shared stuff) into each module.

> I agree with Robert that a CEP would be needed for this, both for clearing
> up the implications and actual use cases (I know that Sage is a reasonable
> use case, but it's also a rather special case).
>
>
>> This will decrease Cython and C
>> compile time, and will also make executables smaller.
>
> I don't see how this actually impacts executables. However, a self-contained
> executable is a value in itself.

As an example, we're starting to have full utility types, e.g. for
generators and or CyFunction. Lots of the utility code (e.g. loading
modules, raising exceptions, etc.) could be shared as well. For
something like Sage that could be a significant savings, and it could
be a big boon for cython.inline as well.

>> This could be
>> enabled using a command line option to Cython, as well as with
>> distutils, eventually we may decide to make it the default (lets
>> figure that out later). Preferably libcython.so would be installed
>> alongside libpython.so and cython.h inside the Python include
>> directory.
>
> I don't see this happening. It's easy for Python (there is only one Python
> running at a time, with one libpython loaded), but it's a lot less safe for
> different versions of a Cython library that are used by different modules
> inside of the running Python. For example, we'd have to version all visible
> symbols in operating systems with flat namespaces, in order to support
> loading multiple versions of the library.

Which is another advantage to "linking" via the cimport mechanisms.

>> Lastly, I think we also should figure out a way to serialize Entry
>> objects from CythonUtilities, which could easily and swiftly be loaded
>> when creating the cython scope. It's quite a pain to declare all
>> entries for utilities you write manually
>
> Why would you declare them manually? I thought everything would be moved out
> into the utility code files?
>
>
>> so what I mostly did was
>> parse the utility up to and including AnalyseDeclarationsTransform,
>> and then retrieve the entries from there.
>
> Sounds like a drawback regarding the processing time, but may still be a
> reasonable way to do it. I would expect that it won't be hard to pickle the
> resulting dict of entries into a cache file and rebuild it only when one of
> the utility files changes.

+1

It'd be great to be able to do this for the many .pxd files in Sage as
well. Parsing .pxd files is a huge portion of the compilation of the
Sage library.

- Robert

From robertwb at math.washington.edu  Wed Oct  5 09:45:25 2011
From: robertwb at math.washington.edu (Robert Bradshaw)
Date: Wed, 5 Oct 2011 00:45:25 -0700
Subject: [Cython] [cython-users] Re: callback function pointer problem
In-Reply-To: <588dc249-8f0b-49f2-bf42-23978ea95ddf@email.android.com>
References: <4E835336.1060800@gmail.com> <4E8398B5.6050905@gmail.com>
	<CAKGHGPTUn=g1VxpfMXaETgFXEu=sjbz=2gZRz7W-o4ggQ+-2bw@mail.gmail.com>
	<4E8421D0.5010007@gmail.com> <4E844BD0.4040207@gmail.com>
	<4E845B24.6060102@astro.uio.no> <4E845BAE.307@astro.uio.no>
	<CANg26EWDmkbunP+4ezuN9ggk1MPe-6b6oPu7tgRJ1G0y1wTuXQ@mail.gmail.com>
	<4E8460E8.5050701@gmail.com>
	<CANg26EWL1uZfPAe1HcCaGDqTx9VkLwYhLhGGicx-ytO6K00oqQ@mail.gmail.com>
	<CADiQ+QAT0ubtbdLL+mnOWAkGziL_dNusu2dZQXddHkUis4-v7g@mail.gmail.com>
	<CANg26EUquDZQqiEf0ppFoiOJHj2z7asnp3T8eCa=ynkAu6RDPA@mail.gmail.com>
	<588dc249-8f0b-49f2-bf42-23978ea95ddf@email.android.com>
Message-ID: <CADiQ+QAKqCpFZF8sf+6JHaDi_YE+V+tYGR4GGHL8dBVb7=XS4w@mail.gmail.com>

On Fri, Sep 30, 2011 at 2:14 PM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> Are you saying that when coercing a struct to an object, one would copy
> scalar fields by value but reference array fields? -1, that would be
> confusing. Either the whole struct through a view, or copy it all.

+1

> It bothers me that structs are passed by value in Cython, but it seems
> impossible to change that now. (i.e, once upon a time one could have
> required the use of a copy method to do a struct assignment and give a
> syntax error otherwise, which would have worked nicer with Python
> semantics).

Of course, to do otherwise would have resulted in "pure C" code
behaving very differently from C and messy issues like "cdef int
f(struct_type a)" either meaning different things in an extern block
or not mapping to the "obvious" C signature.

On this note, eventually I would like coerce structs (and unions,
enums) to auto-generated wrapper classes, visible in the Python module
namespace if one declares them as "cpdef struct ..." (even if they're
extern).

- Robert

From markflorisson88 at gmail.com  Wed Oct  5 15:52:19 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Wed, 5 Oct 2011 14:52:19 +0100
Subject: [Cython] Utilities, cython.h, libcython
In-Reply-To: <CADiQ+QBaF22aWveZ75EcDcgekQPzihvkDPijEy15_-1cB7hKqA@mail.gmail.com>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
	<CADiQ+QBaF22aWveZ75EcDcgekQPzihvkDPijEy15_-1cB7hKqA@mail.gmail.com>
Message-ID: <CANg26EU1gWG+Guj8hMRr=mGS62A6FdiPz2EVYjNPAhKruvd6-A@mail.gmail.com>

On 5 October 2011 01:46, Robert Bradshaw <robertwb at math.washington.edu> wrote:
> On Tue, Oct 4, 2011 at 2:19 PM, mark florisson
> <markflorisson88 at gmail.com> wrote:
>> Hey,
>>
>> I briefly mentioned something about this in a pull request, but maybe
>> it deserves some actual discussion on the ML.
>>
>> So I propose that after fused types gets merged we try to move as many
>> utility codes as possible to their utility code files (unless they are
>> used in pending pull requests or other branches). Preferably this will
>> be done in one or a few commits. How should we split up the work, any
>> volunteers? Perhaps people who wrote certain utilities also want to
>> move them? In that case, we should start a new branch and then merge
>> that into master when it's done.
>> We could actually move things before fused types get merged, as long
>> as we don't touch binding_cfunc_utility_code.
>
> +1 to moving towards this, but I don't see the urgency or need to do
> it all at once (though if there's going to be a big push, lets
> coordinate on a wiki or trac).

Hm, perhaps there is no strict need to hurry, as long as we take care
not to modify utilities after they have been moved. The wiki could be
great for that, but I personally don't keep track of everyone's
branches, so I don't know which utility is modified by whom (if at
all), so strictly speaking (to avoid painful merges) I'd have to ask
everyone each time I wanted to move something, or dig through
everyone's branches.

>> Before we go there, Stefan, do we still want to implement the header
>> .ini style which can list dependencies and such? I personally don't
>> care very much about it, but memoryviews and the utility loaders are
>> merged so if someone wants to take up that job, it'd be good to do
>> before moving the utilities.
>>
>> Another issue is that Cython compile time is increasing with the
>> addition of control flow and cython utilities. If you use fused types
>> you're also going to combinatorially add more compile time.
>
> Yeah, this was especially obvious with, e.g. cython.compile(...). (In
> particular, some utility code was being parsed before it could even
> figure out whether it needed to do a full re-compile...)
>
>> I'm sure
>> this came up earlier, but I really think we should have a libcython
>> and a cython.h. libcython (a shared library) should contain any common
>> Cython-specific code not meant to be inlined, and cython.h any types,
>> macros and inline functions etc. This will decrease Cython and C
>> compile time, and will also make executables smaller.
>
> +1. Yes, we talked about this earlier, but nothing concrete was
> planned. It's probably worth a CEP, if anything to have a concrete
> plan recorded somewhere other than a series of mailing list threads
> (though discussion tends to work best here).
>
>> This could be
>> enabled using a command line option to Cython, as well as with
>> distutils, eventually we may decide to make it the default (lets
>> figure that out later). Preferably libcython.so would be installed
>> alongside libpython.so and cython.h inside the Python include
>> directory. Assuming multiple versions of Cython and multiple Python
>> installations, we'd need to come up with a versioning scheme for
>> either.
>
> I would propose a cython.h file that sits in Cython/Compiler/Include
> (or similar), as a first step. The .pyx -> .c pass could be configured
> to copy this to a specific location (for shipping just the generated
> .c files).

That would be fine as well. It might be convenient for users in that
case if we could provide a cython.get_include() in addition to the
distutils hooks, and a cython-config script.

> One option is to build the shared library as a companion
> _cython_x_y_z.so module which, while not as efficient as linking at
> the C level, would probably be much simpler to implement in a
> cross-platform way. (This perhaps merits some benchmarks, but the main
> contents is likely to be things like shared classes and objects.)
> Actually linking .so files from modules that cimport each other would
> be a nice feature down the road anyways. Again, the associated .c file
> could be (optionally) generated/copied during the .pyx -> .c step.
> Installation would determine if the required module exists, and if not
> build and install it.

Hm, that's a really good idea. I think the only overhead would be the
capsule unpacking and pointer duplication, but that shouldn't suddenly
be an issue. That means we don't have to do any versioning of the
libraries and the symbols to avoid clashes in a flat namespaces as
Stefan mentioned.

>> We could also provide a static library there, for users who want to
>> link and ship a compiled and statically linked version of their code.
>> For a local Cython that isn't built, we can ignore the header and
>> shared library option and issue a warning or some such.
>>
>> Lastly, I think we also should figure out a way to serialize Entry
>> objects from CythonUtilities, which could easily and swiftly be loaded
>> when creating the cython scope. It's quite a pain to declare all
>> entries for utilities you write manually, so what I mostly did was
>> parse the utility up to and including AnalyseDeclarationsTransform,
>> and then retrieve the entries from there.
>
> This would be really nice too. Way back in the day I did some work
> with trying to pickle full module scopes, but that soon became too
> painful as there are so many far-reaching references. Pickling
> individual Entries and re-building modules will probably be a more
> tractable goal. Eventually, I'd like to see a way to cache the full
> pxd pipeline.
>
> - Robert
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From markflorisson88 at gmail.com  Wed Oct  5 15:53:24 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Wed, 5 Oct 2011 14:53:24 +0100
Subject: [Cython] Utilities, cython.h, libcython
In-Reply-To: <4E8C0448.6010204@behnel.de>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
	<4E8C0448.6010204@behnel.de>
Message-ID: <CANg26EWEjJh-uuFMUaxQAOQ6bKSdmG70Y6CxrzBjOQb_XPhHxA@mail.gmail.com>

On 5 October 2011 08:16, Stefan Behnel <stefan_ml at behnel.de> wrote:
> mark florisson, 04.10.2011 23:19:
>>
>> So I propose that after fused types gets merged we try to move as many
>> utility codes as possible to their utility code files (unless they are
>> used in pending pull requests or other branches). Preferably this will
>> be done in one or a few commits. How should we split up the work
>
> I would propose that new utility code gets moved out into utility files
> right away (if doable, given the current state of the infrastructure), and
> that existing utility code gets moves when it gets modified or when someone
> feels like it. Until we really get to the point of wanting to create a
> separate shared library etc., there's no need to hurry with the move.
>
>
>> We could actually move things before fused types get merged, as long
>> as we don't touch binding_cfunc_utility_code.
>
> Another reason not to hurry, right?
>
>
>> Before we go there, Stefan, do we still want to implement the header
>> .ini style which can list dependencies and such?
>
> I think we'll eventually need that, but that also depends a bit on the
> question whether we want to (or can) build a shared library or not. See
> below.
>
>
>> Another issue is that Cython compile time is increasing with the
>> addition of control flow and cython utilities. If you use fused types
>> you're also going to combinatorially add more compile time.
>
> I don't see that locally - a compiled Cython is hugely fast for me. In
> comparison, the C compiler literally takes ages to compile the result. An
> external shared library may or may not help with both - in particular, it is
> not clear to me what makes the C compiler slow. If the compile time is
> dominated by the number of inlined functions (which is not unlikely), a
> shared library + header file will not make a difference.
>

Have you tried with the memoryviews merged? e.g. if I have this code:

from libc.stdlib cimport malloc
cdef int[:] slice = <int[:10]> <int *> malloc(sizeof(int) * 10)

[0] [14:45] ~  ? time cython test.pyx
cython test.pyx  2.61s user 0.08s system 99% cpu 2.695 total
[0] [14:45] ~  ? time zsh compile
zsh compile  1.88s user 0.06s system 99% cpu 1.946 total

where 'compile' is the script that invoked the same gcc command
distutils uses. As you can see it took more than 2.5 seconds to
compile this code (simply because the memoryview utilities get
included). The C compiler does it quite a lot faster here. This
obviously depends largely on your code, you get probably have it the
other way around as well.

>> I'm sure
>> this came up earlier, but I really think we should have a libcython
>> and a cython.h. libcython (a shared library) should contain any common
>> Cython-specific code not meant to be inlined, and cython.h any types,
>> macros and inline functions etc.
>
> This has a couple of implications though. In order to support this on the
> user side, we have to build one shared library per installed package in
> order to avoid any Cython versioning issues. Just installing a versioned
> "libcython_x.y.z.so" globally isn't enough, especially during development,
> but also at deployment time. Different packages may use different CFLAGS or
> Cython options, which may have an impact on the result. Encoding all
> possible factors in the file name will be cumbersome and may mean that we
> still end up with a number of installed Cython libraries that correlates
> with the number of installed Cython based packages.

Hm, I think the CFLAGS are important so long as they are compatible
with Python. When the user compiles a Cython extension module with
extra CFLAGS, this doesn't affect libpython. Similarly, the Cython
utilities are really not the user's responsibility, so libcython
doesn't need to be compiled with the same flags as the extension
module. If still wanted, the user could either recompile python with
different CFLAGS (which means libcython will get those as well), or
not use libcython at all. CFLAGS should really only pertain to user
code, not to the Cython library, which the user shouldn't be concerned
about.

> Next, we may not know at build time which set of Cython modules is in the
> package. This may be less of an issue if we rely on "cythonize()" in
> setup.py to compile all modules before hand (assuming that the user doesn't
> call it twice, once for *.pyx, once for *.py, for example), but even if we
> know all modules, we'd still have to figure out the complete set of utility
> code used by all modules in order to build an adapted library with only the
> necessary code used in the package. So we'd always end up with a complete
> library with all utility code, which is only really interesting for larger
> packages with several Cython modules.
> I agree with Robert that a CEP would be needed for this, both for clearing
> up the implications and actual use cases (I know that Sage is a reasonable
> use case, but it's also a rather special case).
>
>
>> This will decrease Cython and C
>> compile time, and will also make executables smaller.
>
> I don't see how this actually impacts executables. However, a self-contained
> executable is a value in itself.
>
>
>> This could be
>> enabled using a command line option to Cython, as well as with
>> distutils, eventually we may decide to make it the default (lets
>> figure that out later). Preferably libcython.so would be installed
>> alongside libpython.so and cython.h inside the Python include
>> directory.
>
> I don't see this happening. It's easy for Python (there is only one Python
> running at a time, with one libpython loaded), but it's a lot less safe for
> different versions of a Cython library that are used by different modules
> inside of the running Python. For example, we'd have to version all visible
> symbols in operating systems with flat namespaces, in order to support
> loading multiple versions of the library.
>
>
>> Lastly, I think we also should figure out a way to serialize Entry
>> objects from CythonUtilities, which could easily and swiftly be loaded
>> when creating the cython scope. It's quite a pain to declare all
>> entries for utilities you write manually
>
> Why would you declare them manually? I thought everything would be moved out
> into the utility code files?
>

Right, the code is in the utility files. However, the cython scope
needs to have the entries of the classes and functions of the
utilities. e.g. the user may write

cimport cython

cdef cython.array myobject

For this to work, we need an 'array' entry, which we don't have yet,
as the utility code will be parsed at code generation time if an entry
of that utility code (which doesn't exist yet!) is used.

>> so what I mostly did was
>> parse the utility up to and including AnalyseDeclarationsTransform,
>> and then retrieve the entries from there.
>
> Sounds like a drawback regarding the processing time, but may still be a
> reasonable way to do it. I would expect that it won't be hard to pickle the
> resulting dict of entries into a cache file and rebuild it only when one of
> the utility files changes.

Exactly. I'm not sure about pickle though, but the details don't
matter. Pickle is certainly easy as long as you don't change your
interface (which we most certainly will, though).

> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From markflorisson88 at gmail.com  Wed Oct  5 15:54:02 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Wed, 5 Oct 2011 14:54:02 +0100
Subject: [Cython] Utilities, cython.h, libcython
In-Reply-To: <CADiQ+QDpd2JXrV980oJEJmfNiP_CGtoYVOJ5a3iRSd+u1Nj9Fw@mail.gmail.com>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
	<4E8C0448.6010204@behnel.de>
	<CADiQ+QDpd2JXrV980oJEJmfNiP_CGtoYVOJ5a3iRSd+u1Nj9Fw@mail.gmail.com>
Message-ID: <CANg26EUeyjv5tKeVBxwYi3seCfpunnCQPimdV_TgP5PHY_RWFA@mail.gmail.com>

On 5 October 2011 08:38, Robert Bradshaw <robertwb at math.washington.edu> wrote:
> On Wed, Oct 5, 2011 at 12:16 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
>> mark florisson, 04.10.2011 23:19:
>>>
>>> So I propose that after fused types gets merged we try to move as many
>>> utility codes as possible to their utility code files (unless they are
>>> used in pending pull requests or other branches). Preferably this will
>>> be done in one or a few commits. How should we split up the work
>>
>> I would propose that new utility code gets moved out into utility files
>> right away (if doable, given the current state of the infrastructure), and
>> that existing utility code gets moves when it gets modified or when someone
>> feels like it. Until we really get to the point of wanting to create a
>> separate shared library etc., there's no need to hurry with the move.
>>
>>
>>> We could actually move things before fused types get merged, as long
>>> as we don't touch binding_cfunc_utility_code.
>>
>> Another reason not to hurry, right?
>>
>>
>>> Before we go there, Stefan, do we still want to implement the header
>>> .ini style which can list dependencies and such?
>>
>> I think we'll eventually need that, but that also depends a bit on the
>> question whether we want to (or can) build a shared library or not. See
>> below.
>>
>>
>>> Another issue is that Cython compile time is increasing with the
>>> addition of control flow and cython utilities. If you use fused types
>>> you're also going to combinatorially add more compile time.
>>
>> I don't see that locally - a compiled Cython is hugely fast for me. In
>> comparison, the C compiler literally takes ages to compile the result. An
>> external shared library may or may not help with both - in particular, it is
>> not clear to me what makes the C compiler slow. If the compile time is
>> dominated by the number of inlined functions (which is not unlikely), a
>> shared library + header file will not make a difference.
>>
>>
>>> I'm sure
>>> this came up earlier, but I really think we should have a libcython
>>> and a cython.h. libcython (a shared library) should contain any common
>>> Cython-specific code not meant to be inlined, and cython.h any types,
>>> macros and inline functions etc.
>>
>> This has a couple of implications though. In order to support this on the
>> user side, we have to build one shared library per installed package in
>> order to avoid any Cython versioning issues. Just installing a versioned
>> "libcython_x.y.z.so" globally isn't enough, especially during development,
>> but also at deployment time. Different packages may use different CFLAGS or
>> Cython options, which may have an impact on the result. Encoding all
>> possible factors in the file name will be cumbersome and may mean that we
>> still end up with a number of installed Cython libraries that correlates
>> with the number of installed Cython based packages.
>
> That's a good point. Perhaps an easier first target is to have one
> "libcython" per package (with a randomized or project-specific name).
> Longer-term, I think the goal of one libcython per version is a
> reasonable one, for deployment at least. Exceptional packages (e.g.
> that require a special set of CFLAGS rather than the ones Python was
> built with) can either bundle their own or forgo any sharing of code
> as it is done now, and features that can't be easily normalized across
> (cython and c) compilation options would remain in project-specific
> generated .c files.
>
>> Next, we may not know at build time which set of Cython modules is in the
>> package. This may be less of an issue if we rely on "cythonize()" in
>> setup.py to compile all modules before hand (assuming that the user doesn't
>> call it twice, once for *.pyx, once for *.py, for example), but even if we
>> know all modules, we'd still have to figure out the complete set of utility
>> code used by all modules in order to build an adapted library with only the
>> necessary code used in the package. So we'd always end up with a complete
>> library with all utility code, which is only really interesting for larger
>> packages with several Cython modules.
>
> Yes, I'm thinking we would create relatively complete libraries,
> though if we did things on a per package level perhaps we could do
> some pruning. We could still conditionally put some of the utility
> code (especially the rarely used or shared stuff) into each module.

Yeah that would be nice. I actually think we shouldn't do anything on
a per-package level, only a bunch of modules with related stuff
(conversion utilities/exception raising etc in one module,
buffer/memoryview utilities in another etc). We've been living with
huge files since now, I don't think we suddenly need to actively start
pruning for a little bit of memory.

I think the module approach would also be easy to implement, as the
infrastructure for external cdef functions/classes importing/exporting
is already there.

>> I agree with Robert that a CEP would be needed for this, both for clearing
>> up the implications and actual use cases (I know that Sage is a reasonable
>> use case, but it's also a rather special case).
>>
>>
>>> This will decrease Cython and C
>>> compile time, and will also make executables smaller.
>>
>> I don't see how this actually impacts executables. However, a self-contained
>> executable is a value in itself.
>
> As an example, we're starting to have full utility types, e.g. for
> generators and or CyFunction. Lots of the utility code (e.g. loading
> modules, raising exceptions, etc.) could be shared as well. For
> something like Sage that could be a significant savings, and it could
> be a big boon for cython.inline as well.
>
>>> This could be
>>> enabled using a command line option to Cython, as well as with
>>> distutils, eventually we may decide to make it the default (lets
>>> figure that out later). Preferably libcython.so would be installed
>>> alongside libpython.so and cython.h inside the Python include
>>> directory.
>>
>> I don't see this happening. It's easy for Python (there is only one Python
>> running at a time, with one libpython loaded), but it's a lot less safe for
>> different versions of a Cython library that are used by different modules
>> inside of the running Python. For example, we'd have to version all visible
>> symbols in operating systems with flat namespaces, in order to support
>> loading multiple versions of the library.
>
> Which is another advantage to "linking" via the cimport mechanisms.
>
>>> Lastly, I think we also should figure out a way to serialize Entry
>>> objects from CythonUtilities, which could easily and swiftly be loaded
>>> when creating the cython scope. It's quite a pain to declare all
>>> entries for utilities you write manually
>>
>> Why would you declare them manually? I thought everything would be moved out
>> into the utility code files?
>>
>>
>>> so what I mostly did was
>>> parse the utility up to and including AnalyseDeclarationsTransform,
>>> and then retrieve the entries from there.
>>
>> Sounds like a drawback regarding the processing time, but may still be a
>> reasonable way to do it. I would expect that it won't be hard to pickle the
>> resulting dict of entries into a cache file and rebuild it only when one of
>> the utility files changes.
>
> +1
>
> It'd be great to be able to do this for the many .pxd files in Sage as
> well. Parsing .pxd files is a huge portion of the compilation of the
> Sage library.
>
> - Robert
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From markflorisson88 at gmail.com  Wed Oct  5 16:18:11 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Wed, 5 Oct 2011 15:18:11 +0100
Subject: [Cython] Utilities, cython.h, libcython
In-Reply-To: <CANg26EUeyjv5tKeVBxwYi3seCfpunnCQPimdV_TgP5PHY_RWFA@mail.gmail.com>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
	<4E8C0448.6010204@behnel.de>
	<CADiQ+QDpd2JXrV980oJEJmfNiP_CGtoYVOJ5a3iRSd+u1Nj9Fw@mail.gmail.com>
	<CANg26EUeyjv5tKeVBxwYi3seCfpunnCQPimdV_TgP5PHY_RWFA@mail.gmail.com>
Message-ID: <CANg26EWmx9zm4Px1-cx6mpp+-qbd_5BPnhBGfrXXc+fJ6v1x4A@mail.gmail.com>

On 5 October 2011 14:54, mark florisson <markflorisson88 at gmail.com> wrote:
> On 5 October 2011 08:38, Robert Bradshaw <robertwb at math.washington.edu> wrote:
>> On Wed, Oct 5, 2011 at 12:16 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
>>> mark florisson, 04.10.2011 23:19:
>>>>
>>>> So I propose that after fused types gets merged we try to move as many
>>>> utility codes as possible to their utility code files (unless they are
>>>> used in pending pull requests or other branches). Preferably this will
>>>> be done in one or a few commits. How should we split up the work
>>>
>>> I would propose that new utility code gets moved out into utility files
>>> right away (if doable, given the current state of the infrastructure), and
>>> that existing utility code gets moves when it gets modified or when someone
>>> feels like it. Until we really get to the point of wanting to create a
>>> separate shared library etc., there's no need to hurry with the move.
>>>
>>>
>>>> We could actually move things before fused types get merged, as long
>>>> as we don't touch binding_cfunc_utility_code.
>>>
>>> Another reason not to hurry, right?
>>>
>>>
>>>> Before we go there, Stefan, do we still want to implement the header
>>>> .ini style which can list dependencies and such?
>>>
>>> I think we'll eventually need that, but that also depends a bit on the
>>> question whether we want to (or can) build a shared library or not. See
>>> below.
>>>
>>>
>>>> Another issue is that Cython compile time is increasing with the
>>>> addition of control flow and cython utilities. If you use fused types
>>>> you're also going to combinatorially add more compile time.
>>>
>>> I don't see that locally - a compiled Cython is hugely fast for me. In
>>> comparison, the C compiler literally takes ages to compile the result. An
>>> external shared library may or may not help with both - in particular, it is
>>> not clear to me what makes the C compiler slow. If the compile time is
>>> dominated by the number of inlined functions (which is not unlikely), a
>>> shared library + header file will not make a difference.
>>>
>>>
>>>> I'm sure
>>>> this came up earlier, but I really think we should have a libcython
>>>> and a cython.h. libcython (a shared library) should contain any common
>>>> Cython-specific code not meant to be inlined, and cython.h any types,
>>>> macros and inline functions etc.
>>>
>>> This has a couple of implications though. In order to support this on the
>>> user side, we have to build one shared library per installed package in
>>> order to avoid any Cython versioning issues. Just installing a versioned
>>> "libcython_x.y.z.so" globally isn't enough, especially during development,
>>> but also at deployment time. Different packages may use different CFLAGS or
>>> Cython options, which may have an impact on the result. Encoding all
>>> possible factors in the file name will be cumbersome and may mean that we
>>> still end up with a number of installed Cython libraries that correlates
>>> with the number of installed Cython based packages.
>>
>> That's a good point. Perhaps an easier first target is to have one
>> "libcython" per package (with a randomized or project-specific name).
>> Longer-term, I think the goal of one libcython per version is a
>> reasonable one, for deployment at least. Exceptional packages (e.g.
>> that require a special set of CFLAGS rather than the ones Python was
>> built with) can either bundle their own or forgo any sharing of code
>> as it is done now, and features that can't be easily normalized across
>> (cython and c) compilation options would remain in project-specific
>> generated .c files.
>>
>>> Next, we may not know at build time which set of Cython modules is in the
>>> package. This may be less of an issue if we rely on "cythonize()" in
>>> setup.py to compile all modules before hand (assuming that the user doesn't
>>> call it twice, once for *.pyx, once for *.py, for example), but even if we
>>> know all modules, we'd still have to figure out the complete set of utility
>>> code used by all modules in order to build an adapted library with only the
>>> necessary code used in the package. So we'd always end up with a complete
>>> library with all utility code, which is only really interesting for larger
>>> packages with several Cython modules.
>>
>> Yes, I'm thinking we would create relatively complete libraries,
>> though if we did things on a per package level perhaps we could do
>> some pruning. We could still conditionally put some of the utility
>> code (especially the rarely used or shared stuff) into each module.
>
> Yeah that would be nice. I actually think we shouldn't do anything on
> a per-package level, only a bunch of modules with related stuff
> (conversion utilities/exception raising etc in one module,
> buffer/memoryview utilities in another etc). We've been living with
> huge files since now, I don't think we suddenly need to actively start
> pruning for a little bit of memory.
>
> I think the module approach would also be easy to implement, as the
> infrastructure for external cdef functions/classes importing/exporting
> is already there.
>
>>> I agree with Robert that a CEP would be needed for this, both for clearing
>>> up the implications and actual use cases (I know that Sage is a reasonable
>>> use case, but it's also a rather special case).
>>>
>>>
>>>> This will decrease Cython and C
>>>> compile time, and will also make executables smaller.
>>>
>>> I don't see how this actually impacts executables. However, a self-contained
>>> executable is a value in itself.
>>
>> As an example, we're starting to have full utility types, e.g. for
>> generators and or CyFunction. Lots of the utility code (e.g. loading
>> modules, raising exceptions, etc.) could be shared as well. For
>> something like Sage that could be a significant savings, and it could
>> be a big boon for cython.inline as well.
>>
>>>> This could be
>>>> enabled using a command line option to Cython, as well as with
>>>> distutils, eventually we may decide to make it the default (lets
>>>> figure that out later). Preferably libcython.so would be installed
>>>> alongside libpython.so and cython.h inside the Python include
>>>> directory.
>>>
>>> I don't see this happening. It's easy for Python (there is only one Python
>>> running at a time, with one libpython loaded), but it's a lot less safe for
>>> different versions of a Cython library that are used by different modules
>>> inside of the running Python. For example, we'd have to version all visible
>>> symbols in operating systems with flat namespaces, in order to support
>>> loading multiple versions of the library.
>>
>> Which is another advantage to "linking" via the cimport mechanisms.
>>
>>>> Lastly, I think we also should figure out a way to serialize Entry
>>>> objects from CythonUtilities, which could easily and swiftly be loaded
>>>> when creating the cython scope. It's quite a pain to declare all
>>>> entries for utilities you write manually
>>>
>>> Why would you declare them manually? I thought everything would be moved out
>>> into the utility code files?
>>>
>>>
>>>> so what I mostly did was
>>>> parse the utility up to and including AnalyseDeclarationsTransform,
>>>> and then retrieve the entries from there.
>>>
>>> Sounds like a drawback regarding the processing time, but may still be a
>>> reasonable way to do it. I would expect that it won't be hard to pickle the
>>> resulting dict of entries into a cache file and rebuild it only when one of
>>> the utility files changes.
>>
>> +1
>>
>> It'd be great to be able to do this for the many .pxd files in Sage as
>> well. Parsing .pxd files is a huge portion of the compilation of the
>> Sage library.
>>
>> - Robert
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>

I expect it will also speed up the test runner quite a lot, which
takes forever, as there are lots of small doctests. <offtopic>On an
unrelated note, it'd be great if we could run individual doctests in
parallel, I know py.test can do that, maybe nosetests as well. It'd be
great if there was a plugin that supported cython (as well as C
extension modules) that could run them, and an additional plugin that
could make it work with our various test modes and
directives.</offtopic>

From ndbecker2 at gmail.com  Wed Oct  5 17:09:55 2011
From: ndbecker2 at gmail.com (Neal Becker)
Date: Wed, 05 Oct 2011 11:09:55 -0400
Subject: [Cython] scons support
Message-ID: <j6hs04$rv9$1@dough.gmane.org>

I have no idea why this doesn't work for me.

Looking at
http://www.mail-archive.com/cython-dev at codespeak.net/msg09540.html

 scons --version
SCons by Steven Knight et al.:
	script: v2.1.0.r5357[MODIFIED], 2011/09/09 21:31:03, by bdeegan on 
ubuntu
	engine: v2.1.0.r5357[MODIFIED], 2011/09/09 21:31:03, by bdeegan on 
ubuntu
	engine path: ['/usr/lib/scons/SCons']


------------------------------------------------
cyenv = Environment(PYEXT_USE_DISTUTILS=True)
cyenv.Tool("pyext")
cyenv.Tool("cython")
import numpy

cyenv.Append(PYEXTINCPATH=[numpy.get_include()])
cyenv.Replace(CYTHONFLAGS=['--cplus'])
#cyenv.Replace(CXXFILESUFFIX='.cpp')
#cyenv.Replace(CYTHONCFILESUFFIX='.cpp')

cyenv.PythonExtension ('trellis_enc', ['trellis_enc.py'])
-----------------------------------------------------

gives:
cython --cplus -o trellis_enc.c trellis_enc.pyx
gcc -pthread -o trellis_enc.os -c -fPIC -fno-strict-aliasing -O2 -g -pipe -Wall 
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 
-m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -
Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 
-m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -I/usr/include/python2.7 -
I/usr/lib64/python2.7/site-packages/numpy/core/include trellis_enc.c
gcc -pthread -shared -o trellis_enc.so trellis_enc.os

Which is OK, except it used '.c' instead of '.cpp'

but if I try:
------------------------------------------------
cyenv = Environment(PYEXT_USE_DISTUTILS=True)
cyenv.Tool("pyext")
cyenv.Tool("cython")
import numpy

cyenv.Append(PYEXTINCPATH=[numpy.get_include()])
cyenv.Replace(CYTHONFLAGS=['--cplus'])
cyenv.Replace(CXXFILESUFFIX='.cpp')
cyenv.Replace(CYTHONCFILESUFFIX='.cpp')

cyenv.PythonExtension ('trellis_enc', ['trellis_enc.py'])
-----------------------------------------------------
cython --cplus -o trellis_enc.cpp trellis_enc.pyx
o trellis_enc.os -c -I/usr/include/python2.7 -I/usr/lib64/python2.7/site-
packages/numpy/core/include trellis_enc.cpp
sh: o: command not found

The 'gcc' command got completely mangled.

???


From greg.ewing at canterbury.ac.nz  Thu Oct  6 00:41:09 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 06 Oct 2011 11:41:09 +1300
Subject: [Cython] [cython-users] Re: callback function pointer problem
In-Reply-To: <CADiQ+QAKqCpFZF8sf+6JHaDi_YE+V+tYGR4GGHL8dBVb7=XS4w@mail.gmail.com>
References: <4E835336.1060800@gmail.com> <4E8398B5.6050905@gmail.com>
	<CAKGHGPTUn=g1VxpfMXaETgFXEu=sjbz=2gZRz7W-o4ggQ+-2bw@mail.gmail.com>
	<4E8421D0.5010007@gmail.com> <4E844BD0.4040207@gmail.com>
	<4E845B24.6060102@astro.uio.no> <4E845BAE.307@astro.uio.no>
	<CANg26EWDmkbunP+4ezuN9ggk1MPe-6b6oPu7tgRJ1G0y1wTuXQ@mail.gmail.com>
	<4E8460E8.5050701@gmail.com>
	<CANg26EWL1uZfPAe1HcCaGDqTx9VkLwYhLhGGicx-ytO6K00oqQ@mail.gmail.com>
	<CADiQ+QAT0ubtbdLL+mnOWAkGziL_dNusu2dZQXddHkUis4-v7g@mail.gmail.com>
	<CANg26EUquDZQqiEf0ppFoiOJHj2z7asnp3T8eCa=ynkAu6RDPA@mail.gmail.com>
	<588dc249-8f0b-49f2-bf42-23978ea95ddf@email.android.com>
	<CADiQ+QAKqCpFZF8sf+6JHaDi_YE+V+tYGR4GGHL8dBVb7=XS4w@mail.gmail.com>
Message-ID: <4E8CDD05.2080102@canterbury.ac.nz>

Robert Bradshaw wrote:

> On this note, eventually I would like coerce structs (and unions,
> enums) to auto-generated wrapper classes, visible in the Python module
> namespace if one declares them as "cpdef struct ..."

Would these wrapper classes contain a copy of the struct,
or would they reference the struct? If they reference it,
there would be issues with the lifetime of the referenced
data.

-- 
Greg

From robertwb at math.washington.edu  Thu Oct  6 02:05:21 2011
From: robertwb at math.washington.edu (Robert Bradshaw)
Date: Wed, 5 Oct 2011 17:05:21 -0700
Subject: [Cython] Utilities, cython.h, libcython
In-Reply-To: <CANg26EU1gWG+Guj8hMRr=mGS62A6FdiPz2EVYjNPAhKruvd6-A@mail.gmail.com>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
	<CADiQ+QBaF22aWveZ75EcDcgekQPzihvkDPijEy15_-1cB7hKqA@mail.gmail.com>
	<CANg26EU1gWG+Guj8hMRr=mGS62A6FdiPz2EVYjNPAhKruvd6-A@mail.gmail.com>
Message-ID: <CADiQ+QBrEgz00+TWLADQo--xhufTG5o2MPHNmSWTiT0-R1KJoA@mail.gmail.com>

On Wednesday, October 5, 2011, mark florisson wrote:

> On 5 October 2011 01:46, Robert Bradshaw <robertwb at math.washington.edu<javascript:;>>
> wrote:
> > On Tue, Oct 4, 2011 at 2:19 PM, mark florisson
> > <markflorisson88 at gmail.com <javascript:;>> wrote:
> >> Hey,
> >>
> >> I briefly mentioned something about this in a pull request, but maybe
> >> it deserves some actual discussion on the ML.
> >>
> >> So I propose that after fused types gets merged we try to move as many
> >> utility codes as possible to their utility code files (unless they are
> >> used in pending pull requests or other branches). Preferably this will
> >> be done in one or a few commits. How should we split up the work, any
> >> volunteers? Perhaps people who wrote certain utilities also want to
> >> move them? In that case, we should start a new branch and then merge
> >> that into master when it's done.
> >> We could actually move things before fused types get merged, as long
> >> as we don't touch binding_cfunc_utility_code.
> >
> > +1 to moving towards this, but I don't see the urgency or need to do
> > it all at once (though if there's going to be a big push, lets
> > coordinate on a wiki or trac).
>
> Hm, perhaps there is no strict need to hurry, as long as we take care
> not to modify utilities after they have been moved. The wiki could be
> great for that, but I personally don't keep track of everyone's
> branches, so I don't know which utility is modified by whom (if at
> all), so strictly speaking (to avoid painful merges) I'd have to ask
> everyone each time I wanted to move something, or dig through
> everyone's branches.
>

I was proposing that everyone lists the utility code sections that are
likely to cause merge conflicts on a wiki page, and the rest are fair game.


> >> Before we go there, Stefan, do we still want to implement the header
> >> .ini style which can list dependencies and such? I personally don't
> >> care very much about it, but memoryviews and the utility loaders are
> >> merged so if someone wants to take up that job, it'd be good to do
> >> before moving the utilities.
> >>
> >> Another issue is that Cython compile time is increasing with the
> >> addition of control flow and cython utilities. If you use fused types
> >> you're also going to combinatorially add more compile time.
> >
> > Yeah, this was especially obvious with, e.g. cython.compile(...). (In
> > particular, some utility code was being parsed before it could even
> > figure out whether it needed to do a full re-compile...)
> >
> >> I'm sure
> >> this came up earlier, but I really think we should have a libcython
> >> and a cython.h. libcython (a shared library) should contain any common
> >> Cython-specific code not meant to be inlined, and cython.h any types,
> >> macros and inline functions etc. This will decrease Cython and C
> >> compile time, and will also make executables smaller.
> >
> > +1. Yes, we talked about this earlier, but nothing concrete was
> > planned. It's probably worth a CEP, if anything to have a concrete
> > plan recorded somewhere other than a series of mailing list threads
> > (though discussion tends to work best here).
> >
> >> This could be
> >> enabled using a command line option to Cython, as well as with
> >> distutils, eventually we may decide to make it the default (lets
> >> figure that out later). Preferably libcython.so would be installed
> >> alongside libpython.so and cython.h inside the Python include
> >> directory. Assuming multiple versions of Cython and multiple Python
> >> installations, we'd need to come up with a versioning scheme for
> >> either.
> >
> > I would propose a cython.h file that sits in Cython/Compiler/Include
> > (or similar), as a first step. The .pyx -> .c pass could be configured
> > to copy this to a specific location (for shipping just the generated
> > .c files).
>
> That would be fine as well. It might be convenient for users in that
> case if we could provide a cython.get_include() in addition to the
> distutils hooks, and a cython-config script.
>

For sure. We could also have a cython.get_shared_library() (common_code?
cython_module?) which would return an Extension object to build.


> > One option is to build the shared library as a companion
> > _cython_x_y_z.so module which, while not as efficient as linking at
> > the C level, would probably be much simpler to implement in a
> > cross-platform way. (This perhaps merits some benchmarks, but the main
> > contents is likely to be things like shared classes and objects.)
> > Actually linking .so files from modules that cimport each other would
> > be a nice feature down the road anyways. Again, the associated .c file
> > could be (optionally) generated/copied during the .pyx -> .c step.
> > Installation would determine if the required module exists, and if not
> > build and install it.
>
> Hm, that's a really good idea. I think the only overhead would be the
> capsule unpacking and pointer duplication, but that shouldn't suddenly
> be an issue. That means we don't have to do any versioning of the
> libraries and the symbols to avoid clashes in a flat namespaces as
> Stefan mentioned.
>

I'm not sure what the overhead is, if any, in calling function pointers vs.
actually linking things together at the C level (which is essentially the
same idea, but perhaps addresses are resolved at library load time rather
than requiring a dereference on each call?)


> >> We could also provide a static library there, for users who want to
> >> link and ship a compiled and statically linked version of their code.
> >> For a local Cython that isn't built, we can ignore the header and
> >> shared library option and issue a warning or some such.
> >>
> >> Lastly, I think we also should figure out a way to serialize Entry
> >> objects from CythonUtilities, which could easily and swiftly be loaded
> >> when creating the cython scope. It's quite a pain to declare all
> >> entries for utilities you write manually, so what I mostly did was
> >> parse the utility up to and including AnalyseDeclarationsTransform,
> >> and then retrieve the entries from there.
> >
> > This would be really nice too. Way back in the day I did some work
> > with trying to pickle full module scopes, but that soon became too
> > painful as there are so many far-reaching references. Pickling
> > individual Entries and re-building modules will probably be a more
> > tractable goal. Eventually, I'd like to see a way to cache the full
> > pxd pipeline.
> >
> > - Robert
> > _______________________________________________
> > cython-devel mailing list
> > cython-devel at python.org <javascript:;>
> > http://mail.python.org/mailman/listinfo/cython-devel
> >
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org <javascript:;>
> http://mail.python.org/mailman/listinfo/cython-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20111005/fef12bbb/attachment-0001.html>

From robertwb at math.washington.edu  Thu Oct  6 02:05:21 2011
From: robertwb at math.washington.edu (Robert Bradshaw)
Date: Wed, 5 Oct 2011 17:05:21 -0700
Subject: [Cython] Utilities, cython.h, libcython
In-Reply-To: <CANg26EWEjJh-uuFMUaxQAOQ6bKSdmG70Y6CxrzBjOQb_XPhHxA@mail.gmail.com>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
	<4E8C0448.6010204@behnel.de>
	<CANg26EWEjJh-uuFMUaxQAOQ6bKSdmG70Y6CxrzBjOQb_XPhHxA@mail.gmail.com>
Message-ID: <CADiQ+QBZQq3-eJBRYBJwsEPTu6nqCmMVbHoXZ49VVD+QXkOwwg@mail.gmail.com>

On Wednesday, October 5, 2011, mark florisson wrote:

> On 5 October 2011 08:16, Stefan Behnel <stefan_ml at behnel.de <javascript:;>>
> wrote:
> > mark florisson, 04.10.2011 23:19:
> >>
> >> So I propose that after fused types gets merged we try to move as many
> >> utility codes as possible to their utility code files (unless they are
> >> used in pending pull requests or other branches). Preferably this will
> >> be done in one or a few commits. How should we split up the work
> >
> > I would propose that new utility code gets moved out into utility files
> > right away (if doable, given the current state of the infrastructure),
> and
> > that existing utility code gets moves when it gets modified or when
> someone
> > feels like it. Until we really get to the point of wanting to create a
> > separate shared library etc., there's no need to hurry with the move.
> >
> >
> >> We could actually move things before fused types get merged, as long
> >> as we don't touch binding_cfunc_utility_code.
> >
> > Another reason not to hurry, right?
> >
> >
> >> Before we go there, Stefan, do we still want to implement the header
> >> .ini style which can list dependencies and such?
> >
> > I think we'll eventually need that, but that also depends a bit on the
> > question whether we want to (or can) build a shared library or not. See
> > below.
> >
> >
> >> Another issue is that Cython compile time is increasing with the
> >> addition of control flow and cython utilities. If you use fused types
> >> you're also going to combinatorially add more compile time.
> >
> > I don't see that locally - a compiled Cython is hugely fast for me. In
> > comparison, the C compiler literally takes ages to compile the result. An
> > external shared library may or may not help with both - in particular, it
> is
> > not clear to me what makes the C compiler slow. If the compile time is
> > dominated by the number of inlined functions (which is not unlikely), a
> > shared library + header file will not make a difference.
> >
>
> Have you tried with the memoryviews merged? e.g. if I have this code:
>
> from libc.stdlib cimport malloc
> cdef int[:] slice = <int[:10]> <int *> malloc(sizeof(int) * 10)
>
> [0] [14:45] ~  ? time cython test.pyx
> cython test.pyx  2.61s user 0.08s system 99% cpu 2.695 total
> [0] [14:45] ~  ? time zsh compile
> zsh compile  1.88s user 0.06s system 99% cpu 1.946 total
>
> where 'compile' is the script that invoked the same gcc command
> distutils uses. As you can see it took more than 2.5 seconds to
> compile this code (simply because the memoryview utilities get
> included). The C compiler does it quite a lot faster here. This
> obviously depends largely on your code, you get probably have it the
> other way around as well.
>

Anything we can do to cache/dedupe things here would be great.


> >> I'm sure
> >> this came up earlier, but I really think we should have a libcython
> >> and a cython.h. libcython (a shared library) should contain any common
> >> Cython-specific code not meant to be inlined, and cython.h any types,
> >> macros and inline functions etc.
> >
> > This has a couple of implications though. In order to support this on the
> > user side, we have to build one shared library per installed package in
> > order to avoid any Cython versioning issues. Just installing a versioned
> > "libcython_x.y.z.so" globally isn't enough, especially during
> development,
> > but also at deployment time. Different packages may use different CFLAGS
> or
> > Cython options, which may have an impact on the result. Encoding all
> > possible factors in the file name will be cumbersome and may mean that we
> > still end up with a number of installed Cython libraries that correlates
> > with the number of installed Cython based packages.
>
> Hm, I think the CFLAGS are important so long as they are compatible
> with Python. When the user compiles a Cython extension module with
> extra CFLAGS, this doesn't affect libpython. Similarly, the Cython
> utilities are really not the user's responsibility, so libcython
> doesn't need to be compiled with the same flags as the extension
> module. If still wanted, the user could either recompile python with
> different CFLAGS (which means libcython will get those as well), or
> not use libcython at all. CFLAGS should really only pertain to user
> code, not to the Cython library, which the user shouldn't be concerned
> about.
>
> > Next, we may not know at build time which set of Cython modules is in the
> > package. This may be less of an issue if we rely on "cythonize()" in
> > setup.py to compile all modules before hand (assuming that the user
> doesn't
> > call it twice, once for *.pyx, once for *.py, for example), but even if
> we
> > know all modules, we'd still have to figure out the complete set of
> utility
> > code used by all modules in order to build an adapted library with only
> the
> > necessary code used in the package. So we'd always end up with a complete
> > library with all utility code, which is only really interesting for
> larger
> > packages with several Cython modules.
> > I agree with Robert that a CEP would be needed for this, both for
> clearing
> > up the implications and actual use cases (I know that Sage is a
> reasonable
> > use case, but it's also a rather special case).
> >
> >
> >> This will decrease Cython and C
> >> compile time, and will also make executables smaller.
> >
> > I don't see how this actually impacts executables. However, a
> self-contained
> > executable is a value in itself.
> >
> >
> >> This could be
> >> enabled using a command line option to Cython, as well as with
> >> distutils, eventually we may decide to make it the default (lets
> >> figure that out later). Preferably libcython.so would be installed
> >> alongside libpython.so and cython.h inside the Python include
> >> directory.
> >
> > I don't see this happening. It's easy for Python (there is only one
> Python
> > running at a time, with one libpython loaded), but it's a lot less safe
> for
> > different versions of a Cython library that are used by different modules
> > inside of the running Python. For example, we'd have to version all
> visible
> > symbols in operating systems with flat namespaces, in order to support
> > loading multiple versions of the library.
> >
> >
> >> Lastly, I think we also should figure out a way to serialize Entry
> >> objects from CythonUtilities, which could easily and swiftly be loaded
> >> when creating the cython scope. It's quite a pain to declare all
> >> entries for utilities you write manually
> >
> > Why would you declare them manually? I thought everything would be moved
> out
> > into the utility code files?
> >
>
> Right, the code is in the utility files. However, the cython scope
> needs to have the entries of the classes and functions of the
> utilities. e.g. the user may write
>
> cimport cython
>
> cdef cython.array myobject
>
> For this to work, we need an 'array' entry, which we don't have yet,
> as the utility code will be parsed at code generation time if an entry
> of that utility code (which doesn't exist yet!) is used.
>
> >> so what I mostly did was
> >> parse the utility up to and including AnalyseDeclarationsTransform,
> >> and then retrieve the entries from there.
> >
> > Sounds like a drawback regarding the processing time, but may still be a
> > reasonable way to do it. I would expect that it won't be hard to pickle
> the
> > resulting dict of entries into a cache file and rebuild it only when one
> of
> > the utility files changes.
>
> Exactly. I'm not sure about pickle though, but the details don't
> matter. Pickle is certainly easy as long as you don't change your
> interface (which we most certainly will, though).
>
> We can version the cache to handle this.

- Robert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20111005/a7d500f9/attachment.html>

From stefan_ml at behnel.de  Thu Oct  6 08:46:51 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 06 Oct 2011 08:46:51 +0200
Subject: [Cython] Utilities, cython.h, libcython
In-Reply-To: <CANg26EWEjJh-uuFMUaxQAOQ6bKSdmG70Y6CxrzBjOQb_XPhHxA@mail.gmail.com>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>	<4E8C0448.6010204@behnel.de>
	<CANg26EWEjJh-uuFMUaxQAOQ6bKSdmG70Y6CxrzBjOQb_XPhHxA@mail.gmail.com>
Message-ID: <4E8D4EDB.2090009@behnel.de>

mark florisson, 05.10.2011 15:53:
> On 5 October 2011 08:16, Stefan Behnel wrote:
>> mark florisson, 04.10.2011 23:19:
>>> Another issue is that Cython compile time is increasing with the
>>> addition of control flow and cython utilities. If you use fused types
>>> you're also going to combinatorially add more compile time.
>>
>> I don't see that locally - a compiled Cython is hugely fast for me. In
>> comparison, the C compiler literally takes ages to compile the result. An
>> external shared library may or may not help with both - in particular, it is
>> not clear to me what makes the C compiler slow. If the compile time is
>> dominated by the number of inlined functions (which is not unlikely), a
>> shared library + header file will not make a difference.
>
> Have you tried with the memoryviews merged?

No. I didn't expect the difference to be quite that large.


> e.g. if I have this code:
>
> from libc.stdlib cimport malloc
> cdef int[:] slice =<int[:10]>  <int *>  malloc(sizeof(int) * 10)
>
> [0] [14:45] ~  ? time cython test.pyx
> cython test.pyx  2.61s user 0.08s system 99% cpu 2.695 total
> [0] [14:45] ~  ? time zsh compile
> zsh compile  1.88s user 0.06s system 99% cpu 1.946 total
>
> where 'compile' is the script that invoked the same gcc command
> distutils uses.  As you can see it took more than 2.5 seconds to
> compile this code (simply because the memoryview utilities get
> included).

Ok, that hints at serious performance problems. Could you profile it to see 
where the issues are? Is it more that the code is loaded from an external 
file? Or the fact that more utility code is parsed than necessary?

It's certainly not obvious why the inclusion of static code, even from an 
external file, should make any difference.

That being said, it's not we were lacking the infrastructure for making 
Python code run faster ...


>>> I'm sure
>>> this came up earlier, but I really think we should have a libcython
>>> and a cython.h. libcython (a shared library) should contain any common
>>> Cython-specific code not meant to be inlined, and cython.h any types,
>>> macros and inline functions etc.
>>
>> This has a couple of implications though. In order to support this on the
>> user side, we have to build one shared library per installed package in
>> order to avoid any Cython versioning issues. Just installing a versioned
>> "libcython_x.y.z.so" globally isn't enough, especially during development,
>> but also at deployment time. Different packages may use different CFLAGS or
>> Cython options, which may have an impact on the result. Encoding all
>> possible factors in the file name will be cumbersome and may mean that we
>> still end up with a number of installed Cython libraries that correlates
>> with the number of installed Cython based packages.
>
> Hm, I think the CFLAGS are important so long as they are compatible
> with Python. When the user compiles a Cython extension module with
> extra CFLAGS, this doesn't affect libpython. Similarly, the Cython
> utilities are really not the user's responsibility, so libcython
> doesn't need to be compiled with the same flags as the extension
> module. If still wanted, the user could either recompile python with
> different CFLAGS (which means libcython will get those as well), or
> not use libcython at all. CFLAGS should really only pertain to user
> code, not to the Cython library, which the user shouldn't be concerned
> about.

Well, it's either the user or the OS distribution that installs (and 
potentially builds) the libraries. That already makes it two responsible 
entities for many systems that have to agree on what gets installed in what 
way. I'm just saying, don't underestimate the details in world wide 
deployments.

Stefan

From robertwb at math.washington.edu  Thu Oct  6 09:50:17 2011
From: robertwb at math.washington.edu (Robert Bradshaw)
Date: Thu, 6 Oct 2011 00:50:17 -0700
Subject: [Cython] [cython-users] Re: callback function pointer problem
In-Reply-To: <4E8CDD05.2080102@canterbury.ac.nz>
References: <4E835336.1060800@gmail.com> <4E8398B5.6050905@gmail.com>
	<CAKGHGPTUn=g1VxpfMXaETgFXEu=sjbz=2gZRz7W-o4ggQ+-2bw@mail.gmail.com>
	<4E8421D0.5010007@gmail.com> <4E844BD0.4040207@gmail.com>
	<4E845B24.6060102@astro.uio.no> <4E845BAE.307@astro.uio.no>
	<CANg26EWDmkbunP+4ezuN9ggk1MPe-6b6oPu7tgRJ1G0y1wTuXQ@mail.gmail.com>
	<4E8460E8.5050701@gmail.com>
	<CANg26EWL1uZfPAe1HcCaGDqTx9VkLwYhLhGGicx-ytO6K00oqQ@mail.gmail.com>
	<CADiQ+QAT0ubtbdLL+mnOWAkGziL_dNusu2dZQXddHkUis4-v7g@mail.gmail.com>
	<CANg26EUquDZQqiEf0ppFoiOJHj2z7asnp3T8eCa=ynkAu6RDPA@mail.gmail.com>
	<588dc249-8f0b-49f2-bf42-23978ea95ddf@email.android.com>
	<CADiQ+QAKqCpFZF8sf+6JHaDi_YE+V+tYGR4GGHL8dBVb7=XS4w@mail.gmail.com>
	<4E8CDD05.2080102@canterbury.ac.nz>
Message-ID: <CADiQ+QChqT0RvZuT_za_B+nEkTNTpxD2qtPveHUmd2BtJD_zmQ@mail.gmail.com>

On Wed, Oct 5, 2011 at 3:41 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Robert Bradshaw wrote:
>
>> On this note, eventually I would like coerce structs (and unions,
>> enums) to auto-generated wrapper classes, visible in the Python module
>> namespace if one declares them as "cpdef struct ..."
>
> Would these wrapper classes contain a copy of the struct,
> or would they reference the struct? If they reference it,
> there would be issues with the lifetime of the referenced
> data.

They'd contain a copy, which I also think would match expectations
better as well.

- Robert

From markflorisson88 at gmail.com  Thu Oct  6 11:45:55 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Thu, 6 Oct 2011 10:45:55 +0100
Subject: [Cython] Utilities, cython.h, libcython
In-Reply-To: <CADiQ+QBrEgz00+TWLADQo--xhufTG5o2MPHNmSWTiT0-R1KJoA@mail.gmail.com>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
	<CADiQ+QBaF22aWveZ75EcDcgekQPzihvkDPijEy15_-1cB7hKqA@mail.gmail.com>
	<CANg26EU1gWG+Guj8hMRr=mGS62A6FdiPz2EVYjNPAhKruvd6-A@mail.gmail.com>
	<CADiQ+QBrEgz00+TWLADQo--xhufTG5o2MPHNmSWTiT0-R1KJoA@mail.gmail.com>
Message-ID: <CANg26EWJm7ksg9WW2i5mhoW98fr-zNEmkhxCyfMuK4S=fwtssQ@mail.gmail.com>

On 6 October 2011 01:05, Robert Bradshaw <robertwb at math.washington.edu> wrote:
> On Wednesday, October 5, 2011, mark florisson wrote:
>>
>> On 5 October 2011 01:46, Robert Bradshaw <robertwb at math.washington.edu>
>> wrote:
>> > On Tue, Oct 4, 2011 at 2:19 PM, mark florisson
>> > <markflorisson88 at gmail.com> wrote:
>> >> Hey,
>> >>
>> >> I briefly mentioned something about this in a pull request, but maybe
>> >> it deserves some actual discussion on the ML.
>> >>
>> >> So I propose that after fused types gets merged we try to move as many
>> >> utility codes as possible to their utility code files (unless they are
>> >> used in pending pull requests or other branches). Preferably this will
>> >> be done in one or a few commits. How should we split up the work, any
>> >> volunteers? Perhaps people who wrote certain utilities also want to
>> >> move them? In that case, we should start a new branch and then merge
>> >> that into master when it's done.
>> >> We could actually move things before fused types get merged, as long
>> >> as we don't touch binding_cfunc_utility_code.
>> >
>> > +1 to moving towards this, but I don't see the urgency or need to do
>> > it all at once (though if there's going to be a big push, lets
>> > coordinate on a wiki or trac).
>>
>> Hm, perhaps there is no strict need to hurry, as long as we take care
>> not to modify utilities after they have been moved. The wiki could be
>> great for that, but I personally don't keep track of everyone's
>> branches, so I don't know which utility is modified by whom (if at
>> all), so strictly speaking (to avoid painful merges) I'd have to ask
>> everyone each time I wanted to move something, or dig through
>> everyone's branches.
>
> I was proposing that everyone lists the utility code sections that are
> likely to cause merge conflicts on a wiki page, and the rest are fair game.

Ah ok, that sounds good.

>>
>> >> Before we go there, Stefan, do we still want to implement the header
>> >> .ini style which can list dependencies and such? I personally don't
>> >> care very much about it, but memoryviews and the utility loaders are
>> >> merged so if someone wants to take up that job, it'd be good to do
>> >> before moving the utilities.
>> >>
>> >> Another issue is that Cython compile time is increasing with the
>> >> addition of control flow and cython utilities. If you use fused types
>> >> you're also going to combinatorially add more compile time.
>> >
>> > Yeah, this was especially obvious with, e.g. cython.compile(...). (In
>> > particular, some utility code was being parsed before it could even
>> > figure out whether it needed to do a full re-compile...)
>> >
>> >> I'm sure
>> >> this came up earlier, but I really think we should have a libcython
>> >> and a cython.h. libcython (a shared library) should contain any common
>> >> Cython-specific code not meant to be inlined, and cython.h any types,
>> >> macros and inline functions etc. This will decrease Cython and C
>> >> compile time, and will also make executables smaller.
>> >
>> > +1. Yes, we talked about this earlier, but nothing concrete was
>> > planned. It's probably worth a CEP, if anything to have a concrete
>> > plan recorded somewhere other than a series of mailing list threads
>> > (though discussion tends to work best here).
>> >
>> >> This could be
>> >> enabled using a command line option to Cython, as well as with
>> >> distutils, eventually we may decide to make it the default (lets
>> >> figure that out later). Preferably libcython.so would be installed
>> >> alongside libpython.so and cython.h inside the Python include
>> >> directory. Assuming multiple versions of Cython and multiple Python
>> >> installations, we'd need to come up with a versioning scheme for
>> >> either.
>> >
>> > I would propose a cython.h file that sits in Cython/Compiler/Include
>> > (or similar), as a first step. The .pyx -> .c pass could be configured
>> > to copy this to a specific location (for shipping just the generated
>> > .c files).
>>
>> That would be fine as well. It might be convenient for users in that
>> case if we could provide a cython.get_include() in addition to the
>> distutils hooks, and a cython-config script.
>
> For sure. We could also have a cython.get_shared_library() (common_code?
> cython_module?) which would return an Extension object to build.
>
>>
>> > One option is to build the shared library as a companion
>> > _cython_x_y_z.so module which, while not as efficient as linking at
>> > the C level, would probably be much simpler to implement in a
>> > cross-platform way. (This perhaps merits some benchmarks, but the main
>> > contents is likely to be things like shared classes and objects.)
>> > Actually linking .so files from modules that cimport each other would
>> > be a nice feature down the road anyways. Again, the associated .c file
>> > could be (optionally) generated/copied during the .pyx -> .c step.
>> > Installation would determine if the required module exists, and if not
>> > build and install it.
>>
>> Hm, that's a really good idea. I think the only overhead would be the
>> capsule unpacking and pointer duplication, but that shouldn't suddenly
>> be an issue. That means we don't have to do any versioning of the
>> libraries and the symbols to avoid clashes in a flat namespaces as
>> Stefan mentioned.
>
> I'm not sure what the overhead is, if any, in calling function pointers vs.
> actually linking things together at the C level (which is essentially the
> same idea, but perhaps addresses are resolved at library load time rather
> than requiring a dereference on each call?)

I think there isn't any difference with dynamic linking and having a
pointer. My understanding (of ELF shared libraries) is that the
procedure lookup table will contain the actual address of the symbol
(likely after the first reference to it has been made, it may have a
stub that resolves the symbol and replaces it's own address with the
actual address), which to me sounds like the same thing as a pointer.
I think only static linking can prevent this, i.e. directly encode the
static address into the call opcode, but I'm not an expert.

>>
>> >> We could also provide a static library there, for users who want to
>> >> link and ship a compiled and statically linked version of their code.
>> >> For a local Cython that isn't built, we can ignore the header and
>> >> shared library option and issue a warning or some such.
>> >>
>> >> Lastly, I think we also should figure out a way to serialize Entry
>> >> objects from CythonUtilities, which could easily and swiftly be loaded
>> >> when creating the cython scope. It's quite a pain to declare all
>> >> entries for utilities you write manually, so what I mostly did was
>> >> parse the utility up to and including AnalyseDeclarationsTransform,
>> >> and then retrieve the entries from there.
>> >
>> > This would be really nice too. Way back in the day I did some work
>> > with trying to pickle full module scopes, but that soon became too
>> > painful as there are so many far-reaching references. Pickling
>> > individual Entries and re-building modules will probably be a more
>> > tractable goal. Eventually, I'd like to see a way to cache the full
>> > pxd pipeline.
>> >
>> > - Robert
>> > _______________________________________________
>> > cython-devel mailing list
>> > cython-devel at python.org
>> > http://mail.python.org/mailman/listinfo/cython-devel
>> >
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>
>

From markflorisson88 at gmail.com  Thu Oct  6 11:46:20 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Thu, 6 Oct 2011 10:46:20 +0100
Subject: [Cython] Utilities, cython.h, libcython
In-Reply-To: <4E8D4EDB.2090009@behnel.de>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
	<4E8C0448.6010204@behnel.de>
	<CANg26EWEjJh-uuFMUaxQAOQ6bKSdmG70Y6CxrzBjOQb_XPhHxA@mail.gmail.com>
	<4E8D4EDB.2090009@behnel.de>
Message-ID: <CANg26EVyByqdWuV-Lwksz-qbFDdDd=SZ9bS=FNhnON2ZaHMX1g@mail.gmail.com>

On 6 October 2011 07:46, Stefan Behnel <stefan_ml at behnel.de> wrote:
> mark florisson, 05.10.2011 15:53:
>>
>> On 5 October 2011 08:16, Stefan Behnel wrote:
>>>
>>> mark florisson, 04.10.2011 23:19:
>>>>
>>>> Another issue is that Cython compile time is increasing with the
>>>> addition of control flow and cython utilities. If you use fused types
>>>> you're also going to combinatorially add more compile time.
>>>
>>> I don't see that locally - a compiled Cython is hugely fast for me. In
>>> comparison, the C compiler literally takes ages to compile the result. An
>>> external shared library may or may not help with both - in particular, it
>>> is
>>> not clear to me what makes the C compiler slow. If the compile time is
>>> dominated by the number of inlined functions (which is not unlikely), a
>>> shared library + header file will not make a difference.
>>
>> Have you tried with the memoryviews merged?
>
> No. I didn't expect the difference to be quite that large.
>
>
>> e.g. if I have this code:
>>
>> from libc.stdlib cimport malloc
>> cdef int[:] slice =<int[:10]> ?<int *> ?malloc(sizeof(int) * 10)
>>
>> [0] [14:45] ~ ?? time cython test.pyx
>> cython test.pyx ?2.61s user 0.08s system 99% cpu 2.695 total
>> [0] [14:45] ~ ?? time zsh compile
>> zsh compile ?1.88s user 0.06s system 99% cpu 1.946 total
>>
>> where 'compile' is the script that invoked the same gcc command
>> distutils uses. ?As you can see it took more than 2.5 seconds to
>> compile this code (simply because the memoryview utilities get
>> included).
>
> Ok, that hints at serious performance problems. Could you profile it to see
> where the issues are? Is it more that the code is loaded from an external
> file? Or the fact that more utility code is parsed than necessary?

I haven't profiled it yet (I'll do that), but I'm fairly sure it's the
parsing of Cython utility files (not the loading). Maybe Tempita also
adds to the overhead, I'll find out.

> It's certainly not obvious why the inclusion of static code, even from an
> external file, should make any difference.
>
> That being said, it's not we were lacking the infrastructure for making
> Python code run faster ...
>

Heh, indeed. In this case I think caching will solve all our problems.

>>>> I'm sure
>>>> this came up earlier, but I really think we should have a libcython
>>>> and a cython.h. libcython (a shared library) should contain any common
>>>> Cython-specific code not meant to be inlined, and cython.h any types,
>>>> macros and inline functions etc.
>>>
>>> This has a couple of implications though. In order to support this on the
>>> user side, we have to build one shared library per installed package in
>>> order to avoid any Cython versioning issues. Just installing a versioned
>>> "libcython_x.y.z.so" globally isn't enough, especially during
>>> development,
>>> but also at deployment time. Different packages may use different CFLAGS
>>> or
>>> Cython options, which may have an impact on the result. Encoding all
>>> possible factors in the file name will be cumbersome and may mean that we
>>> still end up with a number of installed Cython libraries that correlates
>>> with the number of installed Cython based packages.
>>
>> Hm, I think the CFLAGS are important so long as they are compatible
>> with Python. When the user compiles a Cython extension module with
>> extra CFLAGS, this doesn't affect libpython. Similarly, the Cython
>> utilities are really not the user's responsibility, so libcython
>> doesn't need to be compiled with the same flags as the extension
>> module. If still wanted, the user could either recompile python with
>> different CFLAGS (which means libcython will get those as well), or
>> not use libcython at all. CFLAGS should really only pertain to user
>> code, not to the Cython library, which the user shouldn't be concerned
>> about.
>
> Well, it's either the user or the OS distribution that installs (and
> potentially builds) the libraries. That already makes it two responsible
> entities for many systems that have to agree on what gets installed in what
> way. I'm just saying, don't underestimate the details in world wide
> deployments.
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From vitja.makarov at gmail.com  Thu Oct  6 22:56:43 2011
From: vitja.makarov at gmail.com (Vitja Makarov)
Date: Fri, 7 Oct 2011 00:56:43 +0400
Subject: [Cython] Utilities, cython.h, libcython
In-Reply-To: <CANg26EVyByqdWuV-Lwksz-qbFDdDd=SZ9bS=FNhnON2ZaHMX1g@mail.gmail.com>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
	<4E8C0448.6010204@behnel.de>
	<CANg26EWEjJh-uuFMUaxQAOQ6bKSdmG70Y6CxrzBjOQb_XPhHxA@mail.gmail.com>
	<4E8D4EDB.2090009@behnel.de>
	<CANg26EVyByqdWuV-Lwksz-qbFDdDd=SZ9bS=FNhnON2ZaHMX1g@mail.gmail.com>
Message-ID: <CAKGHGPRt_qmyOv3yNdDweg9-4FcLwxe1JEdJ813dwekP-yinYw@mail.gmail.com>

2011/10/6 mark florisson <markflorisson88 at gmail.com>:
> On 6 October 2011 07:46, Stefan Behnel <stefan_ml at behnel.de> wrote:
>> mark florisson, 05.10.2011 15:53:
>>>
>>> On 5 October 2011 08:16, Stefan Behnel wrote:
>>>>
>>>> mark florisson, 04.10.2011 23:19:
>>>>>
>>>>> Another issue is that Cython compile time is increasing with the
>>>>> addition of control flow and cython utilities. If you use fused types
>>>>> you're also going to combinatorially add more compile time.
>>>>
>>>> I don't see that locally - a compiled Cython is hugely fast for me. In
>>>> comparison, the C compiler literally takes ages to compile the result. An
>>>> external shared library may or may not help with both - in particular, it
>>>> is
>>>> not clear to me what makes the C compiler slow. If the compile time is
>>>> dominated by the number of inlined functions (which is not unlikely), a
>>>> shared library + header file will not make a difference.
>>>
>>> Have you tried with the memoryviews merged?
>>
>> No. I didn't expect the difference to be quite that large.
>>
>>
>>> e.g. if I have this code:
>>>
>>> from libc.stdlib cimport malloc
>>> cdef int[:] slice =<int[:10]> ?<int *> ?malloc(sizeof(int) * 10)
>>>
>>> [0] [14:45] ~ ?? time cython test.pyx
>>> cython test.pyx ?2.61s user 0.08s system 99% cpu 2.695 total
>>> [0] [14:45] ~ ?? time zsh compile
>>> zsh compile ?1.88s user 0.06s system 99% cpu 1.946 total
>>>
>>> where 'compile' is the script that invoked the same gcc command
>>> distutils uses. ?As you can see it took more than 2.5 seconds to
>>> compile this code (simply because the memoryview utilities get
>>> included).
>>
>> Ok, that hints at serious performance problems. Could you profile it to see
>> where the issues are? Is it more that the code is loaded from an external
>> file? Or the fact that more utility code is parsed than necessary?
>
> I haven't profiled it yet (I'll do that), but I'm fairly sure it's the
> parsing of Cython utility files (not the loading). Maybe Tempita also
> adds to the overhead, I'll find out.
>

Compiling this regex gives 5ms instead of 10ms on my machine

https://github.com/cython/cython/blob/master/Cython/Compiler/Code.py#L85

And on your example gives 3% speedup


>> It's certainly not obvious why the inclusion of static code, even from an
>> external file, should make any difference.
>>
>> That being said, it's not we were lacking the infrastructure for making
>> Python code run faster ...
>>
>
> Heh, indeed. In this case I think caching will solve all our problems.
>
>>>>> I'm sure
>>>>> this came up earlier, but I really think we should have a libcython
>>>>> and a cython.h. libcython (a shared library) should contain any common
>>>>> Cython-specific code not meant to be inlined, and cython.h any types,
>>>>> macros and inline functions etc.
>>>>
>>>> This has a couple of implications though. In order to support this on the
>>>> user side, we have to build one shared library per installed package in
>>>> order to avoid any Cython versioning issues. Just installing a versioned
>>>> "libcython_x.y.z.so" globally isn't enough, especially during
>>>> development,
>>>> but also at deployment time. Different packages may use different CFLAGS
>>>> or
>>>> Cython options, which may have an impact on the result. Encoding all
>>>> possible factors in the file name will be cumbersome and may mean that we
>>>> still end up with a number of installed Cython libraries that correlates
>>>> with the number of installed Cython based packages.
>>>
>>> Hm, I think the CFLAGS are important so long as they are compatible
>>> with Python. When the user compiles a Cython extension module with
>>> extra CFLAGS, this doesn't affect libpython. Similarly, the Cython
>>> utilities are really not the user's responsibility, so libcython
>>> doesn't need to be compiled with the same flags as the extension
>>> module. If still wanted, the user could either recompile python with
>>> different CFLAGS (which means libcython will get those as well), or
>>> not use libcython at all. CFLAGS should really only pertain to user
>>> code, not to the Cython library, which the user shouldn't be concerned
>>> about.
>>
>> Well, it's either the user or the OS distribution that installs (and
>> potentially builds) the libraries. That already makes it two responsible
>> entities for many systems that have to agree on what gets installed in what
>> way. I'm just saying, don't underestimate the details in world wide
>> deployments.
>>


-- 
vitja.

From markflorisson88 at gmail.com  Thu Oct  6 23:02:24 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Thu, 6 Oct 2011 22:02:24 +0100
Subject: [Cython] Utilities, cython.h, libcython
In-Reply-To: <CAKGHGPRt_qmyOv3yNdDweg9-4FcLwxe1JEdJ813dwekP-yinYw@mail.gmail.com>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
	<4E8C0448.6010204@behnel.de>
	<CANg26EWEjJh-uuFMUaxQAOQ6bKSdmG70Y6CxrzBjOQb_XPhHxA@mail.gmail.com>
	<4E8D4EDB.2090009@behnel.de>
	<CANg26EVyByqdWuV-Lwksz-qbFDdDd=SZ9bS=FNhnON2ZaHMX1g@mail.gmail.com>
	<CAKGHGPRt_qmyOv3yNdDweg9-4FcLwxe1JEdJ813dwekP-yinYw@mail.gmail.com>
Message-ID: <CANg26EXMMO059ox_+uXW4riBk2NumLGd6Uk5-LruNLoWbFNDsQ@mail.gmail.com>

On 6 October 2011 21:56, Vitja Makarov <vitja.makarov at gmail.com> wrote:
> 2011/10/6 mark florisson <markflorisson88 at gmail.com>:
>> On 6 October 2011 07:46, Stefan Behnel <stefan_ml at behnel.de> wrote:
>>> mark florisson, 05.10.2011 15:53:
>>>>
>>>> On 5 October 2011 08:16, Stefan Behnel wrote:
>>>>>
>>>>> mark florisson, 04.10.2011 23:19:
>>>>>>
>>>>>> Another issue is that Cython compile time is increasing with the
>>>>>> addition of control flow and cython utilities. If you use fused types
>>>>>> you're also going to combinatorially add more compile time.
>>>>>
>>>>> I don't see that locally - a compiled Cython is hugely fast for me. In
>>>>> comparison, the C compiler literally takes ages to compile the result. An
>>>>> external shared library may or may not help with both - in particular, it
>>>>> is
>>>>> not clear to me what makes the C compiler slow. If the compile time is
>>>>> dominated by the number of inlined functions (which is not unlikely), a
>>>>> shared library + header file will not make a difference.
>>>>
>>>> Have you tried with the memoryviews merged?
>>>
>>> No. I didn't expect the difference to be quite that large.
>>>
>>>
>>>> e.g. if I have this code:
>>>>
>>>> from libc.stdlib cimport malloc
>>>> cdef int[:] slice =<int[:10]> ?<int *> ?malloc(sizeof(int) * 10)
>>>>
>>>> [0] [14:45] ~ ?? time cython test.pyx
>>>> cython test.pyx ?2.61s user 0.08s system 99% cpu 2.695 total
>>>> [0] [14:45] ~ ?? time zsh compile
>>>> zsh compile ?1.88s user 0.06s system 99% cpu 1.946 total
>>>>
>>>> where 'compile' is the script that invoked the same gcc command
>>>> distutils uses. ?As you can see it took more than 2.5 seconds to
>>>> compile this code (simply because the memoryview utilities get
>>>> included).
>>>
>>> Ok, that hints at serious performance problems. Could you profile it to see
>>> where the issues are? Is it more that the code is loaded from an external
>>> file? Or the fact that more utility code is parsed than necessary?
>>
>> I haven't profiled it yet (I'll do that), but I'm fairly sure it's the
>> parsing of Cython utility files (not the loading). Maybe Tempita also
>> adds to the overhead, I'll find out.
>>
>
> Compiling this regex gives 5ms instead of 10ms on my machine
>
> https://github.com/cython/cython/blob/master/Cython/Compiler/Code.py#L85
>
> And on your example gives 3% speedup
>

Sorry, which code gets you 10ms? Also, is this about loading + regex
matching, or just about compiling the pattern?

In any case, libcython would solve these issues. Profiling will still
be useful though.

>>> It's certainly not obvious why the inclusion of static code, even from an
>>> external file, should make any difference.
>>>
>>> That being said, it's not we were lacking the infrastructure for making
>>> Python code run faster ...
>>>
>>
>> Heh, indeed. In this case I think caching will solve all our problems.
>>
>>>>>> I'm sure
>>>>>> this came up earlier, but I really think we should have a libcython
>>>>>> and a cython.h. libcython (a shared library) should contain any common
>>>>>> Cython-specific code not meant to be inlined, and cython.h any types,
>>>>>> macros and inline functions etc.
>>>>>
>>>>> This has a couple of implications though. In order to support this on the
>>>>> user side, we have to build one shared library per installed package in
>>>>> order to avoid any Cython versioning issues. Just installing a versioned
>>>>> "libcython_x.y.z.so" globally isn't enough, especially during
>>>>> development,
>>>>> but also at deployment time. Different packages may use different CFLAGS
>>>>> or
>>>>> Cython options, which may have an impact on the result. Encoding all
>>>>> possible factors in the file name will be cumbersome and may mean that we
>>>>> still end up with a number of installed Cython libraries that correlates
>>>>> with the number of installed Cython based packages.
>>>>
>>>> Hm, I think the CFLAGS are important so long as they are compatible
>>>> with Python. When the user compiles a Cython extension module with
>>>> extra CFLAGS, this doesn't affect libpython. Similarly, the Cython
>>>> utilities are really not the user's responsibility, so libcython
>>>> doesn't need to be compiled with the same flags as the extension
>>>> module. If still wanted, the user could either recompile python with
>>>> different CFLAGS (which means libcython will get those as well), or
>>>> not use libcython at all. CFLAGS should really only pertain to user
>>>> code, not to the Cython library, which the user shouldn't be concerned
>>>> about.
>>>
>>> Well, it's either the user or the OS distribution that installs (and
>>> potentially builds) the libraries. That already makes it two responsible
>>> entities for many systems that have to agree on what gets installed in what
>>> way. I'm just saying, don't underestimate the details in world wide
>>> deployments.
>>>
>
>
>
> --
> vitja.
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From vitja.makarov at gmail.com  Thu Oct  6 23:07:22 2011
From: vitja.makarov at gmail.com (Vitja Makarov)
Date: Fri, 7 Oct 2011 01:07:22 +0400
Subject: [Cython] Utilities, cython.h, libcython
In-Reply-To: <CANg26EXMMO059ox_+uXW4riBk2NumLGd6Uk5-LruNLoWbFNDsQ@mail.gmail.com>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
	<4E8C0448.6010204@behnel.de>
	<CANg26EWEjJh-uuFMUaxQAOQ6bKSdmG70Y6CxrzBjOQb_XPhHxA@mail.gmail.com>
	<4E8D4EDB.2090009@behnel.de>
	<CANg26EVyByqdWuV-Lwksz-qbFDdDd=SZ9bS=FNhnON2ZaHMX1g@mail.gmail.com>
	<CAKGHGPRt_qmyOv3yNdDweg9-4FcLwxe1JEdJ813dwekP-yinYw@mail.gmail.com>
	<CANg26EXMMO059ox_+uXW4riBk2NumLGd6Uk5-LruNLoWbFNDsQ@mail.gmail.com>
Message-ID: <CAKGHGPS-HTbE5Mw8zCbOsbtCVH=uStu5NttJM=JGTACtpngHeQ@mail.gmail.com>

2011/10/7 mark florisson <markflorisson88 at gmail.com>:
> On 6 October 2011 21:56, Vitja Makarov <vitja.makarov at gmail.com> wrote:
>> 2011/10/6 mark florisson <markflorisson88 at gmail.com>:
>>> On 6 October 2011 07:46, Stefan Behnel <stefan_ml at behnel.de> wrote:
>>>> mark florisson, 05.10.2011 15:53:
>>>>>
>>>>> On 5 October 2011 08:16, Stefan Behnel wrote:
>>>>>>
>>>>>> mark florisson, 04.10.2011 23:19:
>>>>>>>
>>>>>>> Another issue is that Cython compile time is increasing with the
>>>>>>> addition of control flow and cython utilities. If you use fused types
>>>>>>> you're also going to combinatorially add more compile time.
>>>>>>
>>>>>> I don't see that locally - a compiled Cython is hugely fast for me. In
>>>>>> comparison, the C compiler literally takes ages to compile the result. An
>>>>>> external shared library may or may not help with both - in particular, it
>>>>>> is
>>>>>> not clear to me what makes the C compiler slow. If the compile time is
>>>>>> dominated by the number of inlined functions (which is not unlikely), a
>>>>>> shared library + header file will not make a difference.
>>>>>
>>>>> Have you tried with the memoryviews merged?
>>>>
>>>> No. I didn't expect the difference to be quite that large.
>>>>
>>>>
>>>>> e.g. if I have this code:
>>>>>
>>>>> from libc.stdlib cimport malloc
>>>>> cdef int[:] slice =<int[:10]> ?<int *> ?malloc(sizeof(int) * 10)
>>>>>
>>>>> [0] [14:45] ~ ?? time cython test.pyx
>>>>> cython test.pyx ?2.61s user 0.08s system 99% cpu 2.695 total
>>>>> [0] [14:45] ~ ?? time zsh compile
>>>>> zsh compile ?1.88s user 0.06s system 99% cpu 1.946 total
>>>>>
>>>>> where 'compile' is the script that invoked the same gcc command
>>>>> distutils uses. ?As you can see it took more than 2.5 seconds to
>>>>> compile this code (simply because the memoryview utilities get
>>>>> included).
>>>>
>>>> Ok, that hints at serious performance problems. Could you profile it to see
>>>> where the issues are? Is it more that the code is loaded from an external
>>>> file? Or the fact that more utility code is parsed than necessary?
>>>
>>> I haven't profiled it yet (I'll do that), but I'm fairly sure it's the
>>> parsing of Cython utility files (not the loading). Maybe Tempita also
>>> adds to the overhead, I'll find out.
>>>
>>
>> Compiling this regex gives 5ms instead of 10ms on my machine
>>
>> https://github.com/cython/cython/blob/master/Cython/Compiler/Code.py#L85
>>
>> And on your example gives 3% speedup
>>
>
> Sorry, which code gets you 10ms? Also, is this about loading + regex
> matching, or just about compiling the pattern?
>

I've added decorator for load_utilities_from_file that prints time for
current call and total sum for this function and total gives 10ms.

Btw that's not that much.


> In any case, libcython would solve these issues. Profiling will still
> be useful though.
>
>>>> It's certainly not obvious why the inclusion of static code, even from an
>>>> external file, should make any difference.
>>>>
>>>> That being said, it's not we were lacking the infrastructure for making
>>>> Python code run faster ...
>>>>
>>>
>>> Heh, indeed. In this case I think caching will solve all our problems.
>>>
>>>>>>> I'm sure
>>>>>>> this came up earlier, but I really think we should have a libcython
>>>>>>> and a cython.h. libcython (a shared library) should contain any common
>>>>>>> Cython-specific code not meant to be inlined, and cython.h any types,
>>>>>>> macros and inline functions etc.
>>>>>>
>>>>>> This has a couple of implications though. In order to support this on the
>>>>>> user side, we have to build one shared library per installed package in
>>>>>> order to avoid any Cython versioning issues. Just installing a versioned
>>>>>> "libcython_x.y.z.so" globally isn't enough, especially during
>>>>>> development,
>>>>>> but also at deployment time. Different packages may use different CFLAGS
>>>>>> or
>>>>>> Cython options, which may have an impact on the result. Encoding all
>>>>>> possible factors in the file name will be cumbersome and may mean that we
>>>>>> still end up with a number of installed Cython libraries that correlates
>>>>>> with the number of installed Cython based packages.
>>>>>
>>>>> Hm, I think the CFLAGS are important so long as they are compatible
>>>>> with Python. When the user compiles a Cython extension module with
>>>>> extra CFLAGS, this doesn't affect libpython. Similarly, the Cython
>>>>> utilities are really not the user's responsibility, so libcython
>>>>> doesn't need to be compiled with the same flags as the extension
>>>>> module. If still wanted, the user could either recompile python with
>>>>> different CFLAGS (which means libcython will get those as well), or
>>>>> not use libcython at all. CFLAGS should really only pertain to user
>>>>> code, not to the Cython library, which the user shouldn't be concerned
>>>>> about.
>>>>
>>>> Well, it's either the user or the OS distribution that installs (and
>>>> potentially builds) the libraries. That already makes it two responsible
>>>> entities for many systems that have to agree on what gets installed in what
>>>> way. I'm just saying, don't underestimate the details in world wide
>>>> deployments.
>>>>
>>
>>
>>
>> --
>> vitja.
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>


-- 
vitja.

From vitja.makarov at gmail.com  Thu Oct  6 23:12:10 2011
From: vitja.makarov at gmail.com (Vitja Makarov)
Date: Fri, 7 Oct 2011 01:12:10 +0400
Subject: [Cython] Utilities, cython.h, libcython
In-Reply-To: <CAKGHGPS-HTbE5Mw8zCbOsbtCVH=uStu5NttJM=JGTACtpngHeQ@mail.gmail.com>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
	<4E8C0448.6010204@behnel.de>
	<CANg26EWEjJh-uuFMUaxQAOQ6bKSdmG70Y6CxrzBjOQb_XPhHxA@mail.gmail.com>
	<4E8D4EDB.2090009@behnel.de>
	<CANg26EVyByqdWuV-Lwksz-qbFDdDd=SZ9bS=FNhnON2ZaHMX1g@mail.gmail.com>
	<CAKGHGPRt_qmyOv3yNdDweg9-4FcLwxe1JEdJ813dwekP-yinYw@mail.gmail.com>
	<CANg26EXMMO059ox_+uXW4riBk2NumLGd6Uk5-LruNLoWbFNDsQ@mail.gmail.com>
	<CAKGHGPS-HTbE5Mw8zCbOsbtCVH=uStu5NttJM=JGTACtpngHeQ@mail.gmail.com>
Message-ID: <CAKGHGPSFO3Az_KBMgvfTRQKX+_ddGPH=JPqxiEPUjocmw55FQg@mail.gmail.com>

2011/10/7 Vitja Makarov <vitja.makarov at gmail.com>:
> 2011/10/7 mark florisson <markflorisson88 at gmail.com>:
>> On 6 October 2011 21:56, Vitja Makarov <vitja.makarov at gmail.com> wrote:
>>> 2011/10/6 mark florisson <markflorisson88 at gmail.com>:
>>>> On 6 October 2011 07:46, Stefan Behnel <stefan_ml at behnel.de> wrote:
>>>>> mark florisson, 05.10.2011 15:53:
>>>>>>
>>>>>> On 5 October 2011 08:16, Stefan Behnel wrote:
>>>>>>>
>>>>>>> mark florisson, 04.10.2011 23:19:
>>>>>>>>
>>>>>>>> Another issue is that Cython compile time is increasing with the
>>>>>>>> addition of control flow and cython utilities. If you use fused types
>>>>>>>> you're also going to combinatorially add more compile time.
>>>>>>>
>>>>>>> I don't see that locally - a compiled Cython is hugely fast for me. In
>>>>>>> comparison, the C compiler literally takes ages to compile the result. An
>>>>>>> external shared library may or may not help with both - in particular, it
>>>>>>> is
>>>>>>> not clear to me what makes the C compiler slow. If the compile time is
>>>>>>> dominated by the number of inlined functions (which is not unlikely), a
>>>>>>> shared library + header file will not make a difference.
>>>>>>
>>>>>> Have you tried with the memoryviews merged?
>>>>>
>>>>> No. I didn't expect the difference to be quite that large.
>>>>>
>>>>>
>>>>>> e.g. if I have this code:
>>>>>>
>>>>>> from libc.stdlib cimport malloc
>>>>>> cdef int[:] slice =<int[:10]> ?<int *> ?malloc(sizeof(int) * 10)
>>>>>>
>>>>>> [0] [14:45] ~ ?? time cython test.pyx
>>>>>> cython test.pyx ?2.61s user 0.08s system 99% cpu 2.695 total
>>>>>> [0] [14:45] ~ ?? time zsh compile
>>>>>> zsh compile ?1.88s user 0.06s system 99% cpu 1.946 total
>>>>>>
>>>>>> where 'compile' is the script that invoked the same gcc command
>>>>>> distutils uses. ?As you can see it took more than 2.5 seconds to
>>>>>> compile this code (simply because the memoryview utilities get
>>>>>> included).
>>>>>
>>>>> Ok, that hints at serious performance problems. Could you profile it to see
>>>>> where the issues are? Is it more that the code is loaded from an external
>>>>> file? Or the fact that more utility code is parsed than necessary?
>>>>
>>>> I haven't profiled it yet (I'll do that), but I'm fairly sure it's the
>>>> parsing of Cython utility files (not the loading). Maybe Tempita also
>>>> adds to the overhead, I'll find out.
>>>>
>>>
>>> Compiling this regex gives 5ms instead of 10ms on my machine
>>>
>>> https://github.com/cython/cython/blob/master/Cython/Compiler/Code.py#L85
>>>
>>> And on your example gives 3% speedup
>>>
>>
>> Sorry, which code gets you 10ms? Also, is this about loading + regex
>> matching, or just about compiling the pattern?
>>
>
> I've added decorator for load_utilities_from_file that prints time for
> current call and total sum for this function and total gives 10ms.
>
> Btw that's not that much.
>
>

Here is small comparison on compiling urllib.py with cython:

((e8527c5...)) vitja at mchome:~/work/cython-vitek-git/zzz$ time python
../cython.py urllib.py

real	0m1.699s
user	0m1.650s
sys	0m0.040s
(master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python
../cython.py urllib.py

real	0m2.830s
user	0m2.790s
sys	0m0.030s


It's about 1.5 times slower.

-- 
vitja.

From stefan_ml at behnel.de  Fri Oct  7 09:41:34 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 07 Oct 2011 09:41:34 +0200
Subject: [Cython] Utilities, cython.h, libcython
In-Reply-To: <CAKGHGPSFO3Az_KBMgvfTRQKX+_ddGPH=JPqxiEPUjocmw55FQg@mail.gmail.com>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>	<4E8C0448.6010204@behnel.de>	<CANg26EWEjJh-uuFMUaxQAOQ6bKSdmG70Y6CxrzBjOQb_XPhHxA@mail.gmail.com>	<4E8D4EDB.2090009@behnel.de>	<CANg26EVyByqdWuV-Lwksz-qbFDdDd=SZ9bS=FNhnON2ZaHMX1g@mail.gmail.com>	<CAKGHGPRt_qmyOv3yNdDweg9-4FcLwxe1JEdJ813dwekP-yinYw@mail.gmail.com>	<CANg26EXMMO059ox_+uXW4riBk2NumLGd6Uk5-LruNLoWbFNDsQ@mail.gmail.com>	<CAKGHGPS-HTbE5Mw8zCbOsbtCVH=uStu5NttJM=JGTACtpngHeQ@mail.gmail.com>
	<CAKGHGPSFO3Az_KBMgvfTRQKX+_ddGPH=JPqxiEPUjocmw55FQg@mail.gmail.com>
Message-ID: <4E8EAD2E.8040701@behnel.de>

Vitja Makarov, 06.10.2011 23:12:
> Here is small comparison on compiling urllib.py with cython:
>
> ((e8527c5...)) vitja at mchome:~/work/cython-vitek-git/zzz$ time python
> ../cython.py urllib.py
>
> real	0m1.699s
> user	0m1.650s
> sys	0m0.040s
> (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python
> ../cython.py urllib.py
>
> real	0m2.830s
> user	0m2.790s
> sys	0m0.030s
>
>
> It's about 1.5 times slower.

I assume this uses a compiled Cython? That's a pretty serious regression 
for plain Python code then. Again, this needs proper profiling.

We may also want to disable certain steps in the pipeline based on the 
syntax features used. If a feature is not used that has its own (set of) 
visitors, we can disable them completely. Detection already happens based 
on the .pyx/.py distinction, but could additionally use a detector (e.g. in 
the post-parse phase) that sets up skip flags. One example is the closure 
building step, which could be skipped if there are no closures.

Stefan

From vitja.makarov at gmail.com  Fri Oct  7 10:11:42 2011
From: vitja.makarov at gmail.com (Vitja Makarov)
Date: Fri, 7 Oct 2011 12:11:42 +0400
Subject: [Cython] Utilities, cython.h, libcython
In-Reply-To: <4E8EAD2E.8040701@behnel.de>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
	<4E8C0448.6010204@behnel.de>
	<CANg26EWEjJh-uuFMUaxQAOQ6bKSdmG70Y6CxrzBjOQb_XPhHxA@mail.gmail.com>
	<4E8D4EDB.2090009@behnel.de>
	<CANg26EVyByqdWuV-Lwksz-qbFDdDd=SZ9bS=FNhnON2ZaHMX1g@mail.gmail.com>
	<CAKGHGPRt_qmyOv3yNdDweg9-4FcLwxe1JEdJ813dwekP-yinYw@mail.gmail.com>
	<CANg26EXMMO059ox_+uXW4riBk2NumLGd6Uk5-LruNLoWbFNDsQ@mail.gmail.com>
	<CAKGHGPS-HTbE5Mw8zCbOsbtCVH=uStu5NttJM=JGTACtpngHeQ@mail.gmail.com>
	<CAKGHGPSFO3Az_KBMgvfTRQKX+_ddGPH=JPqxiEPUjocmw55FQg@mail.gmail.com>
	<4E8EAD2E.8040701@behnel.de>
Message-ID: <CAKGHGPRWtHC5FO6uO_=EH8oQHb77VgUY8uUpzLASZhzP-G3faA@mail.gmail.com>

2011/10/7 Stefan Behnel <stefan_ml at behnel.de>:
> Vitja Makarov, 06.10.2011 23:12:
>>
>> Here is small comparison on compiling urllib.py with cython:
>>
>> ((e8527c5...)) vitja at mchome:~/work/cython-vitek-git/zzz$ time python
>> ../cython.py urllib.py
>>
>> real ? ?0m1.699s
>> user ? ?0m1.650s
>> sys ? ? 0m0.040s
>> (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python
>> ../cython.py urllib.py
>>
>> real ? ?0m2.830s
>> user ? ?0m2.790s
>> sys ? ? 0m0.030s
>>
>>
>> It's about 1.5 times slower.
>
> I assume this uses a compiled Cython? That's a pretty serious regression for
> plain Python code then. Again, this needs proper profiling.
>

No, that was pure python cython.

> We may also want to disable certain steps in the pipeline based on the
> syntax features used. If a feature is not used that has its own (set of)
> visitors, we can disable them completely. Detection already happens based on
> the .pyx/.py distinction, but could additionally use a detector (e.g. in the
> post-parse phase) that sets up skip flags. One example is the closure
> building step, which could be skipped if there are no closures.
>


One more think I've found is that many unused utilities are loaded.

-- 
vitja.

From vitja.makarov at gmail.com  Fri Oct  7 18:01:02 2011
From: vitja.makarov at gmail.com (Vitja Makarov)
Date: Fri, 7 Oct 2011 20:01:02 +0400
Subject: [Cython] Utilities, cython.h, libcython
In-Reply-To: <CAKGHGPRWtHC5FO6uO_=EH8oQHb77VgUY8uUpzLASZhzP-G3faA@mail.gmail.com>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
	<4E8C0448.6010204@behnel.de>
	<CANg26EWEjJh-uuFMUaxQAOQ6bKSdmG70Y6CxrzBjOQb_XPhHxA@mail.gmail.com>
	<4E8D4EDB.2090009@behnel.de>
	<CANg26EVyByqdWuV-Lwksz-qbFDdDd=SZ9bS=FNhnON2ZaHMX1g@mail.gmail.com>
	<CAKGHGPRt_qmyOv3yNdDweg9-4FcLwxe1JEdJ813dwekP-yinYw@mail.gmail.com>
	<CANg26EXMMO059ox_+uXW4riBk2NumLGd6Uk5-LruNLoWbFNDsQ@mail.gmail.com>
	<CAKGHGPS-HTbE5Mw8zCbOsbtCVH=uStu5NttJM=JGTACtpngHeQ@mail.gmail.com>
	<CAKGHGPSFO3Az_KBMgvfTRQKX+_ddGPH=JPqxiEPUjocmw55FQg@mail.gmail.com>
	<4E8EAD2E.8040701@behnel.de>
	<CAKGHGPRWtHC5FO6uO_=EH8oQHb77VgUY8uUpzLASZhzP-G3faA@mail.gmail.com>
Message-ID: <CAKGHGPQV1djXKwYhcjN-y_KVp-Befp2t-Q78PSOFYXYq7hP3cw@mail.gmail.com>

2011/10/7 Vitja Makarov <vitja.makarov at gmail.com>:
> 2011/10/7 Stefan Behnel <stefan_ml at behnel.de>:
>> Vitja Makarov, 06.10.2011 23:12:
>>>
>>> Here is small comparison on compiling urllib.py with cython:
>>>
>>> ((e8527c5...)) vitja at mchome:~/work/cython-vitek-git/zzz$ time python
>>> ../cython.py urllib.py
>>>
>>> real ? ?0m1.699s
>>> user ? ?0m1.650s
>>> sys ? ? 0m0.040s
>>> (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python
>>> ../cython.py urllib.py
>>>
>>> real ? ?0m2.830s
>>> user ? ?0m2.790s
>>> sys ? ? 0m0.030s
>>>
>>>
>>> It's about 1.5 times slower.
>>
>> I assume this uses a compiled Cython? That's a pretty serious regression for
>> plain Python code then. Again, this needs proper profiling.
>>
>
> No, that was pure python cython.
>

I've added return statement on top of CythonScope.test_cythonscope,
now I have these timings:

(master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python
../cython.py urllib.py

real	0m1.764s
user	0m1.700s
sys	0m0.060s


>> We may also want to disable certain steps in the pipeline based on the
>> syntax features used. If a feature is not used that has its own (set of)
>> visitors, we can disable them completely. Detection already happens based on
>> the .pyx/.py distinction, but could additionally use a detector (e.g. in the
>> post-parse phase) that sets up skip flags. One example is the closure
>> building step, which could be skipped if there are no closures.
>>
>
>
> One more think I've found is that many unused utilities are loaded.
>


-- 
vitja.

From stefan_ml at behnel.de  Sat Oct  8 09:03:50 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 08 Oct 2011 09:03:50 +0200
Subject: [Cython] compiler performance issue for extended utility code
In-Reply-To: <CAKGHGPQV1djXKwYhcjN-y_KVp-Befp2t-Q78PSOFYXYq7hP3cw@mail.gmail.com>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>	<4E8C0448.6010204@behnel.de>	<CANg26EWEjJh-uuFMUaxQAOQ6bKSdmG70Y6CxrzBjOQb_XPhHxA@mail.gmail.com>	<4E8D4EDB.2090009@behnel.de>	<CANg26EVyByqdWuV-Lwksz-qbFDdDd=SZ9bS=FNhnON2ZaHMX1g@mail.gmail.com>	<CAKGHGPRt_qmyOv3yNdDweg9-4FcLwxe1JEdJ813dwekP-yinYw@mail.gmail.com>	<CANg26EXMMO059ox_+uXW4riBk2NumLGd6Uk5-LruNLoWbFNDsQ@mail.gmail.com>	<CAKGHGPS-HTbE5Mw8zCbOsbtCVH=uStu5NttJM=JGTACtpngHeQ@mail.gmail.com>	<CAKGHGPSFO3Az_KBMgvfTRQKX+_ddGPH=JPqxiEPUjocmw55FQg@mail.gmail.com>	<4E8EAD2E.8040701@behnel.de>	<CAKGHGPRWtHC5FO6uO_=EH8oQHb77VgUY8uUpzLASZhzP-G3faA@mail.gmail.com>
	<CAKGHGPQV1djXKwYhcjN-y_KVp-Befp2t-Q78PSOFYXYq7hP3cw@mail.gmail.com>
Message-ID: <4E8FF5D6.4070104@behnel.de>

Vitja Makarov, 07.10.2011 18:01:
>> 2011/10/7 Stefan Behnel:
>>> Vitja Makarov, 06.10.2011 23:12:
>>>>
>>>> Here is small comparison on compiling urllib.py with cython:
>>>>
>>>> ((e8527c5...)) vitja at mchome:~/work/cython-vitek-git/zzz$ time python
>>>> ../cython.py urllib.py
>>>>
>>>> real    0m1.699s
>>>> user    0m1.650s
>>>> sys     0m0.040s
>>>> (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python
>>>> ../cython.py urllib.py
>>>>
>>>> real    0m2.830s
>>>> user    0m2.790s
>>>> sys     0m0.030s
>>>>
>>>>
>>>> It's about 1.5 times slower.
>>>
>>> That's a pretty serious regression for
>>> plain Python code then. Again, this needs proper profiling.
>
> I've added return statement on top of CythonScope.test_cythonscope,
> now I have these timings:
>
> (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python
> ../cython.py urllib.py
>
> real	0m1.764s
> user	0m1.700s
> sys	0m0.060s

Ok, then it's only a bug. "create_testscope" is on by default in Main.py, 
Context.__init__(). I don't know what it does exactly, but my guess is that 
the option should a) be off by default and b) should rather be passed in by 
the test runner as part of the compile options rather than being a 
parameter of the Context class. AFAICT, it's currently only used in 
TreeFragment.py, where it is being switched off explicitly for parsing code 
snippets.

Stefan

From markflorisson88 at gmail.com  Sat Oct  8 11:22:12 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sat, 8 Oct 2011 10:22:12 +0100
Subject: [Cython] compiler performance issue for extended utility code
In-Reply-To: <4E8FF5D6.4070104@behnel.de>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
	<4E8C0448.6010204@behnel.de>
	<CANg26EWEjJh-uuFMUaxQAOQ6bKSdmG70Y6CxrzBjOQb_XPhHxA@mail.gmail.com>
	<4E8D4EDB.2090009@behnel.de>
	<CANg26EVyByqdWuV-Lwksz-qbFDdDd=SZ9bS=FNhnON2ZaHMX1g@mail.gmail.com>
	<CAKGHGPRt_qmyOv3yNdDweg9-4FcLwxe1JEdJ813dwekP-yinYw@mail.gmail.com>
	<CANg26EXMMO059ox_+uXW4riBk2NumLGd6Uk5-LruNLoWbFNDsQ@mail.gmail.com>
	<CAKGHGPS-HTbE5Mw8zCbOsbtCVH=uStu5NttJM=JGTACtpngHeQ@mail.gmail.com>
	<CAKGHGPSFO3Az_KBMgvfTRQKX+_ddGPH=JPqxiEPUjocmw55FQg@mail.gmail.com>
	<4E8EAD2E.8040701@behnel.de>
	<CAKGHGPRWtHC5FO6uO_=EH8oQHb77VgUY8uUpzLASZhzP-G3faA@mail.gmail.com>
	<CAKGHGPQV1djXKwYhcjN-y_KVp-Befp2t-Q78PSOFYXYq7hP3cw@mail.gmail.com>
	<4E8FF5D6.4070104@behnel.de>
Message-ID: <CANg26EUGsEdfNL5+W8jhOADn=DatSyUdeBZDW8tyZQ+4Y+G3Mg@mail.gmail.com>

On 8 October 2011 08:03, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Vitja Makarov, 07.10.2011 18:01:
>>>
>>> 2011/10/7 Stefan Behnel:
>>>>
>>>> Vitja Makarov, 06.10.2011 23:12:
>>>>>
>>>>> Here is small comparison on compiling urllib.py with cython:
>>>>>
>>>>> ((e8527c5...)) vitja at mchome:~/work/cython-vitek-git/zzz$ time python
>>>>> ../cython.py urllib.py
>>>>>
>>>>> real ? ?0m1.699s
>>>>> user ? ?0m1.650s
>>>>> sys ? ? 0m0.040s
>>>>> (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python
>>>>> ../cython.py urllib.py
>>>>>
>>>>> real ? ?0m2.830s
>>>>> user ? ?0m2.790s
>>>>> sys ? ? 0m0.030s
>>>>>
>>>>>
>>>>> It's about 1.5 times slower.
>>>>
>>>> That's a pretty serious regression for
>>>> plain Python code then. Again, this needs proper profiling.
>>
>> I've added return statement on top of CythonScope.test_cythonscope,
>> now I have these timings:
>>
>> (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python
>> ../cython.py urllib.py
>>
>> real ? ?0m1.764s
>> user ? ?0m1.700s
>> sys ? ? 0m0.060s
>
> Ok, then it's only a bug. "create_testscope" is on by default in Main.py,
> Context.__init__(). I don't know what it does exactly, but my guess is that
> the option should a) be off by default and b) should rather be passed in by
> the test runner as part of the compile options rather than being a parameter
> of the Context class. AFAICT, it's currently only used in TreeFragment.py,
> where it is being switched off explicitly for parsing code snippets.
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

It turns it off to avoid infinite recursion. This basically means that
you cannot use stuf from the Cython scope in your Cython utilities. So
in your Cython utilities, you have to declare the C version of it
(which you declared with the @cname decorator).

This is not really something that can just be avoided loading like
this. Perhaps one solution could be to load the test scope when you do
a lookup in the cython scope for which no entry is found. But really,
libcython and serializing entries will solve all this, so I suppose
the real question is, do we want to do a release before we support
such functionality?
Anyway, the cython scope lookup would be a simple hack worth a try.

From vitja.makarov at gmail.com  Sat Oct  8 14:10:53 2011
From: vitja.makarov at gmail.com (Vitja Makarov)
Date: Sat, 8 Oct 2011 16:10:53 +0400
Subject: [Cython] compiler performance issue for extended utility code
In-Reply-To: <CANg26EUGsEdfNL5+W8jhOADn=DatSyUdeBZDW8tyZQ+4Y+G3Mg@mail.gmail.com>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
	<4E8C0448.6010204@behnel.de>
	<CANg26EWEjJh-uuFMUaxQAOQ6bKSdmG70Y6CxrzBjOQb_XPhHxA@mail.gmail.com>
	<4E8D4EDB.2090009@behnel.de>
	<CANg26EVyByqdWuV-Lwksz-qbFDdDd=SZ9bS=FNhnON2ZaHMX1g@mail.gmail.com>
	<CAKGHGPRt_qmyOv3yNdDweg9-4FcLwxe1JEdJ813dwekP-yinYw@mail.gmail.com>
	<CANg26EXMMO059ox_+uXW4riBk2NumLGd6Uk5-LruNLoWbFNDsQ@mail.gmail.com>
	<CAKGHGPS-HTbE5Mw8zCbOsbtCVH=uStu5NttJM=JGTACtpngHeQ@mail.gmail.com>
	<CAKGHGPSFO3Az_KBMgvfTRQKX+_ddGPH=JPqxiEPUjocmw55FQg@mail.gmail.com>
	<4E8EAD2E.8040701@behnel.de>
	<CAKGHGPRWtHC5FO6uO_=EH8oQHb77VgUY8uUpzLASZhzP-G3faA@mail.gmail.com>
	<CAKGHGPQV1djXKwYhcjN-y_KVp-Befp2t-Q78PSOFYXYq7hP3cw@mail.gmail.com>
	<4E8FF5D6.4070104@behnel.de>
	<CANg26EUGsEdfNL5+W8jhOADn=DatSyUdeBZDW8tyZQ+4Y+G3Mg@mail.gmail.com>
Message-ID: <CAKGHGPQCT6QfujQEj0H9x_uhNrnw9tWuQJK5th2WVFBMpiAz0g@mail.gmail.com>

2011/10/8 mark florisson <markflorisson88 at gmail.com>:
> On 8 October 2011 08:03, Stefan Behnel <stefan_ml at behnel.de> wrote:
>> Vitja Makarov, 07.10.2011 18:01:
>>>>
>>>> 2011/10/7 Stefan Behnel:
>>>>>
>>>>> Vitja Makarov, 06.10.2011 23:12:
>>>>>>
>>>>>> Here is small comparison on compiling urllib.py with cython:
>>>>>>
>>>>>> ((e8527c5...)) vitja at mchome:~/work/cython-vitek-git/zzz$ time python
>>>>>> ../cython.py urllib.py
>>>>>>
>>>>>> real ? ?0m1.699s
>>>>>> user ? ?0m1.650s
>>>>>> sys ? ? 0m0.040s
>>>>>> (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python
>>>>>> ../cython.py urllib.py
>>>>>>
>>>>>> real ? ?0m2.830s
>>>>>> user ? ?0m2.790s
>>>>>> sys ? ? 0m0.030s
>>>>>>
>>>>>>
>>>>>> It's about 1.5 times slower.
>>>>>
>>>>> That's a pretty serious regression for
>>>>> plain Python code then. Again, this needs proper profiling.
>>>
>>> I've added return statement on top of CythonScope.test_cythonscope,
>>> now I have these timings:
>>>
>>> (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python
>>> ../cython.py urllib.py
>>>
>>> real ? ?0m1.764s
>>> user ? ?0m1.700s
>>> sys ? ? 0m0.060s
>>
>> Ok, then it's only a bug. "create_testscope" is on by default in Main.py,
>> Context.__init__(). I don't know what it does exactly, but my guess is that
>> the option should a) be off by default and b) should rather be passed in by
>> the test runner as part of the compile options rather than being a parameter
>> of the Context class. AFAICT, it's currently only used in TreeFragment.py,
>> where it is being switched off explicitly for parsing code snippets.
>>
>> Stefan
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>
> It turns it off to avoid infinite recursion. This basically means that
> you cannot use stuf from the Cython scope in your Cython utilities. So
> in your Cython utilities, you have to declare the C version of it
> (which you declared with the @cname decorator).
>
> This is not really something that can just be avoided loading like
> this. Perhaps one solution could be to load the test scope when you do
> a lookup in the cython scope for which no entry is found. But really,
> libcython and serializing entries will solve all this, so I suppose
> the real question is, do we want to do a release before we support
> such functionality?
> Anyway, the cython scope lookup would be a simple hack worth a try.
>

Does utility code supports something like dependencies? And could that
help here?

I've also noticed that some utilities are loaded unconditionally
perhaps it's better to introduce lazy loading.

-- 
vitja.

From stefan_ml at behnel.de  Sat Oct  8 14:25:25 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 08 Oct 2011 14:25:25 +0200
Subject: [Cython] Any news from the IronPython port?
In-Reply-To: <CADiQ+QCa+0kf3d4zFtt0SBF96cO5JUVXM4zsN-jetZmeBRuSxA@mail.gmail.com>
References: <4E23D558.5000104@behnel.de>	<CACBMVRgWLRXo+SWqK1nw8uWnjUif4L1VXN7hVSv4hgUs7-DOaw@mail.gmail.com>
	<CADiQ+QCa+0kf3d4zFtt0SBF96cO5JUVXM4zsN-jetZmeBRuSxA@mail.gmail.com>
Message-ID: <4E904135.6090708@behnel.de>

Robert Bradshaw, 19.07.2011 05:57:
> On Mon, Jul 18, 2011 at 7:45 AM, Jason McCampbell wrote:
>> Definitely not buried for good, though we haven't made a lot of changes
>> recently. :)  We used it for porting SciPy to .NET and re-wrote a large
>> number of the SciPy C module implementations in Cython.  It is generally
>> stable and produces good code within the set of features that were needed
>> (by no means has feature parity with the CPython version).
>> In general, I have been quite happy with the results given that it is
>> possible to generate interfaces for two Python implementations from a single
>> source.  Of course, it is not free.  One can, in general, not take a
>> NumPy-heavy Cython file and just generate source code for IronPython.
>>   Because IronPython and NumPy for .NET do not share any common C APIs we had
>> to wrap some of the APIs and in other cases switch to using Python notation
>> and/or call the new Python-independent NumPy core API (present only in the
>> refactored version).
>> Overall, I think it's a good start and holds some promise for generating
>> re-targetable native wrappings, but there is still plenty of work to do to
>> make it more accessible.
>> Regards,
>> Jason
>
> Thanks for the status update--is the code available somewhere (e.g. as
> a forked git repo)? Is it something that would be worth merging, or at
> this point is it mostly hacked up to just do what you need it to for
> SciPy?

The code is here:

https://bitbucket.org/cwitty/cython-for-ironpython/overview

No idea what the status is, but it hasn't been updated for a while.

Stefan

From markflorisson88 at gmail.com  Sat Oct  8 15:18:27 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sat, 8 Oct 2011 14:18:27 +0100
Subject: [Cython] compiler performance issue for extended utility code
In-Reply-To: <CAKGHGPQCT6QfujQEj0H9x_uhNrnw9tWuQJK5th2WVFBMpiAz0g@mail.gmail.com>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
	<4E8C0448.6010204@behnel.de>
	<CANg26EWEjJh-uuFMUaxQAOQ6bKSdmG70Y6CxrzBjOQb_XPhHxA@mail.gmail.com>
	<4E8D4EDB.2090009@behnel.de>
	<CANg26EVyByqdWuV-Lwksz-qbFDdDd=SZ9bS=FNhnON2ZaHMX1g@mail.gmail.com>
	<CAKGHGPRt_qmyOv3yNdDweg9-4FcLwxe1JEdJ813dwekP-yinYw@mail.gmail.com>
	<CANg26EXMMO059ox_+uXW4riBk2NumLGd6Uk5-LruNLoWbFNDsQ@mail.gmail.com>
	<CAKGHGPS-HTbE5Mw8zCbOsbtCVH=uStu5NttJM=JGTACtpngHeQ@mail.gmail.com>
	<CAKGHGPSFO3Az_KBMgvfTRQKX+_ddGPH=JPqxiEPUjocmw55FQg@mail.gmail.com>
	<4E8EAD2E.8040701@behnel.de>
	<CAKGHGPRWtHC5FO6uO_=EH8oQHb77VgUY8uUpzLASZhzP-G3faA@mail.gmail.com>
	<CAKGHGPQV1djXKwYhcjN-y_KVp-Befp2t-Q78PSOFYXYq7hP3cw@mail.gmail.com>
	<4E8FF5D6.4070104@behnel.de>
	<CANg26EUGsEdfNL5+W8jhOADn=DatSyUdeBZDW8tyZQ+4Y+G3Mg@mail.gmail.com>
	<CAKGHGPQCT6QfujQEj0H9x_uhNrnw9tWuQJK5th2WVFBMpiAz0g@mail.gmail.com>
Message-ID: <CANg26EXFjJQRoWdLK2CzB3znv7de_eEfEzw6Uvn3868g5Px0Zg@mail.gmail.com>

On 8 October 2011 13:10, Vitja Makarov <vitja.makarov at gmail.com> wrote:
> 2011/10/8 mark florisson <markflorisson88 at gmail.com>:
>> On 8 October 2011 08:03, Stefan Behnel <stefan_ml at behnel.de> wrote:
>>> Vitja Makarov, 07.10.2011 18:01:
>>>>>
>>>>> 2011/10/7 Stefan Behnel:
>>>>>>
>>>>>> Vitja Makarov, 06.10.2011 23:12:
>>>>>>>
>>>>>>> Here is small comparison on compiling urllib.py with cython:
>>>>>>>
>>>>>>> ((e8527c5...)) vitja at mchome:~/work/cython-vitek-git/zzz$ time python
>>>>>>> ../cython.py urllib.py
>>>>>>>
>>>>>>> real ? ?0m1.699s
>>>>>>> user ? ?0m1.650s
>>>>>>> sys ? ? 0m0.040s
>>>>>>> (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python
>>>>>>> ../cython.py urllib.py
>>>>>>>
>>>>>>> real ? ?0m2.830s
>>>>>>> user ? ?0m2.790s
>>>>>>> sys ? ? 0m0.030s
>>>>>>>
>>>>>>>
>>>>>>> It's about 1.5 times slower.
>>>>>>
>>>>>> That's a pretty serious regression for
>>>>>> plain Python code then. Again, this needs proper profiling.
>>>>
>>>> I've added return statement on top of CythonScope.test_cythonscope,
>>>> now I have these timings:
>>>>
>>>> (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python
>>>> ../cython.py urllib.py
>>>>
>>>> real ? ?0m1.764s
>>>> user ? ?0m1.700s
>>>> sys ? ? 0m0.060s
>>>
>>> Ok, then it's only a bug. "create_testscope" is on by default in Main.py,
>>> Context.__init__(). I don't know what it does exactly, but my guess is that
>>> the option should a) be off by default and b) should rather be passed in by
>>> the test runner as part of the compile options rather than being a parameter
>>> of the Context class. AFAICT, it's currently only used in TreeFragment.py,
>>> where it is being switched off explicitly for parsing code snippets.
>>>
>>> Stefan
>>> _______________________________________________
>>> cython-devel mailing list
>>> cython-devel at python.org
>>> http://mail.python.org/mailman/listinfo/cython-devel
>>>
>>
>> It turns it off to avoid infinite recursion. This basically means that
>> you cannot use stuf from the Cython scope in your Cython utilities. So
>> in your Cython utilities, you have to declare the C version of it
>> (which you declared with the @cname decorator).
>>
>> This is not really something that can just be avoided loading like
>> this. Perhaps one solution could be to load the test scope when you do
>> a lookup in the cython scope for which no entry is found. But really,
>> libcython and serializing entries will solve all this, so I suppose
>> the real question is, do we want to do a release before we support
>> such functionality?
>> Anyway, the cython scope lookup would be a simple hack worth a try.
>>
>
> Does utility code supports something like dependencies? And could that
> help here?

Yeah they can have dependencies like normal UtilitieCodes.

> I've also noticed that some utilities are loaded unconditionally
> perhaps it's better to introduce lazy loading.

Well, they shouldn't be. If they are it's generally a bug. I noticed
that it happens in the test runner though, although it should create a
fresh context with freshly initialized entries.

> --
> vitja.
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From vitja.makarov at gmail.com  Sun Oct  9 07:12:33 2011
From: vitja.makarov at gmail.com (Vitja Makarov)
Date: Sun, 9 Oct 2011 09:12:33 +0400
Subject: [Cython] Can't login into my trac account
In-Reply-To: <CADiQ+QDJr7bn2r4kZo+o+ozwgrPWgM_7oy6htXp2cDPRiSrAyw@mail.gmail.com>
References: <CAKGHGPQm6vp=NAwZXJ=qgWt1Ccf7KWRJzF8YmdDjBTrgDH-HxQ@mail.gmail.com>
	<CADiQ+QDJr7bn2r4kZo+o+ozwgrPWgM_7oy6htXp2cDPRiSrAyw@mail.gmail.com>
Message-ID: <CAKGHGPRYvhYdieE4XFrdSXDywkr-NxyK_4NRBLp+W4e4vuN-rA@mail.gmail.com>

Hi! Any news here?

2011/9/28 Robert Bradshaw <robertwb at math.washington.edu>:
> I can't log in either, though I haven't had a chance to investigate.
>
> On Tue, Sep 27, 2011 at 9:41 AM, Vitja Makarov <vitja.makarov at gmail.com> wrote:
>> Hi!
>>
>> Today I found that I can't login into my trac account. Is that common
>> problem or only mine?
>>
>> --
>> vitja.
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>


-- 
vitja.

From markflorisson88 at gmail.com  Sun Oct  9 12:19:34 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sun, 9 Oct 2011 11:19:34 +0100
Subject: [Cython] compiler performance issue for extended utility code
In-Reply-To: <CANg26EUGsEdfNL5+W8jhOADn=DatSyUdeBZDW8tyZQ+4Y+G3Mg@mail.gmail.com>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
	<4E8C0448.6010204@behnel.de>
	<CANg26EWEjJh-uuFMUaxQAOQ6bKSdmG70Y6CxrzBjOQb_XPhHxA@mail.gmail.com>
	<4E8D4EDB.2090009@behnel.de>
	<CANg26EVyByqdWuV-Lwksz-qbFDdDd=SZ9bS=FNhnON2ZaHMX1g@mail.gmail.com>
	<CAKGHGPRt_qmyOv3yNdDweg9-4FcLwxe1JEdJ813dwekP-yinYw@mail.gmail.com>
	<CANg26EXMMO059ox_+uXW4riBk2NumLGd6Uk5-LruNLoWbFNDsQ@mail.gmail.com>
	<CAKGHGPS-HTbE5Mw8zCbOsbtCVH=uStu5NttJM=JGTACtpngHeQ@mail.gmail.com>
	<CAKGHGPSFO3Az_KBMgvfTRQKX+_ddGPH=JPqxiEPUjocmw55FQg@mail.gmail.com>
	<4E8EAD2E.8040701@behnel.de>
	<CAKGHGPRWtHC5FO6uO_=EH8oQHb77VgUY8uUpzLASZhzP-G3faA@mail.gmail.com>
	<CAKGHGPQV1djXKwYhcjN-y_KVp-Befp2t-Q78PSOFYXYq7hP3cw@mail.gmail.com>
	<4E8FF5D6.4070104@behnel.de>
	<CANg26EUGsEdfNL5+W8jhOADn=DatSyUdeBZDW8tyZQ+4Y+G3Mg@mail.gmail.com>
Message-ID: <CANg26EVjdNV-C-xyszGgoN_XbxEPYUrMeWT98+pAj1Dz2_zbdg@mail.gmail.com>

On 8 October 2011 10:22, mark florisson <markflorisson88 at gmail.com> wrote:

> On 8 October 2011 08:03, Stefan Behnel <stefan_ml at behnel.de> wrote:
> > Vitja Makarov, 07.10.2011 18:01:
> >>>
> >>> 2011/10/7 Stefan Behnel:
> >>>>
> >>>> Vitja Makarov, 06.10.2011 23:12:
> >>>>>
> >>>>> Here is small comparison on compiling urllib.py with cython:
> >>>>>
> >>>>> ((e8527c5...)) vitja at mchome:~/work/cython-vitek-git/zzz$ time python
> >>>>> ../cython.py urllib.py
> >>>>>
> >>>>> real    0m1.699s
> >>>>> user    0m1.650s
> >>>>> sys     0m0.040s
> >>>>> (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python
> >>>>> ../cython.py urllib.py
> >>>>>
> >>>>> real    0m2.830s
> >>>>> user    0m2.790s
> >>>>> sys     0m0.030s
> >>>>>
> >>>>>
> >>>>> It's about 1.5 times slower.
> >>>>
> >>>> That's a pretty serious regression for
> >>>> plain Python code then. Again, this needs proper profiling.
> >>
> >> I've added return statement on top of CythonScope.test_cythonscope,
> >> now I have these timings:
> >>
> >> (master) vitja at mchome:~/work/cython-vitek-git/zzz$ time python
> >> ../cython.py urllib.py
> >>
> >> real    0m1.764s
> >> user    0m1.700s
> >> sys     0m0.060s
> >
> > Ok, then it's only a bug. "create_testscope" is on by default in Main.py,
> > Context.__init__(). I don't know what it does exactly, but my guess is
> that
> > the option should a) be off by default and b) should rather be passed in
> by
> > the test runner as part of the compile options rather than being a
> parameter
> > of the Context class. AFAICT, it's currently only used in
> TreeFragment.py,
> > where it is being switched off explicitly for parsing code snippets.
> >
> > Stefan
> > _______________________________________________
> > cython-devel mailing list
> > cython-devel at python.org
> > http://mail.python.org/mailman/listinfo/cython-devel
> >
>
> It turns it off to avoid infinite recursion. This basically means that
> you cannot use stuf from the Cython scope in your Cython utilities. So
> in your Cython utilities, you have to declare the C version of it
> (which you declared with the @cname decorator).
>
> This is not really something that can just be avoided loading like
> this. Perhaps one solution could be to load the test scope when you do
> a lookup in the cython scope for which no entry is found. But really,
> libcython and serializing entries will solve all this, so I suppose
> the real question is, do we want to do a release before we support
> such functionality?
> Anyway, the cython scope lookup would be a simple hack worth a try.
>

I applied the hack, i.e. defer loading the scope until the first entry in
the cython scope can't be found:
https://github.com/markflorisson88/cython/commit/ad4cf6303d1bf8a81e3afccc9572559a34827a3b

[0] [11:16] ~  ? time cython urllib.py # conditionally load scope
cython urllib.py  2.75s user 0.14s system 99% cpu 2.893 total
[0] [11:17] ~  ? time cython urllib.py # always load scope
cython urllib.py  4.08s user 0.16s system 99% cpu 4.239 total
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20111009/0436bf6f/attachment.html>

From markflorisson88 at gmail.com  Sun Oct  9 14:11:56 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sun, 9 Oct 2011 13:11:56 +0100
Subject: [Cython] cython.parallel tasks, single, master, critical, barriers
Message-ID: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>

Hey,

So far people have been enthusiastic about the cython.parallel features, I
think we should introduce some new features. I propose the following, assume
parallel has been imported from cython:

with parallel.master():
    this is executed in the master thread in a parallel (non-prange) section

with parallel.single():
   same as master, except any thread may do the execution

An optional keyword argument 'nowait' specifies whether there will be a
barrier at the end. The default is to wait.

with parallel.task():
    create a task to be executed by some thread in the team
    once a thread takes up the task it shall only be executed by that thread
and no other thread (so the task will be tied to the thread)

    C variables will be firstprivate
    Python objects will be shared

parallel.taskwait() # wait on any direct descendent tasks to finish

with parallel.critical():
    this section of code is mutually exclusive with other critical sections

    optional keyword argument 'name' specifies a name for the critical
section,
    which means all sections with that name will exclude each other, but not
    critical sections with different names

    Note: all threads that encounter the section will execute it, just not
at the same time

with parallel.barrier():
    all threads wait until everyone has reached the barrier
    either no one or everyone should encounter the barrier
    shared variables are flushed

Unfortunately, gcc again manages to horribly break master and single
constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll first
file a bug report. Other (better) compilers like Portland (and I'm sure
Intel) work fine. I suppose a warning in the documentation will suffice
there.

If we at some point implement vector/SIMD operations we could also try out
the Fortran openmp workshare construct.

What do you guys think?

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20111009/5c988f53/attachment.html>

From d.s.seljebotn at astro.uio.no  Sun Oct  9 14:18:08 2011
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Sun, 09 Oct 2011 14:18:08 +0200
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
Message-ID: <4E919100.8020801@astro.uio.no>

On 10/09/2011 02:11 PM, mark florisson wrote:
> Hey,
>
> So far people have been enthusiastic about the cython.parallel features,
> I think we should introduce some new features. I propose the following,

Great!!

I only have time for a very short feedback now, perhaps more will follow.

> assume parallel has been imported from cython:
>
> with parallel.master():
>      this is executed in the master thread in a parallel (non-prange)
> section
>
> with parallel.single():
>     same as master, except any thread may do the execution
>
> An optional keyword argument 'nowait' specifies whether there will be a
> barrier at the end. The default is to wait.
>
> with parallel.task():
>      create a task to be executed by some thread in the team
>      once a thread takes up the task it shall only be executed by that
> thread and no other thread (so the task will be tied to the thread)
>
>      C variables will be firstprivate
>      Python objects will be shared
>
> parallel.taskwait() # wait on any direct descendent tasks to finish

Regarding tasks, I think this is mapping OpenMP too close to Python. 
Closures are excellent for the notion of a task, so I think something 
based on the futures API would work better. I realize that makes the 
mapping to OpenMP and implementation a bit more difficult, but I think 
it is worth it in the long run.

>
> with parallel.critical():
>      this section of code is mutually exclusive with other critical sections
>      optional keyword argument 'name' specifies a name for the critical
> section,
>      which means all sections with that name will exclude each other,
> but not
>      critical sections with different names
>
>      Note: all threads that encounter the section will execute it, just
> not at the same time
>
> with parallel.barrier():
>      all threads wait until everyone has reached the barrier
>      either no one or everyone should encounter the barrier
>      shared variables are flushed
>
> Unfortunately, gcc again manages to horribly break master and single
> constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll
> first file a bug report. Other (better) compilers like Portland (and I'm
> sure Intel) work fine. I suppose a warning in the documentation will
> suffice there.
>
> If we at some point implement vector/SIMD operations we could also try
> out the Fortran openmp workshare construct.

I'm starting to learn myself OpenCL as part of a course. It's very neat 
for some kinds of parallelism. What I'm saying is that at least of the 
case of SIMD, we should not lock ourselves to Fortran+OpenMP thinking 
too early, but also look forward to coming architectures (e.g., AMD's 
GPU-and-CPU on same die design).

Dag Sverre

From d.s.seljebotn at astro.uio.no  Sun Oct  9 14:57:36 2011
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Sun, 09 Oct 2011 14:57:36 +0200
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <4E919100.8020801@astro.uio.no>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no>
Message-ID: <4E919A40.2090001@astro.uio.no>

On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote:
> On 10/09/2011 02:11 PM, mark florisson wrote:
>> Hey,
>>
>> So far people have been enthusiastic about the cython.parallel features,
>> I think we should introduce some new features. I propose the following,
>
> Great!!
>
> I only have time for a very short feedback now, perhaps more will follow.
>
>> assume parallel has been imported from cython:
>>
>> with parallel.master():
>> this is executed in the master thread in a parallel (non-prange)
>> section
>>
>> with parallel.single():
>> same as master, except any thread may do the execution
>>
>> An optional keyword argument 'nowait' specifies whether there will be a
>> barrier at the end. The default is to wait.

I like

if parallel.is_master():
     ...
explicit_barrier_somehow() # see below

better as a Pythonization. One could easily support is_master to be used 
in other contexts as well, simply by assigning a status flag in the 
master block.

Using an if-test flows much better with Python I feel, but that 
naturally lead to making the barrier explicit. But I like the barrier 
always being explicit, rather than having it as a predicate on all the 
different constructs like in OpenMP....

I'm less sure about single, since making it a function indicates one 
could use it in other contexts and the whole thing becomes too magic 
(since it's tied to the position of invocation). I'm tempted to suggest

for _ in prange(1):
     ...

as our syntax for single.

>>
>> with parallel.task():
>> create a task to be executed by some thread in the team
>> once a thread takes up the task it shall only be executed by that
>> thread and no other thread (so the task will be tied to the thread)
>>
>> C variables will be firstprivate
>> Python objects will be shared
>>
>> parallel.taskwait() # wait on any direct descendent tasks to finish
>
> Regarding tasks, I think this is mapping OpenMP too close to Python.
> Closures are excellent for the notion of a task, so I think something
> based on the futures API would work better. I realize that makes the
> mapping to OpenMP and implementation a bit more difficult, but I think
> it is worth it in the long run.
>
>>
>> with parallel.critical():
>> this section of code is mutually exclusive with other critical sections
>> optional keyword argument 'name' specifies a name for the critical
>> section,
>> which means all sections with that name will exclude each other,
>> but not
>> critical sections with different names
>>
>> Note: all threads that encounter the section will execute it, just
>> not at the same time

Yes, this works well as a with-statement...

..except that it is slightly magic in that it binds to call position 
(unlike anything in Python). I.e. this would be more "correct", or at 
least Pythonic:

with parallel.critical(__file__, __line__):
     ...


>>
>> with parallel.barrier():
>> all threads wait until everyone has reached the barrier
>> either no one or everyone should encounter the barrier
>> shared variables are flushed

I have problems with requiring a noop with block...

I'd much rather write

parallel.barrier()

However, that ties a function call to the place of invocation, and 
suggests that one could do

if rand() > .5:
     barrier()
else:
     i += 3
     barrier()

and have the same barrier in each case. Again,

barrier(__file__, __line__)

gets us purity at the cost of practicality. Another way is the pthreads 
approach (although one may have to use pthread rather then OpenMP to get 
it, unless there are named barriers?):

barrier_a = parallel.barrier()
barrier_b = parallel.barrier()
with parallel:
     barrier_a.wait()
     if rand() > .5:
         barrier_b.wait()
     else:
         i += 3
         barrier_b.wait()


I'm really not sure here.

>>
>> Unfortunately, gcc again manages to horribly break master and single
>> constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll
>> first file a bug report. Other (better) compilers like Portland (and I'm
>> sure Intel) work fine. I suppose a warning in the documentation will
>> suffice there.
>>
>> If we at some point implement vector/SIMD operations we could also try
>> out the Fortran openmp workshare construct.
>
> I'm starting to learn myself OpenCL as part of a course. It's very neat
> for some kinds of parallelism. What I'm saying is that at least of the
> case of SIMD, we should not lock ourselves to Fortran+OpenMP thinking
> too early, but also look forward to coming architectures (e.g., AMD's
> GPU-and-CPU on same die design).
>
> Dag Sverre
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel


From markflorisson88 at gmail.com  Sun Oct  9 15:28:24 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sun, 9 Oct 2011 14:28:24 +0100
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <4E919100.8020801@astro.uio.no>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no>
Message-ID: <CANg26EV1gtkWOugKEJ+AuUkuib1-LFkML=TfHmAhjA+GeC1JdA@mail.gmail.com>

On 9 October 2011 13:18, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
>
> On 10/09/2011 02:11 PM, mark florisson wrote:
>>
>> Hey,
>>
>> So far people have been enthusiastic about the cython.parallel features,
>> I think we should introduce some new features. I propose the following,
>
> Great!!
>
> I only have time for a very short feedback now, perhaps more will follow.
>
>> assume parallel has been imported from cython:
>>
>> with parallel.master():
>> ? ? this is executed in the master thread in a parallel (non-prange)
>> section
>>
>> with parallel.single():
>> ? ?same as master, except any thread may do the execution
>>
>> An optional keyword argument 'nowait' specifies whether there will be a
>> barrier at the end. The default is to wait.
>>
>> with parallel.task():
>> ? ? create a task to be executed by some thread in the team
>> ? ? once a thread takes up the task it shall only be executed by that
>> thread and no other thread (so the task will be tied to the thread)
>>
>> ? ? C variables will be firstprivate
>> ? ? Python objects will be shared
>>
>> parallel.taskwait() # wait on any direct descendent tasks to finish
>
> Regarding tasks, I think this is mapping OpenMP too close to Python. Closures are excellent for the notion of a task, so I think something based on the futures API would work better. I realize that makes the mapping to OpenMP and implementation a bit more difficult, but I think it is worth it in the long run.

Hmm, that would be cool as well. Something like parallel.submit_task(myclosure)?

The problem I see with that is that parallel stuff can't have the GIL,
and you can only have 'def' closures at the moment. I realize that you
won't actually have to use closure support here though, and could just
transform the inner function to OpenMP task code. This would maybe
look inconsistent with other closures though, and you'd also have to
restrict the use of such a closure to parallel.submit_task().

Anyway, perhaps you have a concrete proposal that addresses these problems.

>>
>> with parallel.critical():
>> ? ? this section of code is mutually exclusive with other critical sections
>> ? ? optional keyword argument 'name' specifies a name for the critical
>> section,
>> ? ? which means all sections with that name will exclude each other,
>> but not
>> ? ? critical sections with different names
>>
>> ? ? Note: all threads that encounter the section will execute it, just
>> not at the same time
>>
>> with parallel.barrier():
>> ? ? all threads wait until everyone has reached the barrier
>> ? ? either no one or everyone should encounter the barrier
>> ? ? shared variables are flushed
>>
>> Unfortunately, gcc again manages to horribly break master and single
>> constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll
>> first file a bug report. Other (better) compilers like Portland (and I'm
>> sure Intel) work fine. I suppose a warning in the documentation will
>> suffice there.
>>
>> If we at some point implement vector/SIMD operations we could also try
>> out the Fortran openmp workshare construct.
>
> I'm starting to learn myself OpenCL as part of a course. It's very neat for some kinds of parallelism. What I'm saying is that at least of the case of SIMD, we should not lock ourselves to Fortran+OpenMP thinking too early, but also look forward to coming architectures (e.g., AMD's GPU-and-CPU on same die design).

Oh, definitely. The good thing is that code generation backends
needn't be that hard. If you figure all semantics out in the Python
code you could based on a backend load a different utility template as
a string. It's probably not that easy, but the point is that as long
as your code semantics don't prevent other backends, you keep your
options open.

In the end I want to be able to write a parallel program almost
serially and have Cython compile it to OpenMP, MPI, GPU's or whatever
else I need. At the same time I need to stay in touch with reality, so
it's one step at a time :)

> Dag Sverre
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

From markflorisson88 at gmail.com  Sun Oct  9 15:30:39 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sun, 9 Oct 2011 14:30:39 +0100
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <4E919A40.2090001@astro.uio.no>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
Message-ID: <CANg26EVTcB4D5RmaRsQo0=SQ46h5MZinORhcNs1FQmf5R-o1=A@mail.gmail.com>

On 9 October 2011 13:57, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote:
>>
>> On 10/09/2011 02:11 PM, mark florisson wrote:
>>>
>>> Hey,
>>>
>>> So far people have been enthusiastic about the cython.parallel features,
>>> I think we should introduce some new features. I propose the following,
>>
>> Great!!
>>
>> I only have time for a very short feedback now, perhaps more will follow.
>>
>>> assume parallel has been imported from cython:
>>>
>>> with parallel.master():
>>> this is executed in the master thread in a parallel (non-prange)
>>> section
>>>
>>> with parallel.single():
>>> same as master, except any thread may do the execution
>>>
>>> An optional keyword argument 'nowait' specifies whether there will be a
>>> barrier at the end. The default is to wait.
>
> I like
>
> if parallel.is_master():
> ? ?...
> explicit_barrier_somehow() # see below
>
> better as a Pythonization. One could easily support is_master to be used in
> other contexts as well, simply by assigning a status flag in the master
> block.
>
> Using an if-test flows much better with Python I feel, but that naturally
> lead to making the barrier explicit. But I like the barrier always being
> explicit, rather than having it as a predicate on all the different
> constructs like in OpenMP....

Hmm, that might mean you also want the barrier for a prange in a
parallel to be explicit. I like the 'if' test though, although it
wouldn't make sense for 'single'.

> I'm less sure about single, since making it a function indicates one could
> use it in other contexts and the whole thing becomes too magic (since it's
> tied to the position of invocation). I'm tempted to suggest
>
> for _ in prange(1):
> ? ?...
>
> as our syntax for single.

I think that syntax is absolutely terrible :) Perhaps single is not so
important and one can just use master instead (or, if really needed,
master + a task with the actual work).

>>>
>>> with parallel.task():
>>> create a task to be executed by some thread in the team
>>> once a thread takes up the task it shall only be executed by that
>>> thread and no other thread (so the task will be tied to the thread)
>>>
>>> C variables will be firstprivate
>>> Python objects will be shared
>>>
>>> parallel.taskwait() # wait on any direct descendent tasks to finish
>>
>> Regarding tasks, I think this is mapping OpenMP too close to Python.
>> Closures are excellent for the notion of a task, so I think something
>> based on the futures API would work better. I realize that makes the
>> mapping to OpenMP and implementation a bit more difficult, but I think
>> it is worth it in the long run.
>>
>>>
>>> with parallel.critical():
>>> this section of code is mutually exclusive with other critical sections
>>> optional keyword argument 'name' specifies a name for the critical
>>> section,
>>> which means all sections with that name will exclude each other,
>>> but not
>>> critical sections with different names
>>>
>>> Note: all threads that encounter the section will execute it, just
>>> not at the same time
>
> Yes, this works well as a with-statement...
>
> ..except that it is slightly magic in that it binds to call position (unlike
> anything in Python). I.e. this would be more "correct", or at least
> Pythonic:
>
> with parallel.critical(__file__, __line__):
> ? ?...
>

I'm not entirely sure what you mean here. Critical is really about the
block contained within, not about a position in a file. Not all
threads have to encounter the critical region, and not specifying a
name means you exclude with *all other* unnamed critical sections (not
just this one).

>>>
>>> with parallel.barrier():
>>> all threads wait until everyone has reached the barrier
>>> either no one or everyone should encounter the barrier
>>> shared variables are flushed
>
> I have problems with requiring a noop with block...
>
> I'd much rather write
>
> parallel.barrier()

Although in OpenMP it doesn't have any associated code, but we could
give it those semantics: apply the barrier at the end of the block of
code. The con is that the barrier is at the top while it only affects
leaving the block, you would write:

with parallel.barrier():
    if rand() > .5:
        ...
    else:
        ...
# the barrier is here

> However, that ties a function call to the place of invocation, and suggests
> that one could do
>
> if rand() > .5:
> ? ?barrier()
> else:
> ? ?i += 3
> ? ?barrier()
>
> and have the same barrier in each case. Again,
>
> barrier(__file__, __line__)
>
> gets us purity at the cost of practicality.

In this case (unlike the critical construct), yes. I think a warning
in the docs stating that either all or none of the threads must
encounter the barrier should suffice.

> Another way is the pthreads
> approach (although one may have to use pthread rather then OpenMP to get it,
> unless there are named barriers?):
>
> barrier_a = parallel.barrier()
> barrier_b = parallel.barrier()
> with parallel:
> ? ?barrier_a.wait()
> ? ?if rand() > .5:
> ? ? ? ?barrier_b.wait()
> ? ?else:
> ? ? ? ?i += 3
> ? ? ? ?barrier_b.wait()
>
>
> I'm really not sure here.

I think we should really just say to the user: "dont do this". There
are no named barriers, implementing this wouldn't be easy at all (in
fact, I'm not sure you can specify sane semantics for this if you have
more branches and some do not contain the same barrier). The block
structure for barriers would help here, as blocks are inconvenient to
write:

if C:
    with barrier(): ...
else:
    with barrier(): ...

is just not nice to write, you would instead write

with barrier():
    if C:
        ...
    else:
        ...

>>>
>>> Unfortunately, gcc again manages to horribly break master and single
>>> constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll
>>> first file a bug report. Other (better) compilers like Portland (and I'm
>>> sure Intel) work fine. I suppose a warning in the documentation will
>>> suffice there.
>>>
>>> If we at some point implement vector/SIMD operations we could also try
>>> out the Fortran openmp workshare construct.
>>
>> I'm starting to learn myself OpenCL as part of a course. It's very neat
>> for some kinds of parallelism. What I'm saying is that at least of the
>> case of SIMD, we should not lock ourselves to Fortran+OpenMP thinking
>> too early, but also look forward to coming architectures (e.g., AMD's
>> GPU-and-CPU on same die design).
>>
>> Dag Sverre
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From markflorisson88 at gmail.com  Sun Oct  9 15:39:45 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sun, 9 Oct 2011 14:39:45 +0100
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <CANg26EVTcB4D5RmaRsQo0=SQ46h5MZinORhcNs1FQmf5R-o1=A@mail.gmail.com>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
	<CANg26EVTcB4D5RmaRsQo0=SQ46h5MZinORhcNs1FQmf5R-o1=A@mail.gmail.com>
Message-ID: <CANg26EVKqYKtxtAnHGvguv0guYfd93A=YYspQUck_1CM6daLcw@mail.gmail.com>

On 9 October 2011 14:30, mark florisson <markflorisson88 at gmail.com> wrote:
> On 9 October 2011 13:57, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no> wrote:
>> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote:
>>>
>>> On 10/09/2011 02:11 PM, mark florisson wrote:
>>>>
>>>> Hey,
>>>>
>>>> So far people have been enthusiastic about the cython.parallel features,
>>>> I think we should introduce some new features. I propose the following,
>>>
>>> Great!!
>>>
>>> I only have time for a very short feedback now, perhaps more will follow.
>>>
>>>> assume parallel has been imported from cython:
>>>>
>>>> with parallel.master():
>>>> this is executed in the master thread in a parallel (non-prange)
>>>> section
>>>>
>>>> with parallel.single():
>>>> same as master, except any thread may do the execution
>>>>
>>>> An optional keyword argument 'nowait' specifies whether there will be a
>>>> barrier at the end. The default is to wait.
>>
>> I like
>>
>> if parallel.is_master():
>> ? ?...
>> explicit_barrier_somehow() # see below
>>
>> better as a Pythonization. One could easily support is_master to be used in
>> other contexts as well, simply by assigning a status flag in the master
>> block.
>>
>> Using an if-test flows much better with Python I feel, but that naturally
>> lead to making the barrier explicit. But I like the barrier always being
>> explicit, rather than having it as a predicate on all the different
>> constructs like in OpenMP....
>
> Hmm, that might mean you also want the barrier for a prange in a
> parallel to be explicit. I like the 'if' test though, although it
> wouldn't make sense for 'single'.
>
>> I'm less sure about single, since making it a function indicates one could
>> use it in other contexts and the whole thing becomes too magic (since it's
>> tied to the position of invocation). I'm tempted to suggest
>>
>> for _ in prange(1):
>> ? ?...
>>
>> as our syntax for single.
>
> I think that syntax is absolutely terrible :) Perhaps single is not so
> important and one can just use master instead (or, if really needed,
> master + a task with the actual work).
>
>>>>
>>>> with parallel.task():
>>>> create a task to be executed by some thread in the team
>>>> once a thread takes up the task it shall only be executed by that
>>>> thread and no other thread (so the task will be tied to the thread)
>>>>
>>>> C variables will be firstprivate
>>>> Python objects will be shared
>>>>
>>>> parallel.taskwait() # wait on any direct descendent tasks to finish
>>>
>>> Regarding tasks, I think this is mapping OpenMP too close to Python.
>>> Closures are excellent for the notion of a task, so I think something
>>> based on the futures API would work better. I realize that makes the
>>> mapping to OpenMP and implementation a bit more difficult, but I think
>>> it is worth it in the long run.
>>>
>>>>
>>>> with parallel.critical():
>>>> this section of code is mutually exclusive with other critical sections
>>>> optional keyword argument 'name' specifies a name for the critical
>>>> section,
>>>> which means all sections with that name will exclude each other,
>>>> but not
>>>> critical sections with different names
>>>>
>>>> Note: all threads that encounter the section will execute it, just
>>>> not at the same time
>>
>> Yes, this works well as a with-statement...
>>
>> ..except that it is slightly magic in that it binds to call position (unlike
>> anything in Python). I.e. this would be more "correct", or at least
>> Pythonic:
>>
>> with parallel.critical(__file__, __line__):
>> ? ?...
>>
>
> I'm not entirely sure what you mean here. Critical is really about the
> block contained within, not about a position in a file. Not all
> threads have to encounter the critical region, and not specifying a
> name means you exclude with *all other* unnamed critical sections (not
> just this one).
>
>>>>
>>>> with parallel.barrier():
>>>> all threads wait until everyone has reached the barrier
>>>> either no one or everyone should encounter the barrier
>>>> shared variables are flushed
>>
>> I have problems with requiring a noop with block...
>>
>> I'd much rather write
>>
>> parallel.barrier()
>
> Although in OpenMP it doesn't have any associated code, but we could
> give it those semantics: apply the barrier at the end of the block of
> code. The con is that the barrier is at the top while it only affects
> leaving the block, you would write:
>
> with parallel.barrier():
> ? ?if rand() > .5:
> ? ? ? ?...
> ? ?else:
> ? ? ? ?...
> # the barrier is here
>
>> However, that ties a function call to the place of invocation, and suggests
>> that one could do
>>
>> if rand() > .5:
>> ? ?barrier()
>> else:
>> ? ?i += 3
>> ? ?barrier()
>>
>> and have the same barrier in each case. Again,
>>
>> barrier(__file__, __line__)
>>
>> gets us purity at the cost of practicality.
>
> In this case (unlike the critical construct), yes. I think a warning
> in the docs stating that either all or none of the threads must
> encounter the barrier should suffice.
>
>> Another way is the pthreads
>> approach (although one may have to use pthread rather then OpenMP to get it,
>> unless there are named barriers?):
>>
>> barrier_a = parallel.barrier()
>> barrier_b = parallel.barrier()
>> with parallel:
>> ? ?barrier_a.wait()
>> ? ?if rand() > .5:
>> ? ? ? ?barrier_b.wait()
>> ? ?else:
>> ? ? ? ?i += 3
>> ? ? ? ?barrier_b.wait()
>>
>>
>> I'm really not sure here.
>
> I think we should really just say to the user: "dont do this". There
> are no named barriers, implementing this wouldn't be easy at all (in
> fact, I'm not sure you can specify sane semantics for this if you have
> more branches and some do not contain the same barrier). The block
> structure for barriers would help here, as blocks are inconvenient to
> write:
>
> if C:
> ? ?with barrier(): ...
> else:
> ? ?with barrier(): ...
>
> is just not nice to write, you would instead write
>
> with barrier():
> ? ?if C:
> ? ? ? ?...
> ? ?else:
> ? ? ? ?...

This would also allow one to write

with barrier(), master():
    ...

Basically it's up to the user to use it sensibly. Usually you want a
barrier to ensure that you have a well-defined state set by some code.
One could (correctly) only put the last line of such code in the with
block, but it would make more sense to put all associated code in
there.

If there isn't really any associated code, you could just put 'pass'
in the block.

Does that make sense? I haven't even convinced myself of it yet.

>>>>
>>>> Unfortunately, gcc again manages to horribly break master and single
>>>> constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll
>>>> first file a bug report. Other (better) compilers like Portland (and I'm
>>>> sure Intel) work fine. I suppose a warning in the documentation will
>>>> suffice there.
>>>>
>>>> If we at some point implement vector/SIMD operations we could also try
>>>> out the Fortran openmp workshare construct.
>>>
>>> I'm starting to learn myself OpenCL as part of a course. It's very neat
>>> for some kinds of parallelism. What I'm saying is that at least of the
>>> case of SIMD, we should not lock ourselves to Fortran+OpenMP thinking
>>> too early, but also look forward to coming architectures (e.g., AMD's
>>> GPU-and-CPU on same die design).
>>>
>>> Dag Sverre
>>> _______________________________________________
>>> cython-devel mailing list
>>> cython-devel at python.org
>>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>

Of course, a 'with barrier():' means you can apply it anywhere:

with parallel():
    lots of code

    with barrier():
        single line of code

But the trick for readable programs would be to find the section of code that is

From markflorisson88 at gmail.com  Sun Oct  9 15:44:05 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sun, 9 Oct 2011 14:44:05 +0100
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <CANg26EVKqYKtxtAnHGvguv0guYfd93A=YYspQUck_1CM6daLcw@mail.gmail.com>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
	<CANg26EVTcB4D5RmaRsQo0=SQ46h5MZinORhcNs1FQmf5R-o1=A@mail.gmail.com>
	<CANg26EVKqYKtxtAnHGvguv0guYfd93A=YYspQUck_1CM6daLcw@mail.gmail.com>
Message-ID: <CANg26EV63mfdNHFU0EUABWHWJmruUq_sNxV3d6-tizKdwZSCww@mail.gmail.com>

On 9 October 2011 14:39, mark florisson <markflorisson88 at gmail.com> wrote:
> On 9 October 2011 14:30, mark florisson <markflorisson88 at gmail.com> wrote:
>> On 9 October 2011 13:57, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no> wrote:
>>> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote:
>>>>
>>>> On 10/09/2011 02:11 PM, mark florisson wrote:
>>>>>
>>>>> Hey,
>>>>>
>>>>> So far people have been enthusiastic about the cython.parallel features,
>>>>> I think we should introduce some new features. I propose the following,
>>>>
>>>> Great!!
>>>>
>>>> I only have time for a very short feedback now, perhaps more will follow.
>>>>
>>>>> assume parallel has been imported from cython:
>>>>>
>>>>> with parallel.master():
>>>>> this is executed in the master thread in a parallel (non-prange)
>>>>> section
>>>>>
>>>>> with parallel.single():
>>>>> same as master, except any thread may do the execution
>>>>>
>>>>> An optional keyword argument 'nowait' specifies whether there will be a
>>>>> barrier at the end. The default is to wait.
>>>
>>> I like
>>>
>>> if parallel.is_master():
>>> ? ?...
>>> explicit_barrier_somehow() # see below
>>>
>>> better as a Pythonization. One could easily support is_master to be used in
>>> other contexts as well, simply by assigning a status flag in the master
>>> block.
>>>
>>> Using an if-test flows much better with Python I feel, but that naturally
>>> lead to making the barrier explicit. But I like the barrier always being
>>> explicit, rather than having it as a predicate on all the different
>>> constructs like in OpenMP....
>>
>> Hmm, that might mean you also want the barrier for a prange in a
>> parallel to be explicit. I like the 'if' test though, although it
>> wouldn't make sense for 'single'.
>>
>>> I'm less sure about single, since making it a function indicates one could
>>> use it in other contexts and the whole thing becomes too magic (since it's
>>> tied to the position of invocation). I'm tempted to suggest
>>>
>>> for _ in prange(1):
>>> ? ?...
>>>
>>> as our syntax for single.
>>
>> I think that syntax is absolutely terrible :) Perhaps single is not so
>> important and one can just use master instead (or, if really needed,
>> master + a task with the actual work).
>>
>>>>>
>>>>> with parallel.task():
>>>>> create a task to be executed by some thread in the team
>>>>> once a thread takes up the task it shall only be executed by that
>>>>> thread and no other thread (so the task will be tied to the thread)
>>>>>
>>>>> C variables will be firstprivate
>>>>> Python objects will be shared
>>>>>
>>>>> parallel.taskwait() # wait on any direct descendent tasks to finish
>>>>
>>>> Regarding tasks, I think this is mapping OpenMP too close to Python.
>>>> Closures are excellent for the notion of a task, so I think something
>>>> based on the futures API would work better. I realize that makes the
>>>> mapping to OpenMP and implementation a bit more difficult, but I think
>>>> it is worth it in the long run.
>>>>
>>>>>
>>>>> with parallel.critical():
>>>>> this section of code is mutually exclusive with other critical sections
>>>>> optional keyword argument 'name' specifies a name for the critical
>>>>> section,
>>>>> which means all sections with that name will exclude each other,
>>>>> but not
>>>>> critical sections with different names
>>>>>
>>>>> Note: all threads that encounter the section will execute it, just
>>>>> not at the same time
>>>
>>> Yes, this works well as a with-statement...
>>>
>>> ..except that it is slightly magic in that it binds to call position (unlike
>>> anything in Python). I.e. this would be more "correct", or at least
>>> Pythonic:
>>>
>>> with parallel.critical(__file__, __line__):
>>> ? ?...
>>>
>>
>> I'm not entirely sure what you mean here. Critical is really about the
>> block contained within, not about a position in a file. Not all
>> threads have to encounter the critical region, and not specifying a
>> name means you exclude with *all other* unnamed critical sections (not
>> just this one).
>>
>>>>>
>>>>> with parallel.barrier():
>>>>> all threads wait until everyone has reached the barrier
>>>>> either no one or everyone should encounter the barrier
>>>>> shared variables are flushed
>>>
>>> I have problems with requiring a noop with block...
>>>
>>> I'd much rather write
>>>
>>> parallel.barrier()
>>
>> Although in OpenMP it doesn't have any associated code, but we could
>> give it those semantics: apply the barrier at the end of the block of
>> code. The con is that the barrier is at the top while it only affects
>> leaving the block, you would write:
>>
>> with parallel.barrier():
>> ? ?if rand() > .5:
>> ? ? ? ?...
>> ? ?else:
>> ? ? ? ?...
>> # the barrier is here
>>
>>> However, that ties a function call to the place of invocation, and suggests
>>> that one could do
>>>
>>> if rand() > .5:
>>> ? ?barrier()
>>> else:
>>> ? ?i += 3
>>> ? ?barrier()
>>>
>>> and have the same barrier in each case. Again,
>>>
>>> barrier(__file__, __line__)
>>>
>>> gets us purity at the cost of practicality.
>>
>> In this case (unlike the critical construct), yes. I think a warning
>> in the docs stating that either all or none of the threads must
>> encounter the barrier should suffice.
>>
>>> Another way is the pthreads
>>> approach (although one may have to use pthread rather then OpenMP to get it,
>>> unless there are named barriers?):
>>>
>>> barrier_a = parallel.barrier()
>>> barrier_b = parallel.barrier()
>>> with parallel:
>>> ? ?barrier_a.wait()
>>> ? ?if rand() > .5:
>>> ? ? ? ?barrier_b.wait()
>>> ? ?else:
>>> ? ? ? ?i += 3
>>> ? ? ? ?barrier_b.wait()
>>>
>>>
>>> I'm really not sure here.
>>
>> I think we should really just say to the user: "dont do this". There
>> are no named barriers, implementing this wouldn't be easy at all (in
>> fact, I'm not sure you can specify sane semantics for this if you have
>> more branches and some do not contain the same barrier). The block
>> structure for barriers would help here, as blocks are inconvenient to
>> write:
>>
>> if C:
>> ? ?with barrier(): ...
>> else:
>> ? ?with barrier(): ...
>>
>> is just not nice to write, you would instead write
>>
>> with barrier():
>> ? ?if C:
>> ? ? ? ?...
>> ? ?else:
>> ? ? ? ?...
>
> This would also allow one to write
>
> with barrier(), master():
> ? ?...
>
> Basically it's up to the user to use it sensibly. Usually you want a
> barrier to ensure that you have a well-defined state set by some code.
> One could (correctly) only put the last line of such code in the with
> block, but it would make more sense to put all associated code in
> there.
>
> If there isn't really any associated code, you could just put 'pass'
> in the block.
>
> Does that make sense? I haven't even convinced myself of it yet.
>
>>>>>
>>>>> Unfortunately, gcc again manages to horribly break master and single
>>>>> constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll
>>>>> first file a bug report. Other (better) compilers like Portland (and I'm
>>>>> sure Intel) work fine. I suppose a warning in the documentation will
>>>>> suffice there.
>>>>>
>>>>> If we at some point implement vector/SIMD operations we could also try
>>>>> out the Fortran openmp workshare construct.
>>>>
>>>> I'm starting to learn myself OpenCL as part of a course. It's very neat
>>>> for some kinds of parallelism. What I'm saying is that at least of the
>>>> case of SIMD, we should not lock ourselves to Fortran+OpenMP thinking
>>>> too early, but also look forward to coming architectures (e.g., AMD's
>>>> GPU-and-CPU on same die design).
>>>>
>>>> Dag Sverre
>>>> _______________________________________________
>>>> cython-devel mailing list
>>>> cython-devel at python.org
>>>> http://mail.python.org/mailman/listinfo/cython-devel
>>>
>>> _______________________________________________
>>> cython-devel mailing list
>>> cython-devel at python.org
>>> http://mail.python.org/mailman/listinfo/cython-devel
>>>
>>
>
> Of course, a 'with barrier():' means you can apply it anywhere:
>
> with parallel():
> ? ?lots of code
>
> ? ?with barrier():
> ? ? ? ?single line of code
>
> But the trick for readable programs would be to find the section of code that is
>

It seems I didn't finish my last mail. I wanted to say that readable
programs would try to find a logical block of code which you're
synchronizing on with the barrier.

From stefan_ml at behnel.de  Sun Oct  9 19:35:32 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 09 Oct 2011 19:35:32 +0200
Subject: [Cython] PyCon-DE wrap-up by Kay Hayen
Message-ID: <4E91DB64.9050201@behnel.de>

Hi,

Kay Hayen wrote a blog post about his view of the first PyCon-DE, including 
a bit on the discussions I had with him about Nuitka.

http://www.nuitka.net/blog/2011/10/pycon-de-2011-my-report/

It was interesting to see that Nuitka actually comes from the other side, 
meaning that it tries to be a pure Python compiler, but should at some 
point start to support (Python) type hints for the compiler. Cython made 
static types a language feature from the very beginning and is now fixing 
up the Python compatibility. So both systems will eventually become rather 
similar in what they achieve, with Cython being essentially a superset of 
the feature set of Nuitka due to its additional focus on talking to 
external libraries efficiently and supporting things like parallel loops or 
the PEP-3118 buffer interface.

One of the impressions I took out of the technical discussions with Kay is 
that there isn't really a good reason why Cython should refuse to duplicate 
some of the inner mechanics of CPython for optimisation purposes. Nuitka 
appears to be somewhat more aggressive here, partly because Kay doesn't 
currently care all that much about portability (e.g. to Python 3).

I was previously very opposed to that (you may remember my opposition to 
the list.pop() optimisation), but now I think that we have to fix up the 
generated code for each new major CPython release anyway, so it won't make 
a difference if we have to rework some more of the code because a bit of 
those inner workings changed. They sure won't change for released CPython 
versions anymore, and many implementation details are unlikely enough to 
change for years to come. It's good to continue to be considerate about 
such changes, but some of them may well bring another serious bit of 
performance without introducing real portability risks. Changes like the 
Unicode string restructuring in PEP-393 show that even relying on official 
and long standing parts of the C-API isn't enough to guarantee that code 
still works as expected in new releases, so we may just as well start 
digging deeper.

Stefan

From markflorisson88 at gmail.com  Sun Oct  9 19:57:16 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sun, 9 Oct 2011 18:57:16 +0100
Subject: [Cython] PyCon-DE wrap-up by Kay Hayen
In-Reply-To: <4E91DB64.9050201@behnel.de>
References: <4E91DB64.9050201@behnel.de>
Message-ID: <CANg26EX1+dMDvZfusjR2gXb3y=RyQYJud2Og7HWRO3S4QfiAyA@mail.gmail.com>

On 9 October 2011 18:35, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Hi,
>
> Kay Hayen wrote a blog post about his view of the first PyCon-DE, including
> a bit on the discussions I had with him about Nuitka.
>
> http://www.nuitka.net/blog/2011/10/pycon-de-2011-my-report/
>
> It was interesting to see that Nuitka actually comes from the other side,
> meaning that it tries to be a pure Python compiler, but should at some point
> start to support (Python) type hints for the compiler. Cython made static
> types a language feature from the very beginning and is now fixing up the
> Python compatibility. So both systems will eventually become rather similar
> in what they achieve, with Cython being essentially a superset of the
> feature set of Nuitka due to its additional focus on talking to external
> libraries efficiently and supporting things like parallel loops or the
> PEP-3118 buffer interface.
>
> One of the impressions I took out of the technical discussions with Kay is
> that there isn't really a good reason why Cython should refuse to duplicate
> some of the inner mechanics of CPython for optimisation purposes. Nuitka
> appears to be somewhat more aggressive here, partly because Kay doesn't
> currently care all that much about portability (e.g. to Python 3).

Interesting. What kind of (significant) optimizations could be made by
duplicating code? Do you want to duplicate entire functions or do you
want to inline parts of those?

I actually think we should not get too tied to CPython, e.g. what if
PyPy gets a CPython compatible API, or possibly a subset like PEP 384?

> I was previously very opposed to that (you may remember my opposition to the
> list.pop() optimisation), but now I think that we have to fix up the
> generated code for each new major CPython release anyway, so it won't make a
> difference if we have to rework some more of the code because a bit of those
> inner workings changed. They sure won't change for released CPython versions
> anymore, and many implementation details are unlikely enough to change for
> years to come. It's good to continue to be considerate about such changes,
> but some of them may well bring another serious bit of performance without
> introducing real portability risks. Changes like the Unicode string
> restructuring in PEP-393 show that even relying on official and long
> standing parts of the C-API isn't enough to guarantee that code still works
> as expected in new releases, so we may just as well start digging deeper.
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From jonovik at gmail.com  Sun Oct  9 20:54:01 2011
From: jonovik at gmail.com (Jon Olav Vik)
Date: Sun, 9 Oct 2011 20:54:01 +0200
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <4E919A40.2090001@astro.uio.no>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
Message-ID: <CADgCd12Pm1TumU=Aq4p2jOx1Y=Ww2=1sV+0NpJfAbfthuRcz8w@mail.gmail.com>

On Sun, Oct 9, 2011 at 2:57 PM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
>>> with parallel.single():
>>> same as master, except any thread may do the execution
>>>
>>> An optional keyword argument 'nowait' specifies whether there will be a
>>> barrier at the end. The default is to wait.
>
> I like
>
> if parallel.is_master():
> ? ?...
> explicit_barrier_somehow() # see below
>
> better as a Pythonization. One could easily support is_master to be used in
> other contexts as well, simply by assigning a status flag in the master
> block.
>
> Using an if-test flows much better with Python I feel, but that naturally
> lead to making the barrier explicit. But I like the barrier always being
> explicit, rather than having it as a predicate on all the different
> constructs like in OpenMP....

Personally, I think I'd prefer find context managers as a very
readable way to deal with parallelism, similar to the "threading"
module:

http://docs.python.org/library/threading.html#using-locks-conditions-and-semaphores-in-the-with-statement

From markflorisson88 at gmail.com  Sun Oct  9 21:01:00 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sun, 9 Oct 2011 20:01:00 +0100
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <CADgCd12Pm1TumU=Aq4p2jOx1Y=Ww2=1sV+0NpJfAbfthuRcz8w@mail.gmail.com>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
	<CADgCd12Pm1TumU=Aq4p2jOx1Y=Ww2=1sV+0NpJfAbfthuRcz8w@mail.gmail.com>
Message-ID: <CANg26EV-THPeXKDgzjoMcagRYjPbW1RnSSm4q1ONfhpCYjLGLg@mail.gmail.com>

On 9 October 2011 19:54, Jon Olav Vik <jonovik at gmail.com> wrote:
> On Sun, Oct 9, 2011 at 2:57 PM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no> wrote:
>>>> with parallel.single():
>>>> same as master, except any thread may do the execution
>>>>
>>>> An optional keyword argument 'nowait' specifies whether there will be a
>>>> barrier at the end. The default is to wait.
>>
>> I like
>>
>> if parallel.is_master():
>> ? ?...
>> explicit_barrier_somehow() # see below
>>
>> better as a Pythonization. One could easily support is_master to be used in
>> other contexts as well, simply by assigning a status flag in the master
>> block.
>>
>> Using an if-test flows much better with Python I feel, but that naturally
>> lead to making the barrier explicit. But I like the barrier always being
>> explicit, rather than having it as a predicate on all the different
>> constructs like in OpenMP....
>
> Personally, I think I'd prefer find context managers as a very
> readable way to deal with parallelism, similar to the "threading"
> module:
>
> http://docs.python.org/library/threading.html#using-locks-conditions-and-semaphores-in-the-with-statement
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

Yeah it makes a lot of sense for mutual exclusion, but 'master' really
means "only the master thread executes this peace of code, even though
other threads encounter the same code", which is more akin to 'if'
than 'with'.

From jonovik at gmail.com  Sun Oct  9 22:48:49 2011
From: jonovik at gmail.com (Jon Olav Vik)
Date: Sun, 9 Oct 2011 22:48:49 +0200
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <CANg26EV-THPeXKDgzjoMcagRYjPbW1RnSSm4q1ONfhpCYjLGLg@mail.gmail.com>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
	<CADgCd12Pm1TumU=Aq4p2jOx1Y=Ww2=1sV+0NpJfAbfthuRcz8w@mail.gmail.com>
	<CANg26EV-THPeXKDgzjoMcagRYjPbW1RnSSm4q1ONfhpCYjLGLg@mail.gmail.com>
Message-ID: <CADgCd121mfhEcTUJ6icfZUNKk2a30dRMK3TDYk-uwy2Uki5PFg@mail.gmail.com>

On Sun, Oct 9, 2011 at 9:01 PM, mark florisson
<markflorisson88 at gmail.com> wrote:
> On 9 October 2011 19:54, Jon Olav Vik <jonovik at gmail.com> wrote:
>> Personally, I think I'd prefer context managers as a very
>> readable way to deal with parallelism
>
> Yeah it makes a lot of sense for mutual exclusion, but 'master' really
> means "only the master thread executes this peace of code, even though
> other threads encounter the same code", which is more akin to 'if'
> than 'with'.

I see your point. However, another similarity with "with" statements
as an encapsulated "try..finally" is when there's a barrier at the end
of the block. I can live with some magic if it saves me from having a
boilerplate line of "barrier" everywhere 8-)

From markflorisson88 at gmail.com  Sun Oct  9 23:27:37 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sun, 9 Oct 2011 22:27:37 +0100
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <CADgCd121mfhEcTUJ6icfZUNKk2a30dRMK3TDYk-uwy2Uki5PFg@mail.gmail.com>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
	<CADgCd12Pm1TumU=Aq4p2jOx1Y=Ww2=1sV+0NpJfAbfthuRcz8w@mail.gmail.com>
	<CANg26EV-THPeXKDgzjoMcagRYjPbW1RnSSm4q1ONfhpCYjLGLg@mail.gmail.com>
	<CADgCd121mfhEcTUJ6icfZUNKk2a30dRMK3TDYk-uwy2Uki5PFg@mail.gmail.com>
Message-ID: <CANg26EVhpJbcn63_9T-FQCdVWAi3Div5cnmDQkEzzxCv8iUFog@mail.gmail.com>

On 9 October 2011 21:48, Jon Olav Vik <jonovik at gmail.com> wrote:
> On Sun, Oct 9, 2011 at 9:01 PM, mark florisson
> <markflorisson88 at gmail.com> wrote:
>> On 9 October 2011 19:54, Jon Olav Vik <jonovik at gmail.com> wrote:
>>> Personally, I think I'd prefer context managers as a very
>>> readable way to deal with parallelism
>>
>> Yeah it makes a lot of sense for mutual exclusion, but 'master' really
>> means "only the master thread executes this peace of code, even though
>> other threads encounter the same code", which is more akin to 'if'
>> than 'with'.
>
> I see your point. However, another similarity with "with" statements
> as an encapsulated "try..finally" is when there's a barrier at the end
> of the block. I can live with some magic if it saves me from having a
> boilerplate line of "barrier" everywhere 8-)
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

Hm, indeed. I just noticed that unlike single constructs, master
constructs don't have barriers. Both are also not allowed to be
closely nested in worksharing constructs. I think the single directive
is more useful with respect to tasks, e.g. have a single thread
generate tasks and have other threads waiting at the barrier execute
them. In that sense I suppose 'if parallel.is_master():' makes sense
(no barrier, master thread) and 'with single():' (with barrier, any
thread).

We could still support single in prange though, if we simply have the
master thread execute it ('if (omp_get_thread_num() == 0)') and put a
barrier after the block. This makes me wonder what the point of master
was supposed to be...

From markflorisson88 at gmail.com  Mon Oct 10 10:12:52 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Mon, 10 Oct 2011 09:12:52 +0100
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <CANg26EVhpJbcn63_9T-FQCdVWAi3Div5cnmDQkEzzxCv8iUFog@mail.gmail.com>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
	<CADgCd12Pm1TumU=Aq4p2jOx1Y=Ww2=1sV+0NpJfAbfthuRcz8w@mail.gmail.com>
	<CANg26EV-THPeXKDgzjoMcagRYjPbW1RnSSm4q1ONfhpCYjLGLg@mail.gmail.com>
	<CADgCd121mfhEcTUJ6icfZUNKk2a30dRMK3TDYk-uwy2Uki5PFg@mail.gmail.com>
	<CANg26EVhpJbcn63_9T-FQCdVWAi3Div5cnmDQkEzzxCv8iUFog@mail.gmail.com>
Message-ID: <CANg26EV9tR=az097ZEWdGHa78z_ZsZwb37t4ZB=mWf7m1g+EgQ@mail.gmail.com>

On 9 October 2011 22:27, mark florisson <markflorisson88 at gmail.com> wrote:
>
> On 9 October 2011 21:48, Jon Olav Vik <jonovik at gmail.com> wrote:
> > On Sun, Oct 9, 2011 at 9:01 PM, mark florisson
> > <markflorisson88 at gmail.com> wrote:
> >> On 9 October 2011 19:54, Jon Olav Vik <jonovik at gmail.com> wrote:
> >>> Personally, I think I'd prefer context managers as a very
> >>> readable way to deal with parallelism
> >>
> >> Yeah it makes a lot of sense for mutual exclusion, but 'master' really
> >> means "only the master thread executes this peace of code, even though
> >> other threads encounter the same code", which is more akin to 'if'
> >> than 'with'.
> >
> > I see your point. However, another similarity with "with" statements
> > as an encapsulated "try..finally" is when there's a barrier at the end
> > of the block. I can live with some magic if it saves me from having a
> > boilerplate line of "barrier" everywhere 8-)
> > _______________________________________________
> > cython-devel mailing list
> > cython-devel at python.org
> > http://mail.python.org/mailman/listinfo/cython-devel
> >
>
> Hm, indeed. I just noticed that unlike single constructs, master
> constructs don't have barriers. Both are also not allowed to be
> closely nested in worksharing constructs. I think the single directive
> is more useful with respect to tasks, e.g. have a single thread
> generate tasks and have other threads waiting at the barrier execute
> them. In that sense I suppose 'if parallel.is_master():' makes sense
> (no barrier, master thread) and 'with single():' (with barrier, any
> thread).
>
> We could still support single in prange though, if we simply have the
> master thread execute it ('if (omp_get_thread_num() == 0)') and put a
> barrier after the block. This makes me wonder what the point of master
> was supposed to be...

Scratch that last part about master/single in parallel sections, it
doesn't make sense. It only makes sense if you think of those sections
as tasks you submit that would be immediately taken up by a (certain)
thread. But that's not quite what it means. I do like 'if is_master()'
and 'with single', though.

Another thing we could support is arbitrary reductions. In OpenMP 3.1
you get reduction operators 'and', 'max' and 'min', but it wouldn't be
hard to support arbitrary user functions. e.g.

@cython.reduction
cdef int func(int a, int b):
    ...

for i in prange(...):
    a = func(a, b)

I'm not sure how common this is though. You probably have your
reduction data in an array so you're already using numpy so you'll
likely already have your functionality.

From stefan_ml at behnel.de  Mon Oct 10 10:38:35 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 10 Oct 2011 10:38:35 +0200
Subject: [Cython] PyCon-DE wrap-up by Kay Hayen
In-Reply-To: <CANg26EX1+dMDvZfusjR2gXb3y=RyQYJud2Og7HWRO3S4QfiAyA@mail.gmail.com>
References: <4E91DB64.9050201@behnel.de>
	<CANg26EX1+dMDvZfusjR2gXb3y=RyQYJud2Og7HWRO3S4QfiAyA@mail.gmail.com>
Message-ID: <4E92AF0B.6070905@behnel.de>

mark florisson, 09.10.2011 19:57:
> On 9 October 2011 18:35, Stefan Behnel wrote:
>> One of the impressions I took out of the technical discussions with Kay is
>> that there isn't really a good reason why Cython should refuse to duplicate
>> some of the inner mechanics of CPython for optimisation purposes. Nuitka
>> appears to be somewhat more aggressive here, partly because Kay doesn't
>> currently care all that much about portability (e.g. to Python 3).
>
> Interesting. What kind of (significant) optimizations could be made by
> duplicating code? Do you want to duplicate entire functions or do you
> want to inline parts of those?

I was mainly referring to things like direct access to type/object struct 
fields and little things like that. They can make a difference especially 
in loops, compared to calling into a generic C-API function. For example, 
we could have our own interned implementation of PyDict_Next(). I'm not 
very impressed by the performance of that C-API function - repeated calls 
to GetItem can be faster than looping over a dict with PyDict_Next()!

That being said, I wasn't referring to any specific changes. It was more of 
a general remark about the invisible line that we currently draw in Cython.


> I actually think we should not get too tied to CPython, e.g. what if
> PyPy gets a CPython compatible API

It already implements a part of the C-API:

http://morepypy.blogspot.com/2010/04/using-cpython-extension-modules-with.html

However, if we really want to support it at that level, there's likely more 
to do than just removing low-level optimisations. And that would take the 
normal route that we always use: macros and conditionally compiled inline 
functions. The mere fact that we try to support different targets doesn't 
mean that we should stop optimising for specific targets. The same is true 
for different versions of CPython, where we often use better optimisations 
in newer releases, without sacrificing backwards compatibility.

Personally, I think that supporting PyPy at the Python level is a lot more 
interesting, although it may be easier to get it working at the cpyext level.


> or possibly a subset like PEP 384?

That's currently not very interesting since there are basically no C 
extensions around (generated or hand written) that restrict themselves to 
that API. Supporting it in Cython would mean that we have to rewrite huge 
parts of the generated C code. It's not even clear to me yet that we *can* 
implement all of Cython's features based on PEP 384. For example, fast 
indexing into lists and tuples is basically a no-no in the restricted 
C-API. There are tons of rather unexpected restrictions like this.

Stefan

From markflorisson88 at gmail.com  Mon Oct 10 21:59:11 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Mon, 10 Oct 2011 20:59:11 +0100
Subject: [Cython] PyCon-DE wrap-up by Kay Hayen
In-Reply-To: <4E92AF0B.6070905@behnel.de>
References: <4E91DB64.9050201@behnel.de>
	<CANg26EX1+dMDvZfusjR2gXb3y=RyQYJud2Og7HWRO3S4QfiAyA@mail.gmail.com>
	<4E92AF0B.6070905@behnel.de>
Message-ID: <CANg26EUBUjPxi1vca=DFoNNgO8JLBmMZpcTr8xwrX=n=6V9b0g@mail.gmail.com>

On 10 October 2011 09:38, Stefan Behnel <stefan_ml at behnel.de> wrote:
> mark florisson, 09.10.2011 19:57:
>>
>> On 9 October 2011 18:35, Stefan Behnel wrote:
>>>
>>> One of the impressions I took out of the technical discussions with Kay
>>> is
>>> that there isn't really a good reason why Cython should refuse to
>>> duplicate
>>> some of the inner mechanics of CPython for optimisation purposes. Nuitka
>>> appears to be somewhat more aggressive here, partly because Kay doesn't
>>> currently care all that much about portability (e.g. to Python 3).
>>
>> Interesting. What kind of (significant) optimizations could be made by
>> duplicating code? Do you want to duplicate entire functions or do you
>> want to inline parts of those?
>
> I was mainly referring to things like direct access to type/object struct
> fields and little things like that.

Ah, I see. I suppose that if you do everything through Cython-specific
macros it will be easy to change it at any time and it will make it
easy to experiment with performance as well.

> They can make a difference especially in
> loops, compared to calling into a generic C-API function. For example, we
> could have our own interned implementation of PyDict_Next(). I'm not very
> impressed by the performance of that C-API function - repeated calls to
> GetItem can be faster than looping over a dict with PyDict_Next()!
>
> That being said, I wasn't referring to any specific changes. It was more of
> a general remark about the invisible line that we currently draw in Cython.
>
>
>> I actually think we should not get too tied to CPython, e.g. what if
>> PyPy gets a CPython compatible API
>
> It already implements a part of the C-API:
>
> http://morepypy.blogspot.com/2010/04/using-cpython-extension-modules-with.html
>
> However, if we really want to support it at that level, there's likely more
> to do than just removing low-level optimisations. And that would take the
> normal route that we always use: macros and conditionally compiled inline
> functions. The mere fact that we try to support different targets doesn't
> mean that we should stop optimising for specific targets. The same is true
> for different versions of CPython, where we often use better optimisations
> in newer releases, without sacrificing backwards compatibility.
>
> Personally, I think that supporting PyPy at the Python level is a lot more
> interesting, although it may be easier to get it working at the cpyext
> level.
>

Yeah it's certainly interesting. It might be hard to support things
like cython.parallel and efficient buffer access though. I think
releasing the GIL might not be very easy either, although perhaps that
could be circumvented by factoring the entire nogil block out into a C
function which you call with ctypes.

>> or possibly a subset like PEP 384?
>
> That's currently not very interesting since there are basically no C
> extensions around (generated or hand written) that restrict themselves to
> that API. Supporting it in Cython would mean that we have to rewrite huge
> parts of the generated C code. It's not even clear to me yet that we *can*
> implement all of Cython's features based on PEP 384. For example, fast
> indexing into lists and tuples is basically a no-no in the restricted C-API.
> There are tons of rather unexpected restrictions like this.
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From robertwb at math.washington.edu  Tue Oct 11 08:11:06 2011
From: robertwb at math.washington.edu (Robert Bradshaw)
Date: Mon, 10 Oct 2011 23:11:06 -0700
Subject: [Cython] PyCon-DE wrap-up by Kay Hayen
In-Reply-To: <4E92AF0B.6070905@behnel.de>
References: <4E91DB64.9050201@behnel.de>
	<CANg26EX1+dMDvZfusjR2gXb3y=RyQYJud2Og7HWRO3S4QfiAyA@mail.gmail.com>
	<4E92AF0B.6070905@behnel.de>
Message-ID: <CADiQ+QA9PAaQ+D0OagD4tAjMtnbgodoh_Lk3o4JLNQJ6kEd_8w@mail.gmail.com>

Thanks for the update and link. Sounds like PyCon-DE went well.

On Mon, Oct 10, 2011 at 1:38 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> mark florisson, 09.10.2011 19:57:
>>
>> On 9 October 2011 18:35, Stefan Behnel wrote:
>>>
>>> One of the impressions I took out of the technical discussions with Kay
>>> is
>>> that there isn't really a good reason why Cython should refuse to
>>> duplicate
>>> some of the inner mechanics of CPython for optimisation purposes. Nuitka
>>> appears to be somewhat more aggressive here, partly because Kay doesn't
>>> currently care all that much about portability (e.g. to Python 3).
>>
>> Interesting. What kind of (significant) optimizations could be made by
>> duplicating code? Do you want to duplicate entire functions or do you
>> want to inline parts of those?
>
> I was mainly referring to things like direct access to type/object struct
> fields and little things like that. They can make a difference especially in
> loops, compared to calling into a generic C-API function. For example, we
> could have our own interned implementation of PyDict_Next(). I'm not very
> impressed by the performance of that C-API function - repeated calls to
> GetItem can be faster than looping over a dict with PyDict_Next()!
>
> That being said, I wasn't referring to any specific changes. It was more of
> a general remark about the invisible line that we currently draw in Cython.

CPython, especially the internals, is a slow enough moving target that
I'm not too concerned about reaching into the internals if there is a
clear benefit. If we're flexible enough to support 2.x and 3.x, I
think we can handle 3.(x+1) when it comes.

>> I actually think we should not get too tied to CPython, e.g. what if
>> PyPy gets a CPython compatible API
>
> It already implements a part of the C-API:
>
> http://morepypy.blogspot.com/2010/04/using-cpython-extension-modules-with.html
>
> However, if we really want to support it at that level, there's likely more
> to do than just removing low-level optimisations. And that would take the
> normal route that we always use: macros and conditionally compiled inline
> functions. The mere fact that we try to support different targets doesn't
> mean that we should stop optimising for specific targets.

+1

> The same is true
> for different versions of CPython, where we often use better optimisations
> in newer releases, without sacrificing backwards compatibility.
>
> Personally, I think that supporting PyPy at the Python level is a lot more
> interesting, although it may be easier to get it working at the cpyext
> level.
>
>
>> or possibly a subset like PEP 384?
>
> That's currently not very interesting since there are basically no C
> extensions around (generated or hand written) that restrict themselves to
> that API. Supporting it in Cython would mean that we have to rewrite huge
> parts of the generated C code. It's not even clear to me yet that we *can*
> implement all of Cython's features based on PEP 384. For example, fast
> indexing into lists and tuples is basically a no-no in the restricted C-API.
> There are tons of rather unexpected restrictions like this.

I agree, PEP 384 is a nice idea, but it seems to be a rather lot of
work for an unclear/small benefit (compared to other stuff we could be
doing.)

- Robert

From robertwb at math.washington.edu  Wed Oct 12 09:55:55 2011
From: robertwb at math.washington.edu (Robert Bradshaw)
Date: Wed, 12 Oct 2011 00:55:55 -0700
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <4E919A40.2090001@astro.uio.no>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
Message-ID: <CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>

On Sun, Oct 9, 2011 at 5:57 AM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote:
>>
>> On 10/09/2011 02:11 PM, mark florisson wrote:
>>>
>>> Hey,
>>>
>>> So far people have been enthusiastic about the cython.parallel features,
>>> I think we should introduce some new features.

Excellent. I think this is going to become a killer feature like
buffer support.

>>> I propose the following,
>>
>> Great!!
>>
>> I only have time for a very short feedback now, perhaps more will follow.
>>
>>> assume parallel has been imported from cython:
>>>
>>> with parallel.master():
>>> this is executed in the master thread in a parallel (non-prange)
>>> section
>>>
>>> with parallel.single():
>>> same as master, except any thread may do the execution
>>>
>>> An optional keyword argument 'nowait' specifies whether there will be a
>>> barrier at the end. The default is to wait.
>
> I like
>
> if parallel.is_master():
> ? ?...
> explicit_barrier_somehow() # see below
>
> better as a Pythonization. One could easily support is_master to be used in
> other contexts as well, simply by assigning a status flag in the master
> block.

+1, the if statement feels a lot more natural.

> Using an if-test flows much better with Python I feel, but that naturally
> lead to making the barrier explicit. But I like the barrier always being
> explicit, rather than having it as a predicate on all the different
> constructs like in OpenMP....
>
> I'm less sure about single, since making it a function indicates one could
> use it in other contexts and the whole thing becomes too magic (since it's
> tied to the position of invocation). I'm tempted to suggest
>
> for _ in prange(1):
> ? ?...
>
> as our syntax for single.

The idea here is that you want a block of code executed once,
presumably by the first thread that gets here? I think this could also
be handled by a if statement, perhaps "if parallel.first()" or
something like that. Is there anything special about this construct
that couldn't simply be done by flushing/checking a variable?

>>> with parallel.task():
>>> create a task to be executed by some thread in the team
>>> once a thread takes up the task it shall only be executed by that
>>> thread and no other thread (so the task will be tied to the thread)
>>>
>>> C variables will be firstprivate
>>> Python objects will be shared
>>>
>>> parallel.taskwait() # wait on any direct descendent tasks to finish
>>
>> Regarding tasks, I think this is mapping OpenMP too close to Python.
>> Closures are excellent for the notion of a task, so I think something
>> based on the futures API would work better. I realize that makes the
>> mapping to OpenMP and implementation a bit more difficult, but I think
>> it is worth it in the long run.

It's almost as if you're reading my thoughts. There are much more
natural task APIs, e.g. futures or the way the Python
threading/multiprocessing does things.

>>> with parallel.critical():
>>> this section of code is mutually exclusive with other critical sections
>>> optional keyword argument 'name' specifies a name for the critical
>>> section,
>>> which means all sections with that name will exclude each other,
>>> but not
>>> critical sections with different names
>>>
>>> Note: all threads that encounter the section will execute it, just
>>> not at the same time
>
> Yes, this works well as a with-statement...
>
> ..except that it is slightly magic in that it binds to call position (unlike
> anything in Python). I.e. this would be more "correct", or at least
> Pythonic:
>
> with parallel.critical(__file__, __line__):
> ? ?...

This feels a lot like a lock, which of course fits well with the with
statement.

>>> with parallel.barrier():
>>> all threads wait until everyone has reached the barrier
>>> either no one or everyone should encounter the barrier
>>> shared variables are flushed
>
> I have problems with requiring a noop with block...
>
> I'd much rather write
>
> parallel.barrier()
>
> However, that ties a function call to the place of invocation, and suggests
> that one could do
>
> if rand() > .5:
> ? ?barrier()
> else:
> ? ?i += 3
> ? ?barrier()
>
> and have the same barrier in each case. Again,
>
> barrier(__file__, __line__)
>
> gets us purity at the cost of practicality. Another way is the pthreads
> approach (although one may have to use pthread rather then OpenMP to get it,
> unless there are named barriers?):
>
> barrier_a = parallel.barrier()
> barrier_b = parallel.barrier()
> with parallel:
> ? ?barrier_a.wait()
> ? ?if rand() > .5:
> ? ? ? ?barrier_b.wait()
> ? ?else:
> ? ? ? ?i += 3
> ? ? ? ?barrier_b.wait()
>
>
> I'm really not sure here.

I agree, the barrier doesn't seem like it belongs in a context. For
example, it's ambiguous whether the block is supposed to proceed or
succeed the barrier. I like the named barrier idea, but if that's not
feasible we could perhaps use control flow to disallow conditionally
calling barriers (or that every path calls the barrier (an equal
number of times?)).

>>> Unfortunately, gcc again manages to horribly break master and single
>>> constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll
>>> first file a bug report. Other (better) compilers like Portland (and I'm
>>> sure Intel) work fine. I suppose a warning in the documentation will
>>> suffice there.

One can emit conditional #error pragmas in this case, though of course
it's better to produce code that works correctly on all compilers.

>>> If we at some point implement vector/SIMD operations we could also try
>>> out the Fortran openmp workshare construct.
>>
>> I'm starting to learn myself OpenCL as part of a course. It's very neat
>> for some kinds of parallelism. What I'm saying is that at least of the
>> case of SIMD, we should not lock ourselves to Fortran+OpenMP thinking
>> too early, but also look forward to coming architectures (e.g., AMD's
>> GPU-and-CPU on same die design).

+1. I like the idea of providing more parallelism constructs, but
rather than risk fixating on OpenMP's model, perhaps we should look at
the problem we're trying to solve (e.g., what can't one do well now)
and create (or more likely borrow) the right Pythonic API to do it.

- Robert

From robertwb at math.washington.edu  Wed Oct 12 10:00:00 2011
From: robertwb at math.washington.edu (Robert Bradshaw)
Date: Wed, 12 Oct 2011 01:00:00 -0700
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <CANg26EV9tR=az097ZEWdGHa78z_ZsZwb37t4ZB=mWf7m1g+EgQ@mail.gmail.com>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
	<CADgCd12Pm1TumU=Aq4p2jOx1Y=Ww2=1sV+0NpJfAbfthuRcz8w@mail.gmail.com>
	<CANg26EV-THPeXKDgzjoMcagRYjPbW1RnSSm4q1ONfhpCYjLGLg@mail.gmail.com>
	<CADgCd121mfhEcTUJ6icfZUNKk2a30dRMK3TDYk-uwy2Uki5PFg@mail.gmail.com>
	<CANg26EVhpJbcn63_9T-FQCdVWAi3Div5cnmDQkEzzxCv8iUFog@mail.gmail.com>
	<CANg26EV9tR=az097ZEWdGHa78z_ZsZwb37t4ZB=mWf7m1g+EgQ@mail.gmail.com>
Message-ID: <CADiQ+QBU8Xxkc=RNd9OChOjWeaUhL-fAi3SjMJNgAuMRxvWNog@mail.gmail.com>

On Mon, Oct 10, 2011 at 1:12 AM, mark florisson
<markflorisson88 at gmail.com> wrote:
> On 9 October 2011 22:27, mark florisson <markflorisson88 at gmail.com> wrote:
>>
>> On 9 October 2011 21:48, Jon Olav Vik <jonovik at gmail.com> wrote:
>> > On Sun, Oct 9, 2011 at 9:01 PM, mark florisson
>> > <markflorisson88 at gmail.com> wrote:
>> >> On 9 October 2011 19:54, Jon Olav Vik <jonovik at gmail.com> wrote:
>> >>> Personally, I think I'd prefer context managers as a very
>> >>> readable way to deal with parallelism
>> >>
>> >> Yeah it makes a lot of sense for mutual exclusion, but 'master' really
>> >> means "only the master thread executes this peace of code, even though
>> >> other threads encounter the same code", which is more akin to 'if'
>> >> than 'with'.
>> >
>> > I see your point. However, another similarity with "with" statements
>> > as an encapsulated "try..finally" is when there's a barrier at the end
>> > of the block. I can live with some magic if it saves me from having a
>> > boilerplate line of "barrier" everywhere 8-)
>> > _______________________________________________
>> > cython-devel mailing list
>> > cython-devel at python.org
>> > http://mail.python.org/mailman/listinfo/cython-devel
>> >
>>
>> Hm, indeed. I just noticed that unlike single constructs, master
>> constructs don't have barriers. Both are also not allowed to be
>> closely nested in worksharing constructs. I think the single directive
>> is more useful with respect to tasks, e.g. have a single thread
>> generate tasks and have other threads waiting at the barrier execute
>> them. In that sense I suppose 'if parallel.is_master():' makes sense
>> (no barrier, master thread) and 'with single():' (with barrier, any
>> thread).
>>
>> We could still support single in prange though, if we simply have the
>> master thread execute it ('if (omp_get_thread_num() == 0)') and put a
>> barrier after the block. This makes me wonder what the point of master
>> was supposed to be...
>
> Scratch that last part about master/single in parallel sections, it
> doesn't make sense. It only makes sense if you think of those sections
> as tasks you submit that would be immediately taken up by a (certain)
> thread. But that's not quite what it means. I do like 'if is_master()'
> and 'with single', though.
>
> Another thing we could support is arbitrary reductions. In OpenMP 3.1
> you get reduction operators 'and', 'max' and 'min', but it wouldn't be
> hard to support arbitrary user functions. e.g.
>
> @cython.reduction
> cdef int func(int a, int b):
> ? ?...
>
> for i in prange(...):
> ? ?a = func(a, b)

Interesting idea. An alternative syntax could be

a = cython.parallel.reduce(func, a, b)

> I'm not sure how common this is though. You probably have your
> reduction data in an array so you're already using numpy so you'll
> likely already have your functionality.
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From d.s.seljebotn at astro.uio.no  Wed Oct 12 10:36:16 2011
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Wed, 12 Oct 2011 10:36:16 +0200
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>	<4E919100.8020801@astro.uio.no>
	<4E919A40.2090001@astro.uio.no>
	<CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>
Message-ID: <4E955180.1070601@astro.uio.no>

On 10/12/2011 09:55 AM, Robert Bradshaw wrote:
> On Sun, Oct 9, 2011 at 5:57 AM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no>  wrote:
>> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote:
>>>
>>> On 10/09/2011 02:11 PM, mark florisson wrote:
>>>>
>>>> Hey,
>>>>
>>>> So far people have been enthusiastic about the cython.parallel features,
>>>> I think we should introduce some new features.
>
> Excellent. I think this is going to become a killer feature like
> buffer support.
>
>>>> I propose the following,
>>>
>>> Great!!
>>>
>>> I only have time for a very short feedback now, perhaps more will follow.
>>>
>>>> assume parallel has been imported from cython:
>>>>
>>>> with parallel.master():
>>>> this is executed in the master thread in a parallel (non-prange)
>>>> section
>>>>
>>>> with parallel.single():
>>>> same as master, except any thread may do the execution
>>>>
>>>> An optional keyword argument 'nowait' specifies whether there will be a
>>>> barrier at the end. The default is to wait.
>>
>> I like
>>
>> if parallel.is_master():
>>     ...
>> explicit_barrier_somehow() # see below
>>
>> better as a Pythonization. One could easily support is_master to be used in
>> other contexts as well, simply by assigning a status flag in the master
>> block.
>
> +1, the if statement feels a lot more natural.
>
>> Using an if-test flows much better with Python I feel, but that naturally
>> lead to making the barrier explicit. But I like the barrier always being
>> explicit, rather than having it as a predicate on all the different
>> constructs like in OpenMP....
>>
>> I'm less sure about single, since making it a function indicates one could
>> use it in other contexts and the whole thing becomes too magic (since it's
>> tied to the position of invocation). I'm tempted to suggest
>>
>> for _ in prange(1):
>>     ...
>>
>> as our syntax for single.

Just to be clear: My point was that the above implements single 
behaviour even now, without any extra effort.

>
> The idea here is that you want a block of code executed once,
> presumably by the first thread that gets here? I think this could also
> be handled by a if statement, perhaps "if parallel.first()" or
> something like that. Is there anything special about this construct
> that couldn't simply be done by flushing/checking a variable?

Good point. I think there's a problem with OpenMP that it has too many 
primitives for similar things.

I'm -1 on single -- either using a for loop or flag+flush is more to 
type, but more readable to people who don't know cython.parallel (look: 
Python even makes "self." explicit -- the bias in language design is 
clearly on readability rather than writability).

I thought of "if is_first()" as well, but my problem is again that it 
binds to the location of the call.

if foo:
     if parallel.is_first():
         ...
else:
     if parallel.is_first():
         ...

can not be refactored to:

if parallel.is_first():
     if foo:
         ...
     else:
         ...

which I think is highly confusing for people who didn't write the code 
and don't know the details of cython.parallel. (Unlike is_master(), 
which works the same either way).

I think we should aim for something that's as easy to read as possible 
for Python users with no cython.parallel knowledge.

>
>>>> with parallel.task():
>>>> create a task to be executed by some thread in the team
>>>> once a thread takes up the task it shall only be executed by that
>>>> thread and no other thread (so the task will be tied to the thread)
>>>>
>>>> C variables will be firstprivate
>>>> Python objects will be shared
>>>>
>>>> parallel.taskwait() # wait on any direct descendent tasks to finish
>>>
>>> Regarding tasks, I think this is mapping OpenMP too close to Python.
>>> Closures are excellent for the notion of a task, so I think something
>>> based on the futures API would work better. I realize that makes the
>>> mapping to OpenMP and implementation a bit more difficult, but I think
>>> it is worth it in the long run.
>
> It's almost as if you're reading my thoughts. There are much more
> natural task APIs, e.g. futures or the way the Python
> threading/multiprocessing does things.
>
>>>> with parallel.critical():
>>>> this section of code is mutually exclusive with other critical sections
>>>> optional keyword argument 'name' specifies a name for the critical
>>>> section,
>>>> which means all sections with that name will exclude each other,
>>>> but not
>>>> critical sections with different names
>>>>
>>>> Note: all threads that encounter the section will execute it, just
>>>> not at the same time
>>
>> Yes, this works well as a with-statement...
>>
>> ..except that it is slightly magic in that it binds to call position (unlike
>> anything in Python). I.e. this would be more "correct", or at least
>> Pythonic:
>>
>> with parallel.critical(__file__, __line__):
>>     ...

Mark: I stand corrected on this point. +1 on your critical proposal.

> This feels a lot like a lock, which of course fits well with the with
> statement.
>
>>>> with parallel.barrier():
>>>> all threads wait until everyone has reached the barrier
>>>> either no one or everyone should encounter the barrier
>>>> shared variables are flushed
>>
>> I have problems with requiring a noop with block...
>>
>> I'd much rather write
>>
>> parallel.barrier()
>>
>> However, that ties a function call to the place of invocation, and suggests
>> that one could do
>>
>> if rand()>  .5:
>>     barrier()
>> else:
>>     i += 3
>>     barrier()
>>
>> and have the same barrier in each case. Again,
>>
>> barrier(__file__, __line__)
>>
>> gets us purity at the cost of practicality. Another way is the pthreads
>> approach (although one may have to use pthread rather then OpenMP to get it,
>> unless there are named barriers?):
>>
>> barrier_a = parallel.barrier()
>> barrier_b = parallel.barrier()
>> with parallel:
>>     barrier_a.wait()
>>     if rand()>  .5:
>>         barrier_b.wait()
>>     else:
>>         i += 3
>>         barrier_b.wait()
>>
>>
>> I'm really not sure here.
>
> I agree, the barrier doesn't seem like it belongs in a context. For
> example, it's ambiguous whether the block is supposed to proceed or
> succeed the barrier. I like the named barrier idea, but if that's not
> feasible we could perhaps use control flow to disallow conditionally
> calling barriers (or that every path calls the barrier (an equal
> number of times?)).

It is always an option to go beyond OpenMP. Pthread barriers are a lot 
more powerful in this way, and with pthread and Windows covered I think 
we should be good...

IIUC, you can't have different path calling the barrier the same number 
of times, it's merely

#pragma omp barrier

and a seperate barrier statement gets another counter. Which is why I 
think it is not powerful enough and we should use pthreads.

> +1. I like the idea of providing more parallelism constructs, but
> rather than risk fixating on OpenMP's model, perhaps we should look at
> the problem we're trying to solve (e.g., what can't one do well now)
> and create (or more likely borrow) the right Pythonic API to do it.

Also, quick and flexible message-passing between threads/processes 
through channels is becoming an increasingly popular concept. Go even 
has a seperate syntax for channel communication, and zeromq is becoming 
popular for distributed work.

The is a problem Cython may need to solve here, since one currently has 
to use very low-level C to do it quickly (either zeromq or pthreads in 
most cases -- I guess, an OpenMP critical section would help in 
implementing a queue though).

I wouldn't resist a builtin "channel" type in Cython (since we don't 
have full templating/generics, it would be the only way of sending typed 
data conveniently?).

I ultimately feel things like that is more important than 100% coverage 
of the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit.

Dag Sverre

From d.s.seljebotn at astro.uio.no  Wed Oct 12 10:49:09 2011
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Wed, 12 Oct 2011 10:49:09 +0200
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <4E955180.1070601@astro.uio.no>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>	<4E919100.8020801@astro.uio.no>	<4E919A40.2090001@astro.uio.no>	<CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>
	<4E955180.1070601@astro.uio.no>
Message-ID: <4E955485.7060809@astro.uio.no>

On 10/12/2011 10:36 AM, Dag Sverre Seljebotn wrote:
> On 10/12/2011 09:55 AM, Robert Bradshaw wrote:
>> On Sun, Oct 9, 2011 at 5:57 AM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no> wrote:
>>> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote:
>>>>
>>>> On 10/09/2011 02:11 PM, mark florisson wrote:
>>>>> with parallel.critical():
>>>>> this section of code is mutually exclusive with other critical
>>>>> sections
>>>>> optional keyword argument 'name' specifies a name for the critical
>>>>> section,
>>>>> which means all sections with that name will exclude each other,
>>>>> but not
>>>>> critical sections with different names
>>>>>
>>>>> Note: all threads that encounter the section will execute it, just
>>>>> not at the same time
>>>

On critical sections, I do feel string naming is rather un-Pythonic. I'd 
rather have

lock_a = parallel.Mutex()
lock_b = parallel.Mutex()
with cython.parallel:
     with lock_a:
         ...
     with lock_b:
         ...

This maps well to pthread mutexes, though much harder to map it to OpenMP...

So my proposal is:

  a) parallel.Mutex() can take a string argument and then returns the 
same mutex each time for the same string, meaning you can do

with parallel.Mutex("somename"):

which maps directly to OpenMP.

  b) However, this does not make sense:

with parallel.Mutex():

because each thread would instantiate a *seperate* mutex. So raise 
compiler error ("Redundant code, thread will never block on fresh mutex")

  c) However, one can use a default global Mutex instance:

with parallel.global_mutex

(mapping to an un-named critical in OpenMP)

This seems to be simple enough to implement, and allows generalizing to 
the advanced case above later (probably using pthreads/Windows directly).

Dag Sverre

From robertwb at math.washington.edu  Wed Oct 12 11:08:42 2011
From: robertwb at math.washington.edu (Robert Bradshaw)
Date: Wed, 12 Oct 2011 02:08:42 -0700
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <4E955180.1070601@astro.uio.no>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
	<CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>
	<4E955180.1070601@astro.uio.no>
Message-ID: <CADiQ+QD3ic619jYEEC7mJ6U=EwWna7jOc0TDYhWDBPCfW=8kvg@mail.gmail.com>

On Wed, Oct 12, 2011 at 1:36 AM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 10/12/2011 09:55 AM, Robert Bradshaw wrote:
>>> I'm less sure about single, since making it a function indicates one
>>> could
>>> use it in other contexts and the whole thing becomes too magic (since
>>> it's
>>> tied to the position of invocation). I'm tempted to suggest
>>>
>>> for _ in prange(1):
>>> ? ?...
>>>
>>> as our syntax for single.
>
> Just to be clear: My point was that the above implements single behaviour
> even now, without any extra effort.
>
>>
>> The idea here is that you want a block of code executed once,
>> presumably by the first thread that gets here? I think this could also
>> be handled by a if statement, perhaps "if parallel.first()" or
>> something like that. Is there anything special about this construct
>> that couldn't simply be done by flushing/checking a variable?
>
> Good point. I think there's a problem with OpenMP that it has too many
> primitives for similar things.
>
> I'm -1 on single -- either using a for loop or flag+flush is more to type,
> but more readable to people who don't know cython.parallel (look: Python
> even makes "self." explicit -- the bias in language design is clearly on
> readability rather than writability).
>
> I thought of "if is_first()" as well, but my problem is again that it binds
> to the location of the call.
>
> if foo:
> ? ?if parallel.is_first():
> ? ? ? ?...
> else:
> ? ?if parallel.is_first():
> ? ? ? ?...
>
> can not be refactored to:
>
> if parallel.is_first():
> ? ?if foo:
> ? ? ? ?...
> ? ?else:
> ? ? ? ?...
>
> which I think is highly confusing for people who didn't write the code and
> don't know the details of cython.parallel. (Unlike is_master(), which works
> the same either way).
>
> I think we should aim for something that's as easy to read as possible for
> Python users with no cython.parallel knowledge.

Exactly. This is what's so beautiful about prange.

>>>>> with parallel.barrier():
>>>>> all threads wait until everyone has reached the barrier
>>>>> either no one or everyone should encounter the barrier
>>>>> shared variables are flushed
>>>
>>> I have problems with requiring a noop with block...
>>>
>>> I'd much rather write
>>>
>>> parallel.barrier()
>>>
>>> However, that ties a function call to the place of invocation, and
>>> suggests
>>> that one could do
>>>
>>> if rand()> ?.5:
>>> ? ?barrier()
>>> else:
>>> ? ?i += 3
>>> ? ?barrier()
>>>
>>> and have the same barrier in each case. Again,
>>>
>>> barrier(__file__, __line__)
>>>
>>> gets us purity at the cost of practicality. Another way is the pthreads
>>> approach (although one may have to use pthread rather then OpenMP to get
>>> it,
>>> unless there are named barriers?):
>>>
>>> barrier_a = parallel.barrier()
>>> barrier_b = parallel.barrier()
>>> with parallel:
>>> ? ?barrier_a.wait()
>>> ? ?if rand()> ?.5:
>>> ? ? ? ?barrier_b.wait()
>>> ? ?else:
>>> ? ? ? ?i += 3
>>> ? ? ? ?barrier_b.wait()
>>>
>>>
>>> I'm really not sure here.
>>
>> I agree, the barrier doesn't seem like it belongs in a context. For
>> example, it's ambiguous whether the block is supposed to proceed or
>> succeed the barrier. I like the named barrier idea, but if that's not
>> feasible we could perhaps use control flow to disallow conditionally
>> calling barriers (or that every path calls the barrier (an equal
>> number of times?)).
>
> It is always an option to go beyond OpenMP. Pthread barriers are a lot more
> powerful in this way, and with pthread and Windows covered I think we should
> be good...
>
> IIUC, you can't have different path calling the barrier the same number of
> times, it's merely
>
> #pragma omp barrier
>
> and a seperate barrier statement gets another counter.

Makes sense, but this greatly restricts where we could use the OpenMP version.

> Which is why I think
> it is not powerful enough and we should use pthreads.
>
>> +1. I like the idea of providing more parallelism constructs, but
>> rather than risk fixating on OpenMP's model, perhaps we should look at
>> the problem we're trying to solve (e.g., what can't one do well now)
>> and create (or more likely borrow) the right Pythonic API to do it.
>
> Also, quick and flexible message-passing between threads/processes through
> channels is becoming an increasingly popular concept. Go even has a seperate
> syntax for channel communication, and zeromq is becoming popular for
> distributed work.
>
> The is a problem Cython may need to solve here, since one currently has to
> use very low-level C to do it quickly (either zeromq or pthreads in most
> cases -- I guess, an OpenMP critical section would help in implementing a
> queue though).
>
> I wouldn't resist a builtin "channel" type in Cython (since we don't have
> full templating/generics, it would be the only way of sending typed data
> conveniently?).

zeromq seems to be a nice level of abstraction--we could probably get
far with a zeromq "overlay" module that didn't require the GIL. Or is
the C API easy enough to use if we could provide convenient mechanisms
to initialize the tasks/threads. I think perhaps the communication
model could be solved by a library more easily than the treading
model.

> I ultimately feel things like that is more important than 100% coverage of
> the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit.

+1 Prange handles the (corse-grained) SIMD case nicely, and a
task/futures model based on closures would I think flesh this out to
the next level of generality (and complexity).

- Robert

From robertwb at math.washington.edu  Wed Oct 12 11:20:11 2011
From: robertwb at math.washington.edu (Robert Bradshaw)
Date: Wed, 12 Oct 2011 02:20:11 -0700
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <4E955485.7060809@astro.uio.no>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
	<CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>
	<4E955180.1070601@astro.uio.no> <4E955485.7060809@astro.uio.no>
Message-ID: <CADiQ+QCCt2jTCPurmXyRAtV3egDoC+j7F_ajoKAk0Fh6X4MspA@mail.gmail.com>

On Wed, Oct 12, 2011 at 1:49 AM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 10/12/2011 10:36 AM, Dag Sverre Seljebotn wrote:
>>
>> On 10/12/2011 09:55 AM, Robert Bradshaw wrote:
>>>
>>> On Sun, Oct 9, 2011 at 5:57 AM, Dag Sverre Seljebotn
>>> <d.s.seljebotn at astro.uio.no> wrote:
>>>>
>>>> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote:
>>>>>
>>>>> On 10/09/2011 02:11 PM, mark florisson wrote:
>>>>>>
>>>>>> with parallel.critical():
>>>>>> this section of code is mutually exclusive with other critical
>>>>>> sections
>>>>>> optional keyword argument 'name' specifies a name for the critical
>>>>>> section,
>>>>>> which means all sections with that name will exclude each other,
>>>>>> but not
>>>>>> critical sections with different names
>>>>>>
>>>>>> Note: all threads that encounter the section will execute it, just
>>>>>> not at the same time
>>>>
>
> On critical sections, I do feel string naming is rather un-Pythonic. I'd
> rather have
>
> lock_a = parallel.Mutex()
> lock_b = parallel.Mutex()
> with cython.parallel:
> ? ?with lock_a:
> ? ? ? ?...
> ? ?with lock_b:
> ? ? ? ?...
>
> This maps well to pthread mutexes, though much harder to map it to OpenMP...

For this low level, perhaps people should just be using the pthreads
library directly? Here I'm showing my ignorance: can that work with
OpenMP spawned threads? (Maybe a compatibility layer is required for
transparent Windows support.) Suppose one could write a context object
that did not require the GIL, then one could do

with MyContext():
   ...

in a nogil block, MyContext could be implemented by whoever on
whatever thread library, no special language support required.

> So my proposal is:
>
> ?a) parallel.Mutex() can take a string argument and then returns the same
> mutex each time for the same string, meaning you can do
>
> with parallel.Mutex("somename"):
>
> which maps directly to OpenMP.
>
> ?b) However, this does not make sense:
>
> with parallel.Mutex():
>
> because each thread would instantiate a *seperate* mutex. So raise compiler
> error ("Redundant code, thread will never block on fresh mutex")
>
> ?c) However, one can use a default global Mutex instance:
>
> with parallel.global_mutex
>
> (mapping to an un-named critical in OpenMP)
>
> This seems to be simple enough to implement, and allows generalizing to the
> advanced case above later (probably using pthreads/Windows directly).

Alternatively, let parallel.Mutex() be the global mutex, with some
other way of getting a new, unique mutex to pass around and use in
multiple places.

- Robert

From d.s.seljebotn at astro.uio.no  Wed Oct 12 11:24:45 2011
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Wed, 12 Oct 2011 11:24:45 +0200
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <CADiQ+QD3ic619jYEEC7mJ6U=EwWna7jOc0TDYhWDBPCfW=8kvg@mail.gmail.com>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>	<4E919100.8020801@astro.uio.no>
	<4E919A40.2090001@astro.uio.no>	<CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>	<4E955180.1070601@astro.uio.no>
	<CADiQ+QD3ic619jYEEC7mJ6U=EwWna7jOc0TDYhWDBPCfW=8kvg@mail.gmail.com>
Message-ID: <4E955CDD.8060203@astro.uio.no>

On 10/12/2011 11:08 AM, Robert Bradshaw wrote:
> On Wed, Oct 12, 2011 at 1:36 AM, Dag Sverre Seljebotn
>> I wouldn't resist a builtin "channel" type in Cython (since we don't have
>> full templating/generics, it would be the only way of sending typed data
>> conveniently?).
>
> zeromq seems to be a nice level of abstraction--we could probably get
> far with a zeromq "overlay" module that didn't require the GIL. Or is
> the C API easy enough to use if we could provide convenient mechanisms
> to initialize the tasks/threads. I think perhaps the communication
> model could be solved by a library more easily than the treading
> model.

Ah, zeromq even has an in-process transport, so should work nicely for 
multithreading as well.

The main problem is that I'd like something like

ctypedef struct Msg:
     int what
     double when

cdef Msg msg
cdef channel[Msg] mychan = channel[msg](blocking=True, in_process=True)
with cython.parallel:
     ...
     if is_master():
         mychan.send(what=1, when=2.3)
     else:
         msg = mychan.recv()


Which one can't really do without either builtin support or templating 
support. One *could* implement it in C++...

C-level API just sends char* around, e.g.,

int zmq_msg_init_data (zmq_msg_t *msg, void *data, size_t size, 
zmq_free_fn *ffn, void *hint);

Dag Sverre

From stefan_ml at behnel.de  Wed Oct 12 14:03:14 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 12 Oct 2011 14:03:14 +0200
Subject: [Cython] Utilities, cython.h, libcython
In-Reply-To: <CANg26EWJm7ksg9WW2i5mhoW98fr-zNEmkhxCyfMuK4S=fwtssQ@mail.gmail.com>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>	<CADiQ+QBaF22aWveZ75EcDcgekQPzihvkDPijEy15_-1cB7hKqA@mail.gmail.com>	<CANg26EU1gWG+Guj8hMRr=mGS62A6FdiPz2EVYjNPAhKruvd6-A@mail.gmail.com>	<CADiQ+QBrEgz00+TWLADQo--xhufTG5o2MPHNmSWTiT0-R1KJoA@mail.gmail.com>
	<CANg26EWJm7ksg9WW2i5mhoW98fr-zNEmkhxCyfMuK4S=fwtssQ@mail.gmail.com>
Message-ID: <4E958202.9000806@behnel.de>

mark florisson, 06.10.2011 11:45:
> On 6 October 2011 01:05, Robert Bradshaw wrote:
>> I'm not sure what the overhead is, if any, in calling function pointers vs.
>> actually linking things together at the C level (which is essentially the
>> same idea, but perhaps addresses are resolved at library load time rather
>> than requiring a dereference on each call?)
>
> I think there isn't any difference with dynamic linking and having a
> pointer. My understanding (of ELF shared libraries) is that the
> procedure lookup table will contain the actual address of the symbol
> (likely after the first reference to it has been made, it may have a
> stub that resolves the symbol and replaces it's own address with the
> actual address), which to me sounds like the same thing as a pointer.
> I think only static linking can prevent this, i.e. directly encode the
> static address into the call opcode, but I'm not an expert.

Even if it makes a slight difference that the CPU's branch prediction 
cannot cope with, it's still up to us to decide which code must be inside 
the module for performance reasons and which we can afford to move outside. 
Generally speaking, any code section that is large enough to be worth being 
moved into a separate library shouldn't notice any performance difference 
through an indirect call.

Stefan

From markflorisson88 at gmail.com  Wed Oct 12 16:00:13 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Wed, 12 Oct 2011 15:00:13 +0100
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <CADiQ+QCCt2jTCPurmXyRAtV3egDoC+j7F_ajoKAk0Fh6X4MspA@mail.gmail.com>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
	<CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>
	<4E955180.1070601@astro.uio.no> <4E955485.7060809@astro.uio.no>
	<CADiQ+QCCt2jTCPurmXyRAtV3egDoC+j7F_ajoKAk0Fh6X4MspA@mail.gmail.com>
Message-ID: <CANg26EUo6iyKz4R4N94qSOMZXMTm-4mgnf5cqYmweyzXPgsy8w@mail.gmail.com>

On 12 October 2011 10:20, Robert Bradshaw <robertwb at math.washington.edu> wrote:
> On Wed, Oct 12, 2011 at 1:49 AM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no> wrote:
>> On 10/12/2011 10:36 AM, Dag Sverre Seljebotn wrote:
>>>
>>> On 10/12/2011 09:55 AM, Robert Bradshaw wrote:
>>>>
>>>> On Sun, Oct 9, 2011 at 5:57 AM, Dag Sverre Seljebotn
>>>> <d.s.seljebotn at astro.uio.no> wrote:
>>>>>
>>>>> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote:
>>>>>>
>>>>>> On 10/09/2011 02:11 PM, mark florisson wrote:
>>>>>>>
>>>>>>> with parallel.critical():
>>>>>>> this section of code is mutually exclusive with other critical
>>>>>>> sections
>>>>>>> optional keyword argument 'name' specifies a name for the critical
>>>>>>> section,
>>>>>>> which means all sections with that name will exclude each other,
>>>>>>> but not
>>>>>>> critical sections with different names
>>>>>>>
>>>>>>> Note: all threads that encounter the section will execute it, just
>>>>>>> not at the same time
>>>>>
>>
>> On critical sections, I do feel string naming is rather un-Pythonic. I'd
>> rather have
>>
>> lock_a = parallel.Mutex()
>> lock_b = parallel.Mutex()
>> with cython.parallel:
>> ? ?with lock_a:
>> ? ? ? ?...
>> ? ?with lock_b:
>> ? ? ? ?...
>>
>> This maps well to pthread mutexes, though much harder to map it to OpenMP...
>
> For this low level, perhaps people should just be using the pthreads
> library directly? Here I'm showing my ignorance: can that work with
> OpenMP spawned threads? (Maybe a compatibility layer is required for
> transparent Windows support.) Suppose one could write a context object
> that did not require the GIL, then one could do
>
> with MyContext():
> ? ...
>
> in a nogil block, MyContext could be implemented by whoever on
> whatever thread library, no special language support required.

Exactly, that's always possible. I myself very much like how critical
works, but if you want a more Pythonic-looking mutex, it might be
better to make that the user's burden. Otherwise we'd also have to
give it a type, make it compatible with code that doesn't have the
GIL, acquisition count it when passing it around, etc.

If your program doesn't even have other Python threads running, you
could even use 'with gil:' as a global synchronization.

The only good thing about named and unnamed critical sections is
really the convenience of writing it, and the resulting conciseness
(which imho, if you know how critical works, only adds to the code
readability).

However, not providing parallel.Mutex would mean people probably want
to resort to the goodies from the threading module, which would
ironically not be impossible because you'd need to GIL to use them :)
But we could recommend the PyThread_*_lock stuff in the documentation.

>> So my proposal is:
>>
>> ?a) parallel.Mutex() can take a string argument and then returns the same
>> mutex each time for the same string, meaning you can do
>>
>> with parallel.Mutex("somename"):
>>
>> which maps directly to OpenMP.
>>
>> ?b) However, this does not make sense:
>>
>> with parallel.Mutex():
>>
>> because each thread would instantiate a *seperate* mutex. So raise compiler
>> error ("Redundant code, thread will never block on fresh mutex")
>>
>> ?c) However, one can use a default global Mutex instance:
>>
>> with parallel.global_mutex
>>
>> (mapping to an un-named critical in OpenMP)
>>
>> This seems to be simple enough to implement, and allows generalizing to the
>> advanced case above later (probably using pthreads/Windows directly).
>
> Alternatively, let parallel.Mutex() be the global mutex, with some
> other way of getting a new, unique mutex to pass around and use in
> multiple places.
>
> - Robert
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From markflorisson88 at gmail.com  Wed Oct 12 16:07:21 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Wed, 12 Oct 2011 15:07:21 +0100
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <4E955180.1070601@astro.uio.no>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
	<CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>
	<4E955180.1070601@astro.uio.no>
Message-ID: <CANg26EVXR-rSKUkWuL0JhdYNntJ5JX1GpHP8gtQrWa-U9zfdAg@mail.gmail.com>

On 12 October 2011 09:36, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 10/12/2011 09:55 AM, Robert Bradshaw wrote:
>>
>> On Sun, Oct 9, 2011 at 5:57 AM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no> ?wrote:
>>>
>>> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote:
>>>>
>>>> On 10/09/2011 02:11 PM, mark florisson wrote:
>>>>>
>>>>> Hey,
>>>>>
>>>>> So far people have been enthusiastic about the cython.parallel
>>>>> features,
>>>>> I think we should introduce some new features.
>>
>> Excellent. I think this is going to become a killer feature like
>> buffer support.
>>
>>>>> I propose the following,
>>>>
>>>> Great!!
>>>>
>>>> I only have time for a very short feedback now, perhaps more will
>>>> follow.
>>>>
>>>>> assume parallel has been imported from cython:
>>>>>
>>>>> with parallel.master():
>>>>> this is executed in the master thread in a parallel (non-prange)
>>>>> section
>>>>>
>>>>> with parallel.single():
>>>>> same as master, except any thread may do the execution
>>>>>
>>>>> An optional keyword argument 'nowait' specifies whether there will be a
>>>>> barrier at the end. The default is to wait.
>>>
>>> I like
>>>
>>> if parallel.is_master():
>>> ? ?...
>>> explicit_barrier_somehow() # see below
>>>
>>> better as a Pythonization. One could easily support is_master to be used
>>> in
>>> other contexts as well, simply by assigning a status flag in the master
>>> block.
>>
>> +1, the if statement feels a lot more natural.
>>
>>> Using an if-test flows much better with Python I feel, but that naturally
>>> lead to making the barrier explicit. But I like the barrier always being
>>> explicit, rather than having it as a predicate on all the different
>>> constructs like in OpenMP....
>>>
>>> I'm less sure about single, since making it a function indicates one
>>> could
>>> use it in other contexts and the whole thing becomes too magic (since
>>> it's
>>> tied to the position of invocation). I'm tempted to suggest
>>>
>>> for _ in prange(1):
>>> ? ?...
>>>
>>> as our syntax for single.
>
> Just to be clear: My point was that the above implements single behaviour
> even now, without any extra effort.

Right I got that. In the same way you could use

for _ in prange(0): pass

to get a barrier. I'm just saying that it looks pretty weird.

>>
>> The idea here is that you want a block of code executed once,
>> presumably by the first thread that gets here? I think this could also
>> be handled by a if statement, perhaps "if parallel.first()" or
>> something like that. Is there anything special about this construct
>> that couldn't simply be done by flushing/checking a variable?
>
> Good point. I think there's a problem with OpenMP that it has too many
> primitives for similar things.

Definitely.

> I'm -1 on single -- either using a for loop or flag+flush is more to type,
> but more readable to people who don't know cython.parallel (look: Python
> even makes "self." explicit -- the bias in language design is clearly on
> readability rather than writability).
>
> I thought of "if is_first()" as well, but my problem is again that it binds
> to the location of the call.
>
> if foo:
> ? ?if parallel.is_first():
> ? ? ? ?...
> else:
> ? ?if parallel.is_first():
> ? ? ? ?...
>
> can not be refactored to:
>
> if parallel.is_first():
> ? ?if foo:
> ? ? ? ?...
> ? ?else:
> ? ? ? ?...
>
> which I think is highly confusing for people who didn't write the code and
> don't know the details of cython.parallel. (Unlike is_master(), which works
> the same either way).
>
> I think we should aim for something that's as easy to read as possible for
> Python users with no cython.parallel knowledge.

That's a good point. I suppose single and master is not really needed,
so just master ("is_master") could be sufficient there.

>>
>>>>> with parallel.task():
>>>>> create a task to be executed by some thread in the team
>>>>> once a thread takes up the task it shall only be executed by that
>>>>> thread and no other thread (so the task will be tied to the thread)
>>>>>
>>>>> C variables will be firstprivate
>>>>> Python objects will be shared
>>>>>
>>>>> parallel.taskwait() # wait on any direct descendent tasks to finish
>>>>
>>>> Regarding tasks, I think this is mapping OpenMP too close to Python.
>>>> Closures are excellent for the notion of a task, so I think something
>>>> based on the futures API would work better. I realize that makes the
>>>> mapping to OpenMP and implementation a bit more difficult, but I think
>>>> it is worth it in the long run.
>>
>> It's almost as if you're reading my thoughts. There are much more
>> natural task APIs, e.g. futures or the way the Python
>> threading/multiprocessing does things.
>>
>>>>> with parallel.critical():
>>>>> this section of code is mutually exclusive with other critical sections
>>>>> optional keyword argument 'name' specifies a name for the critical
>>>>> section,
>>>>> which means all sections with that name will exclude each other,
>>>>> but not
>>>>> critical sections with different names
>>>>>
>>>>> Note: all threads that encounter the section will execute it, just
>>>>> not at the same time
>>>
>>> Yes, this works well as a with-statement...
>>>
>>> ..except that it is slightly magic in that it binds to call position
>>> (unlike
>>> anything in Python). I.e. this would be more "correct", or at least
>>> Pythonic:
>>>
>>> with parallel.critical(__file__, __line__):
>>> ? ?...
>
> Mark: I stand corrected on this point. +1 on your critical proposal.
>
>> This feels a lot like a lock, which of course fits well with the with
>> statement.
>>
>>>>> with parallel.barrier():
>>>>> all threads wait until everyone has reached the barrier
>>>>> either no one or everyone should encounter the barrier
>>>>> shared variables are flushed
>>>
>>> I have problems with requiring a noop with block...
>>>
>>> I'd much rather write
>>>
>>> parallel.barrier()
>>>
>>> However, that ties a function call to the place of invocation, and
>>> suggests
>>> that one could do
>>>
>>> if rand()> ?.5:
>>> ? ?barrier()
>>> else:
>>> ? ?i += 3
>>> ? ?barrier()
>>>
>>> and have the same barrier in each case. Again,
>>>
>>> barrier(__file__, __line__)
>>>
>>> gets us purity at the cost of practicality. Another way is the pthreads
>>> approach (although one may have to use pthread rather then OpenMP to get
>>> it,
>>> unless there are named barriers?):
>>>
>>> barrier_a = parallel.barrier()
>>> barrier_b = parallel.barrier()
>>> with parallel:
>>> ? ?barrier_a.wait()
>>> ? ?if rand()> ?.5:
>>> ? ? ? ?barrier_b.wait()
>>> ? ?else:
>>> ? ? ? ?i += 3
>>> ? ? ? ?barrier_b.wait()
>>>
>>>
>>> I'm really not sure here.
>>
>> I agree, the barrier doesn't seem like it belongs in a context. For
>> example, it's ambiguous whether the block is supposed to proceed or
>> succeed the barrier. I like the named barrier idea, but if that's not
>> feasible we could perhaps use control flow to disallow conditionally
>> calling barriers (or that every path calls the barrier (an equal
>> number of times?)).
>
> It is always an option to go beyond OpenMP. Pthread barriers are a lot more
> powerful in this way, and with pthread and Windows covered I think we should
> be good...
>
> IIUC, you can't have different path calling the barrier the same number of
> times, it's merely
>
> #pragma omp barrier
>
> and a seperate barrier statement gets another counter. Which is why I think
> it is not powerful enough and we should use pthreads.

I don't think we should quite jump to that conclusion. Indeed openmp
barriers may not do what we want, but I think you could implement
barriers yourself (I haven't looked at an implementation, but I think
a condition lock + OpenMP flush can do what you need).

Implementing all this in pthreads wouldn't be trivial and it would
also be hard to do portably for non-Posix systems, considering that
most Cython developers don't know much about/care a lot about windows
for instance.

>> +1. I like the idea of providing more parallelism constructs, but
>> rather than risk fixating on OpenMP's model, perhaps we should look at
>> the problem we're trying to solve (e.g., what can't one do well now)
>> and create (or more likely borrow) the right Pythonic API to do it.
>
> Also, quick and flexible message-passing between threads/processes through
> channels is becoming an increasingly popular concept. Go even has a seperate
> syntax for channel communication, and zeromq is becoming popular for
> distributed work.
>
> The is a problem Cython may need to solve here, since one currently has to
> use very low-level C to do it quickly (either zeromq or pthreads in most
> cases -- I guess, an OpenMP critical section would help in implementing a
> queue though).
>
> I wouldn't resist a builtin "channel" type in Cython (since we don't have
> full templating/generics, it would be the only way of sending typed data
> conveniently?).

I'm not sure if we should introduce more syntax, but what about
reusing arrays or memoryview slices? If you assign to elements or
subslices you send messages, if you read them but don't have the data
you get the messages (so the program which has the data will send it,
etc).

But really, I think this is a different beast all together. If you
want to do this then you must be sure to cover all aspects, otherwise
people will just use the respective libraries. I think if you really
want this kind of thing on a cluster, you'd be using fortran anyway
(maybe with co-arrays), and if you need to do distributed computing
you'd be using zeromq directly.

> I ultimately feel things like that is more important than 100% coverage of
> the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit.

Yeah I never wanted full OpenMP coverage, it's just the first
(easiest) thing that comes to mind, it's easy to implement and if
you're familiar with OpenMP, it makes sense.

It would also be easier to support orphaned worksharing in the future,
if we wanted. But I think that might just be even more confusing for
people.

> Dag Sverre
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From markflorisson88 at gmail.com  Wed Oct 12 16:55:44 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Wed, 12 Oct 2011 15:55:44 +0100
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <CADiQ+QD3ic619jYEEC7mJ6U=EwWna7jOc0TDYhWDBPCfW=8kvg@mail.gmail.com>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
	<CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>
	<4E955180.1070601@astro.uio.no>
	<CADiQ+QD3ic619jYEEC7mJ6U=EwWna7jOc0TDYhWDBPCfW=8kvg@mail.gmail.com>
Message-ID: <CANg26EVx0EZxRdC15tjKc+QVFaQjTT_yozZe47rLAQCs9aoH4Q@mail.gmail.com>

On 12 October 2011 10:08, Robert Bradshaw <robertwb at math.washington.edu> wrote:
> On Wed, Oct 12, 2011 at 1:36 AM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no> wrote:
>> On 10/12/2011 09:55 AM, Robert Bradshaw wrote:
>>>> I'm less sure about single, since making it a function indicates one
>>>> could
>>>> use it in other contexts and the whole thing becomes too magic (since
>>>> it's
>>>> tied to the position of invocation). I'm tempted to suggest
>>>>
>>>> for _ in prange(1):
>>>> ? ?...
>>>>
>>>> as our syntax for single.
>>
>> Just to be clear: My point was that the above implements single behaviour
>> even now, without any extra effort.
>>
>>>
>>> The idea here is that you want a block of code executed once,
>>> presumably by the first thread that gets here? I think this could also
>>> be handled by a if statement, perhaps "if parallel.first()" or
>>> something like that. Is there anything special about this construct
>>> that couldn't simply be done by flushing/checking a variable?
>>
>> Good point. I think there's a problem with OpenMP that it has too many
>> primitives for similar things.
>>
>> I'm -1 on single -- either using a for loop or flag+flush is more to type,
>> but more readable to people who don't know cython.parallel (look: Python
>> even makes "self." explicit -- the bias in language design is clearly on
>> readability rather than writability).
>>
>> I thought of "if is_first()" as well, but my problem is again that it binds
>> to the location of the call.
>>
>> if foo:
>> ? ?if parallel.is_first():
>> ? ? ? ?...
>> else:
>> ? ?if parallel.is_first():
>> ? ? ? ?...
>>
>> can not be refactored to:
>>
>> if parallel.is_first():
>> ? ?if foo:
>> ? ? ? ?...
>> ? ?else:
>> ? ? ? ?...
>>
>> which I think is highly confusing for people who didn't write the code and
>> don't know the details of cython.parallel. (Unlike is_master(), which works
>> the same either way).
>>
>> I think we should aim for something that's as easy to read as possible for
>> Python users with no cython.parallel knowledge.
>
> Exactly. This is what's so beautiful about prange.
>
>>>>>> with parallel.barrier():
>>>>>> all threads wait until everyone has reached the barrier
>>>>>> either no one or everyone should encounter the barrier
>>>>>> shared variables are flushed
>>>>
>>>> I have problems with requiring a noop with block...
>>>>
>>>> I'd much rather write
>>>>
>>>> parallel.barrier()
>>>>
>>>> However, that ties a function call to the place of invocation, and
>>>> suggests
>>>> that one could do
>>>>
>>>> if rand()> ?.5:
>>>> ? ?barrier()
>>>> else:
>>>> ? ?i += 3
>>>> ? ?barrier()
>>>>
>>>> and have the same barrier in each case. Again,
>>>>
>>>> barrier(__file__, __line__)
>>>>
>>>> gets us purity at the cost of practicality. Another way is the pthreads
>>>> approach (although one may have to use pthread rather then OpenMP to get
>>>> it,
>>>> unless there are named barriers?):
>>>>
>>>> barrier_a = parallel.barrier()
>>>> barrier_b = parallel.barrier()
>>>> with parallel:
>>>> ? ?barrier_a.wait()
>>>> ? ?if rand()> ?.5:
>>>> ? ? ? ?barrier_b.wait()
>>>> ? ?else:
>>>> ? ? ? ?i += 3
>>>> ? ? ? ?barrier_b.wait()
>>>>
>>>>
>>>> I'm really not sure here.
>>>
>>> I agree, the barrier doesn't seem like it belongs in a context. For
>>> example, it's ambiguous whether the block is supposed to proceed or
>>> succeed the barrier. I like the named barrier idea, but if that's not
>>> feasible we could perhaps use control flow to disallow conditionally
>>> calling barriers (or that every path calls the barrier (an equal
>>> number of times?)).
>>
>> It is always an option to go beyond OpenMP. Pthread barriers are a lot more
>> powerful in this way, and with pthread and Windows covered I think we should
>> be good...
>>
>> IIUC, you can't have different path calling the barrier the same number of
>> times, it's merely
>>
>> #pragma omp barrier
>>
>> and a seperate barrier statement gets another counter.
>
> Makes sense, but this greatly restricts where we could use the OpenMP version.
>
>> Which is why I think
>> it is not powerful enough and we should use pthreads.
>>
>>> +1. I like the idea of providing more parallelism constructs, but
>>> rather than risk fixating on OpenMP's model, perhaps we should look at
>>> the problem we're trying to solve (e.g., what can't one do well now)
>>> and create (or more likely borrow) the right Pythonic API to do it.
>>
>> Also, quick and flexible message-passing between threads/processes through
>> channels is becoming an increasingly popular concept. Go even has a seperate
>> syntax for channel communication, and zeromq is becoming popular for
>> distributed work.
>>
>> The is a problem Cython may need to solve here, since one currently has to
>> use very low-level C to do it quickly (either zeromq or pthreads in most
>> cases -- I guess, an OpenMP critical section would help in implementing a
>> queue though).
>>
>> I wouldn't resist a builtin "channel" type in Cython (since we don't have
>> full templating/generics, it would be the only way of sending typed data
>> conveniently?).
>
> zeromq seems to be a nice level of abstraction--we could probably get
> far with a zeromq "overlay" module that didn't require the GIL. Or is
> the C API easy enough to use if we could provide convenient mechanisms
> to initialize the tasks/threads. I think perhaps the communication
> model could be solved by a library more easily than the treading
> model.
>
>> I ultimately feel things like that is more important than 100% coverage of
>> the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit.
>
> +1 Prange handles the (corse-grained) SIMD case nicely, and a
> task/futures model based on closures would I think flesh this out to
> the next level of generality (and complexity).

Futures are definitely nice. I suppose I think really like "inline
futures", i.e. openmp tasks. I realize that futures may look more
pythonic. However, as mentioned previously, I also see issues with
that. When you submit a task then you expect a future object, which
you might want to pass around. But we don't have the GIL for that. I
personally feel that futures is something that should be done by a
library (such as concurrent.futures in python 3.2), and inline tasks
by a language. It also means I have to write an entire function or
closure for perhaps only a few lines of code.

I might also want to submit other functions that are not closures, or
I might want to reuse my closures that are used for tasks and for
something else. So what if my tasks contain more parallel constructs?
e.g. what if I have a task closure that I return from my function that
generates more tasks itself? Would you just execute them sequentially
outside of the parallel construct, or would you simply disallow that?
Also, do you restrict future "objects" to only the parallel section?

Another problem is that you can only wait on tasks of your direct
children. So what if I get access to my parent's future object
(assuming you allow tasks to generate tasks), and then want the result
of my parent?
Or what if I store these future objects in an array or list and access
them arbitrarily? You will only know at runtime which task to wait on,
and openmp only has a static, lexical taskwait.

I suppose my point is that without either a drastic rewrite (e.g., use
pthreads instead of openmp) or quite a bit of contraints, I am unsure
how futures would work here. Perhaps you guys have some concrete
syntax and semantics proposals?

> - Robert
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From stefan_ml at behnel.de  Wed Oct 12 21:52:07 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 12 Oct 2011 21:52:07 +0200
Subject: [Cython] PyCon-DE wrap-up by Kay Hayen
In-Reply-To: <CADiQ+QA9PAaQ+D0OagD4tAjMtnbgodoh_Lk3o4JLNQJ6kEd_8w@mail.gmail.com>
References: <4E91DB64.9050201@behnel.de>	<CANg26EX1+dMDvZfusjR2gXb3y=RyQYJud2Og7HWRO3S4QfiAyA@mail.gmail.com>	<4E92AF0B.6070905@behnel.de>
	<CADiQ+QA9PAaQ+D0OagD4tAjMtnbgodoh_Lk3o4JLNQJ6kEd_8w@mail.gmail.com>
Message-ID: <4E95EFE7.1020500@behnel.de>

Robert Bradshaw, 11.10.2011 08:11:
> Thanks for the update and link. Sounds like PyCon-DE went well.

More than that - here's my take on it:

http://blog.behnel.de/index.php?p=188

Stefan

From stefan_ml at behnel.de  Thu Oct 13 07:10:06 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 13 Oct 2011 07:10:06 +0200
Subject: [Cython] test failure for cython-devel in Py2.4
In-Reply-To: <CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>
References: <4E930C72.8080303@behnel.de>	<CANg26EW2Xa-+Y5PCNryC6zj+9i=t872kdroaZ-ujfVBCj-RRMQ@mail.gmail.com>	<CANg26EVQ2_R8v5a2HQSxtMhKR7VtRPU3nSLrn_Pb2G0yOTGZOg@mail.gmail.com>	<CANg26EXkcHFWmr5WySyQZuoZiCX9_-eegPCe3z4req3WBY0Lsg@mail.gmail.com>
	<CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>
Message-ID: <4E9672AE.6080905@behnel.de>

mark florisson, 12.10.2011 23:46:
>>>> On 10 October 2011 16:17, Stefan Behnel wrote:
>>>>> Jenkins currently reports several failures, and this one seems to be
>>>>> due to your tempita changes:
>>>>>
>>>> https://sage.math.washington.edu:8091/hudson/view/cython-devel/job/cython-devel-lxml-trunk/PYVERSION=py24/31/console
>>>>
>>>> Thanks! I'll try to fix that somewhere this week.

We should really get to the habit of not pushing changes to the master 
branch that turn out to be broken in the personal branches, or, if they 
appear to be ok and only turn out to break the master branch *after* 
pushing them (which is ok, we have Jenkins to tell us), revert them if a 
fix cannot be applied shortly, i.e. within a day or two at most.

It's very annoying when the master branch is broken for weeks in a row, 
especially since that means that it will keep attracting new failures due 
to the cover of already broken tests, which makes it much harder to 
pinpoint the commits that triggered them.


>> Is it me or are other builds broken as well?
>>
>> I pushed a fix for the tempita thing, but it seems the entire py3k build is
>> broken:
>>
>> https://sage.math.washington.edu:8091/hudson/view/All/job/cython-devel-build/54/PYVERSION=py3k/console

It's not only the py3k tests, the build is broken in general. The problem 
here is that it only *shows* in the py3k tests because the Py2 builds do 
not bail out when one of the Cython modules fails to build. That needs 
fixing as well.


> I just cannot reproduce that error on my system, let me investigate it
> further.

My guess was that it's due to the innocent looking change that Robert did 
to enable type inference for the GeneralCallNode. It seems that there was a 
bit more to do here.

Stefan

From stefan_ml at behnel.de  Thu Oct 13 07:37:13 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 13 Oct 2011 07:37:13 +0200
Subject: [Cython] test failure for cython-devel in Py2.4
In-Reply-To: <4E9672AE.6080905@behnel.de>
References: <4E930C72.8080303@behnel.de>	<CANg26EW2Xa-+Y5PCNryC6zj+9i=t872kdroaZ-ujfVBCj-RRMQ@mail.gmail.com>	<CANg26EVQ2_R8v5a2HQSxtMhKR7VtRPU3nSLrn_Pb2G0yOTGZOg@mail.gmail.com>	<CANg26EXkcHFWmr5WySyQZuoZiCX9_-eegPCe3z4req3WBY0Lsg@mail.gmail.com>	<CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>
	<4E9672AE.6080905@behnel.de>
Message-ID: <4E967909.2040407@behnel.de>

Stefan Behnel, 13.10.2011 07:10:
> mark florisson, 12.10.2011 23:46:
>>> Is it me or are other builds broken as well?
>>>
>>> I pushed a fix for the tempita thing, but it seems the entire py3k build is
>>> broken:
>>>
>>> https://sage.math.washington.edu:8091/hudson/view/All/job/cython-devel-build/54/PYVERSION=py3k/console
>>>
>
> It's not only the py3k tests, the build is broken in general. The problem
> here is that it only *shows* in the py3k tests because the Py2 builds do
> not bail out when one of the Cython modules fails to build. That needs
> fixing as well.
>
>
>> I just cannot reproduce that error on my system, let me investigate it
>> further.
>
> My guess was that it's due to the innocent looking change that Robert did
> to enable type inference for the GeneralCallNode. It seems that there was a
> bit more to do here.

Now that I think about it - remember that the Jenkins builds use a source 
distribution to build, not a plain checkout. Maybe there's something wrong 
with the sdist? At least, I see several warnings about file patterns in 
MANIFEST.in that are not matched by any files:

"""
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.pyx' under directory 
'Cython/Debugger/Tests'
warning: no files found matching '*.pxd' under directory 
'Cython/Debugger/Tests'
warning: no files found matching '*.h' under directory 'Cython/Debugger/Tests'
warning: no files found matching '*.pxd' under directory 'Cython/Utility'
warning: no files found matching '*.h' under directory 'Cython/Utility'
warning: no files found matching '.cpp' under directory 'Cython/Utility'
"""

https://sage.math.washington.edu:8091/hudson/job/cython-devel-sdist/678/console

Also note that the build appears to choke on test utility code:

"""
Error compiling Cython file:
------------------------------------------------------------
...
cdef extern from *:
     cdef object __pyx_test_dep(object)

@cname('__pyx_TestClass')
cdef class TestClass(object):
     cdef public int value
                    ^
------------------------------------------------------------

TestClass:9:20: Compiler crash in AnalyseDeclarationsTransform
"""

https://sage.math.washington.edu:8091/hudson/job/cython-devel-build/56/PYVERSION=py3k/console

Mark, didn't you disable the loading of any test code during 'normal' 
builds? Maybe there's something broken on that front?

Stefan

From vitja.makarov at gmail.com  Thu Oct 13 08:03:55 2011
From: vitja.makarov at gmail.com (Vitja Makarov)
Date: Thu, 13 Oct 2011 10:03:55 +0400
Subject: [Cython] test failure for cython-devel in Py2.4
In-Reply-To: <4E9672AE.6080905@behnel.de>
References: <4E930C72.8080303@behnel.de>
	<CANg26EW2Xa-+Y5PCNryC6zj+9i=t872kdroaZ-ujfVBCj-RRMQ@mail.gmail.com>
	<CANg26EVQ2_R8v5a2HQSxtMhKR7VtRPU3nSLrn_Pb2G0yOTGZOg@mail.gmail.com>
	<CANg26EXkcHFWmr5WySyQZuoZiCX9_-eegPCe3z4req3WBY0Lsg@mail.gmail.com>
	<CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>
	<4E9672AE.6080905@behnel.de>
Message-ID: <CAKGHGPTejE7=f=O-he1Zhm3zL=X8KP0V0oKsJskqB=R2qgQ0AA@mail.gmail.com>

2011/10/13 Stefan Behnel <stefan_ml at behnel.de>:
> mark florisson, 12.10.2011 23:46:
>>>>>
>>>>> On 10 October 2011 16:17, Stefan Behnel wrote:
>>>>>>
>>>>>> Jenkins currently reports several failures, and this one seems to be
>>>>>> due to your tempita changes:
>>>>>>
>>>>>
>>>>> https://sage.math.washington.edu:8091/hudson/view/cython-devel/job/cython-devel-lxml-trunk/PYVERSION=py24/31/console
>>>>>
>>>>> Thanks! I'll try to fix that somewhere this week.
>
> We should really get to the habit of not pushing changes to the master
> branch that turn out to be broken in the personal branches, or, if they
> appear to be ok and only turn out to break the master branch *after* pushing
> them (which is ok, we have Jenkins to tell us), revert them if a fix cannot
> be applied shortly, i.e. within a day or two at most.
>
> It's very annoying when the master branch is broken for weeks in a row,
> especially since that means that it will keep attracting new failures due to
> the cover of already broken tests, which makes it much harder to pinpoint
> the commits that triggered them.
>

+1

>
>>> Is it me or are other builds broken as well?
>>>
>>> I pushed a fix for the tempita thing, but it seems the entire py3k build
>>> is
>>> broken:
>>>
>>>
>>> https://sage.math.washington.edu:8091/hudson/view/All/job/cython-devel-build/54/PYVERSION=py3k/console
>
> It's not only the py3k tests, the build is broken in general. The problem
> here is that it only *shows* in the py3k tests because the Py2 builds do not
> bail out when one of the Cython modules fails to build. That needs fixing as
> well.
>
>
>> I just cannot reproduce that error on my system, let me investigate it
>> further.
>
> My guess was that it's due to the innocent looking change that Robert did to
> enable type inference for the GeneralCallNode. It seems that there was a bit
> more to do here.
>

I found that tempita bug goes away if you change language_level to 2.


-- 
vitja.

From robertwb at math.washington.edu  Thu Oct 13 09:26:37 2011
From: robertwb at math.washington.edu (Robert Bradshaw)
Date: Thu, 13 Oct 2011 00:26:37 -0700
Subject: [Cython] test failure for cython-devel in Py2.4
In-Reply-To: <4E9672AE.6080905@behnel.de>
References: <4E930C72.8080303@behnel.de>
	<CANg26EW2Xa-+Y5PCNryC6zj+9i=t872kdroaZ-ujfVBCj-RRMQ@mail.gmail.com>
	<CANg26EVQ2_R8v5a2HQSxtMhKR7VtRPU3nSLrn_Pb2G0yOTGZOg@mail.gmail.com>
	<CANg26EXkcHFWmr5WySyQZuoZiCX9_-eegPCe3z4req3WBY0Lsg@mail.gmail.com>
	<CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>
	<4E9672AE.6080905@behnel.de>
Message-ID: <CADiQ+QARCbQZLv_mXrzjVTNQmxjr8xcyH4PV39rEn4J=frbyqg@mail.gmail.com>

On Wed, Oct 12, 2011 at 10:10 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> mark florisson, 12.10.2011 23:46:
>>>>>
>>>>> On 10 October 2011 16:17, Stefan Behnel wrote:
>>>>>>
>>>>>> Jenkins currently reports several failures, and this one seems to be
>>>>>> due to your tempita changes:
>>>>>>
>>>>>
>>>>> https://sage.math.washington.edu:8091/hudson/view/cython-devel/job/cython-devel-lxml-trunk/PYVERSION=py24/31/console
>>>>>
>>>>> Thanks! I'll try to fix that somewhere this week.
>
> We should really get to the habit of not pushing changes to the master
> branch that turn out to be broken in the personal branches, or, if they
> appear to be ok and only turn out to break the master branch *after* pushing
> them (which is ok, we have Jenkins to tell us), revert them if a fix cannot
> be applied shortly, i.e. within a day or two at most.
>
> It's very annoying when the master branch is broken for weeks in a row,
> especially since that means that it will keep attracting new failures due to
> the cover of already broken tests, which makes it much harder to pinpoint
> the commits that triggered them.
>
>
>>> Is it me or are other builds broken as well?
>>>
>>> I pushed a fix for the tempita thing, but it seems the entire py3k build
>>> is
>>> broken:
>>>
>>>
>>> https://sage.math.washington.edu:8091/hudson/view/All/job/cython-devel-build/54/PYVERSION=py3k/console
>
> It's not only the py3k tests, the build is broken in general. The problem
> here is that it only *shows* in the py3k tests because the Py2 builds do not
> bail out when one of the Cython modules fails to build. That needs fixing as
> well.
>
>
>> I just cannot reproduce that error on my system, let me investigate it
>> further.
>
> My guess was that it's due to the innocent looking change that Robert did to
> enable type inference for the GeneralCallNode. It seems that there was a bit
> more to do here.

This has been rolled back, but that didn't fix things...

In other news, I finally set up a set of jenkins jobs for my github
branch, because I agree it's super annoying to have a broken build for
a long time. Still puzzled by this one though...

- Robert

From stefan_ml at behnel.de  Thu Oct 13 10:05:09 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 13 Oct 2011 10:05:09 +0200
Subject: [Cython] test failure for cython-devel in Py2.4
In-Reply-To: <CADiQ+QARCbQZLv_mXrzjVTNQmxjr8xcyH4PV39rEn4J=frbyqg@mail.gmail.com>
References: <4E930C72.8080303@behnel.de>	<CANg26EW2Xa-+Y5PCNryC6zj+9i=t872kdroaZ-ujfVBCj-RRMQ@mail.gmail.com>	<CANg26EVQ2_R8v5a2HQSxtMhKR7VtRPU3nSLrn_Pb2G0yOTGZOg@mail.gmail.com>	<CANg26EXkcHFWmr5WySyQZuoZiCX9_-eegPCe3z4req3WBY0Lsg@mail.gmail.com>	<CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>	<4E9672AE.6080905@behnel.de>
	<CADiQ+QARCbQZLv_mXrzjVTNQmxjr8xcyH4PV39rEn4J=frbyqg@mail.gmail.com>
Message-ID: <4E969BB5.6060400@behnel.de>

Robert Bradshaw, 13.10.2011 09:26:
> On Wed, Oct 12, 2011 at 10:10 PM, Stefan Behnel wrote:
>> mark florisson, 12.10.2011 23:46:
>>>> Is it me or are other builds broken as well?
>>>>
>>>> I pushed a fix for the tempita thing, but it seems the entire py3k build
>>>> is broken:
>>>>
>>>> https://sage.math.washington.edu:8091/hudson/view/All/job/cython-devel-build/54/PYVERSION=py3k/console
>>
>> It's not only the py3k tests, the build is broken in general. The problem
>> here is that it only *shows* in the py3k tests because the Py2 builds do not
>> bail out when one of the Cython modules fails to build. That needs fixing as
>> well.
>>
>> My guess was that it's due to the innocent looking change that Robert did to
>> enable type inference for the GeneralCallNode. It seems that there was a bit
>> more to do here.
>
> This has been rolled back, but that didn't fix things...

Hmm, ok, sorry then. That's the kind of thing I meant when I said that it 
becomes hard to pinpoint bugs when things are broken already. That change 
was the only functional change before the build broke in Jenkins...

Stefan

From stefan_ml at behnel.de  Thu Oct 13 10:53:48 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 13 Oct 2011 10:53:48 +0200
Subject: [Cython] test failure for cython-devel in Py2.4
In-Reply-To: <4E9672AE.6080905@behnel.de>
References: <4E930C72.8080303@behnel.de>	<CANg26EW2Xa-+Y5PCNryC6zj+9i=t872kdroaZ-ujfVBCj-RRMQ@mail.gmail.com>	<CANg26EVQ2_R8v5a2HQSxtMhKR7VtRPU3nSLrn_Pb2G0yOTGZOg@mail.gmail.com>	<CANg26EXkcHFWmr5WySyQZuoZiCX9_-eegPCe3z4req3WBY0Lsg@mail.gmail.com>	<CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>
	<4E9672AE.6080905@behnel.de>
Message-ID: <4E96A71C.1030504@behnel.de>

Stefan Behnel, 13.10.2011 07:10:
> mark florisson, 12.10.2011 23:46:
>>> Is it me or are other builds broken as well?
>>>
>>> I pushed a fix for the tempita thing, but it seems the entire py3k build is
>>> broken:
>>>
>>> https://sage.math.washington.edu:8091/hudson/view/All/job/cython-devel-build/54/PYVERSION=py3k/console
>
> It's not only the py3k tests, the build is broken in general.

I take that back. I thought I had seen failures in other versions, too, but 
that might have been in older builds. Currently, it is only broken in the 
py3k branch, which opens up the possibility that it has something to do 
with the large rewrites that recently went into CPython, specifically (but 
not necessarily limited to) the unicode changes for PEP393.

I disabled the py3k builds for now and that at least gets the other builds 
through. I still see the tempita bug in Py2.4, though:

https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/BACKEND=c,PYVERSION=py24/47/console

Stefan

From stefan_ml at behnel.de  Thu Oct 13 11:01:34 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 13 Oct 2011 11:01:34 +0200
Subject: [Cython] test failure for cython-devel in Py2.4
In-Reply-To: <CAKGHGPTejE7=f=O-he1Zhm3zL=X8KP0V0oKsJskqB=R2qgQ0AA@mail.gmail.com>
References: <4E930C72.8080303@behnel.de>	<CANg26EW2Xa-+Y5PCNryC6zj+9i=t872kdroaZ-ujfVBCj-RRMQ@mail.gmail.com>	<CANg26EVQ2_R8v5a2HQSxtMhKR7VtRPU3nSLrn_Pb2G0yOTGZOg@mail.gmail.com>	<CANg26EXkcHFWmr5WySyQZuoZiCX9_-eegPCe3z4req3WBY0Lsg@mail.gmail.com>	<CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>	<4E9672AE.6080905@behnel.de>
	<CAKGHGPTejE7=f=O-he1Zhm3zL=X8KP0V0oKsJskqB=R2qgQ0AA@mail.gmail.com>
Message-ID: <4E96A8EE.4070701@behnel.de>

Vitja Makarov, 13.10.2011 08:03:
> I found that tempita bug goes away if you change language_level to 2.

There's no language level configured in Py2.4, which fails.

https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/48/BACKEND=c,PYVERSION=py24/console

Stefan

From markflorisson88 at gmail.com  Thu Oct 13 11:06:25 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Thu, 13 Oct 2011 10:06:25 +0100
Subject: [Cython] test failure for cython-devel in Py2.4
In-Reply-To: <4E9672AE.6080905@behnel.de>
References: <4E930C72.8080303@behnel.de>
	<CANg26EW2Xa-+Y5PCNryC6zj+9i=t872kdroaZ-ujfVBCj-RRMQ@mail.gmail.com>
	<CANg26EVQ2_R8v5a2HQSxtMhKR7VtRPU3nSLrn_Pb2G0yOTGZOg@mail.gmail.com>
	<CANg26EXkcHFWmr5WySyQZuoZiCX9_-eegPCe3z4req3WBY0Lsg@mail.gmail.com>
	<CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>
	<4E9672AE.6080905@behnel.de>
Message-ID: <CANg26EW+o96YB+JxkJomVxZc4X6o73SfXwCK01cdk7EPTf=Npw@mail.gmail.com>

On 13 October 2011 06:10, Stefan Behnel <stefan_ml at behnel.de> wrote:
>
> mark florisson, 12.10.2011 23:46:
>>>>>
>>>>> On 10 October 2011 16:17, Stefan Behnel wrote:
>>>>>>
>>>>>> Jenkins currently reports several failures, and this one seems to be
>>>>>> due to your tempita changes:
>>>>>>
>>>>> https://sage.math.washington.edu:8091/hudson/view/cython-devel/job/cython-devel-lxml-trunk/PYVERSION=py24/31/console
>>>>>
>>>>> Thanks! I'll try to fix that somewhere this week.
>
> We should really get to the habit of not pushing changes to the master branch that turn out to be broken in the personal branches, or, if they appear to be ok and only turn out to break the master branch *after* pushing them (which is ok, we have Jenkins to tell us), revert them if a fix cannot be applied shortly, i.e. within a day or two at most.
>
> It's very annoying when the master branch is broken for weeks in a row, especially since that means that it will keep attracting new failures due to the cover of already broken tests, which makes it much harder to pinpoint the commits that triggered them.
>

Yes I totally agree. The thing is that memoryviews on hudson were
rebased on the latest master and my Jenkins was entirely blue. So I
merged them, I don't recall checking the cython-devel-tests results,
but I think it might have only been 2.4 failing with the tempita
stuff. Unfortunately I only have a 2.3 build that is perpetually
broken on Jenkins.

At some point my fused types py3k build also got broken after merging
stuff in from  master. None of it is reproducible on my machine
though.

>>> Is it me or are other builds broken as well?
>>>
>>> I pushed a fix for the tempita thing, but it seems the entire py3k build is
>>> broken:
>>>
>>> https://sage.math.washington.edu:8091/hudson/view/All/job/cython-devel-build/54/PYVERSION=py3k/console
>
> It's not only the py3k tests, the build is broken in general. The problem here is that it only *shows* in the py3k tests because the Py2 builds do not bail out when one of the Cython modules fails to build. That needs fixing as well.
>
>
>> I just cannot reproduce that error on my system, let me investigate it
>> further.
>
> My guess was that it's due to the innocent looking change that Robert did to enable type inference for the GeneralCallNode. It seems that there was a bit more to do here.
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

From markflorisson88 at gmail.com  Thu Oct 13 11:10:42 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Thu, 13 Oct 2011 10:10:42 +0100
Subject: [Cython] test failure for cython-devel in Py2.4
In-Reply-To: <4E96A71C.1030504@behnel.de>
References: <4E930C72.8080303@behnel.de>
	<CANg26EW2Xa-+Y5PCNryC6zj+9i=t872kdroaZ-ujfVBCj-RRMQ@mail.gmail.com>
	<CANg26EVQ2_R8v5a2HQSxtMhKR7VtRPU3nSLrn_Pb2G0yOTGZOg@mail.gmail.com>
	<CANg26EXkcHFWmr5WySyQZuoZiCX9_-eegPCe3z4req3WBY0Lsg@mail.gmail.com>
	<CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>
	<4E9672AE.6080905@behnel.de> <4E96A71C.1030504@behnel.de>
Message-ID: <CANg26EVFpBCo_4jOwgN47tJyVqc8gT8wbnf=XTSC-r=pRTqYJA@mail.gmail.com>

On 13 October 2011 09:53, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Stefan Behnel, 13.10.2011 07:10:
>>
>> mark florisson, 12.10.2011 23:46:
>>>>
>>>> Is it me or are other builds broken as well?
>>>>
>>>> I pushed a fix for the tempita thing, but it seems the entire py3k build
>>>> is
>>>> broken:
>>>>
>>>>
>>>> https://sage.math.washington.edu:8091/hudson/view/All/job/cython-devel-build/54/PYVERSION=py3k/console
>>
>> It's not only the py3k tests, the build is broken in general.
>
> I take that back. I thought I had seen failures in other versions, too, but
> that might have been in older builds. Currently, it is only broken in the
> py3k branch, which opens up the possibility that it has something to do with
> the large rewrites that recently went into CPython, specifically (but not
> necessarily limited to) the unicode changes for PEP393.
>
> I disabled the py3k builds for now and that at least gets the other builds
> through. I still see the tempita bug in Py2.4, though:
>
> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/BACKEND=c,PYVERSION=py24/47/console
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

Should we just use stable CPython only? It's confusing to see failing
test suites just because CPython might break (or even if it doesn't,
you might be thinking it does).
Tempita also works fine on my system, I pushed a fix for that. It
seems there's a problem with the memoryview tests in 2.4 though,
because the PyBUF_* flags aren't available there. I'll try to add a
2.4 build to my Jenkins.

From stefan_ml at behnel.de  Thu Oct 13 11:23:08 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 13 Oct 2011 11:23:08 +0200
Subject: [Cython] test failure for cython-devel in Py2.4
In-Reply-To: <CANg26EVFpBCo_4jOwgN47tJyVqc8gT8wbnf=XTSC-r=pRTqYJA@mail.gmail.com>
References: <4E930C72.8080303@behnel.de>	<CANg26EW2Xa-+Y5PCNryC6zj+9i=t872kdroaZ-ujfVBCj-RRMQ@mail.gmail.com>	<CANg26EVQ2_R8v5a2HQSxtMhKR7VtRPU3nSLrn_Pb2G0yOTGZOg@mail.gmail.com>	<CANg26EXkcHFWmr5WySyQZuoZiCX9_-eegPCe3z4req3WBY0Lsg@mail.gmail.com>	<CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>	<4E9672AE.6080905@behnel.de>
	<4E96A71C.1030504@behnel.de>
	<CANg26EVFpBCo_4jOwgN47tJyVqc8gT8wbnf=XTSC-r=pRTqYJA@mail.gmail.com>
Message-ID: <4E96ADFC.3070706@behnel.de>

mark florisson, 13.10.2011 11:10:
> On 13 October 2011 09:53, Stefan Behnel wrote:
>> Stefan Behnel, 13.10.2011 07:10:
>>> mark florisson, 12.10.2011 23:46:
>>>>>
>>>>> Is it me or are other builds broken as well?
>>>>>
>>>>> I pushed a fix for the tempita thing, but it seems the entire py3k build
>>>>> is broken:
>>>>>
>>>>> https://sage.math.washington.edu:8091/hudson/view/All/job/cython-devel-build/54/PYVERSION=py3k/console
>>>
>>> It's not only the py3k tests, the build is broken in general.
>>
>> I take that back. I thought I had seen failures in other versions, too, but
>> that might have been in older builds. Currently, it is only broken in the
>> py3k branch, which opens up the possibility that it has something to do with
>> the large rewrites that recently went into CPython, specifically (but not
>> necessarily limited to) the unicode changes for PEP393.
>>
>> I disabled the py3k builds for now and that at least gets the other builds
>> through. I still see the tempita bug in Py2.4, though:
>>
>> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/BACKEND=c,PYVERSION=py24/47/console
>
> Should we just use stable CPython only? It's confusing to see failing
> test suites just because CPython might break (or even if it doesn't,
> you might be thinking it does).

Well, it's rare that CPython is *that* broken, and it's good for us to see 
quickly when it breaks because of our own code. It's also good if we can 
report bugs to python-dev before they consider everything fine because they 
lack a test.


> Tempita also works fine on my system, I pushed a fix for that. It
> seems there's a problem with the memoryview tests in 2.4 though,
> because the PyBUF_* flags aren't available there. I'll try to add a
> 2.4 build to my Jenkins.

You should just copy the cython-devel jobs. They are much friendlier to set 
up and change.

Stefan

From vitja.makarov at gmail.com  Thu Oct 13 11:53:01 2011
From: vitja.makarov at gmail.com (Vitja Makarov)
Date: Thu, 13 Oct 2011 13:53:01 +0400
Subject: [Cython] test failure for cython-devel in Py2.4
In-Reply-To: <4E96A8EE.4070701@behnel.de>
References: <4E930C72.8080303@behnel.de>
	<CANg26EW2Xa-+Y5PCNryC6zj+9i=t872kdroaZ-ujfVBCj-RRMQ@mail.gmail.com>
	<CANg26EVQ2_R8v5a2HQSxtMhKR7VtRPU3nSLrn_Pb2G0yOTGZOg@mail.gmail.com>
	<CANg26EXkcHFWmr5WySyQZuoZiCX9_-eegPCe3z4req3WBY0Lsg@mail.gmail.com>
	<CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>
	<4E9672AE.6080905@behnel.de>
	<CAKGHGPTejE7=f=O-he1Zhm3zL=X8KP0V0oKsJskqB=R2qgQ0AA@mail.gmail.com>
	<4E96A8EE.4070701@behnel.de>
Message-ID: <CAKGHGPTW9urWTXaJsiicUHppjON27qPAt+vjUEVEYUsYgJNpZQ@mail.gmail.com>

2011/10/13 Stefan Behnel <stefan_ml at behnel.de>:
> Vitja Makarov, 13.10.2011 08:03:
>>
>> I found that tempita bug goes away if you change language_level to 2.
>
> There's no language level configured in Py2.4, which fails.
>
> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/48/BACKEND=c,PYVERSION=py24/console
>

No, I mean language level 3 is set at top of the Code.py, when it's
set to 2 Py2.4 build is okay.


-- 
vitja.

From markflorisson88 at gmail.com  Thu Oct 13 11:56:42 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Thu, 13 Oct 2011 10:56:42 +0100
Subject: [Cython] test failure for cython-devel in Py2.4
In-Reply-To: <CAKGHGPTW9urWTXaJsiicUHppjON27qPAt+vjUEVEYUsYgJNpZQ@mail.gmail.com>
References: <4E930C72.8080303@behnel.de>
	<CANg26EW2Xa-+Y5PCNryC6zj+9i=t872kdroaZ-ujfVBCj-RRMQ@mail.gmail.com>
	<CANg26EVQ2_R8v5a2HQSxtMhKR7VtRPU3nSLrn_Pb2G0yOTGZOg@mail.gmail.com>
	<CANg26EXkcHFWmr5WySyQZuoZiCX9_-eegPCe3z4req3WBY0Lsg@mail.gmail.com>
	<CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>
	<4E9672AE.6080905@behnel.de>
	<CAKGHGPTejE7=f=O-he1Zhm3zL=X8KP0V0oKsJskqB=R2qgQ0AA@mail.gmail.com>
	<4E96A8EE.4070701@behnel.de>
	<CAKGHGPTW9urWTXaJsiicUHppjON27qPAt+vjUEVEYUsYgJNpZQ@mail.gmail.com>
Message-ID: <CANg26EV6tqnpLGUyhtHrtL1cfEHBGfx50nxss1PN2j_ZV5Bihg@mail.gmail.com>

On 13 October 2011 10:53, Vitja Makarov <vitja.makarov at gmail.com> wrote:
> 2011/10/13 Stefan Behnel <stefan_ml at behnel.de>:
>> Vitja Makarov, 13.10.2011 08:03:
>>>
>>> I found that tempita bug goes away if you change language_level to 2.
>>
>> There's no language level configured in Py2.4, which fails.
>>
>> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/48/BACKEND=c,PYVERSION=py24/console
>>
>
> No, I mean language level 3 is set at top of the Code.py, when it's
> set to 2 Py2.4 build is okay.
>
>
> --
> vitja.
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

Ah, it doesn't take unicode keyword arguments. That should be fixed.

From markflorisson88 at gmail.com  Thu Oct 13 12:18:37 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Thu, 13 Oct 2011 11:18:37 +0100
Subject: [Cython] test failure for cython-devel in Py2.4
In-Reply-To: <CANg26EV6tqnpLGUyhtHrtL1cfEHBGfx50nxss1PN2j_ZV5Bihg@mail.gmail.com>
References: <4E930C72.8080303@behnel.de>
	<CANg26EW2Xa-+Y5PCNryC6zj+9i=t872kdroaZ-ujfVBCj-RRMQ@mail.gmail.com>
	<CANg26EVQ2_R8v5a2HQSxtMhKR7VtRPU3nSLrn_Pb2G0yOTGZOg@mail.gmail.com>
	<CANg26EXkcHFWmr5WySyQZuoZiCX9_-eegPCe3z4req3WBY0Lsg@mail.gmail.com>
	<CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>
	<4E9672AE.6080905@behnel.de>
	<CAKGHGPTejE7=f=O-he1Zhm3zL=X8KP0V0oKsJskqB=R2qgQ0AA@mail.gmail.com>
	<4E96A8EE.4070701@behnel.de>
	<CAKGHGPTW9urWTXaJsiicUHppjON27qPAt+vjUEVEYUsYgJNpZQ@mail.gmail.com>
	<CANg26EV6tqnpLGUyhtHrtL1cfEHBGfx50nxss1PN2j_ZV5Bihg@mail.gmail.com>
Message-ID: <CANg26EVqTPfqg7AVs5AQEGEsysqAcjvF94zWd-pD6CKiwVzz+g@mail.gmail.com>

On 13 October 2011 10:56, mark florisson <markflorisson88 at gmail.com> wrote:
> On 13 October 2011 10:53, Vitja Makarov <vitja.makarov at gmail.com> wrote:
>> 2011/10/13 Stefan Behnel <stefan_ml at behnel.de>:
>>> Vitja Makarov, 13.10.2011 08:03:
>>>>
>>>> I found that tempita bug goes away if you change language_level to 2.
>>>
>>> There's no language level configured in Py2.4, which fails.
>>>
>>> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/48/BACKEND=c,PYVERSION=py24/console
>>>
>>
>> No, I mean language level 3 is set at top of the Code.py, when it's
>> set to 2 Py2.4 build is okay.
>>
>>
>> --
>> vitja.
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>
> Ah, it doesn't take unicode keyword arguments. That should be fixed.
>

Frankly, language level 3 is rather uncomfortable to deal with in
python 2(.4). Any reason it's set to 3? I'll try reverting to 2 and
pushing.

From stefan_ml at behnel.de  Thu Oct 13 13:44:28 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 13 Oct 2011 13:44:28 +0200
Subject: [Cython] test failure for cython-devel in Py2.4
In-Reply-To: <CANg26EVqTPfqg7AVs5AQEGEsysqAcjvF94zWd-pD6CKiwVzz+g@mail.gmail.com>
References: <4E930C72.8080303@behnel.de>	<CANg26EW2Xa-+Y5PCNryC6zj+9i=t872kdroaZ-ujfVBCj-RRMQ@mail.gmail.com>	<CANg26EVQ2_R8v5a2HQSxtMhKR7VtRPU3nSLrn_Pb2G0yOTGZOg@mail.gmail.com>	<CANg26EXkcHFWmr5WySyQZuoZiCX9_-eegPCe3z4req3WBY0Lsg@mail.gmail.com>	<CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>	<4E9672AE.6080905@behnel.de>	<CAKGHGPTejE7=f=O-he1Zhm3zL=X8KP0V0oKsJskqB=R2qgQ0AA@mail.gmail.com>	<4E96A8EE.4070701@behnel.de>	<CAKGHGPTW9urWTXaJsiicUHppjON27qPAt+vjUEVEYUsYgJNpZQ@mail.gmail.com>	<CANg26EV6tqnpLGUyhtHrtL1cfEHBGfx50nxss1PN2j_ZV5Bihg@mail.gmail.com>
	<CANg26EVqTPfqg7AVs5AQEGEsysqAcjvF94zWd-pD6CKiwVzz+g@mail.gmail.com>
Message-ID: <4E96CF1C.1000400@behnel.de>

mark florisson, 13.10.2011 12:18:
> On 13 October 2011 10:56, mark florisson wrote:
>> On 13 October 2011 10:53, Vitja Makarov wrote:
>>> 2011/10/13 Stefan Behnel:
>>>> Vitja Makarov, 13.10.2011 08:03:
>>>>>
>>>>> I found that tempita bug goes away if you change language_level to 2.
>>>>
>>>> There's no language level configured in Py2.4, which fails.
>>>>
>>>> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/48/BACKEND=c,PYVERSION=py24/console
>>>
>>> No, I mean language level 3 is set at top of the Code.py, when it's
>>> set to 2 Py2.4 build is okay.
>>>
>> Ah, it doesn't take unicode keyword arguments. That should be fixed.
>
> Frankly, language level 3 is rather uncomfortable to deal with in
> python 2(.4).

Well, without the parentheses, I presume ...


> Any reason it's set to 3?

Mainly for performance reasons, especially in Python 2. Py3 code tends to 
run faster in Cython due to more explicit semantics. In particular, we get 
unicode content in and write unicode content out, so using unicode literals 
in the source code right away saves a decoding step for each write or 
interpolation of a literal string in Python 2. It won't make a difference 
when running Cython in Python 3, but it saves a lot of unnecessary 
processing cycles in Py2, even though the difference may not be substantial 
over a complete run. It's just so convenient to switch the language level 
and let that shave off a bunch of processing overhead that I didn't see a 
reason not to do it.

I doubt that it'll make a functional difference, though, so if it works 
better without that option, we may have to go back to Py2 compilation.

Stefan

From markflorisson88 at gmail.com  Thu Oct 13 13:52:07 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Thu, 13 Oct 2011 12:52:07 +0100
Subject: [Cython] test failure for cython-devel in Py2.4
In-Reply-To: <4E96CF1C.1000400@behnel.de>
References: <4E930C72.8080303@behnel.de>
	<CANg26EW2Xa-+Y5PCNryC6zj+9i=t872kdroaZ-ujfVBCj-RRMQ@mail.gmail.com>
	<CANg26EVQ2_R8v5a2HQSxtMhKR7VtRPU3nSLrn_Pb2G0yOTGZOg@mail.gmail.com>
	<CANg26EXkcHFWmr5WySyQZuoZiCX9_-eegPCe3z4req3WBY0Lsg@mail.gmail.com>
	<CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>
	<4E9672AE.6080905@behnel.de>
	<CAKGHGPTejE7=f=O-he1Zhm3zL=X8KP0V0oKsJskqB=R2qgQ0AA@mail.gmail.com>
	<4E96A8EE.4070701@behnel.de>
	<CAKGHGPTW9urWTXaJsiicUHppjON27qPAt+vjUEVEYUsYgJNpZQ@mail.gmail.com>
	<CANg26EV6tqnpLGUyhtHrtL1cfEHBGfx50nxss1PN2j_ZV5Bihg@mail.gmail.com>
	<CANg26EVqTPfqg7AVs5AQEGEsysqAcjvF94zWd-pD6CKiwVzz+g@mail.gmail.com>
	<4E96CF1C.1000400@behnel.de>
Message-ID: <CANg26EVaApuuy98uhoaZhyJ0bdUcm2ijNY7FobT3c=5kmX=8gQ@mail.gmail.com>

On 13 October 2011 12:44, Stefan Behnel <stefan_ml at behnel.de> wrote:
> mark florisson, 13.10.2011 12:18:
>>
>> On 13 October 2011 10:56, mark florisson wrote:
>>>
>>> On 13 October 2011 10:53, Vitja Makarov wrote:
>>>>
>>>> 2011/10/13 Stefan Behnel:
>>>>>
>>>>> Vitja Makarov, 13.10.2011 08:03:
>>>>>>
>>>>>> I found that tempita bug goes away if you change language_level to 2.
>>>>>
>>>>> There's no language level configured in Py2.4, which fails.
>>>>>
>>>>>
>>>>> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/48/BACKEND=c,PYVERSION=py24/console
>>>>
>>>> No, I mean language level 3 is set at top of the Code.py, when it's
>>>> set to 2 Py2.4 build is okay.
>>>>
>>> Ah, it doesn't take unicode keyword arguments. That should be fixed.
>>
>> Frankly, language level 3 is rather uncomfortable to deal with in
>> python 2(.4).
>
> Well, without the parentheses, I presume ...

Ah, it appears only 2.7 eats unicode keyword arguments. I wonder why
the 2.5 and 2.6 builds didn't fail then.

>
>> Any reason it's set to 3?
>
> Mainly for performance reasons, especially in Python 2. Py3 code tends to
> run faster in Cython due to more explicit semantics. In particular, we get
> unicode content in and write unicode content out, so using unicode literals
> in the source code right away saves a decoding step for each write or
> interpolation of a literal string in Python 2. It won't make a difference
> when running Cython in Python 3, but it saves a lot of unnecessary
> processing cycles in Py2, even though the difference may not be substantial
> over a complete run. It's just so convenient to switch the language level
> and let that shave off a bunch of processing overhead that I didn't see a
> reason not to do it.
>
> I doubt that it'll make a functional difference, though, so if it works
> better without that option, we may have to go back to Py2 compilation.

I see. Yeah it's sort of hard to fix, as I really need bytes in python
2 and really need unicode (str) in python 3, so I can neither write
'foo' nor b'foo' nor u'foo' with language level 3.

BTW this is always a real problem in doctests too, as your bytestrings
will suddenly be printed as b'foo' in python 3, which will fail your
doctest. So to make it work you need to do explicit encoding/decoding
to make it work everywhere.

> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From markflorisson88 at gmail.com  Thu Oct 13 13:54:12 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Thu, 13 Oct 2011 12:54:12 +0100
Subject: [Cython] test failure for cython-devel in Py2.4
In-Reply-To: <CANg26EVaApuuy98uhoaZhyJ0bdUcm2ijNY7FobT3c=5kmX=8gQ@mail.gmail.com>
References: <4E930C72.8080303@behnel.de>
	<CANg26EW2Xa-+Y5PCNryC6zj+9i=t872kdroaZ-ujfVBCj-RRMQ@mail.gmail.com>
	<CANg26EVQ2_R8v5a2HQSxtMhKR7VtRPU3nSLrn_Pb2G0yOTGZOg@mail.gmail.com>
	<CANg26EXkcHFWmr5WySyQZuoZiCX9_-eegPCe3z4req3WBY0Lsg@mail.gmail.com>
	<CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>
	<4E9672AE.6080905@behnel.de>
	<CAKGHGPTejE7=f=O-he1Zhm3zL=X8KP0V0oKsJskqB=R2qgQ0AA@mail.gmail.com>
	<4E96A8EE.4070701@behnel.de>
	<CAKGHGPTW9urWTXaJsiicUHppjON27qPAt+vjUEVEYUsYgJNpZQ@mail.gmail.com>
	<CANg26EV6tqnpLGUyhtHrtL1cfEHBGfx50nxss1PN2j_ZV5Bihg@mail.gmail.com>
	<CANg26EVqTPfqg7AVs5AQEGEsysqAcjvF94zWd-pD6CKiwVzz+g@mail.gmail.com>
	<4E96CF1C.1000400@behnel.de>
	<CANg26EVaApuuy98uhoaZhyJ0bdUcm2ijNY7FobT3c=5kmX=8gQ@mail.gmail.com>
Message-ID: <CANg26EXykLVnGNezt=qFCm6Q7ciTOew0i0tGFHhcmqQZYi_VmA@mail.gmail.com>

On 13 October 2011 12:52, mark florisson <markflorisson88 at gmail.com> wrote:
> On 13 October 2011 12:44, Stefan Behnel <stefan_ml at behnel.de> wrote:
>> mark florisson, 13.10.2011 12:18:
>>>
>>> On 13 October 2011 10:56, mark florisson wrote:
>>>>
>>>> On 13 October 2011 10:53, Vitja Makarov wrote:
>>>>>
>>>>> 2011/10/13 Stefan Behnel:
>>>>>>
>>>>>> Vitja Makarov, 13.10.2011 08:03:
>>>>>>>
>>>>>>> I found that tempita bug goes away if you change language_level to 2.
>>>>>>
>>>>>> There's no language level configured in Py2.4, which fails.
>>>>>>
>>>>>>
>>>>>> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/48/BACKEND=c,PYVERSION=py24/console
>>>>>
>>>>> No, I mean language level 3 is set at top of the Code.py, when it's
>>>>> set to 2 Py2.4 build is okay.
>>>>>
>>>> Ah, it doesn't take unicode keyword arguments. That should be fixed.
>>>
>>> Frankly, language level 3 is rather uncomfortable to deal with in
>>> python 2(.4).
>>
>> Well, without the parentheses, I presume ...
>
> Ah, it appears only 2.7 eats unicode keyword arguments. I wonder why
> the 2.5 and 2.6 builds didn't fail then.
>
>>
>>> Any reason it's set to 3?
>>
>> Mainly for performance reasons, especially in Python 2. Py3 code tends to
>> run faster in Cython due to more explicit semantics. In particular, we get
>> unicode content in and write unicode content out, so using unicode literals
>> in the source code right away saves a decoding step for each write or
>> interpolation of a literal string in Python 2. It won't make a difference
>> when running Cython in Python 3, but it saves a lot of unnecessary
>> processing cycles in Py2, even though the difference may not be substantial
>> over a complete run. It's just so convenient to switch the language level
>> and let that shave off a bunch of processing overhead that I didn't see a
>> reason not to do it.
>>
>> I doubt that it'll make a functional difference, though, so if it works
>> better without that option, we may have to go back to Py2 compilation.
>
> I see. Yeah it's sort of hard to fix, as I really need bytes in python
> 2 and really need unicode (str) in python 3, so I can neither write
> 'foo' nor b'foo' nor u'foo' with language level 3.
>
> BTW this is always a real problem in doctests too, as your bytestrings
> will suddenly be printed as b'foo' in python 3, which will fail your
> doctest. So to make it work you need to do explicit encoding/decoding
> to make it work everywhere.
>
>> Stefan
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>

Anyway, I fixed the 2.4 build and cherry-picked the cython scope
loading fix over from fused types, I'll push that to master.

From stefan_ml at behnel.de  Thu Oct 13 14:07:23 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 13 Oct 2011 14:07:23 +0200
Subject: [Cython] test failure for cython-devel in Py2.4
In-Reply-To: <CANg26EVaApuuy98uhoaZhyJ0bdUcm2ijNY7FobT3c=5kmX=8gQ@mail.gmail.com>
References: <4E930C72.8080303@behnel.de>	<CANg26EW2Xa-+Y5PCNryC6zj+9i=t872kdroaZ-ujfVBCj-RRMQ@mail.gmail.com>	<CANg26EVQ2_R8v5a2HQSxtMhKR7VtRPU3nSLrn_Pb2G0yOTGZOg@mail.gmail.com>	<CANg26EXkcHFWmr5WySyQZuoZiCX9_-eegPCe3z4req3WBY0Lsg@mail.gmail.com>	<CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>	<4E9672AE.6080905@behnel.de>	<CAKGHGPTejE7=f=O-he1Zhm3zL=X8KP0V0oKsJskqB=R2qgQ0AA@mail.gmail.com>	<4E96A8EE.4070701@behnel.de>	<CAKGHGPTW9urWTXaJsiicUHppjON27qPAt+vjUEVEYUsYgJNpZQ@mail.gmail.com>	<CANg26EV6tqnpLGUyhtHrtL1cfEHBGfx50nxss1PN2j_ZV5Bihg@mail.gmail.com>	<CANg26EVqTPfqg7AVs5AQEGEsysqAcjvF94zWd-pD6CKiwVzz+g@mail.gmail.com>	<4E96CF1C.1000400@behnel.de>
	<CANg26EVaApuuy98uhoaZhyJ0bdUcm2ijNY7FobT3c=5kmX=8gQ@mail.gmail.com>
Message-ID: <4E96D47B.808@behnel.de>

mark florisson, 13.10.2011 13:52:
> On 13 October 2011 12:44, Stefan Behnel wrote:
>> mark florisson, 13.10.2011 12:18:
>>> On 13 October 2011 10:56, mark florisson wrote:
>>>> On 13 October 2011 10:53, Vitja Makarov wrote:
>>>>> 2011/10/13 Stefan Behnel:
>>>>>> Vitja Makarov, 13.10.2011 08:03:
>>>>>>> I found that tempita bug goes away if you change language_level to 2.
>>>>>>
>>>>>> There's no language level configured in Py2.4, which fails.
>>>>>>
>>>>>>
>>>>>> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/48/BACKEND=c,PYVERSION=py24/console
>>>>>
>>>>> No, I mean language level 3 is set at top of the Code.py, when it's
>>>>> set to 2 Py2.4 build is okay.
>>>>>
>>>> Ah, it doesn't take unicode keyword arguments. That should be fixed.
>>>
>>> Frankly, language level 3 is rather uncomfortable to deal with in
>>> python 2(.4).
>>
>> Well, without the parentheses, I presume ...
>
> Ah, it appears only 2.7 eats unicode keyword arguments. I wonder why
> the 2.5 and 2.6 builds didn't fail then.

Ah, right, I remember having to work around this in the C code at some 
point. That's where the "identifier" kind of string in Cython originated from.


>>> Any reason it's set to 3?
>>
>> Mainly for performance reasons, especially in Python 2. Py3 code tends to
>> run faster in Cython due to more explicit semantics. In particular, we get
>> unicode content in and write unicode content out, so using unicode literals
>> in the source code right away saves a decoding step for each write or
>> interpolation of a literal string in Python 2. It won't make a difference
>> when running Cython in Python 3, but it saves a lot of unnecessary
>> processing cycles in Py2, even though the difference may not be substantial
>> over a complete run. It's just so convenient to switch the language level
>> and let that shave off a bunch of processing overhead that I didn't see a
>> reason not to do it.
>>
>> I doubt that it'll make a functional difference, though, so if it works
>> better without that option, we may have to go back to Py2 compilation.
>
> I see. Yeah it's sort of hard to fix, as I really need bytes in python
> 2 and really need unicode (str) in python 3, so I can neither write
> 'foo' nor b'foo' nor u'foo' with language level 3.

You can either pass the keyword arguments explicitly in the code or use 
something like dict(foo=1) to get a dict of keyword arguments (also works 
in Py2.4). Cython will turn the names into identifier strings, i.e. bytes 
in Py2 and unicode in Py3, as required for keyword arguments.


> BTW this is always a real problem in doctests too, as your bytestrings
> will suddenly be printed as b'foo' in python 3, which will fail your
> doctest. So to make it work you need to do explicit encoding/decoding
> to make it work everywhere.

I usually either wrap them in a helper function or, as you say, put a 
.decode() at the end. However, that fails to test explicitly for a byte 
string in Py2, as .decode() also tends to work for ASCII-only unicode 
strings there...

Let's hope that Py2 won't take a decade to die.

Stefan

From vitja.makarov at gmail.com  Thu Oct 13 20:33:34 2011
From: vitja.makarov at gmail.com (Vitja Makarov)
Date: Thu, 13 Oct 2011 22:33:34 +0400
Subject: [Cython] test failure for cython-devel in Py2.4
In-Reply-To: <CANg26EXykLVnGNezt=qFCm6Q7ciTOew0i0tGFHhcmqQZYi_VmA@mail.gmail.com>
References: <4E930C72.8080303@behnel.de>
	<CANg26EW2Xa-+Y5PCNryC6zj+9i=t872kdroaZ-ujfVBCj-RRMQ@mail.gmail.com>
	<CANg26EVQ2_R8v5a2HQSxtMhKR7VtRPU3nSLrn_Pb2G0yOTGZOg@mail.gmail.com>
	<CANg26EXkcHFWmr5WySyQZuoZiCX9_-eegPCe3z4req3WBY0Lsg@mail.gmail.com>
	<CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>
	<4E9672AE.6080905@behnel.de>
	<CAKGHGPTejE7=f=O-he1Zhm3zL=X8KP0V0oKsJskqB=R2qgQ0AA@mail.gmail.com>
	<4E96A8EE.4070701@behnel.de>
	<CAKGHGPTW9urWTXaJsiicUHppjON27qPAt+vjUEVEYUsYgJNpZQ@mail.gmail.com>
	<CANg26EV6tqnpLGUyhtHrtL1cfEHBGfx50nxss1PN2j_ZV5Bihg@mail.gmail.com>
	<CANg26EVqTPfqg7AVs5AQEGEsysqAcjvF94zWd-pD6CKiwVzz+g@mail.gmail.com>
	<4E96CF1C.1000400@behnel.de>
	<CANg26EVaApuuy98uhoaZhyJ0bdUcm2ijNY7FobT3c=5kmX=8gQ@mail.gmail.com>
	<CANg26EXykLVnGNezt=qFCm6Q7ciTOew0i0tGFHhcmqQZYi_VmA@mail.gmail.com>
Message-ID: <CAKGHGPQoOjRXjQEEorqTx9QOB=t5Ehh4Bao=cC7gD1WYf9GWAg@mail.gmail.com>

2011/10/13 mark florisson <markflorisson88 at gmail.com>:
> On 13 October 2011 12:52, mark florisson <markflorisson88 at gmail.com> wrote:
>> On 13 October 2011 12:44, Stefan Behnel <stefan_ml at behnel.de> wrote:
>>> mark florisson, 13.10.2011 12:18:
>>>>
>>>> On 13 October 2011 10:56, mark florisson wrote:
>>>>>
>>>>> On 13 October 2011 10:53, Vitja Makarov wrote:
>>>>>>
>>>>>> 2011/10/13 Stefan Behnel:
>>>>>>>
>>>>>>> Vitja Makarov, 13.10.2011 08:03:
>>>>>>>>
>>>>>>>> I found that tempita bug goes away if you change language_level to 2.
>>>>>>>
>>>>>>> There's no language level configured in Py2.4, which fails.
>>>>>>>
>>>>>>>
>>>>>>> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/48/BACKEND=c,PYVERSION=py24/console
>>>>>>
>>>>>> No, I mean language level 3 is set at top of the Code.py, when it's
>>>>>> set to 2 Py2.4 build is okay.
>>>>>>
>>>>> Ah, it doesn't take unicode keyword arguments. That should be fixed.
>>>>
>>>> Frankly, language level 3 is rather uncomfortable to deal with in
>>>> python 2(.4).
>>>
>>> Well, without the parentheses, I presume ...
>>
>> Ah, it appears only 2.7 eats unicode keyword arguments. I wonder why
>> the 2.5 and 2.6 builds didn't fail then.
>>
>>>
>>>> Any reason it's set to 3?
>>>
>>> Mainly for performance reasons, especially in Python 2. Py3 code tends to
>>> run faster in Cython due to more explicit semantics. In particular, we get
>>> unicode content in and write unicode content out, so using unicode literals
>>> in the source code right away saves a decoding step for each write or
>>> interpolation of a literal string in Python 2. It won't make a difference
>>> when running Cython in Python 3, but it saves a lot of unnecessary
>>> processing cycles in Py2, even though the difference may not be substantial
>>> over a complete run. It's just so convenient to switch the language level
>>> and let that shave off a bunch of processing overhead that I didn't see a
>>> reason not to do it.
>>>
>>> I doubt that it'll make a functional difference, though, so if it works
>>> better without that option, we may have to go back to Py2 compilation.
>>
>> I see. Yeah it's sort of hard to fix, as I really need bytes in python
>> 2 and really need unicode (str) in python 3, so I can neither write
>> 'foo' nor b'foo' nor u'foo' with language level 3.
>>
>> BTW this is always a real problem in doctests too, as your bytestrings
>> will suddenly be printed as b'foo' in python 3, which will fail your
>> doctest. So to make it work you need to do explicit encoding/decoding
>> to make it work everywhere.
>>
>>> Stefan
>>> _______________________________________________
>>> cython-devel mailing list
>>> cython-devel at python.org
>>> http://mail.python.org/mailman/listinfo/cython-devel
>>>
>>
>
> Anyway, I fixed the 2.4 build and cherry-picked the cython scope
> loading fix over from fused types, I'll push that to master.


Cool!

But py3k pyregr is no red due to SIGSEGV, is that python problem:

https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr/BACKEND=c,PYVERSION=py3k/26/console

-- 
vitja.

From stefan_ml at behnel.de  Thu Oct 13 21:22:33 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 13 Oct 2011 21:22:33 +0200
Subject: [Cython] test failure for cython-devel in Py2.4
In-Reply-To: <CAKGHGPQoOjRXjQEEorqTx9QOB=t5Ehh4Bao=cC7gD1WYf9GWAg@mail.gmail.com>
References: <4E930C72.8080303@behnel.de>	<CANg26EW2Xa-+Y5PCNryC6zj+9i=t872kdroaZ-ujfVBCj-RRMQ@mail.gmail.com>	<CANg26EVQ2_R8v5a2HQSxtMhKR7VtRPU3nSLrn_Pb2G0yOTGZOg@mail.gmail.com>	<CANg26EXkcHFWmr5WySyQZuoZiCX9_-eegPCe3z4req3WBY0Lsg@mail.gmail.com>	<CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>	<4E9672AE.6080905@behnel.de>	<CAKGHGPTejE7=f=O-he1Zhm3zL=X8KP0V0oKsJskqB=R2qgQ0AA@mail.gmail.com>	<4E96A8EE.4070701@behnel.de>	<CAKGHGPTW9urWTXaJsiicUHppjON27qPAt+vjUEVEYUsYgJNpZQ@mail.gmail.com>	<CANg26EV6tqnpLGUyhtHrtL1cfEHBGfx50nxss1PN2j_ZV5Bihg@mail.gmail.com>	<CANg26EVqTPfqg7AVs5AQEGEsysqAcjvF94zWd-pD6CKiwVzz+g@mail.gmail.com>	<4E96CF1C.1000400@behnel.de>	<CANg26EVaApuuy98uhoaZhyJ0bdUcm2ijNY7FobT3c=5kmX=8gQ@mail.gmail.com>	<CANg26EXykLVnGNezt=qFCm6Q7ciTOew0i0tGFHhcmqQZYi_VmA@mail.gmail.com>
	<CAKGHGPQoOjRXjQEEorqTx9QOB=t5Ehh4Bao=cC7gD1WYf9GWAg@mail.gmail.com>
Message-ID: <4E973A79.4030504@behnel.de>

Vitja Makarov, 13.10.2011 20:33:
> But py3k pyregr is no red due to SIGSEGV, is that python problem:
>
> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr/BACKEND=c,PYVERSION=py3k/26/console

Not sure, but rather likely. The PEP393 implementation is still being 
worked on. That also makes it still a moving target to which the current 
code in Cython may not fit exactly. Worth investigating. I'll see if I find 
a bit of time for updating my local py3k installation this weekend to run 
some tests with it.

Stefan

From markflorisson88 at gmail.com  Thu Oct 13 22:37:30 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Thu, 13 Oct 2011 21:37:30 +0100
Subject: [Cython] test failure for cython-devel in Py2.4
In-Reply-To: <4E973A79.4030504@behnel.de>
References: <4E930C72.8080303@behnel.de>
	<CANg26EW2Xa-+Y5PCNryC6zj+9i=t872kdroaZ-ujfVBCj-RRMQ@mail.gmail.com>
	<CANg26EVQ2_R8v5a2HQSxtMhKR7VtRPU3nSLrn_Pb2G0yOTGZOg@mail.gmail.com>
	<CANg26EXkcHFWmr5WySyQZuoZiCX9_-eegPCe3z4req3WBY0Lsg@mail.gmail.com>
	<CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>
	<4E9672AE.6080905@behnel.de>
	<CAKGHGPTejE7=f=O-he1Zhm3zL=X8KP0V0oKsJskqB=R2qgQ0AA@mail.gmail.com>
	<4E96A8EE.4070701@behnel.de>
	<CAKGHGPTW9urWTXaJsiicUHppjON27qPAt+vjUEVEYUsYgJNpZQ@mail.gmail.com>
	<CANg26EV6tqnpLGUyhtHrtL1cfEHBGfx50nxss1PN2j_ZV5Bihg@mail.gmail.com>
	<CANg26EVqTPfqg7AVs5AQEGEsysqAcjvF94zWd-pD6CKiwVzz+g@mail.gmail.com>
	<4E96CF1C.1000400@behnel.de>
	<CANg26EVaApuuy98uhoaZhyJ0bdUcm2ijNY7FobT3c=5kmX=8gQ@mail.gmail.com>
	<CANg26EXykLVnGNezt=qFCm6Q7ciTOew0i0tGFHhcmqQZYi_VmA@mail.gmail.com>
	<CAKGHGPQoOjRXjQEEorqTx9QOB=t5Ehh4Bao=cC7gD1WYf9GWAg@mail.gmail.com>
	<4E973A79.4030504@behnel.de>
Message-ID: <CANg26EVADQv0FWrBNXLQvuHXmPya29QZtVV7e+KZH=MxxCmGGA@mail.gmail.com>

On 13 October 2011 20:22, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Vitja Makarov, 13.10.2011 20:33:
>>
>> But py3k pyregr is no red due to SIGSEGV, is that python problem:
>>
>>
>> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr/BACKEND=c,PYVERSION=py3k/26/console
>
> Not sure, but rather likely. The PEP393 implementation is still being worked
> on. That also makes it still a moving target to which the current code in
> Cython may not fit exactly. Worth investigating. I'll see if I find a bit of
> time for updating my local py3k installation this weekend to run some tests
> with it.
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

BTW, the recent test changes on Jenkins really made it a lot better
than what we had before, thanks Stefan! It's now much easier to clone
and configure jobs and see when things go awry.

From stefan_ml at behnel.de  Fri Oct 14 15:02:38 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 14 Oct 2011 15:02:38 +0200
Subject: [Cython] unexpected side-effect in cython utility code
Message-ID: <4E9832EE.7010002@behnel.de>

Hi,

I started working on better malloc() support and wrote this code as a test 
to get going:

"""
cimport cython

def test_malloc(int n):
     with cython.malloc(n*sizeof(int)) as m:
         for i in range(n):
             m[i] = i
         l = [ m[i] for i in range(n) ]
     return l
"""

Now, when I compile this normally, I get a compiler error about "malloc" 
not being a cython attribute. However, when I do the same in the test 
runner, it compiles without errors and crashes when trying to run the test. 
The code it generates for the 'with' statement above starts like this:

"""
     __pyx_t_1 = PyObject_GetAttr(((PyObject *)malloc((__pyx_v_n * 
(sizeof(int))))), __pyx_n_s____exit__); /*...*/
"""

It appears that something has declared malloc(). I'm pretty sure it's this 
code in UtilityCode.py:

"""
     def declare_in_scope(self, dest_scope, used=False, cython_scope=None):
         """
         Declare all entries from the utility code in dest_scope. Code will
         only be included for used entries. If module_name is given,
         declare the type entries with that name.
         """
         tree = self.get_tree(entries_only=True, cython_scope=cython_scope)

         entries = tree.scope.entries
         entries.pop('__name__')
         entries.pop('__file__')
         entries.pop('__builtins__')
         entries.pop('__doc__')

         for name, entry in entries.iteritems():
             entry.utility_code_definition = self
             entry.used = used
"""

Basically, it declares everything it finds except for an explicit 
blacklist. Bad design. As I argued before, it should use a whitelist in the 
utility code file instead, which specifically lists the names that should 
be public. Everything else should just be considered implementation details.

Stefan

From markflorisson88 at gmail.com  Fri Oct 14 17:18:19 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Fri, 14 Oct 2011 16:18:19 +0100
Subject: [Cython] unexpected side-effect in cython utility code
In-Reply-To: <4E9832EE.7010002@behnel.de>
References: <4E9832EE.7010002@behnel.de>
Message-ID: <CANg26EWsjsCtyo4G9CpTc_SBhPGwgvjT3JsWTMo6ffk5f69Gxg@mail.gmail.com>

On 14 October 2011 14:02, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Hi,
>
> I started working on better malloc() support and wrote this code as a test
> to get going:
>
> """
> cimport cython
>
> def test_malloc(int n):
> ? ?with cython.malloc(n*sizeof(int)) as m:
> ? ? ? ?for i in range(n):
> ? ? ? ? ? ?m[i] = i
> ? ? ? ?l = [ m[i] for i in range(n) ]
> ? ?return l
> """
>
> Now, when I compile this normally, I get a compiler error about "malloc" not
> being a cython attribute. However, when I do the same in the test runner, it
> compiles without errors and crashes when trying to run the test. The code it
> generates for the 'with' statement above starts like this:
>
> """
> ? ?__pyx_t_1 = PyObject_GetAttr(((PyObject *)malloc((__pyx_v_n *
> (sizeof(int))))), __pyx_n_s____exit__); /*...*/
> """
>
> It appears that something has declared malloc(). I'm pretty sure it's this
> code in UtilityCode.py:
>
> """
> ? ?def declare_in_scope(self, dest_scope, used=False, cython_scope=None):
> ? ? ? ?"""
> ? ? ? ?Declare all entries from the utility code in dest_scope. Code will
> ? ? ? ?only be included for used entries. If module_name is given,
> ? ? ? ?declare the type entries with that name.
> ? ? ? ?"""
> ? ? ? ?tree = self.get_tree(entries_only=True, cython_scope=cython_scope)
>
> ? ? ? ?entries = tree.scope.entries
> ? ? ? ?entries.pop('__name__')
> ? ? ? ?entries.pop('__file__')
> ? ? ? ?entries.pop('__builtins__')
> ? ? ? ?entries.pop('__doc__')
>
> ? ? ? ?for name, entry in entries.iteritems():
> ? ? ? ? ? ?entry.utility_code_definition = self
> ? ? ? ? ? ?entry.used = used
> """
>
> Basically, it declares everything it finds except for an explicit blacklist.
> Bad design. As I argued before, it should use a whitelist in the utility
> code file instead, which specifically lists the names that should be public.
> Everything else should just be considered implementation details.
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

That would indeed be better, I just never got around to it and I must
admit that it was never my priority.

About cython.malloc, wouldn't it be nicer if we had automatic (stack
or heap allocated) arrays? e.g.

def func(int n):
    cdef int array[n]

I think you usually have homogenous data anyway. When you return, it
simply goes out of scope like normal automatic variables, I see no
clear advantage to 'with' here.

From markflorisson88 at gmail.com  Fri Oct 14 18:18:40 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Fri, 14 Oct 2011 17:18:40 +0100
Subject: [Cython] unexpected side-effect in cython utility code
In-Reply-To: <CANg26EWsjsCtyo4G9CpTc_SBhPGwgvjT3JsWTMo6ffk5f69Gxg@mail.gmail.com>
References: <4E9832EE.7010002@behnel.de>
	<CANg26EWsjsCtyo4G9CpTc_SBhPGwgvjT3JsWTMo6ffk5f69Gxg@mail.gmail.com>
Message-ID: <CANg26EWptCSnCBrDQPLgxgu67GnuUP16LKzJ2cOJjz8J+ra5uw@mail.gmail.com>

On 14 October 2011 16:18, mark florisson <markflorisson88 at gmail.com> wrote:
> On 14 October 2011 14:02, Stefan Behnel <stefan_ml at behnel.de> wrote:
>> Hi,
>>
>> I started working on better malloc() support and wrote this code as a test
>> to get going:
>>
>> """
>> cimport cython
>>
>> def test_malloc(int n):
>> ? ?with cython.malloc(n*sizeof(int)) as m:
>> ? ? ? ?for i in range(n):
>> ? ? ? ? ? ?m[i] = i
>> ? ? ? ?l = [ m[i] for i in range(n) ]
>> ? ?return l
>> """
>>
>> Now, when I compile this normally, I get a compiler error about "malloc" not
>> being a cython attribute. However, when I do the same in the test runner, it
>> compiles without errors and crashes when trying to run the test. The code it
>> generates for the 'with' statement above starts like this:
>>
>> """
>> ? ?__pyx_t_1 = PyObject_GetAttr(((PyObject *)malloc((__pyx_v_n *
>> (sizeof(int))))), __pyx_n_s____exit__); /*...*/
>> """
>>
>> It appears that something has declared malloc(). I'm pretty sure it's this
>> code in UtilityCode.py:
>>
>> """
>> ? ?def declare_in_scope(self, dest_scope, used=False, cython_scope=None):
>> ? ? ? ?"""
>> ? ? ? ?Declare all entries from the utility code in dest_scope. Code will
>> ? ? ? ?only be included for used entries. If module_name is given,
>> ? ? ? ?declare the type entries with that name.
>> ? ? ? ?"""
>> ? ? ? ?tree = self.get_tree(entries_only=True, cython_scope=cython_scope)
>>
>> ? ? ? ?entries = tree.scope.entries
>> ? ? ? ?entries.pop('__name__')
>> ? ? ? ?entries.pop('__file__')
>> ? ? ? ?entries.pop('__builtins__')
>> ? ? ? ?entries.pop('__doc__')
>>
>> ? ? ? ?for name, entry in entries.iteritems():
>> ? ? ? ? ? ?entry.utility_code_definition = self
>> ? ? ? ? ? ?entry.used = used
>> """
>>
>> Basically, it declares everything it finds except for an explicit blacklist.
>> Bad design. As I argued before, it should use a whitelist in the utility
>> code file instead, which specifically lists the names that should be public.
>> Everything else should just be considered implementation details.
>>
>> Stefan
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>
> That would indeed be better, I just never got around to it and I must
> admit that it was never my priority.
>
> About cython.malloc, wouldn't it be nicer if we had automatic (stack
> or heap allocated) arrays? e.g.
>
> def func(int n):
> ? ?cdef int array[n]
>
> I think you usually have homogenous data anyway. When you return, it
> simply goes out of scope like normal automatic variables, I see no
> clear advantage to 'with' here.
>

Actually these whitelists are really uncomfortable to work with (which
is one of the reasons I didn't use them). I think a decorator like
'@public' or some such would be nicer here.

From robertwb at math.washington.edu  Fri Oct 14 20:31:16 2011
From: robertwb at math.washington.edu (Robert Bradshaw)
Date: Fri, 14 Oct 2011 11:31:16 -0700
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <CANg26EVx0EZxRdC15tjKc+QVFaQjTT_yozZe47rLAQCs9aoH4Q@mail.gmail.com>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
	<CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>
	<4E955180.1070601@astro.uio.no>
	<CADiQ+QD3ic619jYEEC7mJ6U=EwWna7jOc0TDYhWDBPCfW=8kvg@mail.gmail.com>
	<CANg26EVx0EZxRdC15tjKc+QVFaQjTT_yozZe47rLAQCs9aoH4Q@mail.gmail.com>
Message-ID: <CADiQ+QDHQnvtRMWVUiPRJC+ZsWg7J_gRw0GCzfTqARPL33-92A@mail.gmail.com>

On Wed, Oct 12, 2011 at 7:55 AM, mark florisson
<markflorisson88 at gmail.com> wrote:
>>> I ultimately feel things like that is more important than 100% coverage of
>>> the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit.
>>
>> +1 Prange handles the (corse-grained) SIMD case nicely, and a
>> task/futures model based on closures would I think flesh this out to
>> the next level of generality (and complexity).
>
> Futures are definitely nice. I suppose I think really like "inline
> futures", i.e. openmp tasks. I realize that futures may look more
> pythonic. However, as mentioned previously, I also see issues with
> that. When you submit a task then you expect a future object, which
> you might want to pass around. But we don't have the GIL for that. I
> personally feel that futures is something that should be done by a
> library (such as concurrent.futures in python 3.2), and inline tasks
> by a language. It also means I have to write an entire function or
> closure for perhaps only a few lines of code.
>
> I might also want to submit other functions that are not closures, or
> I might want to reuse my closures that are used for tasks and for
> something else. So what if my tasks contain more parallel constructs?
> e.g. what if I have a task closure that I return from my function that
> generates more tasks itself? Would you just execute them sequentially
> outside of the parallel construct, or would you simply disallow that?
> Also, do you restrict future "objects" to only the parallel section?
>
> Another problem is that you can only wait on tasks of your direct
> children. So what if I get access to my parent's future object
> (assuming you allow tasks to generate tasks), and then want the result
> of my parent?
> Or what if I store these future objects in an array or list and access
> them arbitrarily? You will only know at runtime which task to wait on,
> and openmp only has a static, lexical taskwait.
>
> I suppose my point is that without either a drastic rewrite (e.g., use
> pthreads instead of openmp) or quite a bit of contraints, I am unsure
> how futures would work here. Perhaps you guys have some concrete
> syntax and semantics proposals?

It feels to me that OpenMP tasks took a different model of parallelism
and tried to force them into the OpenMP model/constraints, and so it'd
be even more difficult to fit them into a nice pythonic interface.
Perhaps to make progress on this front we need to have a concrete
example to look at. I'm also wondering if the standard threading
module (perhaps with overlay support) used with nogil functions would
be sufficient--locking is required for handling the queues, etc. so
the fact that the GIL is involved is not a big deal. It is possible
that this won't scale to as small of work units, but the overhead
should be minimal once your work unit is a sufficient size (which is
probably quite small) and it's already implemented and well
documented/used.

As for critical and barrier, the notion of a critical block as a with
statement is very useful. Creating/naming locks (rather than being
implicit on the file/line number) is more powerful, but is a larger
burden on the user and more difficult to support with the OpenMP
backend. barrier, if supported, should be a function call not a
context. Not as critical as with the tasks case, but a good example to
see how it flows would be useful here as well.

As for single, I see doing this manually does require boilerplate
locking, so what about

if cython.parallel.once():  # will return True once for a tread group.
    ...

we could implement this via our own locking/checking/flushing to allow
it to occur in arbitrary expressions, e.g.

special_worker = cython.parallel.once()
if special_worker:
   ...
[common code]
if special_worker:   # single wouldn't work here
   ...


- Robert

From markflorisson88 at gmail.com  Fri Oct 14 22:07:07 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Fri, 14 Oct 2011 21:07:07 +0100
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <CADiQ+QDHQnvtRMWVUiPRJC+ZsWg7J_gRw0GCzfTqARPL33-92A@mail.gmail.com>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
	<CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>
	<4E955180.1070601@astro.uio.no>
	<CADiQ+QD3ic619jYEEC7mJ6U=EwWna7jOc0TDYhWDBPCfW=8kvg@mail.gmail.com>
	<CANg26EVx0EZxRdC15tjKc+QVFaQjTT_yozZe47rLAQCs9aoH4Q@mail.gmail.com>
	<CADiQ+QDHQnvtRMWVUiPRJC+ZsWg7J_gRw0GCzfTqARPL33-92A@mail.gmail.com>
Message-ID: <CANg26EVvVKGmfAbOJSiOVvWC-nrrCWsiRPOhKBKEmyfFGzqkaQ@mail.gmail.com>

On 14 October 2011 19:31, Robert Bradshaw <robertwb at math.washington.edu> wrote:
> On Wed, Oct 12, 2011 at 7:55 AM, mark florisson
> <markflorisson88 at gmail.com> wrote:
>>>> I ultimately feel things like that is more important than 100% coverage of
>>>> the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit.
>>>
>>> +1 Prange handles the (corse-grained) SIMD case nicely, and a
>>> task/futures model based on closures would I think flesh this out to
>>> the next level of generality (and complexity).
>>
>> Futures are definitely nice. I suppose I think really like "inline
>> futures", i.e. openmp tasks. I realize that futures may look more
>> pythonic. However, as mentioned previously, I also see issues with
>> that. When you submit a task then you expect a future object, which
>> you might want to pass around. But we don't have the GIL for that. I
>> personally feel that futures is something that should be done by a
>> library (such as concurrent.futures in python 3.2), and inline tasks
>> by a language. It also means I have to write an entire function or
>> closure for perhaps only a few lines of code.
>>
>> I might also want to submit other functions that are not closures, or
>> I might want to reuse my closures that are used for tasks and for
>> something else. So what if my tasks contain more parallel constructs?
>> e.g. what if I have a task closure that I return from my function that
>> generates more tasks itself? Would you just execute them sequentially
>> outside of the parallel construct, or would you simply disallow that?
>> Also, do you restrict future "objects" to only the parallel section?
>>
>> Another problem is that you can only wait on tasks of your direct
>> children. So what if I get access to my parent's future object
>> (assuming you allow tasks to generate tasks), and then want the result
>> of my parent?
>> Or what if I store these future objects in an array or list and access
>> them arbitrarily? You will only know at runtime which task to wait on,
>> and openmp only has a static, lexical taskwait.
>>
>> I suppose my point is that without either a drastic rewrite (e.g., use
>> pthreads instead of openmp) or quite a bit of contraints, I am unsure
>> how futures would work here. Perhaps you guys have some concrete
>> syntax and semantics proposals?
>
> It feels to me that OpenMP tasks took a different model of parallelism
> and tried to force them into the OpenMP model/constraints, and so it'd
> be even more difficult to fit them into a nice pythonic interface.
> Perhaps to make progress on this front we need to have a concrete
> example to look at. I'm also wondering if the standard threading
> module (perhaps with overlay support) used with nogil functions would
> be sufficient--locking is required for handling the queues, etc. so
> the fact that the GIL is involved is not a big deal. It is possible
> that this won't scale to as small of work units, but the overhead
> should be minimal once your work unit is a sufficient size (which is
> probably quite small) and it's already implemented and well
> documented/used.

It's all definitely possible with normal threads, but the thing you
lose is convenience and conciseness. For big problems the programmer
might sum up the courage and effort to implement it, but typically you
will just stick to a serial version. This is really where OpenMP is
powerful, you can take a simple sequential piece of code and make it
parallel with minimal effort and without having to restructure,
rethink and rewrite your algorithms.

Something like concurrent.futures is definitely nice, but most people
cannot afford to mandate python 3.2 for their users.

The most classical examples I can think of for tasks are

1) independent code sections, i.e. two or more pieces of code that
don't depend on each other which you want to execute in parallel
2) traversal of some kind of custom data structure, like a tree or a linked list
3) some kind of other producer/consumer model

e.g. using with task syntax:

cdef postorder_traverse(tree *t): # bullet 1) and 2)
    with task:
        traverse(t.left)
    with task:
        traverse(t.right)

    taskwait() # wait until we traversed our subtrees
    use(t.data)

cdef list_traverse(linkedlist *L): # bullet 2)
    with nogil, parallel():
        if threadid() == 0:
            while L.next:
                with task:
                    do_something(L.data)

In the latter case we don't need a taskwait as we don't care about any
particular order. Only one thread generates the tasks where the others
just hit the barrier and see the tasks they can execute.

The good thing is that the OpenMP runtime can decide at task
generation point (not only at taskwait or barrier points!) decide to
stop generating more tasks and start executing them. So you won't
exhaust memory if you might have lots of tasks.

> As for critical and barrier, the notion of a critical block as a with
> statement is very useful. Creating/naming locks (rather than being
> implicit on the file/line number) is more powerful, but is a larger
> burden on the user and more difficult to support with the OpenMP
> backend.

Actually, as I mentioned before, critical sections do not at all
depend on their line or file number. All they depend on their implicit
or explicit name (the name is implicit when you simply omit it, so all
unnamed critical sections exclude each other).
Indeed, supporting creation of locks dynamically and allowing them to
be passed around arbitrarily would be hard (and likely not worth the
effort). Naming them is trivial though, which might not be incredibly
pythonic but is very convenient, easy and readable.

> barrier, if supported, should be a function call not a
> context. Not as critical as with the tasks case, but a good example to
> see how it flows would be useful here as well.

I agree, it really doesn't have any associated code and trying to
associate code with it is likely more confusing than meaningful. It
was just an idea.
Often you can rely on implicit barriers from e.g. prange, but not
always. I can't think of any real-world example, but you usually need
it to ensure that everyone gets a sane view on some shared data, e.g.

with nogil, parallel():
    array[threadid()] = func(threadid())
    barrier()
    use array[threadid() + 1 % omp_num_threads()] # access data of
some neighbour

This is a rather contrived example, but (see below) it would be
especially useful if you use single/master/once/first that sets some
shared data everyone will operate on (for instance in a prange). To
ensure the data is sane before you use it, you have to put the barrier
to 1) ensure the data has been written and 2) that the data has been
flushed.

Basically, you'll always know when you need a barrier, but it's pretty
hard to come up with a real-world example for it when you have to :)

> As for single, I see doing this manually does require boilerplate
> locking, so what about
>
> if cython.parallel.once(): ?# will return True once for a tread group.
> ? ?...
>
> we could implement this via our own locking/checking/flushing to allow
> it to occur in arbitrary expressions, e.g.
>
> special_worker = cython.parallel.once()
> if special_worker:
> ? ...
> [common code]
> if special_worker: ? # single wouldn't work here
> ? ...
>

That looks OK. I've actually been thinking that if we have barriers we
don't really need is_master(), once() or single() or anything. We
already have threadid() and you usually don't care what thread gets
there first, you only care about doing it once. So one could just
write

if parallel.threadid() == 0:
    ...

parallel.barrier() # if required

It might also be convenient to declare variables explicitly shared
here, e.g. this code will not work:

cdef int *buf

with nogil, parallel.parallel():
    if parallel.threadid() == 0:
        buf = ...

    parallel.barrier()

    # will will likely segfault, as buf is private because we assigned
to it. It's only valid in thread 0
    use buf[...]

So basically you'd have to do something like (&buf)[0][...], which
frankly looks pretty weird. However I do think such cases are rather
uncommon.

> - Robert
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From markflorisson88 at gmail.com  Fri Oct 14 22:18:14 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Fri, 14 Oct 2011 21:18:14 +0100
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <CANg26EVvVKGmfAbOJSiOVvWC-nrrCWsiRPOhKBKEmyfFGzqkaQ@mail.gmail.com>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
	<CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>
	<4E955180.1070601@astro.uio.no>
	<CADiQ+QD3ic619jYEEC7mJ6U=EwWna7jOc0TDYhWDBPCfW=8kvg@mail.gmail.com>
	<CANg26EVx0EZxRdC15tjKc+QVFaQjTT_yozZe47rLAQCs9aoH4Q@mail.gmail.com>
	<CADiQ+QDHQnvtRMWVUiPRJC+ZsWg7J_gRw0GCzfTqARPL33-92A@mail.gmail.com>
	<CANg26EVvVKGmfAbOJSiOVvWC-nrrCWsiRPOhKBKEmyfFGzqkaQ@mail.gmail.com>
Message-ID: <CANg26EXsZb034G0qtVB5FgK+j2+6dnO=GuQhrK9SLT_C2xAaWQ@mail.gmail.com>

On 14 October 2011 21:07, mark florisson <markflorisson88 at gmail.com> wrote:
> On 14 October 2011 19:31, Robert Bradshaw <robertwb at math.washington.edu> wrote:
>> On Wed, Oct 12, 2011 at 7:55 AM, mark florisson
>> <markflorisson88 at gmail.com> wrote:
>>>>> I ultimately feel things like that is more important than 100% coverage of
>>>>> the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit.
>>>>
>>>> +1 Prange handles the (corse-grained) SIMD case nicely, and a
>>>> task/futures model based on closures would I think flesh this out to
>>>> the next level of generality (and complexity).
>>>
>>> Futures are definitely nice. I suppose I think really like "inline
>>> futures", i.e. openmp tasks. I realize that futures may look more
>>> pythonic. However, as mentioned previously, I also see issues with
>>> that. When you submit a task then you expect a future object, which
>>> you might want to pass around. But we don't have the GIL for that. I
>>> personally feel that futures is something that should be done by a
>>> library (such as concurrent.futures in python 3.2), and inline tasks
>>> by a language. It also means I have to write an entire function or
>>> closure for perhaps only a few lines of code.
>>>
>>> I might also want to submit other functions that are not closures, or
>>> I might want to reuse my closures that are used for tasks and for
>>> something else. So what if my tasks contain more parallel constructs?
>>> e.g. what if I have a task closure that I return from my function that
>>> generates more tasks itself? Would you just execute them sequentially
>>> outside of the parallel construct, or would you simply disallow that?
>>> Also, do you restrict future "objects" to only the parallel section?
>>>
>>> Another problem is that you can only wait on tasks of your direct
>>> children. So what if I get access to my parent's future object
>>> (assuming you allow tasks to generate tasks), and then want the result
>>> of my parent?
>>> Or what if I store these future objects in an array or list and access
>>> them arbitrarily? You will only know at runtime which task to wait on,
>>> and openmp only has a static, lexical taskwait.
>>>
>>> I suppose my point is that without either a drastic rewrite (e.g., use
>>> pthreads instead of openmp) or quite a bit of contraints, I am unsure
>>> how futures would work here. Perhaps you guys have some concrete
>>> syntax and semantics proposals?
>>
>> It feels to me that OpenMP tasks took a different model of parallelism
>> and tried to force them into the OpenMP model/constraints, and so it'd
>> be even more difficult to fit them into a nice pythonic interface.
>> Perhaps to make progress on this front we need to have a concrete
>> example to look at. I'm also wondering if the standard threading
>> module (perhaps with overlay support) used with nogil functions would
>> be sufficient--locking is required for handling the queues, etc. so
>> the fact that the GIL is involved is not a big deal. It is possible
>> that this won't scale to as small of work units, but the overhead
>> should be minimal once your work unit is a sufficient size (which is
>> probably quite small) and it's already implemented and well
>> documented/used.
>
> It's all definitely possible with normal threads, but the thing you
> lose is convenience and conciseness. For big problems the programmer
> might sum up the courage and effort to implement it, but typically you
> will just stick to a serial version. This is really where OpenMP is
> powerful, you can take a simple sequential piece of code and make it
> parallel with minimal effort and without having to restructure,
> rethink and rewrite your algorithms.
>
> Something like concurrent.futures is definitely nice, but most people
> cannot afford to mandate python 3.2 for their users.
>
> The most classical examples I can think of for tasks are
>
> 1) independent code sections, i.e. two or more pieces of code that
> don't depend on each other which you want to execute in parallel
> 2) traversal of some kind of custom data structure, like a tree or a linked list
> 3) some kind of other producer/consumer model
>
> e.g. using with task syntax:
>
> cdef postorder_traverse(tree *t): # bullet 1) and 2)
> ? ?with task:
> ? ? ? ?traverse(t.left)
> ? ?with task:
> ? ? ? ?traverse(t.right)
>
> ? ?taskwait() # wait until we traversed our subtrees
> ? ?use(t.data)
>
> cdef list_traverse(linkedlist *L): # bullet 2)
> ? ?with nogil, parallel():
> ? ? ? ?if threadid() == 0:
> ? ? ? ? ? ?while L.next:
> ? ? ? ? ? ? ? ?with task:
> ? ? ? ? ? ? ? ? ? ?do_something(L.data)
>
> In the latter case we don't need a taskwait as we don't care about any
> particular order. Only one thread generates the tasks where the others
> just hit the barrier and see the tasks they can execute.
>
> The good thing is that the OpenMP runtime can decide at task
> generation point (not only at taskwait or barrier points!) decide to
> stop generating more tasks and start executing them. So you won't
> exhaust memory if you might have lots of tasks.
>
>> As for critical and barrier, the notion of a critical block as a with
>> statement is very useful. Creating/naming locks (rather than being
>> implicit on the file/line number) is more powerful, but is a larger
>> burden on the user and more difficult to support with the OpenMP
>> backend.
>
> Actually, as I mentioned before, critical sections do not at all
> depend on their line or file number. All they depend on their implicit
> or explicit name (the name is implicit when you simply omit it, so all
> unnamed critical sections exclude each other).
> Indeed, supporting creation of locks dynamically and allowing them to
> be passed around arbitrarily would be hard (and likely not worth the
> effort). Naming them is trivial though, which might not be incredibly
> pythonic but is very convenient, easy and readable.
>
>> barrier, if supported, should be a function call not a
>> context. Not as critical as with the tasks case, but a good example to
>> see how it flows would be useful here as well.
>
> I agree, it really doesn't have any associated code and trying to
> associate code with it is likely more confusing than meaningful. It
> was just an idea.
> Often you can rely on implicit barriers from e.g. prange, but not
> always. I can't think of any real-world example, but you usually need
> it to ensure that everyone gets a sane view on some shared data, e.g.
>
> with nogil, parallel():
> ? ?array[threadid()] = func(threadid())
> ? ?barrier()
> ? ?use array[threadid() + 1 % omp_num_threads()] # access data of
> some neighbour
>
> This is a rather contrived example, but (see below) it would be
> especially useful if you use single/master/once/first that sets some
> shared data everyone will operate on (for instance in a prange). To
> ensure the data is sane before you use it, you have to put the barrier
> to 1) ensure the data has been written and 2) that the data has been
> flushed.
>
> Basically, you'll always know when you need a barrier, but it's pretty
> hard to come up with a real-world example for it when you have to :)
>
>> As for single, I see doing this manually does require boilerplate
>> locking, so what about
>>
>> if cython.parallel.once(): ?# will return True once for a tread group.
>> ? ?...
>>
>> we could implement this via our own locking/checking/flushing to allow
>> it to occur in arbitrary expressions, e.g.
>>
>> special_worker = cython.parallel.once()
>> if special_worker:
>> ? ...
>> [common code]
>> if special_worker: ? # single wouldn't work here
>> ? ...
>>
>
> That looks OK. I've actually been thinking that if we have barriers we
> don't really need is_master(), once() or single() or anything. We
> already have threadid() and you usually don't care what thread gets
> there first, you only care about doing it once. So one could just
> write
>
> if parallel.threadid() == 0:
> ? ?...
>
> parallel.barrier() # if required
>
> It might also be convenient to declare variables explicitly shared
> here, e.g. this code will not work:
>
> cdef int *buf
>
> with nogil, parallel.parallel():
> ? ?if parallel.threadid() == 0:
> ? ? ? ?buf = ...
>
> ? ?parallel.barrier()
>
> ? ?# will will likely segfault, as buf is private because we assigned
> to it. It's only valid in thread 0
> ? ?use buf[...]
>
> So basically you'd have to do something like (&buf)[0][...], which
> frankly looks pretty weird. However I do think such cases are rather
> uncommon.
>
>> - Robert
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>

BTW, I think orphaned constructs might also really be worth our while.
Suppose you have a piece of code:

execute so many times:
    do some work we want to do in parallel
    compute something that needs to happen sequentially

And also suppose that the two things we do in the loop might be
factored out into separate functions. Now we could have a
prange()/parallel() for the parallel work, but that means we have to
start up a new parallel section every time.
If we're unlucky, the user might also innocently release the GIL as
well. There is a significant performance penalty to this, i.e. it
would be vastly more efficient to do the following:

with nogil, parallel():
    do so many times:
        my_parallel_function()

        if threadid() == 0:
            compute something that needs to happen sequentially

cdef void my_parallel_function(...):
    for i in prange(..., orphan=True): # workshare this loop with the
other threads in the team
        ...

This is not currently possible. Currently, every thread would call
my_parallel_function, and every function call would do the same
computations and not share any work. You can only currently avoid the
overhead  by writing all your code in the one function.

Another possibility for a keyword argument is 'worksharing', but that
would suggest normal prange()s don't share work.

What do you guys think, is this too confusing for people? I think this
is really reasonably common-ish situation.

From stefan_ml at behnel.de  Sat Oct 15 10:30:24 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 15 Oct 2011 10:30:24 +0200
Subject: [Cython] test failure for cython-devel in Py2.4
In-Reply-To: <4E973A79.4030504@behnel.de>
References: <4E930C72.8080303@behnel.de>	<CANg26EW2Xa-+Y5PCNryC6zj+9i=t872kdroaZ-ujfVBCj-RRMQ@mail.gmail.com>	<CANg26EVQ2_R8v5a2HQSxtMhKR7VtRPU3nSLrn_Pb2G0yOTGZOg@mail.gmail.com>	<CANg26EXkcHFWmr5WySyQZuoZiCX9_-eegPCe3z4req3WBY0Lsg@mail.gmail.com>	<CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>	<4E9672AE.6080905@behnel.de>	<CAKGHGPTejE7=f=O-he1Zhm3zL=X8KP0V0oKsJskqB=R2qgQ0AA@mail.gmail.com>	<4E96A8EE.4070701@behnel.de>	<CAKGHGPTW9urWTXaJsiicUHppjON27qPAt+vjUEVEYUsYgJNpZQ@mail.gmail.com>	<CANg26EV6tqnpLGUyhtHrtL1cfEHBGfx50nxss1PN2j_ZV5Bihg@mail.gmail.com>	<CANg26EVqTPfqg7AVs5AQEGEsysqAcjvF94zWd-pD6CKiwVzz+g@mail.gmail.com>	<4E96CF1C.1000400@behnel.de>	<CANg26EVaApuuy98uhoaZhyJ0bdUcm2ijNY7FobT3c=5kmX=8gQ@mail.gmail.com>	<CANg26EXykLVnGNezt=qFCm6Q7ciTOew0i0tGFHhcmqQZYi_VmA@mail.gmail.com>	<CAKGHGPQoOjRXjQEEorqTx9QOB=t5Ehh4Bao=cC7gD1WYf9GWAg@mail.gmail.com>
	<4E973A79.4030504@behnel.de>
Message-ID: <4E9944A0.8090409@behnel.de>

Stefan Behnel, 13.10.2011 21:22:
> Vitja Makarov, 13.10.2011 20:33:
>> But py3k pyregr is no red due to SIGSEGV, is that python problem:
>>
>> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr/BACKEND=c,PYVERSION=py3k/26/console
>
> Not sure, but rather likely. The PEP393 implementation is still being
> worked on. That also makes it still a moving target to which the current
> code in Cython may not fit exactly. Worth investigating. I'll see if I find
> a bit of time for updating my local py3k installation this weekend to run
> some tests with it.

Given that the benchmark runs with the optimised CPython build still work, 
my guess is that this is a problem only with the debug builds of CPython, 
which we use in all test runs (and for good reason...).

Stefan

From vitja.makarov at gmail.com  Sat Oct 15 11:07:35 2011
From: vitja.makarov at gmail.com (Vitja Makarov)
Date: Sat, 15 Oct 2011 13:07:35 +0400
Subject: [Cython] test failure for cython-devel in Py2.4
In-Reply-To: <4E9944A0.8090409@behnel.de>
References: <4E930C72.8080303@behnel.de>
	<CANg26EW2Xa-+Y5PCNryC6zj+9i=t872kdroaZ-ujfVBCj-RRMQ@mail.gmail.com>
	<CANg26EVQ2_R8v5a2HQSxtMhKR7VtRPU3nSLrn_Pb2G0yOTGZOg@mail.gmail.com>
	<CANg26EXkcHFWmr5WySyQZuoZiCX9_-eegPCe3z4req3WBY0Lsg@mail.gmail.com>
	<CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>
	<4E9672AE.6080905@behnel.de>
	<CAKGHGPTejE7=f=O-he1Zhm3zL=X8KP0V0oKsJskqB=R2qgQ0AA@mail.gmail.com>
	<4E96A8EE.4070701@behnel.de>
	<CAKGHGPTW9urWTXaJsiicUHppjON27qPAt+vjUEVEYUsYgJNpZQ@mail.gmail.com>
	<CANg26EV6tqnpLGUyhtHrtL1cfEHBGfx50nxss1PN2j_ZV5Bihg@mail.gmail.com>
	<CANg26EVqTPfqg7AVs5AQEGEsysqAcjvF94zWd-pD6CKiwVzz+g@mail.gmail.com>
	<4E96CF1C.1000400@behnel.de>
	<CANg26EVaApuuy98uhoaZhyJ0bdUcm2ijNY7FobT3c=5kmX=8gQ@mail.gmail.com>
	<CANg26EXykLVnGNezt=qFCm6Q7ciTOew0i0tGFHhcmqQZYi_VmA@mail.gmail.com>
	<CAKGHGPQoOjRXjQEEorqTx9QOB=t5Ehh4Bao=cC7gD1WYf9GWAg@mail.gmail.com>
	<4E973A79.4030504@behnel.de> <4E9944A0.8090409@behnel.de>
Message-ID: <CAKGHGPTXGd+xf8v_PgrZ2eOnBDqXSHzuQ6o2kuqKBOLUZOg9gQ@mail.gmail.com>

2011/10/15 Stefan Behnel <stefan_ml at behnel.de>:
> Stefan Behnel, 13.10.2011 21:22:
>>
>> Vitja Makarov, 13.10.2011 20:33:
>>>
>>> But py3k pyregr is no red due to SIGSEGV, is that python problem:
>>>
>>>
>>> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr/BACKEND=c,PYVERSION=py3k/26/console
>>
>> Not sure, but rather likely. The PEP393 implementation is still being
>> worked on. That also makes it still a moving target to which the current
>> code in Cython may not fit exactly. Worth investigating. I'll see if I
>> find
>> a bit of time for updating my local py3k installation this weekend to run
>> some tests with it.
>
> Given that the benchmark runs with the optimised CPython build still work,
> my guess is that this is a problem only with the debug builds of CPython,
> which we use in all test runs (and for good reason...).
>

It's something wrong with py2.7 pyregr build "'exec' currently
requires a target mapping (globals/locals)"
is still there after you merged my exec/eval branch. I can't reproduce
it on localhost, actually it works just fine, I tried test_binop

-- 
vitja.

From vitja.makarov at gmail.com  Sat Oct 15 11:17:12 2011
From: vitja.makarov at gmail.com (Vitja Makarov)
Date: Sat, 15 Oct 2011 13:17:12 +0400
Subject: [Cython] test failure for cython-devel in Py2.4
In-Reply-To: <CAKGHGPTXGd+xf8v_PgrZ2eOnBDqXSHzuQ6o2kuqKBOLUZOg9gQ@mail.gmail.com>
References: <4E930C72.8080303@behnel.de>
	<CANg26EW2Xa-+Y5PCNryC6zj+9i=t872kdroaZ-ujfVBCj-RRMQ@mail.gmail.com>
	<CANg26EVQ2_R8v5a2HQSxtMhKR7VtRPU3nSLrn_Pb2G0yOTGZOg@mail.gmail.com>
	<CANg26EXkcHFWmr5WySyQZuoZiCX9_-eegPCe3z4req3WBY0Lsg@mail.gmail.com>
	<CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>
	<4E9672AE.6080905@behnel.de>
	<CAKGHGPTejE7=f=O-he1Zhm3zL=X8KP0V0oKsJskqB=R2qgQ0AA@mail.gmail.com>
	<4E96A8EE.4070701@behnel.de>
	<CAKGHGPTW9urWTXaJsiicUHppjON27qPAt+vjUEVEYUsYgJNpZQ@mail.gmail.com>
	<CANg26EV6tqnpLGUyhtHrtL1cfEHBGfx50nxss1PN2j_ZV5Bihg@mail.gmail.com>
	<CANg26EVqTPfqg7AVs5AQEGEsysqAcjvF94zWd-pD6CKiwVzz+g@mail.gmail.com>
	<4E96CF1C.1000400@behnel.de>
	<CANg26EVaApuuy98uhoaZhyJ0bdUcm2ijNY7FobT3c=5kmX=8gQ@mail.gmail.com>
	<CANg26EXykLVnGNezt=qFCm6Q7ciTOew0i0tGFHhcmqQZYi_VmA@mail.gmail.com>
	<CAKGHGPQoOjRXjQEEorqTx9QOB=t5Ehh4Bao=cC7gD1WYf9GWAg@mail.gmail.com>
	<4E973A79.4030504@behnel.de> <4E9944A0.8090409@behnel.de>
	<CAKGHGPTXGd+xf8v_PgrZ2eOnBDqXSHzuQ6o2kuqKBOLUZOg9gQ@mail.gmail.com>
Message-ID: <CAKGHGPQr4gLCpxiFKUad-0TnZo8RaGOoQzVsnzjgW82KnatQrA@mail.gmail.com>

2011/10/15 Vitja Makarov <vitja.makarov at gmail.com>:
> 2011/10/15 Stefan Behnel <stefan_ml at behnel.de>:
>> Stefan Behnel, 13.10.2011 21:22:
>>>
>>> Vitja Makarov, 13.10.2011 20:33:
>>>>
>>>> But py3k pyregr is no red due to SIGSEGV, is that python problem:
>>>>
>>>>
>>>> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr/BACKEND=c,PYVERSION=py3k/26/console
>>>
>>> Not sure, but rather likely. The PEP393 implementation is still being
>>> worked on. That also makes it still a moving target to which the current
>>> code in Cython may not fit exactly. Worth investigating. I'll see if I
>>> find
>>> a bit of time for updating my local py3k installation this weekend to run
>>> some tests with it.
>>
>> Given that the benchmark runs with the optimised CPython build still work,
>> my guess is that this is a problem only with the debug builds of CPython,
>> which we use in all test runs (and for good reason...).
>>
>
> It's something wrong with py2.7 pyregr build "'exec' currently
> requires a target mapping (globals/locals)"
> is still there after you merged my exec/eval branch. I can't reproduce
> it on localhost, actually it works just fine, I tried test_binop
>

Right. That build was triggered by previous commits:
https://sage.math.washington.edu:8091/hudson/job/cython-devel-sdist/changes?from=686&to=687

Pyregr shows regression from 11876 -> 11596 for last 3 builds
https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr/BACKEND=c,PYVERSION=py27/

-- 
vitja.

From stefan_ml at behnel.de  Sat Oct 15 11:17:07 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 15 Oct 2011 11:17:07 +0200
Subject: [Cython] test failure for cython-devel in Py2.4
In-Reply-To: <CAKGHGPTXGd+xf8v_PgrZ2eOnBDqXSHzuQ6o2kuqKBOLUZOg9gQ@mail.gmail.com>
References: <4E930C72.8080303@behnel.de>	<CANg26EVQ2_R8v5a2HQSxtMhKR7VtRPU3nSLrn_Pb2G0yOTGZOg@mail.gmail.com>	<CANg26EXkcHFWmr5WySyQZuoZiCX9_-eegPCe3z4req3WBY0Lsg@mail.gmail.com>	<CANg26EU1SXT9KfXpaA2YudSktF8xat4OsdyTY5C2-Gk9MJvw=A@mail.gmail.com>	<4E9672AE.6080905@behnel.de>	<CAKGHGPTejE7=f=O-he1Zhm3zL=X8KP0V0oKsJskqB=R2qgQ0AA@mail.gmail.com>	<4E96A8EE.4070701@behnel.de>	<CAKGHGPTW9urWTXaJsiicUHppjON27qPAt+vjUEVEYUsYgJNpZQ@mail.gmail.com>	<CANg26EV6tqnpLGUyhtHrtL1cfEHBGfx50nxss1PN2j_ZV5Bihg@mail.gmail.com>	<CANg26EVqTPfqg7AVs5AQEGEsysqAcjvF94zWd-pD6CKiwVzz+g@mail.gmail.com>	<4E96CF1C.1000400@behnel.de>	<CANg26EVaApuuy98uhoaZhyJ0bdUcm2ijNY7FobT3c=5kmX=8gQ@mail.gmail.com>	<CANg26EXykLVnGNezt=qFCm6Q7ciTOew0i0tGFHhcmqQZYi_VmA@mail.gmail.com>	<CAKGHGPQoOjRXjQEEorqTx9QOB=t5Ehh4Bao=cC7gD1WYf9GWAg@mail.gmail.com>	<4E973A79.4030504@behnel.de>
	<4E9944A0.8090409@behnel.de>
	<CAKGHGPTXGd+xf8v_PgrZ2eOnBDqXSHzuQ6o2kuqKBOLUZOg9gQ@mail.gmail.com>
Message-ID: <4E994F93.9040401@behnel.de>

Vitja Makarov, 15.10.2011 11:07:
> It's something wrong with py2.7 pyregr build "'exec' currently
> requires a target mapping (globals/locals)"
> is still there after you merged my exec/eval branch. I can't reproduce
> it on localhost, actually it works just fine, I tried test_binop

The tests simply haven't run yet. You can see that from the dependencies on 
the build page, e.g.

https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr/BACKEND=c,PYVERSION=py27/33/

Oh, and it would be good if you started a new ML thread to discuss a new topic.

Stefan

From vitja.makarov at gmail.com  Sat Oct 15 11:26:26 2011
From: vitja.makarov at gmail.com (Vitja Makarov)
Date: Sat, 15 Oct 2011 13:26:26 +0400
Subject: [Cython] Pyregr regressions
Message-ID: <CAKGHGPTokM_vLapG7Go1vZsZpjZ5YnoH=+h6HGQcfpJg73XtAg@mail.gmail.com>

Hi!

Recent commits to the master introduced pyregr regressions. You can
see it here, just sort by age:

https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr/BACKEND=c,PYVERSION=py27/33/testReport/

Here is one example:
======================================================================
ERROR: runTest (__main__.CythonPyregrTestCase)
compiling (c) and running test_pipes
----------------------------------------------------------------------
Traceback (most recent call last):
  File "runtests.py", line 679, in run
    self.runCompileTest()
  File "runtests.py", line 491, in runCompileTest
    self.test_directory, self.expect_errors, self.annotate)
  File "runtests.py", line 656, in compile
    self.assertEquals(None, unexpected_error)
AssertionError: None != u"39:14: Object of type '<unspecified>' has no
attribute 'open'"


May be it's a good idea to check for pyregr regressions as well as for
regular tests failures before merging into master?

-- 
vitja.

From stefan_ml at behnel.de  Sat Oct 15 12:05:13 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 15 Oct 2011 12:05:13 +0200
Subject: [Cython] Pyregr regressions
In-Reply-To: <CAKGHGPTokM_vLapG7Go1vZsZpjZ5YnoH=+h6HGQcfpJg73XtAg@mail.gmail.com>
References: <CAKGHGPTokM_vLapG7Go1vZsZpjZ5YnoH=+h6HGQcfpJg73XtAg@mail.gmail.com>
Message-ID: <4E995AD9.6040204@behnel.de>

Vitja Makarov, 15.10.2011 11:26:
> Recent commits to the master introduced pyregr regressions. You can
> see it here, just sort by age:
>
> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr/BACKEND=c,PYVERSION=py27/33/testReport/

I fixed the ones I had introduced, thanks for noting.


> Here is one example:
> ======================================================================
> ERROR: runTest (__main__.CythonPyregrTestCase)
> compiling (c) and running test_pipes
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>    File "runtests.py", line 679, in run
>      self.runCompileTest()
>    File "runtests.py", line 491, in runCompileTest
>      self.test_directory, self.expect_errors, self.annotate)
>    File "runtests.py", line 656, in compile
>      self.assertEquals(None, unexpected_error)
> AssertionError: None != u"39:14: Object of type '<unspecified>' has no
> attribute 'open'"

Not sure where that comes from, looks like a type inference bug.


> May be it's a good idea to check for pyregr regressions as well as for
> regular tests failures before merging into master?

Well, sure. The problem is that it's much easier to see when a test turns 
from blue to yellow or red, than it is to see that a test turns from yellow 
to, well, yellow.

I agree that it's generally worth looking through the results after a push 
and especially after a merge. Jenkins quite prominently complains about 
additional test failures in the build history:

https://sage.math.washington.edu:8091/hudson/view/cython-devel/builds

Throwing an eye on that page after the build/test jobs have run should help 
in spotting most regressions.

Stefan

From vitja.makarov at gmail.com  Sat Oct 15 12:24:02 2011
From: vitja.makarov at gmail.com (Vitja Makarov)
Date: Sat, 15 Oct 2011 14:24:02 +0400
Subject: [Cython] Pyregr regressions
In-Reply-To: <4E995AD9.6040204@behnel.de>
References: <CAKGHGPTokM_vLapG7Go1vZsZpjZ5YnoH=+h6HGQcfpJg73XtAg@mail.gmail.com>
	<4E995AD9.6040204@behnel.de>
Message-ID: <CAKGHGPSxYFQYyhyqjQugpfx=u5bNzDDYxQhdpAg5MTjQNTm2ug@mail.gmail.com>

2011/10/15 Stefan Behnel <stefan_ml at behnel.de>:
> Vitja Makarov, 15.10.2011 11:26:
>>
>> Recent commits to the master introduced pyregr regressions. You can
>> see it here, just sort by age:
>>
>>
>> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr/BACKEND=c,PYVERSION=py27/33/testReport/
>
> I fixed the ones I had introduced, thanks for noting.
>
>
>> Here is one example:
>> ======================================================================
>> ERROR: runTest (__main__.CythonPyregrTestCase)
>> compiling (c) and running test_pipes
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>> ? File "runtests.py", line 679, in run
>> ? ? self.runCompileTest()
>> ? File "runtests.py", line 491, in runCompileTest
>> ? ? self.test_directory, self.expect_errors, self.annotate)
>> ? File "runtests.py", line 656, in compile
>> ? ? self.assertEquals(None, unexpected_error)
>> AssertionError: None != u"39:14: Object of type '<unspecified>' has no
>> attribute 'open'"
>
> Not sure where that comes from, looks like a type inference bug.
>

GIT bisect could help here:

Error compiling Cython file:
------------------------------------------------------------
...

    def testSimplePipe3(self):
        file(TESTFN, 'w').write('hello world #2')
        t = pipes.Template()
        t.append(s_command + ' < $IN', pipes.FILEIN_STDOUT)
        with t.open(TESTFN, 'r') as f:
             ^
------------------------------------------------------------

/home/vitja/python/2.7/lib/python2.7/test/test_pipes.py:39:14: Object
of type '<unspecified>' has no attribute 'open'
7445f6fcdf760215f0e472d79570a48e74382818 is the first bad commit
commit 7445f6fcdf760215f0e472d79570a48e74382818
Author: Stefan Behnel <scoder at users.berlios.de>
Date:   Fri Oct 14 21:25:31 2011 +0200

    support for inlining the __enter__() method call in with statements

:040000 040000 970f19cc0f9e377ccfcf6f8d154cdb21f4d86556
1e298400e802a9edaba5fee32020132e3d08056f M	Cython
:040000 040000 795b066ed29ee8cfd0680b7f504cf281ac8d8dbd
a742911fe33b085fc484431714910e3fc263eece M	tests
bisect run success


>
>> May be it's a good idea to check for pyregr regressions as well as for
>> regular tests failures before merging into master?
>
> Well, sure. The problem is that it's much easier to see when a test turns
> from blue to yellow or red, than it is to see that a test turns from yellow
> to, well, yellow.
>
> I agree that it's generally worth looking through the results after a push
> and especially after a merge. Jenkins quite prominently complains about
> additional test failures in the build history:
>
> https://sage.math.washington.edu:8091/hudson/view/cython-devel/builds
>
> Throwing an eye on that page after the build/test jobs have run should help
> in spotting most regressions.
>

Pyregr tests are very helpful to find out that something is going
wrong with your changes.

-- 
vitja.

From stefan_ml at behnel.de  Sun Oct 16 20:46:03 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 16 Oct 2011 20:46:03 +0200
Subject: [Cython] compiler performance issue for extended utility code
In-Reply-To: <CANg26EXFjJQRoWdLK2CzB3znv7de_eEfEzw6Uvn3868g5Px0Zg@mail.gmail.com>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>	<4E8C0448.6010204@behnel.de>	<CANg26EWEjJh-uuFMUaxQAOQ6bKSdmG70Y6CxrzBjOQb_XPhHxA@mail.gmail.com>	<4E8D4EDB.2090009@behnel.de>	<CANg26EVyByqdWuV-Lwksz-qbFDdDd=SZ9bS=FNhnON2ZaHMX1g@mail.gmail.com>	<CAKGHGPRt_qmyOv3yNdDweg9-4FcLwxe1JEdJ813dwekP-yinYw@mail.gmail.com>	<CANg26EXMMO059ox_+uXW4riBk2NumLGd6Uk5-LruNLoWbFNDsQ@mail.gmail.com>	<CAKGHGPS-HTbE5Mw8zCbOsbtCVH=uStu5NttJM=JGTACtpngHeQ@mail.gmail.com>	<CAKGHGPSFO3Az_KBMgvfTRQKX+_ddGPH=JPqxiEPUjocmw55FQg@mail.gmail.com>	<4E8EAD2E.8040701@behnel.de>	<CAKGHGPRWtHC5FO6uO_=EH8oQHb77VgUY8uUpzLASZhzP-G3faA@mail.gmail.com>	<CAKGHGPQV1djXKwYhcjN-y_KVp-Befp2t-Q78PSOFYXYq7hP3cw@mail.gmail.com>	<4E8FF5D6.4070104@behnel.de>	<CANg26EUGsEdfNL5+W8jhOADn=DatSyUdeBZDW8tyZQ+4Y+G3Mg@mail.gmail.com>	<CAKGHGPQCT6QfujQEj0H9x_uhNrnw9tWuQJK5th2WVFBMpiAz0g@mail.gmail.com>
	<CANg26EXFjJQRoWdLK2CzB3znv7de_eEfEzw6Uvn3868g5Px0Zg@mail.gmail.com>
Message-ID: <4E9B266B.7020008@behnel.de>

mark florisson, 08.10.2011 15:18:
> On 8 October 2011 13:10, Vitja Makarov wrote:
>> I've also noticed that some utilities are loaded unconditionally
>> perhaps it's better to introduce lazy loading.
>
> Well, they shouldn't be. If they are it's generally a bug. I noticed
> that it happens in the test runner though, although it should create a
> fresh context with freshly initialized entries.

I recently ran only the couple of with-statement related tests through 
cProfile and it told me that it had spent something like 20 seconds in 
"builtin method sub()", i.e. doing completely useless string processing, 
followed by some 3 seconds or so for the rest of the compilation and test 
execution. That doesn't sound right.

Stefan

From markflorisson88 at gmail.com  Sun Oct 16 20:51:26 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sun, 16 Oct 2011 19:51:26 +0100
Subject: [Cython] compiler performance issue for extended utility code
In-Reply-To: <4E9B266B.7020008@behnel.de>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
	<4E8C0448.6010204@behnel.de>
	<CANg26EWEjJh-uuFMUaxQAOQ6bKSdmG70Y6CxrzBjOQb_XPhHxA@mail.gmail.com>
	<4E8D4EDB.2090009@behnel.de>
	<CANg26EVyByqdWuV-Lwksz-qbFDdDd=SZ9bS=FNhnON2ZaHMX1g@mail.gmail.com>
	<CAKGHGPRt_qmyOv3yNdDweg9-4FcLwxe1JEdJ813dwekP-yinYw@mail.gmail.com>
	<CANg26EXMMO059ox_+uXW4riBk2NumLGd6Uk5-LruNLoWbFNDsQ@mail.gmail.com>
	<CAKGHGPS-HTbE5Mw8zCbOsbtCVH=uStu5NttJM=JGTACtpngHeQ@mail.gmail.com>
	<CAKGHGPSFO3Az_KBMgvfTRQKX+_ddGPH=JPqxiEPUjocmw55FQg@mail.gmail.com>
	<4E8EAD2E.8040701@behnel.de>
	<CAKGHGPRWtHC5FO6uO_=EH8oQHb77VgUY8uUpzLASZhzP-G3faA@mail.gmail.com>
	<CAKGHGPQV1djXKwYhcjN-y_KVp-Befp2t-Q78PSOFYXYq7hP3cw@mail.gmail.com>
	<4E8FF5D6.4070104@behnel.de>
	<CANg26EUGsEdfNL5+W8jhOADn=DatSyUdeBZDW8tyZQ+4Y+G3Mg@mail.gmail.com>
	<CAKGHGPQCT6QfujQEj0H9x_uhNrnw9tWuQJK5th2WVFBMpiAz0g@mail.gmail.com>
	<CANg26EXFjJQRoWdLK2CzB3znv7de_eEfEzw6Uvn3868g5Px0Zg@mail.gmail.com>
	<4E9B266B.7020008@behnel.de>
Message-ID: <CANg26EXTCWF_R8dL--Ukq9NvuqgrYAyjvaUqfJoD8mepxo8Lnw@mail.gmail.com>

Tempita uses re.sub to do the parsing. Most utilities are loaded at
module-level, so perhaps we should use lazy loading like Vitja
suggested. Are the cythonscope utilities loaded?

On 16 October 2011 19:46, Stefan Behnel <stefan_ml at behnel.de> wrote:
> mark florisson, 08.10.2011 15:18:
>>
>> On 8 October 2011 13:10, Vitja Makarov wrote:
>>>
>>> I've also noticed that some utilities are loaded unconditionally
>>> perhaps it's better to introduce lazy loading.
>>
>> Well, they shouldn't be. If they are it's generally a bug. I noticed
>> that it happens in the test runner though, although it should create a
>> fresh context with freshly initialized entries.
>
> I recently ran only the couple of with-statement related tests through
> cProfile and it told me that it had spent something like 20 seconds in
> "builtin method sub()", i.e. doing completely useless string processing,
> followed by some 3 seconds or so for the rest of the compilation and test
> execution. That doesn't sound right.
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From markflorisson88 at gmail.com  Sun Oct 16 20:58:40 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sun, 16 Oct 2011 19:58:40 +0100
Subject: [Cython] compiler performance issue for extended utility code
In-Reply-To: <CANg26EXTCWF_R8dL--Ukq9NvuqgrYAyjvaUqfJoD8mepxo8Lnw@mail.gmail.com>
References: <CANg26EWb2KmJ7q3ph0A3F78Wa9yD7eNmPL0ic9YnvpPgGHYMsw@mail.gmail.com>
	<4E8C0448.6010204@behnel.de>
	<CANg26EWEjJh-uuFMUaxQAOQ6bKSdmG70Y6CxrzBjOQb_XPhHxA@mail.gmail.com>
	<4E8D4EDB.2090009@behnel.de>
	<CANg26EVyByqdWuV-Lwksz-qbFDdDd=SZ9bS=FNhnON2ZaHMX1g@mail.gmail.com>
	<CAKGHGPRt_qmyOv3yNdDweg9-4FcLwxe1JEdJ813dwekP-yinYw@mail.gmail.com>
	<CANg26EXMMO059ox_+uXW4riBk2NumLGd6Uk5-LruNLoWbFNDsQ@mail.gmail.com>
	<CAKGHGPS-HTbE5Mw8zCbOsbtCVH=uStu5NttJM=JGTACtpngHeQ@mail.gmail.com>
	<CAKGHGPSFO3Az_KBMgvfTRQKX+_ddGPH=JPqxiEPUjocmw55FQg@mail.gmail.com>
	<4E8EAD2E.8040701@behnel.de>
	<CAKGHGPRWtHC5FO6uO_=EH8oQHb77VgUY8uUpzLASZhzP-G3faA@mail.gmail.com>
	<CAKGHGPQV1djXKwYhcjN-y_KVp-Befp2t-Q78PSOFYXYq7hP3cw@mail.gmail.com>
	<4E8FF5D6.4070104@behnel.de>
	<CANg26EUGsEdfNL5+W8jhOADn=DatSyUdeBZDW8tyZQ+4Y+G3Mg@mail.gmail.com>
	<CAKGHGPQCT6QfujQEj0H9x_uhNrnw9tWuQJK5th2WVFBMpiAz0g@mail.gmail.com>
	<CANg26EXFjJQRoWdLK2CzB3znv7de_eEfEzw6Uvn3868g5Px0Zg@mail.gmail.com>
	<4E9B266B.7020008@behnel.de>
	<CANg26EXTCWF_R8dL--Ukq9NvuqgrYAyjvaUqfJoD8mepxo8Lnw@mail.gmail.com>
Message-ID: <CANg26EXNb=rcZXia_rDYx6-xnP55Cp33Qgrc_NaqrAcvZh484g@mail.gmail.com>

On 16 October 2011 19:51, mark florisson <markflorisson88 at gmail.com> wrote:
> Tempita uses re.sub to do the parsing. Most utilities are loaded at
> module-level, so perhaps we should use lazy loading like Vitja
> suggested. Are the cythonscope utilities loaded?
>
> On 16 October 2011 19:46, Stefan Behnel <stefan_ml at behnel.de> wrote:
>> mark florisson, 08.10.2011 15:18:
>>>
>>> On 8 October 2011 13:10, Vitja Makarov wrote:
>>>>
>>>> I've also noticed that some utilities are loaded unconditionally
>>>> perhaps it's better to introduce lazy loading.
>>>
>>> Well, they shouldn't be. If they are it's generally a bug. I noticed
>>> that it happens in the test runner though, although it should create a
>>> fresh context with freshly initialized entries.
>>
>> I recently ran only the couple of with-statement related tests through
>> cProfile and it told me that it had spent something like 20 seconds in
>> "builtin method sub()", i.e. doing completely useless string processing,
>> followed by some 3 seconds or so for the rest of the compilation and test
>> execution. That doesn't sound right.
>>
>> Stefan
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>

Sorry for the previous accidental top-post.

Most of these problems will go away if we get a libcython module and a
cython.h header. In the meantime we could do the lazy stuff, it
shouldn't be hard to implement. Maybe load it when any of the
attributes get accessed and just wrap it.

From stefan_ml at behnel.de  Tue Oct 18 10:06:14 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 18 Oct 2011 10:06:14 +0200
Subject: [Cython] Cython-ctypes branch
Message-ID: <4E9D3376.105@behnel.de>

Hi Romain,

I know your branch isn't "ready" in the sense that it's useful for the real 
world, but I'd like to find a way to get it merged, and to find a time 
frame for that. Otherwise, it will just bit-rot, which is certainly not 
what anyone wants. How would you judge your availability for this in the 
near future? I hope you're interested. :)

For those who didn't follow the project, the branch lives here:

https://github.com/hardshooter/CythonCTypesBackend

The first thing that (IMHO, let's see if the others agree) needs to happen 
is that you should try to rebase it on the latest master branch. There were 
changes in the meantime that will not make this go clean. For example, the 
pipeline code was factored out of Main.py into a separate module 
Pipeline.py, so you will have to migrate your pipeline changes manually. 
That shouldn't be too hard, though, and it's the only major conflict that I 
currently anticipate.

There's a test runner change in the master branch that will allow you to 
select the tested backends with a positive list, i.e. as in

     runtests.py --backends=c,cpp

You'd want to add the ctypes backend here. The "--no-cpp" etc. set of 
switches become very unwieldy as new backends are added.

You will also notice that Cython gained a couple of new features and syntax 
since you started, specifically fused types, an extended array syntax for 
memoryviews and parallel OpenMP loops. I'm not sure how (or even if) they 
will translate to the Python backend. I think all of them will need a 
dedicated implementation in some way, which is very unfortunate. But I 
don't think that has to bother us for the moment.

I recreated the Jenkins build and test jobs for your branch:

https://sage.math.washington.edu:8091/hudson/view/dev-romain/

There's currently a unit test failure in the build job that keeps me from 
trying the subsequent test runs. It looks trivial, though, so if you could 
push a fix, I can make sure the build and test jobs work as expected. That 
will give us an idea about the current status of your code.

I also noticed that the ctypes_configure script is not Py3 clean, so we 
can't currently test your code on that platform. 2to3 may be able to do the 
job, but the package needs fixing upstream.

Stefan

From markflorisson88 at gmail.com  Tue Oct 18 18:50:09 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Tue, 18 Oct 2011 17:50:09 +0100
Subject: [Cython] Cython-ctypes branch
In-Reply-To: <4E9D3376.105@behnel.de>
References: <4E9D3376.105@behnel.de>
Message-ID: <CANg26EWuxGzASNgev+jg-kW-68_d-kK+gbScOY_aZqmVGPvmKg@mail.gmail.com>

On 18 October 2011 09:06, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Hi Romain,
>
> I know your branch isn't "ready" in the sense that it's useful for the real
> world, but I'd like to find a way to get it merged, and to find a time frame
> for that. Otherwise, it will just bit-rot, which is certainly not what
> anyone wants.

I think you're more concerned about Cython playing a role in numpypy
than in bit-rot :) I certainly agree though, it would be great to have
some decent functionality in, if said functionality actually covers a
large subset of the Cython language, otherwise users might be tempted
to restrict themselves to certain functionality only.

> How would you judge your availability for this in the near
> future? I hope you're interested. :)
>
> For those who didn't follow the project, the branch lives here:
>
> https://github.com/hardshooter/CythonCTypesBackend
>
> The first thing that (IMHO, let's see if the others agree) needs to happen
> is that you should try to rebase it on the latest master branch. There were
> changes in the meantime that will not make this go clean. For example, the
> pipeline code was factored out of Main.py into a separate module
> Pipeline.py, so you will have to migrate your pipeline changes manually.
> That shouldn't be too hard, though, and it's the only major conflict that I
> currently anticipate.
>
> There's a test runner change in the master branch that will allow you to
> select the tested backends with a positive list, i.e. as in
>
> ? ?runtests.py --backends=c,cpp
>
> You'd want to add the ctypes backend here. The "--no-cpp" etc. set of
> switches become very unwieldy as new backends are added.
>
> You will also notice that Cython gained a couple of new features and syntax
> since you started, specifically fused types, an extended array syntax for
> memoryviews and parallel OpenMP loops. I'm not sure how (or even if) they
> will translate to the Python backend. I think all of them will need a
> dedicated implementation in some way, which is very unfortunate. But I don't
> think that has to bother us for the moment.

For OpenMP you might not actually need to do anything at all, it
should already be supported in pure mode. Fused types and memoryviews
are harder (as is the older buffer support). I'm not even sure if/how
pypy's buffer support works.

There is also support for pure-mode fused types, but to a very limited
extend, i.e. you can do cython.fused_type(my-type-list) to create a
fused type, but you don't actually generate any actual specializations
unless you compile it with Cython.

As for actual fused types support, I think you can replace-and-wrap
fused functions at runtime with an instance of a generated
FusedFunction class that is indexable and callable and does the
necessary instance checks (in case of 'def' or 'cpdef'). You will also
know at compile-time whether a certain cdef or cpdef call is valid,
and you can basically do the same trick as we do in C: generate
multiple specializations and choose which one to call. But seeing that
compile time checks will ensure that you can only do an intersection
of all possible operations, I think you might only need to do this for
the case where you may either get a ctypes object or a normal python
object (if you make weird combinations of types to fuse).

In any case, I agree that leaving buffers and fused types for now in
the ctypes support is probably the best idea.

I'm not really familiar with RPython, but would it in any way be
feasible to have Cython generate RPython code? That may just make
things easier and more efficient to implement.

> I recreated the Jenkins build and test jobs for your branch:
>
> https://sage.math.washington.edu:8091/hudson/view/dev-romain/
>
> There's currently a unit test failure in the build job that keeps me from
> trying the subsequent test runs. It looks trivial, though, so if you could
> push a fix, I can make sure the build and test jobs work as expected. That
> will give us an idea about the current status of your code.
>
> I also noticed that the ctypes_configure script is not Py3 clean, so we
> can't currently test your code on that platform. 2to3 may be able to do the
> job, but the package needs fixing upstream.
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From markflorisson88 at gmail.com  Tue Oct 18 18:56:23 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Tue, 18 Oct 2011 17:56:23 +0100
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <4E955CDD.8060203@astro.uio.no>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
	<CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>
	<4E955180.1070601@astro.uio.no>
	<CADiQ+QD3ic619jYEEC7mJ6U=EwWna7jOc0TDYhWDBPCfW=8kvg@mail.gmail.com>
	<4E955CDD.8060203@astro.uio.no>
Message-ID: <CANg26EUhoY_r_3-LHfU7AHx-i4NT224R42SysJTG67E94kn6Fg@mail.gmail.com>

On 12 October 2011 10:24, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 10/12/2011 11:08 AM, Robert Bradshaw wrote:
>>
>> On Wed, Oct 12, 2011 at 1:36 AM, Dag Sverre Seljebotn
>>>
>>> I wouldn't resist a builtin "channel" type in Cython (since we don't have
>>> full templating/generics, it would be the only way of sending typed data
>>> conveniently?).
>>
>> zeromq seems to be a nice level of abstraction--we could probably get
>> far with a zeromq "overlay" module that didn't require the GIL. Or is
>> the C API easy enough to use if we could provide convenient mechanisms
>> to initialize the tasks/threads. I think perhaps the communication
>> model could be solved by a library more easily than the treading
>> model.
>
> Ah, zeromq even has an in-process transport, so should work nicely for
> multithreading as well.
>
> The main problem is that I'd like something like
>
> ctypedef struct Msg:
> ? ?int what
> ? ?double when
>
> cdef Msg msg
> cdef channel[Msg] mychan = channel[msg](blocking=True, in_process=True)
> with cython.parallel:
> ? ?...
> ? ?if is_master():
> ? ? ? ?mychan.send(what=1, when=2.3)
> ? ?else:
> ? ? ? ?msg = mychan.recv()
>
>
> Which one can't really do without either builtin support or templating
> support. One *could* implement it in C++...
>
> C-level API just sends char* around, e.g.,
>
> int zmq_msg_init_data (zmq_msg_t *msg, void *data, size_t size, zmq_free_fn
> *ffn, void *hint);

Actually I think fused types may be able to help here as well. E.g.
you could specify 'send' and 'recv' as cdef methods that based on the
type they get pack their data in a certain way (if you don't want
to/cannot go for the char * + sizeof(MyType) way). This means you have
to do have a branch in send and recv for every type you're going to
use, though, but it might still be more convenient than writing
different functions for every different type to pack your data.

> Dag Sverre
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From stefan_ml at behnel.de  Tue Oct 18 20:18:46 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 18 Oct 2011 20:18:46 +0200
Subject: [Cython] Cython-ctypes branch
In-Reply-To: <CANg26EWuxGzASNgev+jg-kW-68_d-kK+gbScOY_aZqmVGPvmKg@mail.gmail.com>
References: <4E9D3376.105@behnel.de>
	<CANg26EWuxGzASNgev+jg-kW-68_d-kK+gbScOY_aZqmVGPvmKg@mail.gmail.com>
Message-ID: <4E9DC306.2030202@behnel.de>

mark florisson, 18.10.2011 18:50:
> On 18 October 2011 09:06, Stefan Behnel wrote:
>> I know your branch isn't "ready" in the sense that it's useful for the real
>> world, but I'd like to find a way to get it merged, and to find a time frame
>> for that. Otherwise, it will just bit-rot, which is certainly not what
>> anyone wants.
>
> I think you're more concerned about Cython playing a role in numpypy
> than in bit-rot :)

Both, in a way. But I think I'm really more concerned about the code dying 
from branch divergence. It's a really nice feature that's worth keeping and 
growing.

Obviously, it also means it'll be more work for us to add language 
features, as we have an additional backend to support. But it's also a good 
proof that we're prepared to deal with that, and may eventually lead to 
more backends being added.


> I certainly agree though, it would be great to have
> some decent functionality in, if said functionality actually covers a
> large subset of the Cython language, otherwise users might be tempted
> to restrict themselves to certain functionality only.

Well, it's a pretty experimental feature. Users should expect it to be 
limited in functionality and have bugs. If they want to use it, they'll 
have to accept some drawbacks for the time being. I think that's fine.


>> You will also notice that Cython gained a couple of new features and syntax
>> since you started, specifically fused types, an extended array syntax for
>> memoryviews and parallel OpenMP loops. I'm not sure how (or even if) they
>> will translate to the Python backend. I think all of them will need a
>> dedicated implementation in some way, which is very unfortunate. But I don't
>> think that has to bother us for the moment.
>
> For OpenMP you might not actually need to do anything at all, it
> should already be supported in pure mode. Fused types and memoryviews
> are harder (as is the older buffer support). I'm not even sure if/how
> pypy's buffer support works.

I have no idea. It doesn't even have to have such a feature. It's not 
required for language compliance, for one.


> There is also support for pure-mode fused types, but to a very limited
> extend, i.e. you can do cython.fused_type(my-type-list) to create a
> fused type, but you don't actually generate any actual specializations
> unless you compile it with Cython.
>
> As for actual fused types support, I think you can replace-and-wrap
> fused functions at runtime with an instance of a generated
> FusedFunction class that is indexable and callable and does the
> necessary instance checks (in case of 'def' or 'cpdef'). You will also
> know at compile-time whether a certain cdef or cpdef call is valid,
> and you can basically do the same trick as we do in C: generate
> multiple specializations and choose which one to call. But seeing that
> compile time checks will ensure that you can only do an intersection
> of all possible operations, I think you might only need to do this for
> the case where you may either get a ctypes object or a normal python
> object (if you make weird combinations of types to fuse).

Yes, I was expecting problems when using C types. However, just because you 
pass different ctypes wrapped C values into generic code doesn't mean you 
have to split the function. It depends on what the function actually does, 
i.e. if the different types lead to different *Python* code. However, I 
guess that will quickly be the case as soon as you use typed variables in 
side of the function that need a ctypes initialisation in the generated code.


> In any case, I agree that leaving buffers and fused types for now in
> the ctypes support is probably the best idea.

Certainly helps in getting this out of the door.


> I'm not really familiar with RPython, but would it in any way be
> feasible to have Cython generate RPython code? That may just make
> things easier and more efficient to implement.

I never used RPython either, but my guess is that this would be quite 
involved. You'd loose the more or less 1:1 mapping from Cython code to 
Python code and would have to replace some constructs or code patterns by 
different Python code.

Anyway, I think PyPy can optimise Python code just fine.

Stefan

From romain.py at gmail.com  Tue Oct 18 20:43:29 2011
From: romain.py at gmail.com (Romain Guillebert)
Date: Tue, 18 Oct 2011 20:43:29 +0200
Subject: [Cython] Cython-ctypes branch
In-Reply-To: <4E9DC306.2030202@behnel.de>
References: <4E9D3376.105@behnel.de>
	<CANg26EWuxGzASNgev+jg-kW-68_d-kK+gbScOY_aZqmVGPvmKg@mail.gmail.com>
	<4E9DC306.2030202@behnel.de>
Message-ID: <20111018184329.GA16314@hardshooter>

Hi

I'll try to do that this week, I agree that it's better to get this
branch merged.

Rpython isn't suitable at all for this kind of use case because you have
to recompile the entire PyPy executable each time you change a library
(long compile time and big memory consumption), loading modules is not
trivial, the entire program must be type-inferable (which probably isn't
the case of most Cython programs), global variables are considered
constants, and I think (don't quote me) that the JIT doesn't work on
rpython code.

I have no idea on the speedup/slowdown though.

Cheers
Romain

From stefan_ml at behnel.de  Tue Oct 18 21:10:18 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 18 Oct 2011 21:10:18 +0200
Subject: [Cython] Cython-ctypes branch
In-Reply-To: <20111018184329.GA16314@hardshooter>
References: <4E9D3376.105@behnel.de>
	<CANg26EWuxGzASNgev+jg-kW-68_d-kK+gbScOY_aZqmVGPvmKg@mail.gmail.com>
	<4E9DC306.2030202@behnel.de> <20111018184329.GA16314@hardshooter>
Message-ID: <4E9DCF1A.1060303@behnel.de>

Romain Guillebert, 18.10.2011 20:43:
> I'll try to do that this week, I agree that it's better to get this
> branch merged.

Cool.


> Rpython isn't suitable at all for this kind of use case because you have
> to recompile the entire PyPy executable each time you change a library
> (long compile time and big memory consumption), loading modules is not
> trivial, the entire program must be type-inferable (which probably isn't
> the case of most Cython programs), global variables are considered
> constants,

Yes, that's about the kind of hassle that I expected. I heard a couple of 
PyPy developers report that it's not really fun to write code in RPython.


> and I think (don't quote me) that the JIT doesn't work on
> rpython code.

Don't quote me either, but I think RPython basically *is* the JIT.


> I have no idea on the speedup/slowdown though.

That's secondary at best. The most important thing is to get it working, so 
that users can start to test their code against it. If it works out, it'll 
be a huge feature to be able to actually write code that is fast in both 
CPython and PyPy *and* that connects to external C code. Making it fast in 
PyPy is then up to them.

Stefan

From markflorisson88 at gmail.com  Tue Oct 18 23:34:54 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Tue, 18 Oct 2011 22:34:54 +0100
Subject: [Cython] SIMD
Message-ID: <CANg26EVgs=K8uuxiG6vc233ERgUB0XU8aXY8w2q4nYKzS4FjoA@mail.gmail.com>

I'm copy/pasting this message to the ML with regard to previous
discussion on cython-users and auto-vectorization (apparently my
forwarded mail got rejected).

Perhaps an approach as listed below would be easier than to generate
Fortran (and deal with the pain of linking with it, distutils
compatibility, forcing the user to install a fortran compiler etc).

------------ Forwarded Message Below ------------
Hello,

With regards to the discussion on the Cython mail listing regarding
SSE and vectorizing I have a unfinished project which might be of
interest. The project wraps the Orc compiler (
http://code.entropywave.com/projects/orc/ )
which is a simplified assembly language to create cross platform
thight loop code utilizing SMID architectures.

With some simple test code for sin function approximation i get a
speedup of 10x the
corresponding numpy functions (Single threaded).  By utilizing openmp it is
possible to extend this to multiple threads and gain further speedups.

The code is currently just a proof of concept and feel free to adopt and
extend this code if wanted.

Best regards
Runar Tenfjord

From robertwb at math.washington.edu  Wed Oct 19 06:26:33 2011
From: robertwb at math.washington.edu (Robert Bradshaw)
Date: Tue, 18 Oct 2011 21:26:33 -0700
Subject: [Cython] Cython-ctypes branch
In-Reply-To: <4E9DCF1A.1060303@behnel.de>
References: <4E9D3376.105@behnel.de>
	<CANg26EWuxGzASNgev+jg-kW-68_d-kK+gbScOY_aZqmVGPvmKg@mail.gmail.com>
	<4E9DC306.2030202@behnel.de> <20111018184329.GA16314@hardshooter>
	<4E9DCF1A.1060303@behnel.de>
Message-ID: <CADiQ+QAfU9gVNCi70JLnDtFc6-qk8abzfb-_5kvV-U34Xp_1KQ@mail.gmail.com>

On Tue, Oct 18, 2011 at 12:10 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Romain Guillebert, 18.10.2011 20:43:
>>
>> I'll try to do that this week, I agree that it's better to get this
>> branch merged.
>
> Cool.

Thanks!

>> Rpython isn't suitable at all for this kind of use case because you have
>> to recompile the entire PyPy executable each time you change a library
>> (long compile time and big memory consumption), loading modules is not
>> trivial, the entire program must be type-inferable (which probably isn't
>> the case of most Cython programs), global variables are considered
>> constants,
>
> Yes, that's about the kind of hassle that I expected. I heard a couple of
> PyPy developers report that it's not really fun to write code in RPython.
>
>
>> and I think (don't quote me) that the JIT doesn't work on
>> rpython code.
>
> Don't quote me either, but I think RPython basically *is* the JIT.
>
>
>> I have no idea on the speedup/slowdown though.
>
> That's secondary at best. The most important thing is to get it working, so
> that users can start to test their code against it. If it works out, it'll
> be a huge feature to be able to actually write code that is fast in both
> CPython and PyPy *and* that connects to external C code. Making it fast in
> PyPy is then up to them.

+1. The primary benefit I see is being able to write modules that
interact with external C code that works in both CPython and PyPy
which would be a boon to both our communities. The fact that we're
both interested in speed is not as big of a deal on this front, and I
have no concerns that it will work itself out.

- Robert

From robertwb at math.washington.edu  Wed Oct 19 07:01:43 2011
From: robertwb at math.washington.edu (Robert Bradshaw)
Date: Tue, 18 Oct 2011 22:01:43 -0700
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <CANg26EVvVKGmfAbOJSiOVvWC-nrrCWsiRPOhKBKEmyfFGzqkaQ@mail.gmail.com>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
	<CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>
	<4E955180.1070601@astro.uio.no>
	<CADiQ+QD3ic619jYEEC7mJ6U=EwWna7jOc0TDYhWDBPCfW=8kvg@mail.gmail.com>
	<CANg26EVx0EZxRdC15tjKc+QVFaQjTT_yozZe47rLAQCs9aoH4Q@mail.gmail.com>
	<CADiQ+QDHQnvtRMWVUiPRJC+ZsWg7J_gRw0GCzfTqARPL33-92A@mail.gmail.com>
	<CANg26EVvVKGmfAbOJSiOVvWC-nrrCWsiRPOhKBKEmyfFGzqkaQ@mail.gmail.com>
Message-ID: <CADiQ+QC+8Rz2LyjM-BT8jp3kDba5JX0u+rV3xwbMmcOneCa4rA@mail.gmail.com>

On Fri, Oct 14, 2011 at 1:07 PM, mark florisson
<markflorisson88 at gmail.com> wrote:
> On 14 October 2011 19:31, Robert Bradshaw <robertwb at math.washington.edu> wrote:
>> On Wed, Oct 12, 2011 at 7:55 AM, mark florisson
>> <markflorisson88 at gmail.com> wrote:
>>>>> I ultimately feel things like that is more important than 100% coverage of
>>>>> the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit.
>>>>
>>>> +1 Prange handles the (corse-grained) SIMD case nicely, and a
>>>> task/futures model based on closures would I think flesh this out to
>>>> the next level of generality (and complexity).
>>>
>>> Futures are definitely nice. I suppose I think really like "inline
>>> futures", i.e. openmp tasks. I realize that futures may look more
>>> pythonic. However, as mentioned previously, I also see issues with
>>> that. When you submit a task then you expect a future object, which
>>> you might want to pass around. But we don't have the GIL for that. I
>>> personally feel that futures is something that should be done by a
>>> library (such as concurrent.futures in python 3.2), and inline tasks
>>> by a language. It also means I have to write an entire function or
>>> closure for perhaps only a few lines of code.
>>>
>>> I might also want to submit other functions that are not closures, or
>>> I might want to reuse my closures that are used for tasks and for
>>> something else. So what if my tasks contain more parallel constructs?
>>> e.g. what if I have a task closure that I return from my function that
>>> generates more tasks itself? Would you just execute them sequentially
>>> outside of the parallel construct, or would you simply disallow that?
>>> Also, do you restrict future "objects" to only the parallel section?
>>>
>>> Another problem is that you can only wait on tasks of your direct
>>> children. So what if I get access to my parent's future object
>>> (assuming you allow tasks to generate tasks), and then want the result
>>> of my parent?
>>> Or what if I store these future objects in an array or list and access
>>> them arbitrarily? You will only know at runtime which task to wait on,
>>> and openmp only has a static, lexical taskwait.
>>>
>>> I suppose my point is that without either a drastic rewrite (e.g., use
>>> pthreads instead of openmp) or quite a bit of contraints, I am unsure
>>> how futures would work here. Perhaps you guys have some concrete
>>> syntax and semantics proposals?
>>
>> It feels to me that OpenMP tasks took a different model of parallelism
>> and tried to force them into the OpenMP model/constraints, and so it'd
>> be even more difficult to fit them into a nice pythonic interface.
>> Perhaps to make progress on this front we need to have a concrete
>> example to look at. I'm also wondering if the standard threading
>> module (perhaps with overlay support) used with nogil functions would
>> be sufficient--locking is required for handling the queues, etc. so
>> the fact that the GIL is involved is not a big deal. It is possible
>> that this won't scale to as small of work units, but the overhead
>> should be minimal once your work unit is a sufficient size (which is
>> probably quite small) and it's already implemented and well
>> documented/used.
>
> It's all definitely possible with normal threads, but the thing you
> lose is convenience and conciseness. For big problems the programmer
> might sum up the courage and effort to implement it, but typically you
> will just stick to a serial version. This is really where OpenMP is
> powerful, you can take a simple sequential piece of code and make it
> parallel with minimal effort and without having to restructure,
> rethink and rewrite your algorithms.

That is a very good point.

> Something like concurrent.futures is definitely nice, but most people
> cannot afford to mandate python 3.2 for their users.
>
> The most classical examples I can think of for tasks are
>
> 1) independent code sections, i.e. two or more pieces of code that
> don't depend on each other which you want to execute in parallel
> 2) traversal of some kind of custom data structure, like a tree or a linked list
> 3) some kind of other producer/consumer model
>
> e.g. using with task syntax:
>
> cdef postorder_traverse(tree *t): # bullet 1) and 2)
> ? ?with task:
> ? ? ? ?traverse(t.left)
> ? ?with task:
> ? ? ? ?traverse(t.right)
>
> ? ?taskwait() # wait until we traversed our subtrees
> ? ?use(t.data)

Is there an implicit parallel block here? Perhaps in the caller?

> cdef list_traverse(linkedlist *L): # bullet 2)
> ? ?with nogil, parallel():
> ? ? ? ?if threadid() == 0:
> ? ? ? ? ? ?while L.next:
> ? ? ? ? ? ? ? ?with task:
> ? ? ? ? ? ? ? ? ? ?do_something(L.data)
>
> In the latter case we don't need a taskwait as we don't care about any
> particular order. Only one thread generates the tasks where the others
> just hit the barrier and see the tasks they can execute.

I guess it's the fact that Python doesn't have a nice syntax for
anonymous functions or blocks does make this syntax more appealing
than an explicit closure.

Perhaps if we came up with a more pythonic/natural name which would
make the intent clear. Makes me want to do something like

pool = ThreadPool(10)
for item in L:
    with pool:
        process(item)

but then you get into issues of passing the pool around. OpenMP has
the implicit pool of the nesting parallel block, so "with one thread"
or "with cython.parallel.pool" or something like that might be more
readable.

> The good thing is that the OpenMP runtime can decide at task
> generation point (not only at taskwait or barrier points!) decide to
> stop generating more tasks and start executing them. So you won't
> exhaust memory if you might have lots of tasks.

Often threadpools have queues that block when their buffer gets full
to achieve the same goal.

>> As for critical and barrier, the notion of a critical block as a with
>> statement is very useful. Creating/naming locks (rather than being
>> implicit on the file/line number) is more powerful, but is a larger
>> burden on the user and more difficult to support with the OpenMP
>> backend.
>
> Actually, as I mentioned before, critical sections do not at all
> depend on their line or file number. All they depend on their implicit
> or explicit name (the name is implicit when you simply omit it, so all
> unnamed critical sections exclude each other).

Ah, yes. In this case "with cython.parallel.lock([optional name])"
could be obvious enough.

> Indeed, supporting creation of locks dynamically and allowing them to
> be passed around arbitrarily would be hard (and likely not worth the
> effort). Naming them is trivial though, which might not be incredibly
> pythonic but is very convenient, easy and readable.

You can view this as a lookup by name, not a lock creation. Not
allowing them to be used outside of a with clause is a reasonable
restriction, and does not preclude a (possibly very distant) extension
to being able to pass them around.

>> barrier, if supported, should be a function call not a
>> context. Not as critical as with the tasks case, but a good example to
>> see how it flows would be useful here as well.
>
> I agree, it really doesn't have any associated code and trying to
> associate code with it is likely more confusing than meaningful. It
> was just an idea.
> Often you can rely on implicit barriers from e.g. prange, but not
> always. I can't think of any real-world example, but you usually need
> it to ensure that everyone gets a sane view on some shared data, e.g.
>
> with nogil, parallel():
> ? ?array[threadid()] = func(threadid())
> ? ?barrier()
> ? ?use array[threadid() + 1 % omp_num_threads()] # access data of
> some neighbour
>
> This is a rather contrived example, but (see below) it would be
> especially useful if you use single/master/once/first that sets some
> shared data everyone will operate on (for instance in a prange). To
> ensure the data is sane before you use it, you have to put the barrier
> to 1) ensure the data has been written and 2) that the data has been
> flushed.
>
> Basically, you'll always know when you need a barrier, but it's pretty
> hard to come up with a real-world example for it when you have to :)

Yes, I think barriers are explanatory enough.

>> As for single, I see doing this manually does require boilerplate
>> locking, so what about
>>
>> if cython.parallel.once(): ?# will return True once for a tread group.
>> ? ?...
>>
>> we could implement this via our own locking/checking/flushing to allow
>> it to occur in arbitrary expressions, e.g.
>>
>> special_worker = cython.parallel.once()
>> if special_worker:
>> ? ...
>> [common code]
>> if special_worker: ? # single wouldn't work here
>> ? ...
>>
>
> That looks OK. I've actually been thinking that if we have barriers we
> don't really need is_master(), once() or single() or anything. We
> already have threadid() and you usually don't care what thread gets
> there first, you only care about doing it once. So one could just
> write
>
> if parallel.threadid() == 0:
> ? ?...
>
> parallel.barrier() # if required

Perhaps you want the first free thread to take it up to minimize idle
threads. I agree if parallel.threadid() == 0 is a synonym for
is_master(), so probably not needed. However, what are the OpenMP
semantics of

cdef f():
    with parallel():
        g()
        g()

cdef g():
    with single():
        ... # executed once, right?
    with task:
        ... # executed twice, right?

> It might also be convenient to declare variables explicitly shared
> here, e.g. this code will not work:
>
> cdef int *buf
>
> with nogil, parallel.parallel():
> ? ?if parallel.threadid() == 0:
> ? ? ? ?buf = ...
>
> ? ?parallel.barrier()
>
> ? ?# will will likely segfault, as buf is private because we assigned
> to it. It's only valid in thread 0
> ? ?use buf[...]
>
> So basically you'd have to do something like (&buf)[0][...], which
> frankly looks pretty weird. However I do think such cases are rather
> uncommon.

True. Perhaps this could be declared via "with nogil,
parallel.parallel(), parallel.shared(buf)" or something like that.

- Robert

From adriangeologo at yahoo.es  Wed Oct 19 19:16:02 2011
From: adriangeologo at yahoo.es (=?ISO-8859-1?Q?Adrian_Mart=EDnez_Vargas?=)
Date: Wed, 19 Oct 2011 10:16:02 -0700
Subject: [Cython] ImportError: DLL load failed: The specified module could
	not be found.
Message-ID: <4E9F05D2.9050602@yahoo.es>

Hi cython list,

I am having problems to distribute a python module for windows (written 
with cython). It compile ok with mingw (installed with but when I import 
the module in python I get this error

ImportError: DLL load failed: The specified module could not be found.

I guess that some where there is an error linking the DLL. Bout how to 
solve it to create a nice distribution (.exe) for my package?


Thanks
Adrian

From markflorisson88 at gmail.com  Wed Oct 19 20:19:15 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Wed, 19 Oct 2011 19:19:15 +0100
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <CADiQ+QC+8Rz2LyjM-BT8jp3kDba5JX0u+rV3xwbMmcOneCa4rA@mail.gmail.com>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
	<CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>
	<4E955180.1070601@astro.uio.no>
	<CADiQ+QD3ic619jYEEC7mJ6U=EwWna7jOc0TDYhWDBPCfW=8kvg@mail.gmail.com>
	<CANg26EVx0EZxRdC15tjKc+QVFaQjTT_yozZe47rLAQCs9aoH4Q@mail.gmail.com>
	<CADiQ+QDHQnvtRMWVUiPRJC+ZsWg7J_gRw0GCzfTqARPL33-92A@mail.gmail.com>
	<CANg26EVvVKGmfAbOJSiOVvWC-nrrCWsiRPOhKBKEmyfFGzqkaQ@mail.gmail.com>
	<CADiQ+QC+8Rz2LyjM-BT8jp3kDba5JX0u+rV3xwbMmcOneCa4rA@mail.gmail.com>
Message-ID: <CANg26EXvv_LCyzRLjVTrsHC=KHoNv0rfXKZm=kq2S1M=uDJ3gQ@mail.gmail.com>

On 19 October 2011 06:01, Robert Bradshaw <robertwb at math.washington.edu> wrote:
> On Fri, Oct 14, 2011 at 1:07 PM, mark florisson
> <markflorisson88 at gmail.com> wrote:
>> On 14 October 2011 19:31, Robert Bradshaw <robertwb at math.washington.edu> wrote:
>>> On Wed, Oct 12, 2011 at 7:55 AM, mark florisson
>>> <markflorisson88 at gmail.com> wrote:
>>>>>> I ultimately feel things like that is more important than 100% coverage of
>>>>>> the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit.
>>>>>
>>>>> +1 Prange handles the (corse-grained) SIMD case nicely, and a
>>>>> task/futures model based on closures would I think flesh this out to
>>>>> the next level of generality (and complexity).
>>>>
>>>> Futures are definitely nice. I suppose I think really like "inline
>>>> futures", i.e. openmp tasks. I realize that futures may look more
>>>> pythonic. However, as mentioned previously, I also see issues with
>>>> that. When you submit a task then you expect a future object, which
>>>> you might want to pass around. But we don't have the GIL for that. I
>>>> personally feel that futures is something that should be done by a
>>>> library (such as concurrent.futures in python 3.2), and inline tasks
>>>> by a language. It also means I have to write an entire function or
>>>> closure for perhaps only a few lines of code.
>>>>
>>>> I might also want to submit other functions that are not closures, or
>>>> I might want to reuse my closures that are used for tasks and for
>>>> something else. So what if my tasks contain more parallel constructs?
>>>> e.g. what if I have a task closure that I return from my function that
>>>> generates more tasks itself? Would you just execute them sequentially
>>>> outside of the parallel construct, or would you simply disallow that?
>>>> Also, do you restrict future "objects" to only the parallel section?
>>>>
>>>> Another problem is that you can only wait on tasks of your direct
>>>> children. So what if I get access to my parent's future object
>>>> (assuming you allow tasks to generate tasks), and then want the result
>>>> of my parent?
>>>> Or what if I store these future objects in an array or list and access
>>>> them arbitrarily? You will only know at runtime which task to wait on,
>>>> and openmp only has a static, lexical taskwait.
>>>>
>>>> I suppose my point is that without either a drastic rewrite (e.g., use
>>>> pthreads instead of openmp) or quite a bit of contraints, I am unsure
>>>> how futures would work here. Perhaps you guys have some concrete
>>>> syntax and semantics proposals?
>>>
>>> It feels to me that OpenMP tasks took a different model of parallelism
>>> and tried to force them into the OpenMP model/constraints, and so it'd
>>> be even more difficult to fit them into a nice pythonic interface.
>>> Perhaps to make progress on this front we need to have a concrete
>>> example to look at. I'm also wondering if the standard threading
>>> module (perhaps with overlay support) used with nogil functions would
>>> be sufficient--locking is required for handling the queues, etc. so
>>> the fact that the GIL is involved is not a big deal. It is possible
>>> that this won't scale to as small of work units, but the overhead
>>> should be minimal once your work unit is a sufficient size (which is
>>> probably quite small) and it's already implemented and well
>>> documented/used.
>>
>> It's all definitely possible with normal threads, but the thing you
>> lose is convenience and conciseness. For big problems the programmer
>> might sum up the courage and effort to implement it, but typically you
>> will just stick to a serial version. This is really where OpenMP is
>> powerful, you can take a simple sequential piece of code and make it
>> parallel with minimal effort and without having to restructure,
>> rethink and rewrite your algorithms.
>
> That is a very good point.
>
>> Something like concurrent.futures is definitely nice, but most people
>> cannot afford to mandate python 3.2 for their users.
>>
>> The most classical examples I can think of for tasks are
>>
>> 1) independent code sections, i.e. two or more pieces of code that
>> don't depend on each other which you want to execute in parallel
>> 2) traversal of some kind of custom data structure, like a tree or a linked list
>> 3) some kind of other producer/consumer model
>>
>> e.g. using with task syntax:
>>
>> cdef postorder_traverse(tree *t): # bullet 1) and 2)
>> ? ?with task:
>> ? ? ? ?traverse(t.left)
>> ? ?with task:
>> ? ? ? ?traverse(t.right)
>>
>> ? ?taskwait() # wait until we traversed our subtrees
>> ? ?use(t.data)
>
> Is there an implicit parallel block here? Perhaps in the caller?

Yes, it was implicit in my example. If you'd use that code, you'd call
it from a parallel section. Depending on what semantics you'd define
(see below), you'd call it either from one thread in the team, or with
all of them.

>> cdef list_traverse(linkedlist *L): # bullet 2)
>> ? ?with nogil, parallel():
>> ? ? ? ?if threadid() == 0:
>> ? ? ? ? ? ?while L.next:
>> ? ? ? ? ? ? ? ?with task:
>> ? ? ? ? ? ? ? ? ? ?do_something(L.data)
>>
>> In the latter case we don't need a taskwait as we don't care about any
>> particular order. Only one thread generates the tasks where the others
>> just hit the barrier and see the tasks they can execute.
>
> I guess it's the fact that Python doesn't have a nice syntax for
> anonymous functions or blocks does make this syntax more appealing
> than an explicit closure.
>
> Perhaps if we came up with a more pythonic/natural name which would
> make the intent clear. Makes me want to do something like
>
> pool = ThreadPool(10)
> for item in L:
> ? ?with pool:
> ? ? ? ?process(item)
>
> but then you get into issues of passing the pool around. OpenMP has
> the implicit pool of the nesting parallel block, so "with one thread"
> or "with cython.parallel.pool" or something like that might be more
> readable.

I think with pool would be good, it must be clear that the task is
submitted to a threadpool and hence may be executed asynchronously.

>> The good thing is that the OpenMP runtime can decide at task
>> generation point (not only at taskwait or barrier points!) decide to
>> stop generating more tasks and start executing them. So you won't
>> exhaust memory if you might have lots of tasks.
>
> Often threadpools have queues that block when their buffer gets full
> to achieve the same goal.
>
>>> As for critical and barrier, the notion of a critical block as a with
>>> statement is very useful. Creating/naming locks (rather than being
>>> implicit on the file/line number) is more powerful, but is a larger
>>> burden on the user and more difficult to support with the OpenMP
>>> backend.
>>
>> Actually, as I mentioned before, critical sections do not at all
>> depend on their line or file number. All they depend on their implicit
>> or explicit name (the name is implicit when you simply omit it, so all
>> unnamed critical sections exclude each other).
>
> Ah, yes. In this case "with cython.parallel.lock([optional name])"
> could be obvious enough.
>
>> Indeed, supporting creation of locks dynamically and allowing them to
>> be passed around arbitrarily would be hard (and likely not worth the
>> effort). Naming them is trivial though, which might not be incredibly
>> pythonic but is very convenient, easy and readable.
>
> You can view this as a lookup by name, not a lock creation. Not
> allowing them to be used outside of a with clause is a reasonable
> restriction, and does not preclude a (possibly very distant) extension
> to being able to pass them around.
>
>>> barrier, if supported, should be a function call not a
>>> context. Not as critical as with the tasks case, but a good example to
>>> see how it flows would be useful here as well.
>>
>> I agree, it really doesn't have any associated code and trying to
>> associate code with it is likely more confusing than meaningful. It
>> was just an idea.
>> Often you can rely on implicit barriers from e.g. prange, but not
>> always. I can't think of any real-world example, but you usually need
>> it to ensure that everyone gets a sane view on some shared data, e.g.
>>
>> with nogil, parallel():
>> ? ?array[threadid()] = func(threadid())
>> ? ?barrier()
>> ? ?use array[threadid() + 1 % omp_num_threads()] # access data of
>> some neighbour
>>
>> This is a rather contrived example, but (see below) it would be
>> especially useful if you use single/master/once/first that sets some
>> shared data everyone will operate on (for instance in a prange). To
>> ensure the data is sane before you use it, you have to put the barrier
>> to 1) ensure the data has been written and 2) that the data has been
>> flushed.
>>
>> Basically, you'll always know when you need a barrier, but it's pretty
>> hard to come up with a real-world example for it when you have to :)
>
> Yes, I think barriers are explanatory enough.
>
>>> As for single, I see doing this manually does require boilerplate
>>> locking, so what about
>>>
>>> if cython.parallel.once(): ?# will return True once for a tread group.
>>> ? ?...
>>>
>>> we could implement this via our own locking/checking/flushing to allow
>>> it to occur in arbitrary expressions, e.g.
>>>
>>> special_worker = cython.parallel.once()
>>> if special_worker:
>>> ? ...
>>> [common code]
>>> if special_worker: ? # single wouldn't work here
>>> ? ...
>>>
>>
>> That looks OK. I've actually been thinking that if we have barriers we
>> don't really need is_master(), once() or single() or anything. We
>> already have threadid() and you usually don't care what thread gets
>> there first, you only care about doing it once. So one could just
>> write
>>
>> if parallel.threadid() == 0:
>> ? ?...
>>
>> parallel.barrier() # if required
>
> Perhaps you want the first free thread to take it up to minimize idle
> threads. I agree if parallel.threadid() == 0 is a synonym for
> is_master(), so probably not needed. However, what are the OpenMP
> semantics of
>
> cdef f():
> ? ?with parallel():
> ? ? ? ?g()
> ? ? ? ?g()
>
> cdef g():
> ? ?with single():
> ? ? ? ?... # executed once, right?
> ? ?with task:
> ? ? ? ?... # executed twice, right?

Hmm, not quite. The thing is that function g is called by every thread
in the team, say N threads, and for each time the team encounters the
single directive, it will execute it once, so in total it will execute
the code in the single block twice, as the team encounters it twice.

It will however create 2N tasks to execute, as every thread that
encounters it creates a task. This is probably not what you want, so
you usually want

with parallel():
    if threadid() == 0:
        g()

and have the code in g (executed by one thread only) create the tasks.

Note also how 'for _ in prange(1):' would not have the same semantics
here, as it generates a 'parallel for' and not a worksharing for in
the function (because we don't support orphaned pranges).

I think this may all be confusing for users, I think usually you will
want to create just a single task irrespective of whether you are in a
parallel or a prange and not "however many threads are in the team for
parallel and just one for prange because we're sharing work". This
would also work for orphaned tasks, e.g. you expect 2 tasks in your
snippet above, not 2N. Fortunately, that would be easy to support.
We would however have to introduce the same restriction as with
(implicit) barriers: either all or none of the threads must encounter
the construct (or maybe loosen it to "if you actually want to create
the task, make sure at least thread 0 encounters it", which may lead
users to write more efficient code).

>> It might also be convenient to declare variables explicitly shared
>> here, e.g. this code will not work:
>>
>> cdef int *buf
>>
>> with nogil, parallel.parallel():
>> ? ?if parallel.threadid() == 0:
>> ? ? ? ?buf = ...
>>
>> ? ?parallel.barrier()
>>
>> ? ?# will will likely segfault, as buf is private because we assigned
>> to it. It's only valid in thread 0
>> ? ?use buf[...]
>>
>> So basically you'd have to do something like (&buf)[0][...], which
>> frankly looks pretty weird. However I do think such cases are rather
>> uncommon.
>
> True. Perhaps this could be declared via "with nogil,
> parallel.parallel(), parallel.shared(buf)" or something like that.

That looks elegant enough.

> - Robert
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From markflorisson88 at gmail.com  Wed Oct 19 21:45:02 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Wed, 19 Oct 2011 20:45:02 +0100
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <CANg26EXvv_LCyzRLjVTrsHC=KHoNv0rfXKZm=kq2S1M=uDJ3gQ@mail.gmail.com>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
	<CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>
	<4E955180.1070601@astro.uio.no>
	<CADiQ+QD3ic619jYEEC7mJ6U=EwWna7jOc0TDYhWDBPCfW=8kvg@mail.gmail.com>
	<CANg26EVx0EZxRdC15tjKc+QVFaQjTT_yozZe47rLAQCs9aoH4Q@mail.gmail.com>
	<CADiQ+QDHQnvtRMWVUiPRJC+ZsWg7J_gRw0GCzfTqARPL33-92A@mail.gmail.com>
	<CANg26EVvVKGmfAbOJSiOVvWC-nrrCWsiRPOhKBKEmyfFGzqkaQ@mail.gmail.com>
	<CADiQ+QC+8Rz2LyjM-BT8jp3kDba5JX0u+rV3xwbMmcOneCa4rA@mail.gmail.com>
	<CANg26EXvv_LCyzRLjVTrsHC=KHoNv0rfXKZm=kq2S1M=uDJ3gQ@mail.gmail.com>
Message-ID: <CANg26EV88qHwFcAR_VCuE1DDZM4mBK9i4xGWRNuG+Ev8r061vg@mail.gmail.com>

On 19 October 2011 19:19, mark florisson <markflorisson88 at gmail.com> wrote:
> On 19 October 2011 06:01, Robert Bradshaw <robertwb at math.washington.edu> wrote:
>> On Fri, Oct 14, 2011 at 1:07 PM, mark florisson
>> <markflorisson88 at gmail.com> wrote:
>>> On 14 October 2011 19:31, Robert Bradshaw <robertwb at math.washington.edu> wrote:
>>>> On Wed, Oct 12, 2011 at 7:55 AM, mark florisson
>>>> <markflorisson88 at gmail.com> wrote:
>>>>>>> I ultimately feel things like that is more important than 100% coverage of
>>>>>>> the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit.
>>>>>>
>>>>>> +1 Prange handles the (corse-grained) SIMD case nicely, and a
>>>>>> task/futures model based on closures would I think flesh this out to
>>>>>> the next level of generality (and complexity).
>>>>>
>>>>> Futures are definitely nice. I suppose I think really like "inline
>>>>> futures", i.e. openmp tasks. I realize that futures may look more
>>>>> pythonic. However, as mentioned previously, I also see issues with
>>>>> that. When you submit a task then you expect a future object, which
>>>>> you might want to pass around. But we don't have the GIL for that. I
>>>>> personally feel that futures is something that should be done by a
>>>>> library (such as concurrent.futures in python 3.2), and inline tasks
>>>>> by a language. It also means I have to write an entire function or
>>>>> closure for perhaps only a few lines of code.
>>>>>
>>>>> I might also want to submit other functions that are not closures, or
>>>>> I might want to reuse my closures that are used for tasks and for
>>>>> something else. So what if my tasks contain more parallel constructs?
>>>>> e.g. what if I have a task closure that I return from my function that
>>>>> generates more tasks itself? Would you just execute them sequentially
>>>>> outside of the parallel construct, or would you simply disallow that?
>>>>> Also, do you restrict future "objects" to only the parallel section?
>>>>>
>>>>> Another problem is that you can only wait on tasks of your direct
>>>>> children. So what if I get access to my parent's future object
>>>>> (assuming you allow tasks to generate tasks), and then want the result
>>>>> of my parent?
>>>>> Or what if I store these future objects in an array or list and access
>>>>> them arbitrarily? You will only know at runtime which task to wait on,
>>>>> and openmp only has a static, lexical taskwait.
>>>>>
>>>>> I suppose my point is that without either a drastic rewrite (e.g., use
>>>>> pthreads instead of openmp) or quite a bit of contraints, I am unsure
>>>>> how futures would work here. Perhaps you guys have some concrete
>>>>> syntax and semantics proposals?
>>>>
>>>> It feels to me that OpenMP tasks took a different model of parallelism
>>>> and tried to force them into the OpenMP model/constraints, and so it'd
>>>> be even more difficult to fit them into a nice pythonic interface.
>>>> Perhaps to make progress on this front we need to have a concrete
>>>> example to look at. I'm also wondering if the standard threading
>>>> module (perhaps with overlay support) used with nogil functions would
>>>> be sufficient--locking is required for handling the queues, etc. so
>>>> the fact that the GIL is involved is not a big deal. It is possible
>>>> that this won't scale to as small of work units, but the overhead
>>>> should be minimal once your work unit is a sufficient size (which is
>>>> probably quite small) and it's already implemented and well
>>>> documented/used.
>>>
>>> It's all definitely possible with normal threads, but the thing you
>>> lose is convenience and conciseness. For big problems the programmer
>>> might sum up the courage and effort to implement it, but typically you
>>> will just stick to a serial version. This is really where OpenMP is
>>> powerful, you can take a simple sequential piece of code and make it
>>> parallel with minimal effort and without having to restructure,
>>> rethink and rewrite your algorithms.
>>
>> That is a very good point.
>>
>>> Something like concurrent.futures is definitely nice, but most people
>>> cannot afford to mandate python 3.2 for their users.
>>>
>>> The most classical examples I can think of for tasks are
>>>
>>> 1) independent code sections, i.e. two or more pieces of code that
>>> don't depend on each other which you want to execute in parallel
>>> 2) traversal of some kind of custom data structure, like a tree or a linked list
>>> 3) some kind of other producer/consumer model
>>>
>>> e.g. using with task syntax:
>>>
>>> cdef postorder_traverse(tree *t): # bullet 1) and 2)
>>> ? ?with task:
>>> ? ? ? ?traverse(t.left)
>>> ? ?with task:
>>> ? ? ? ?traverse(t.right)
>>>
>>> ? ?taskwait() # wait until we traversed our subtrees
>>> ? ?use(t.data)
>>
>> Is there an implicit parallel block here? Perhaps in the caller?
>
> Yes, it was implicit in my example. If you'd use that code, you'd call
> it from a parallel section. Depending on what semantics you'd define
> (see below), you'd call it either from one thread in the team, or with
> all of them.
>
>>> cdef list_traverse(linkedlist *L): # bullet 2)
>>> ? ?with nogil, parallel():
>>> ? ? ? ?if threadid() == 0:
>>> ? ? ? ? ? ?while L.next:
>>> ? ? ? ? ? ? ? ?with task:
>>> ? ? ? ? ? ? ? ? ? ?do_something(L.data)
>>>
>>> In the latter case we don't need a taskwait as we don't care about any
>>> particular order. Only one thread generates the tasks where the others
>>> just hit the barrier and see the tasks they can execute.
>>
>> I guess it's the fact that Python doesn't have a nice syntax for
>> anonymous functions or blocks does make this syntax more appealing
>> than an explicit closure.
>>
>> Perhaps if we came up with a more pythonic/natural name which would
>> make the intent clear. Makes me want to do something like
>>
>> pool = ThreadPool(10)
>> for item in L:
>> ? ?with pool:
>> ? ? ? ?process(item)
>>
>> but then you get into issues of passing the pool around. OpenMP has
>> the implicit pool of the nesting parallel block, so "with one thread"
>> or "with cython.parallel.pool" or something like that might be more
>> readable.
>
> I think with pool would be good, it must be clear that the task is
> submitted to a threadpool and hence may be executed asynchronously.
>
>>> The good thing is that the OpenMP runtime can decide at task
>>> generation point (not only at taskwait or barrier points!) decide to
>>> stop generating more tasks and start executing them. So you won't
>>> exhaust memory if you might have lots of tasks.
>>
>> Often threadpools have queues that block when their buffer gets full
>> to achieve the same goal.
>>
>>>> As for critical and barrier, the notion of a critical block as a with
>>>> statement is very useful. Creating/naming locks (rather than being
>>>> implicit on the file/line number) is more powerful, but is a larger
>>>> burden on the user and more difficult to support with the OpenMP
>>>> backend.
>>>
>>> Actually, as I mentioned before, critical sections do not at all
>>> depend on their line or file number. All they depend on their implicit
>>> or explicit name (the name is implicit when you simply omit it, so all
>>> unnamed critical sections exclude each other).
>>
>> Ah, yes. In this case "with cython.parallel.lock([optional name])"
>> could be obvious enough.
>>
>>> Indeed, supporting creation of locks dynamically and allowing them to
>>> be passed around arbitrarily would be hard (and likely not worth the
>>> effort). Naming them is trivial though, which might not be incredibly
>>> pythonic but is very convenient, easy and readable.
>>
>> You can view this as a lookup by name, not a lock creation. Not
>> allowing them to be used outside of a with clause is a reasonable
>> restriction, and does not preclude a (possibly very distant) extension
>> to being able to pass them around.
>>
>>>> barrier, if supported, should be a function call not a
>>>> context. Not as critical as with the tasks case, but a good example to
>>>> see how it flows would be useful here as well.
>>>
>>> I agree, it really doesn't have any associated code and trying to
>>> associate code with it is likely more confusing than meaningful. It
>>> was just an idea.
>>> Often you can rely on implicit barriers from e.g. prange, but not
>>> always. I can't think of any real-world example, but you usually need
>>> it to ensure that everyone gets a sane view on some shared data, e.g.
>>>
>>> with nogil, parallel():
>>> ? ?array[threadid()] = func(threadid())
>>> ? ?barrier()
>>> ? ?use array[threadid() + 1 % omp_num_threads()] # access data of
>>> some neighbour
>>>
>>> This is a rather contrived example, but (see below) it would be
>>> especially useful if you use single/master/once/first that sets some
>>> shared data everyone will operate on (for instance in a prange). To
>>> ensure the data is sane before you use it, you have to put the barrier
>>> to 1) ensure the data has been written and 2) that the data has been
>>> flushed.
>>>
>>> Basically, you'll always know when you need a barrier, but it's pretty
>>> hard to come up with a real-world example for it when you have to :)
>>
>> Yes, I think barriers are explanatory enough.
>>
>>>> As for single, I see doing this manually does require boilerplate
>>>> locking, so what about
>>>>
>>>> if cython.parallel.once(): ?# will return True once for a tread group.
>>>> ? ?...
>>>>
>>>> we could implement this via our own locking/checking/flushing to allow
>>>> it to occur in arbitrary expressions, e.g.
>>>>
>>>> special_worker = cython.parallel.once()
>>>> if special_worker:
>>>> ? ...
>>>> [common code]
>>>> if special_worker: ? # single wouldn't work here
>>>> ? ...
>>>>
>>>
>>> That looks OK. I've actually been thinking that if we have barriers we
>>> don't really need is_master(), once() or single() or anything. We
>>> already have threadid() and you usually don't care what thread gets
>>> there first, you only care about doing it once. So one could just
>>> write
>>>
>>> if parallel.threadid() == 0:
>>> ? ?...
>>>
>>> parallel.barrier() # if required
>>
>> Perhaps you want the first free thread to take it up to minimize idle
>> threads. I agree if parallel.threadid() == 0 is a synonym for
>> is_master(), so probably not needed. However, what are the OpenMP
>> semantics of
>>
>> cdef f():
>> ? ?with parallel():
>> ? ? ? ?g()
>> ? ? ? ?g()
>>
>> cdef g():
>> ? ?with single():
>> ? ? ? ?... # executed once, right?
>> ? ?with task:
>> ? ? ? ?... # executed twice, right?
>
> Hmm, not quite. The thing is that function g is called by every thread
> in the team, say N threads, and for each time the team encounters the
> single directive, it will execute it once, so in total it will execute
> the code in the single block twice, as the team encounters it twice.
>
> It will however create 2N tasks to execute, as every thread that
> encounters it creates a task. This is probably not what you want, so
> you usually want
>
> with parallel():
> ? ?if threadid() == 0:
> ? ? ? ?g()
>
> and have the code in g (executed by one thread only) create the tasks.
>
> Note also how 'for _ in prange(1):' would not have the same semantics
> here, as it generates a 'parallel for' and not a worksharing for in
> the function (because we don't support orphaned pranges).
>
> I think this may all be confusing for users, I think usually you will
> want to create just a single task irrespective of whether you are in a
> parallel or a prange and not "however many threads are in the team for
> parallel and just one for prange because we're sharing work". This
> would also work for orphaned tasks, e.g. you expect 2 tasks in your
> snippet above, not 2N. Fortunately, that would be easy to support.
> We would however have to introduce the same restriction as with
> (implicit) barriers: either all or none of the threads must encounter
> the construct (or maybe loosen it to "if you actually want to create
> the task, make sure at least thread 0 encounters it", which may lead
> users to write more efficient code).
>
>>> It might also be convenient to declare variables explicitly shared
>>> here, e.g. this code will not work:
>>>
>>> cdef int *buf
>>>
>>> with nogil, parallel.parallel():
>>> ? ?if parallel.threadid() == 0:
>>> ? ? ? ?buf = ...
>>>
>>> ? ?parallel.barrier()
>>>
>>> ? ?# will will likely segfault, as buf is private because we assigned
>>> to it. It's only valid in thread 0
>>> ? ?use buf[...]
>>>
>>> So basically you'd have to do something like (&buf)[0][...], which
>>> frankly looks pretty weird. However I do think such cases are rather
>>> uncommon.
>>
>> True. Perhaps this could be declared via "with nogil,
>> parallel.parallel(), parallel.shared(buf)" or something like that.
>
> That looks elegant enough.

Likewise, I think something like parallel.private(buf) would also be
really nice for arrays, especially if we also allow arrays with
runtime sizes (behind the scenes we could malloc and free). I think
those cases are much more common than parallel.shared().

>> - Robert
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>

From aberghage at gmail.com  Wed Oct 19 21:45:27 2011
From: aberghage at gmail.com (Alexander T. Berghage)
Date: Wed, 19 Oct 2011 15:45:27 -0400
Subject: [Cython] ImportError: DLL load failed: The specified module
 could not be found.
In-Reply-To: <4E9F05D2.9050602@yahoo.es>
References: <4E9F05D2.9050602@yahoo.es>
Message-ID: <CAGRfJ0ZRwHUY44XWkn6g6ytwzpm4de+jYBZu1EWUisFdDWzyLg@mail.gmail.com>

Adrian

I'm a little unclear on the big picture here. Are you trying to
distribute a module (a .pyd / .dll) that you or someone else can
import from a .py script, or are you looking to compile a .exe that
runs your cython code on execution?

----

Just interpreting the error you're describing (ImportError: DLL load
failed: could not be found),  the dynamic linker couldn't find a
library it needed. Most likely this is either a symptom of missing
dependencies or a path problem. Here's my suggestions for diagnosing
and fixing the problem:

Missing Dependencies:
    One very simple way to confirm that all the dependencies of your
cython module are
    available is to point the dependency walker utility[1] at it, and
look for missing DLLs.

Directory Structure:
    Is the .pyd file you built from your cython module in the
PYTHONPATH (or your current
    working directory? If it's not, there's your issue.

[1]  http://www.dependencywalker.com/


Hope that helps!

Best,
-Alex

From markflorisson88 at gmail.com  Wed Oct 19 21:53:43 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Wed, 19 Oct 2011 20:53:43 +0100
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <CADiQ+QC+8Rz2LyjM-BT8jp3kDba5JX0u+rV3xwbMmcOneCa4rA@mail.gmail.com>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
	<CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>
	<4E955180.1070601@astro.uio.no>
	<CADiQ+QD3ic619jYEEC7mJ6U=EwWna7jOc0TDYhWDBPCfW=8kvg@mail.gmail.com>
	<CANg26EVx0EZxRdC15tjKc+QVFaQjTT_yozZe47rLAQCs9aoH4Q@mail.gmail.com>
	<CADiQ+QDHQnvtRMWVUiPRJC+ZsWg7J_gRw0GCzfTqARPL33-92A@mail.gmail.com>
	<CANg26EVvVKGmfAbOJSiOVvWC-nrrCWsiRPOhKBKEmyfFGzqkaQ@mail.gmail.com>
	<CADiQ+QC+8Rz2LyjM-BT8jp3kDba5JX0u+rV3xwbMmcOneCa4rA@mail.gmail.com>
Message-ID: <CANg26EXQz6aw4dCDkefikiTHwSUdXV41ngY15nrMJ6g6Kk3Kqg@mail.gmail.com>

On 19 October 2011 06:01, Robert Bradshaw <robertwb at math.washington.edu> wrote:
> On Fri, Oct 14, 2011 at 1:07 PM, mark florisson
> <markflorisson88 at gmail.com> wrote:
>> On 14 October 2011 19:31, Robert Bradshaw <robertwb at math.washington.edu> wrote:
>>> On Wed, Oct 12, 2011 at 7:55 AM, mark florisson
>>> <markflorisson88 at gmail.com> wrote:
>>>>>> I ultimately feel things like that is more important than 100% coverage of
>>>>>> the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit.
>>>>>
>>>>> +1 Prange handles the (corse-grained) SIMD case nicely, and a
>>>>> task/futures model based on closures would I think flesh this out to
>>>>> the next level of generality (and complexity).
>>>>
>>>> Futures are definitely nice. I suppose I think really like "inline
>>>> futures", i.e. openmp tasks. I realize that futures may look more
>>>> pythonic. However, as mentioned previously, I also see issues with
>>>> that. When you submit a task then you expect a future object, which
>>>> you might want to pass around. But we don't have the GIL for that. I
>>>> personally feel that futures is something that should be done by a
>>>> library (such as concurrent.futures in python 3.2), and inline tasks
>>>> by a language. It also means I have to write an entire function or
>>>> closure for perhaps only a few lines of code.
>>>>
>>>> I might also want to submit other functions that are not closures, or
>>>> I might want to reuse my closures that are used for tasks and for
>>>> something else. So what if my tasks contain more parallel constructs?
>>>> e.g. what if I have a task closure that I return from my function that
>>>> generates more tasks itself? Would you just execute them sequentially
>>>> outside of the parallel construct, or would you simply disallow that?
>>>> Also, do you restrict future "objects" to only the parallel section?
>>>>
>>>> Another problem is that you can only wait on tasks of your direct
>>>> children. So what if I get access to my parent's future object
>>>> (assuming you allow tasks to generate tasks), and then want the result
>>>> of my parent?
>>>> Or what if I store these future objects in an array or list and access
>>>> them arbitrarily? You will only know at runtime which task to wait on,
>>>> and openmp only has a static, lexical taskwait.
>>>>
>>>> I suppose my point is that without either a drastic rewrite (e.g., use
>>>> pthreads instead of openmp) or quite a bit of contraints, I am unsure
>>>> how futures would work here. Perhaps you guys have some concrete
>>>> syntax and semantics proposals?
>>>
>>> It feels to me that OpenMP tasks took a different model of parallelism
>>> and tried to force them into the OpenMP model/constraints, and so it'd
>>> be even more difficult to fit them into a nice pythonic interface.
>>> Perhaps to make progress on this front we need to have a concrete
>>> example to look at. I'm also wondering if the standard threading
>>> module (perhaps with overlay support) used with nogil functions would
>>> be sufficient--locking is required for handling the queues, etc. so
>>> the fact that the GIL is involved is not a big deal. It is possible
>>> that this won't scale to as small of work units, but the overhead
>>> should be minimal once your work unit is a sufficient size (which is
>>> probably quite small) and it's already implemented and well
>>> documented/used.
>>
>> It's all definitely possible with normal threads, but the thing you
>> lose is convenience and conciseness. For big problems the programmer
>> might sum up the courage and effort to implement it, but typically you
>> will just stick to a serial version. This is really where OpenMP is
>> powerful, you can take a simple sequential piece of code and make it
>> parallel with minimal effort and without having to restructure,
>> rethink and rewrite your algorithms.
>
> That is a very good point.
>
>> Something like concurrent.futures is definitely nice, but most people
>> cannot afford to mandate python 3.2 for their users.
>>
>> The most classical examples I can think of for tasks are
>>
>> 1) independent code sections, i.e. two or more pieces of code that
>> don't depend on each other which you want to execute in parallel
>> 2) traversal of some kind of custom data structure, like a tree or a linked list
>> 3) some kind of other producer/consumer model
>>
>> e.g. using with task syntax:
>>
>> cdef postorder_traverse(tree *t): # bullet 1) and 2)
>> ? ?with task:
>> ? ? ? ?traverse(t.left)
>> ? ?with task:
>> ? ? ? ?traverse(t.right)
>>
>> ? ?taskwait() # wait until we traversed our subtrees
>> ? ?use(t.data)
>
> Is there an implicit parallel block here? Perhaps in the caller?
>
>> cdef list_traverse(linkedlist *L): # bullet 2)
>> ? ?with nogil, parallel():
>> ? ? ? ?if threadid() == 0:
>> ? ? ? ? ? ?while L.next:
>> ? ? ? ? ? ? ? ?with task:
>> ? ? ? ? ? ? ? ? ? ?do_something(L.data)
>>
>> In the latter case we don't need a taskwait as we don't care about any
>> particular order. Only one thread generates the tasks where the others
>> just hit the barrier and see the tasks they can execute.
>
> I guess it's the fact that Python doesn't have a nice syntax for
> anonymous functions or blocks does make this syntax more appealing
> than an explicit closure.
>
> Perhaps if we came up with a more pythonic/natural name which would
> make the intent clear. Makes me want to do something like
>
> pool = ThreadPool(10)
> for item in L:
> ? ?with pool:
> ? ? ? ?process(item)
>
> but then you get into issues of passing the pool around. OpenMP has
> the implicit pool of the nesting parallel block, so "with one thread"
> or "with cython.parallel.pool" or something like that might be more
> readable.
>
>> The good thing is that the OpenMP runtime can decide at task
>> generation point (not only at taskwait or barrier points!) decide to
>> stop generating more tasks and start executing them. So you won't
>> exhaust memory if you might have lots of tasks.
>
> Often threadpools have queues that block when their buffer gets full
> to achieve the same goal.
>
>>> As for critical and barrier, the notion of a critical block as a with
>>> statement is very useful. Creating/naming locks (rather than being
>>> implicit on the file/line number) is more powerful, but is a larger
>>> burden on the user and more difficult to support with the OpenMP
>>> backend.
>>
>> Actually, as I mentioned before, critical sections do not at all
>> depend on their line or file number. All they depend on their implicit
>> or explicit name (the name is implicit when you simply omit it, so all
>> unnamed critical sections exclude each other).
>
> Ah, yes. In this case "with cython.parallel.lock([optional name])"
> could be obvious enough.

We could also support atomic updates. We could either rewrite
parallel.lock() blocks to atomics if all statements use inplace
operators, but that might actually not be safe as the exclusion might
be used for the rhs expressions. So I think you'd want a
parallel.atomic() directive or some such.
Alternatively, if you support parallel.shared(), you could specify
that inplace operators on any such variables would actually be atomic
updates, even if you use the operators on the elements of the shared
variable. e.g.

cdef int array1[N]
cdef int array2[N]
with parallel(), shared(array1):
    # atomic update
    array1[i] += ...

    # not an atomic update, as it is "implicitly shared"
    array2[i] += ...

I'm not sure if that's more confusing than enlightening though.

>> Indeed, supporting creation of locks dynamically and allowing them to
>> be passed around arbitrarily would be hard (and likely not worth the
>> effort). Naming them is trivial though, which might not be incredibly
>> pythonic but is very convenient, easy and readable.
>
> You can view this as a lookup by name, not a lock creation. Not
> allowing them to be used outside of a with clause is a reasonable
> restriction, and does not preclude a (possibly very distant) extension
> to being able to pass them around.
>
>>> barrier, if supported, should be a function call not a
>>> context. Not as critical as with the tasks case, but a good example to
>>> see how it flows would be useful here as well.
>>
>> I agree, it really doesn't have any associated code and trying to
>> associate code with it is likely more confusing than meaningful. It
>> was just an idea.
>> Often you can rely on implicit barriers from e.g. prange, but not
>> always. I can't think of any real-world example, but you usually need
>> it to ensure that everyone gets a sane view on some shared data, e.g.
>>
>> with nogil, parallel():
>> ? ?array[threadid()] = func(threadid())
>> ? ?barrier()
>> ? ?use array[threadid() + 1 % omp_num_threads()] # access data of
>> some neighbour
>>
>> This is a rather contrived example, but (see below) it would be
>> especially useful if you use single/master/once/first that sets some
>> shared data everyone will operate on (for instance in a prange). To
>> ensure the data is sane before you use it, you have to put the barrier
>> to 1) ensure the data has been written and 2) that the data has been
>> flushed.
>>
>> Basically, you'll always know when you need a barrier, but it's pretty
>> hard to come up with a real-world example for it when you have to :)
>
> Yes, I think barriers are explanatory enough.
>
>>> As for single, I see doing this manually does require boilerplate
>>> locking, so what about
>>>
>>> if cython.parallel.once(): ?# will return True once for a tread group.
>>> ? ?...
>>>
>>> we could implement this via our own locking/checking/flushing to allow
>>> it to occur in arbitrary expressions, e.g.
>>>
>>> special_worker = cython.parallel.once()
>>> if special_worker:
>>> ? ...
>>> [common code]
>>> if special_worker: ? # single wouldn't work here
>>> ? ...
>>>
>>
>> That looks OK. I've actually been thinking that if we have barriers we
>> don't really need is_master(), once() or single() or anything. We
>> already have threadid() and you usually don't care what thread gets
>> there first, you only care about doing it once. So one could just
>> write
>>
>> if parallel.threadid() == 0:
>> ? ?...
>>
>> parallel.barrier() # if required
>
> Perhaps you want the first free thread to take it up to minimize idle
> threads. I agree if parallel.threadid() == 0 is a synonym for
> is_master(), so probably not needed. However, what are the OpenMP
> semantics of
>
> cdef f():
> ? ?with parallel():
> ? ? ? ?g()
> ? ? ? ?g()
>
> cdef g():
> ? ?with single():
> ? ? ? ?... # executed once, right?
> ? ?with task:
> ? ? ? ?... # executed twice, right?
>
>> It might also be convenient to declare variables explicitly shared
>> here, e.g. this code will not work:
>>
>> cdef int *buf
>>
>> with nogil, parallel.parallel():
>> ? ?if parallel.threadid() == 0:
>> ? ? ? ?buf = ...
>>
>> ? ?parallel.barrier()
>>
>> ? ?# will will likely segfault, as buf is private because we assigned
>> to it. It's only valid in thread 0
>> ? ?use buf[...]
>>
>> So basically you'd have to do something like (&buf)[0][...], which
>> frankly looks pretty weird. However I do think such cases are rather
>> uncommon.
>
> True. Perhaps this could be declared via "with nogil,
> parallel.parallel(), parallel.shared(buf)" or something like that.
>
> - Robert
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From d.s.seljebotn at astro.uio.no  Thu Oct 20 10:42:15 2011
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Thu, 20 Oct 2011 10:42:15 +0200
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <CANg26EXQz6aw4dCDkefikiTHwSUdXV41ngY15nrMJ6g6Kk3Kqg@mail.gmail.com>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>	<4E919100.8020801@astro.uio.no>
	<4E919A40.2090001@astro.uio.no>	<CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>	<4E955180.1070601@astro.uio.no>	<CADiQ+QD3ic619jYEEC7mJ6U=EwWna7jOc0TDYhWDBPCfW=8kvg@mail.gmail.com>	<CANg26EVx0EZxRdC15tjKc+QVFaQjTT_yozZe47rLAQCs9aoH4Q@mail.gmail.com>	<CADiQ+QDHQnvtRMWVUiPRJC+ZsWg7J_gRw0GCzfTqARPL33-92A@mail.gmail.com>	<CANg26EVvVKGmfAbOJSiOVvWC-nrrCWsiRPOhKBKEmyfFGzqkaQ@mail.gmail.com>	<CADiQ+QC+8Rz2LyjM-BT8jp3kDba5JX0u+rV3xwbMmcOneCa4rA@mail.gmail.com>
	<CANg26EXQz6aw4dCDkefikiTHwSUdXV41ngY15nrMJ6g6Kk3Kqg@mail.gmail.com>
Message-ID: <4E9FDEE7.2070301@astro.uio.no>

Meta: I've been meaning to respond to this thread, but can't find the 
time. What's the time-frame for implementing this? If it's hypothetical 
at the moment and just is a question of getting things spec-ed, one 
could perhaps look at discussing it at the next Cython workshop, or 
perhaps a Skype call with the three of us as some point...

Regarding the tasks: One of my biggest problems with Python is the lack 
of an elegant syntax for anonymous functions. But since Python has that 
problem, I feel it is not necesarrily something we should fix (by using 
the with statements to create tasks). Sometimes Pythonic-ness is more 
important than elegance (for Cython).

In general I'm happy as long as there's a chance of getting things to 
work in pure Python mode as well (with serial execution). So if, e.g., 
with statements creating tasks have the same effect when running the 
same code (serially) in pure Python, I'm less opposed (didn't look at it 
in detail).

Dag Sverre

On 10/19/2011 09:53 PM, mark florisson wrote:
> On 19 October 2011 06:01, Robert Bradshaw<robertwb at math.washington.edu>  wrote:
>> On Fri, Oct 14, 2011 at 1:07 PM, mark florisson
>> <markflorisson88 at gmail.com>  wrote:
>>> On 14 October 2011 19:31, Robert Bradshaw<robertwb at math.washington.edu>  wrote:
>>>> On Wed, Oct 12, 2011 at 7:55 AM, mark florisson
>>>> <markflorisson88 at gmail.com>  wrote:
>>>>>>> I ultimately feel things like that is more important than 100% coverage of
>>>>>>> the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit.
>>>>>>
>>>>>> +1 Prange handles the (corse-grained) SIMD case nicely, and a
>>>>>> task/futures model based on closures would I think flesh this out to
>>>>>> the next level of generality (and complexity).
>>>>>
>>>>> Futures are definitely nice. I suppose I think really like "inline
>>>>> futures", i.e. openmp tasks. I realize that futures may look more
>>>>> pythonic. However, as mentioned previously, I also see issues with
>>>>> that. When you submit a task then you expect a future object, which
>>>>> you might want to pass around. But we don't have the GIL for that. I
>>>>> personally feel that futures is something that should be done by a
>>>>> library (such as concurrent.futures in python 3.2), and inline tasks
>>>>> by a language. It also means I have to write an entire function or
>>>>> closure for perhaps only a few lines of code.
>>>>>
>>>>> I might also want to submit other functions that are not closures, or
>>>>> I might want to reuse my closures that are used for tasks and for
>>>>> something else. So what if my tasks contain more parallel constructs?
>>>>> e.g. what if I have a task closure that I return from my function that
>>>>> generates more tasks itself? Would you just execute them sequentially
>>>>> outside of the parallel construct, or would you simply disallow that?
>>>>> Also, do you restrict future "objects" to only the parallel section?
>>>>>
>>>>> Another problem is that you can only wait on tasks of your direct
>>>>> children. So what if I get access to my parent's future object
>>>>> (assuming you allow tasks to generate tasks), and then want the result
>>>>> of my parent?
>>>>> Or what if I store these future objects in an array or list and access
>>>>> them arbitrarily? You will only know at runtime which task to wait on,
>>>>> and openmp only has a static, lexical taskwait.
>>>>>
>>>>> I suppose my point is that without either a drastic rewrite (e.g., use
>>>>> pthreads instead of openmp) or quite a bit of contraints, I am unsure
>>>>> how futures would work here. Perhaps you guys have some concrete
>>>>> syntax and semantics proposals?
>>>>
>>>> It feels to me that OpenMP tasks took a different model of parallelism
>>>> and tried to force them into the OpenMP model/constraints, and so it'd
>>>> be even more difficult to fit them into a nice pythonic interface.
>>>> Perhaps to make progress on this front we need to have a concrete
>>>> example to look at. I'm also wondering if the standard threading
>>>> module (perhaps with overlay support) used with nogil functions would
>>>> be sufficient--locking is required for handling the queues, etc. so
>>>> the fact that the GIL is involved is not a big deal. It is possible
>>>> that this won't scale to as small of work units, but the overhead
>>>> should be minimal once your work unit is a sufficient size (which is
>>>> probably quite small) and it's already implemented and well
>>>> documented/used.
>>>
>>> It's all definitely possible with normal threads, but the thing you
>>> lose is convenience and conciseness. For big problems the programmer
>>> might sum up the courage and effort to implement it, but typically you
>>> will just stick to a serial version. This is really where OpenMP is
>>> powerful, you can take a simple sequential piece of code and make it
>>> parallel with minimal effort and without having to restructure,
>>> rethink and rewrite your algorithms.
>>
>> That is a very good point.
>>
>>> Something like concurrent.futures is definitely nice, but most people
>>> cannot afford to mandate python 3.2 for their users.
>>>
>>> The most classical examples I can think of for tasks are
>>>
>>> 1) independent code sections, i.e. two or more pieces of code that
>>> don't depend on each other which you want to execute in parallel
>>> 2) traversal of some kind of custom data structure, like a tree or a linked list
>>> 3) some kind of other producer/consumer model
>>>
>>> e.g. using with task syntax:
>>>
>>> cdef postorder_traverse(tree *t): # bullet 1) and 2)
>>>     with task:
>>>         traverse(t.left)
>>>     with task:
>>>         traverse(t.right)
>>>
>>>     taskwait() # wait until we traversed our subtrees
>>>     use(t.data)
>>
>> Is there an implicit parallel block here? Perhaps in the caller?
>>
>>> cdef list_traverse(linkedlist *L): # bullet 2)
>>>     with nogil, parallel():
>>>         if threadid() == 0:
>>>             while L.next:
>>>                 with task:
>>>                     do_something(L.data)
>>>
>>> In the latter case we don't need a taskwait as we don't care about any
>>> particular order. Only one thread generates the tasks where the others
>>> just hit the barrier and see the tasks they can execute.
>>
>> I guess it's the fact that Python doesn't have a nice syntax for
>> anonymous functions or blocks does make this syntax more appealing
>> than an explicit closure.
>>
>> Perhaps if we came up with a more pythonic/natural name which would
>> make the intent clear. Makes me want to do something like
>>
>> pool = ThreadPool(10)
>> for item in L:
>>     with pool:
>>         process(item)
>>
>> but then you get into issues of passing the pool around. OpenMP has
>> the implicit pool of the nesting parallel block, so "with one thread"
>> or "with cython.parallel.pool" or something like that might be more
>> readable.
>>
>>> The good thing is that the OpenMP runtime can decide at task
>>> generation point (not only at taskwait or barrier points!) decide to
>>> stop generating more tasks and start executing them. So you won't
>>> exhaust memory if you might have lots of tasks.
>>
>> Often threadpools have queues that block when their buffer gets full
>> to achieve the same goal.
>>
>>>> As for critical and barrier, the notion of a critical block as a with
>>>> statement is very useful. Creating/naming locks (rather than being
>>>> implicit on the file/line number) is more powerful, but is a larger
>>>> burden on the user and more difficult to support with the OpenMP
>>>> backend.
>>>
>>> Actually, as I mentioned before, critical sections do not at all
>>> depend on their line or file number. All they depend on their implicit
>>> or explicit name (the name is implicit when you simply omit it, so all
>>> unnamed critical sections exclude each other).
>>
>> Ah, yes. In this case "with cython.parallel.lock([optional name])"
>> could be obvious enough.
>
> We could also support atomic updates. We could either rewrite
> parallel.lock() blocks to atomics if all statements use inplace
> operators, but that might actually not be safe as the exclusion might
> be used for the rhs expressions. So I think you'd want a
> parallel.atomic() directive or some such.
> Alternatively, if you support parallel.shared(), you could specify
> that inplace operators on any such variables would actually be atomic
> updates, even if you use the operators on the elements of the shared
> variable. e.g.
>
> cdef int array1[N]
> cdef int array2[N]
> with parallel(), shared(array1):
>      # atomic update
>      array1[i] += ...
>
>      # not an atomic update, as it is "implicitly shared"
>      array2[i] += ...
>
> I'm not sure if that's more confusing than enlightening though.
>
>>> Indeed, supporting creation of locks dynamically and allowing them to
>>> be passed around arbitrarily would be hard (and likely not worth the
>>> effort). Naming them is trivial though, which might not be incredibly
>>> pythonic but is very convenient, easy and readable.
>>
>> You can view this as a lookup by name, not a lock creation. Not
>> allowing them to be used outside of a with clause is a reasonable
>> restriction, and does not preclude a (possibly very distant) extension
>> to being able to pass them around.
>>
>>>> barrier, if supported, should be a function call not a
>>>> context. Not as critical as with the tasks case, but a good example to
>>>> see how it flows would be useful here as well.
>>>
>>> I agree, it really doesn't have any associated code and trying to
>>> associate code with it is likely more confusing than meaningful. It
>>> was just an idea.
>>> Often you can rely on implicit barriers from e.g. prange, but not
>>> always. I can't think of any real-world example, but you usually need
>>> it to ensure that everyone gets a sane view on some shared data, e.g.
>>>
>>> with nogil, parallel():
>>>     array[threadid()] = func(threadid())
>>>     barrier()
>>>     use array[threadid() + 1 % omp_num_threads()] # access data of
>>> some neighbour
>>>
>>> This is a rather contrived example, but (see below) it would be
>>> especially useful if you use single/master/once/first that sets some
>>> shared data everyone will operate on (for instance in a prange). To
>>> ensure the data is sane before you use it, you have to put the barrier
>>> to 1) ensure the data has been written and 2) that the data has been
>>> flushed.
>>>
>>> Basically, you'll always know when you need a barrier, but it's pretty
>>> hard to come up with a real-world example for it when you have to :)
>>
>> Yes, I think barriers are explanatory enough.
>>
>>>> As for single, I see doing this manually does require boilerplate
>>>> locking, so what about
>>>>
>>>> if cython.parallel.once():  # will return True once for a tread group.
>>>>     ...
>>>>
>>>> we could implement this via our own locking/checking/flushing to allow
>>>> it to occur in arbitrary expressions, e.g.
>>>>
>>>> special_worker = cython.parallel.once()
>>>> if special_worker:
>>>>    ...
>>>> [common code]
>>>> if special_worker:   # single wouldn't work here
>>>>    ...
>>>>
>>>
>>> That looks OK. I've actually been thinking that if we have barriers we
>>> don't really need is_master(), once() or single() or anything. We
>>> already have threadid() and you usually don't care what thread gets
>>> there first, you only care about doing it once. So one could just
>>> write
>>>
>>> if parallel.threadid() == 0:
>>>     ...
>>>
>>> parallel.barrier() # if required
>>
>> Perhaps you want the first free thread to take it up to minimize idle
>> threads. I agree if parallel.threadid() == 0 is a synonym for
>> is_master(), so probably not needed. However, what are the OpenMP
>> semantics of
>>
>> cdef f():
>>     with parallel():
>>         g()
>>         g()
>>
>> cdef g():
>>     with single():
>>         ... # executed once, right?
>>     with task:
>>         ... # executed twice, right?
>>
>>> It might also be convenient to declare variables explicitly shared
>>> here, e.g. this code will not work:
>>>
>>> cdef int *buf
>>>
>>> with nogil, parallel.parallel():
>>>     if parallel.threadid() == 0:
>>>         buf = ...
>>>
>>>     parallel.barrier()
>>>
>>>     # will will likely segfault, as buf is private because we assigned
>>> to it. It's only valid in thread 0
>>>     use buf[...]
>>>
>>> So basically you'd have to do something like (&buf)[0][...], which
>>> frankly looks pretty weird. However I do think such cases are rather
>>> uncommon.
>>
>> True. Perhaps this could be declared via "with nogil,
>> parallel.parallel(), parallel.shared(buf)" or something like that.
>>
>> - Robert
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel


From markflorisson88 at gmail.com  Thu Oct 20 11:13:49 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Thu, 20 Oct 2011 10:13:49 +0100
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <4E9FDEE7.2070301@astro.uio.no>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
	<CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>
	<4E955180.1070601@astro.uio.no>
	<CADiQ+QD3ic619jYEEC7mJ6U=EwWna7jOc0TDYhWDBPCfW=8kvg@mail.gmail.com>
	<CANg26EVx0EZxRdC15tjKc+QVFaQjTT_yozZe47rLAQCs9aoH4Q@mail.gmail.com>
	<CADiQ+QDHQnvtRMWVUiPRJC+ZsWg7J_gRw0GCzfTqARPL33-92A@mail.gmail.com>
	<CANg26EVvVKGmfAbOJSiOVvWC-nrrCWsiRPOhKBKEmyfFGzqkaQ@mail.gmail.com>
	<CADiQ+QC+8Rz2LyjM-BT8jp3kDba5JX0u+rV3xwbMmcOneCa4rA@mail.gmail.com>
	<CANg26EXQz6aw4dCDkefikiTHwSUdXV41ngY15nrMJ6g6Kk3Kqg@mail.gmail.com>
	<4E9FDEE7.2070301@astro.uio.no>
Message-ID: <CANg26EWDzYh2c04D96mj7bcsf+VEKk61kYg6T=q0a36utGFVZA@mail.gmail.com>

On 20 October 2011 09:42, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> Meta: I've been meaning to respond to this thread, but can't find the time.
> What's the time-frame for implementing this? If it's hypothetical at the
> moment and just is a question of getting things spec-ed, one could perhaps
> look at discussing it at the next Cython workshop, or perhaps a Skype call
> with the three of us as some point...

For me this is just about getting this spec-ed, so that when someone
finds the time, we don't need to discuss it for weeks first. And the
implementor won't necessarily have to support everything at once, e.g.
just critical sections or barriers alone would be nice.

Is there any plan for a new workshop then? Because if it's in two
years I think we could be more time-efficient :)

> Regarding the tasks: One of my biggest problems with Python is the lack of
> an elegant syntax for anonymous functions. But since Python has that
> problem, I feel it is not necesarrily something we should fix (by using the
> with statements to create tasks). Sometimes Pythonic-ness is more important
> than elegance (for Cython).

I agree it's not something we should fix, I just think tasks are most
useful in inline blocks and not in separate functions or closures.
Although it could certainly work, I think it restricts more, leads to
more verbose code and possibly questionable semantics, and on top of
that it would be a pain to implement (although that should not be used
as a persuasive argument). I'm not saying there is no elegant way
other than with blocks, I'm just saying that I think closures are not
the right thing for it.

> In general I'm happy as long as there's a chance of getting things to work
> in pure Python mode as well (with serial execution). So if, e.g., with
> statements creating tasks have the same effect when running the same code
> (serially) in pure Python, I'm less opposed (didn't look at it in detail).

Yes, it would have the same effect. The thing with tasks (and OpenMP
constructs in general) is that usually if your compiler ignores all
your pragmas, your code just runs serially in the same way. The same
would be true for the tasks in with blocks.

> Dag Sverre
>
> On 10/19/2011 09:53 PM, mark florisson wrote:
>>
>> On 19 October 2011 06:01, Robert Bradshaw<robertwb at math.washington.edu>
>> ?wrote:
>>>
>>> On Fri, Oct 14, 2011 at 1:07 PM, mark florisson
>>> <markflorisson88 at gmail.com> ?wrote:
>>>>
>>>> On 14 October 2011 19:31, Robert Bradshaw<robertwb at math.washington.edu>
>>>> ?wrote:
>>>>>
>>>>> On Wed, Oct 12, 2011 at 7:55 AM, mark florisson
>>>>> <markflorisson88 at gmail.com> ?wrote:
>>>>>>>>
>>>>>>>> I ultimately feel things like that is more important than 100%
>>>>>>>> coverage of
>>>>>>>> the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit.
>>>>>>>
>>>>>>> +1 Prange handles the (corse-grained) SIMD case nicely, and a
>>>>>>> task/futures model based on closures would I think flesh this out to
>>>>>>> the next level of generality (and complexity).
>>>>>>
>>>>>> Futures are definitely nice. I suppose I think really like "inline
>>>>>> futures", i.e. openmp tasks. I realize that futures may look more
>>>>>> pythonic. However, as mentioned previously, I also see issues with
>>>>>> that. When you submit a task then you expect a future object, which
>>>>>> you might want to pass around. But we don't have the GIL for that. I
>>>>>> personally feel that futures is something that should be done by a
>>>>>> library (such as concurrent.futures in python 3.2), and inline tasks
>>>>>> by a language. It also means I have to write an entire function or
>>>>>> closure for perhaps only a few lines of code.
>>>>>>
>>>>>> I might also want to submit other functions that are not closures, or
>>>>>> I might want to reuse my closures that are used for tasks and for
>>>>>> something else. So what if my tasks contain more parallel constructs?
>>>>>> e.g. what if I have a task closure that I return from my function that
>>>>>> generates more tasks itself? Would you just execute them sequentially
>>>>>> outside of the parallel construct, or would you simply disallow that?
>>>>>> Also, do you restrict future "objects" to only the parallel section?
>>>>>>
>>>>>> Another problem is that you can only wait on tasks of your direct
>>>>>> children. So what if I get access to my parent's future object
>>>>>> (assuming you allow tasks to generate tasks), and then want the result
>>>>>> of my parent?
>>>>>> Or what if I store these future objects in an array or list and access
>>>>>> them arbitrarily? You will only know at runtime which task to wait on,
>>>>>> and openmp only has a static, lexical taskwait.
>>>>>>
>>>>>> I suppose my point is that without either a drastic rewrite (e.g., use
>>>>>> pthreads instead of openmp) or quite a bit of contraints, I am unsure
>>>>>> how futures would work here. Perhaps you guys have some concrete
>>>>>> syntax and semantics proposals?
>>>>>
>>>>> It feels to me that OpenMP tasks took a different model of parallelism
>>>>> and tried to force them into the OpenMP model/constraints, and so it'd
>>>>> be even more difficult to fit them into a nice pythonic interface.
>>>>> Perhaps to make progress on this front we need to have a concrete
>>>>> example to look at. I'm also wondering if the standard threading
>>>>> module (perhaps with overlay support) used with nogil functions would
>>>>> be sufficient--locking is required for handling the queues, etc. so
>>>>> the fact that the GIL is involved is not a big deal. It is possible
>>>>> that this won't scale to as small of work units, but the overhead
>>>>> should be minimal once your work unit is a sufficient size (which is
>>>>> probably quite small) and it's already implemented and well
>>>>> documented/used.
>>>>
>>>> It's all definitely possible with normal threads, but the thing you
>>>> lose is convenience and conciseness. For big problems the programmer
>>>> might sum up the courage and effort to implement it, but typically you
>>>> will just stick to a serial version. This is really where OpenMP is
>>>> powerful, you can take a simple sequential piece of code and make it
>>>> parallel with minimal effort and without having to restructure,
>>>> rethink and rewrite your algorithms.
>>>
>>> That is a very good point.
>>>
>>>> Something like concurrent.futures is definitely nice, but most people
>>>> cannot afford to mandate python 3.2 for their users.
>>>>
>>>> The most classical examples I can think of for tasks are
>>>>
>>>> 1) independent code sections, i.e. two or more pieces of code that
>>>> don't depend on each other which you want to execute in parallel
>>>> 2) traversal of some kind of custom data structure, like a tree or a
>>>> linked list
>>>> 3) some kind of other producer/consumer model
>>>>
>>>> e.g. using with task syntax:
>>>>
>>>> cdef postorder_traverse(tree *t): # bullet 1) and 2)
>>>> ? ?with task:
>>>> ? ? ? ?traverse(t.left)
>>>> ? ?with task:
>>>> ? ? ? ?traverse(t.right)
>>>>
>>>> ? ?taskwait() # wait until we traversed our subtrees
>>>> ? ?use(t.data)
>>>
>>> Is there an implicit parallel block here? Perhaps in the caller?
>>>
>>>> cdef list_traverse(linkedlist *L): # bullet 2)
>>>> ? ?with nogil, parallel():
>>>> ? ? ? ?if threadid() == 0:
>>>> ? ? ? ? ? ?while L.next:
>>>> ? ? ? ? ? ? ? ?with task:
>>>> ? ? ? ? ? ? ? ? ? ?do_something(L.data)
>>>>
>>>> In the latter case we don't need a taskwait as we don't care about any
>>>> particular order. Only one thread generates the tasks where the others
>>>> just hit the barrier and see the tasks they can execute.
>>>
>>> I guess it's the fact that Python doesn't have a nice syntax for
>>> anonymous functions or blocks does make this syntax more appealing
>>> than an explicit closure.
>>>
>>> Perhaps if we came up with a more pythonic/natural name which would
>>> make the intent clear. Makes me want to do something like
>>>
>>> pool = ThreadPool(10)
>>> for item in L:
>>> ? ?with pool:
>>> ? ? ? ?process(item)
>>>
>>> but then you get into issues of passing the pool around. OpenMP has
>>> the implicit pool of the nesting parallel block, so "with one thread"
>>> or "with cython.parallel.pool" or something like that might be more
>>> readable.
>>>
>>>> The good thing is that the OpenMP runtime can decide at task
>>>> generation point (not only at taskwait or barrier points!) decide to
>>>> stop generating more tasks and start executing them. So you won't
>>>> exhaust memory if you might have lots of tasks.
>>>
>>> Often threadpools have queues that block when their buffer gets full
>>> to achieve the same goal.
>>>
>>>>> As for critical and barrier, the notion of a critical block as a with
>>>>> statement is very useful. Creating/naming locks (rather than being
>>>>> implicit on the file/line number) is more powerful, but is a larger
>>>>> burden on the user and more difficult to support with the OpenMP
>>>>> backend.
>>>>
>>>> Actually, as I mentioned before, critical sections do not at all
>>>> depend on their line or file number. All they depend on their implicit
>>>> or explicit name (the name is implicit when you simply omit it, so all
>>>> unnamed critical sections exclude each other).
>>>
>>> Ah, yes. In this case "with cython.parallel.lock([optional name])"
>>> could be obvious enough.
>>
>> We could also support atomic updates. We could either rewrite
>> parallel.lock() blocks to atomics if all statements use inplace
>> operators, but that might actually not be safe as the exclusion might
>> be used for the rhs expressions. So I think you'd want a
>> parallel.atomic() directive or some such.
>> Alternatively, if you support parallel.shared(), you could specify
>> that inplace operators on any such variables would actually be atomic
>> updates, even if you use the operators on the elements of the shared
>> variable. e.g.
>>
>> cdef int array1[N]
>> cdef int array2[N]
>> with parallel(), shared(array1):
>> ? ? # atomic update
>> ? ? array1[i] += ...
>>
>> ? ? # not an atomic update, as it is "implicitly shared"
>> ? ? array2[i] += ...
>>
>> I'm not sure if that's more confusing than enlightening though.
>>
>>>> Indeed, supporting creation of locks dynamically and allowing them to
>>>> be passed around arbitrarily would be hard (and likely not worth the
>>>> effort). Naming them is trivial though, which might not be incredibly
>>>> pythonic but is very convenient, easy and readable.
>>>
>>> You can view this as a lookup by name, not a lock creation. Not
>>> allowing them to be used outside of a with clause is a reasonable
>>> restriction, and does not preclude a (possibly very distant) extension
>>> to being able to pass them around.
>>>
>>>>> barrier, if supported, should be a function call not a
>>>>> context. Not as critical as with the tasks case, but a good example to
>>>>> see how it flows would be useful here as well.
>>>>
>>>> I agree, it really doesn't have any associated code and trying to
>>>> associate code with it is likely more confusing than meaningful. It
>>>> was just an idea.
>>>> Often you can rely on implicit barriers from e.g. prange, but not
>>>> always. I can't think of any real-world example, but you usually need
>>>> it to ensure that everyone gets a sane view on some shared data, e.g.
>>>>
>>>> with nogil, parallel():
>>>> ? ?array[threadid()] = func(threadid())
>>>> ? ?barrier()
>>>> ? ?use array[threadid() + 1 % omp_num_threads()] # access data of
>>>> some neighbour
>>>>
>>>> This is a rather contrived example, but (see below) it would be
>>>> especially useful if you use single/master/once/first that sets some
>>>> shared data everyone will operate on (for instance in a prange). To
>>>> ensure the data is sane before you use it, you have to put the barrier
>>>> to 1) ensure the data has been written and 2) that the data has been
>>>> flushed.
>>>>
>>>> Basically, you'll always know when you need a barrier, but it's pretty
>>>> hard to come up with a real-world example for it when you have to :)
>>>
>>> Yes, I think barriers are explanatory enough.
>>>
>>>>> As for single, I see doing this manually does require boilerplate
>>>>> locking, so what about
>>>>>
>>>>> if cython.parallel.once(): ?# will return True once for a tread group.
>>>>> ? ?...
>>>>>
>>>>> we could implement this via our own locking/checking/flushing to allow
>>>>> it to occur in arbitrary expressions, e.g.
>>>>>
>>>>> special_worker = cython.parallel.once()
>>>>> if special_worker:
>>>>> ? ...
>>>>> [common code]
>>>>> if special_worker: ? # single wouldn't work here
>>>>> ? ...
>>>>>
>>>>
>>>> That looks OK. I've actually been thinking that if we have barriers we
>>>> don't really need is_master(), once() or single() or anything. We
>>>> already have threadid() and you usually don't care what thread gets
>>>> there first, you only care about doing it once. So one could just
>>>> write
>>>>
>>>> if parallel.threadid() == 0:
>>>> ? ?...
>>>>
>>>> parallel.barrier() # if required
>>>
>>> Perhaps you want the first free thread to take it up to minimize idle
>>> threads. I agree if parallel.threadid() == 0 is a synonym for
>>> is_master(), so probably not needed. However, what are the OpenMP
>>> semantics of
>>>
>>> cdef f():
>>> ? ?with parallel():
>>> ? ? ? ?g()
>>> ? ? ? ?g()
>>>
>>> cdef g():
>>> ? ?with single():
>>> ? ? ? ?... # executed once, right?
>>> ? ?with task:
>>> ? ? ? ?... # executed twice, right?
>>>
>>>> It might also be convenient to declare variables explicitly shared
>>>> here, e.g. this code will not work:
>>>>
>>>> cdef int *buf
>>>>
>>>> with nogil, parallel.parallel():
>>>> ? ?if parallel.threadid() == 0:
>>>> ? ? ? ?buf = ...
>>>>
>>>> ? ?parallel.barrier()
>>>>
>>>> ? ?# will will likely segfault, as buf is private because we assigned
>>>> to it. It's only valid in thread 0
>>>> ? ?use buf[...]
>>>>
>>>> So basically you'd have to do something like (&buf)[0][...], which
>>>> frankly looks pretty weird. However I do think such cases are rather
>>>> uncommon.
>>>
>>> True. Perhaps this could be declared via "with nogil,
>>> parallel.parallel(), parallel.shared(buf)" or something like that.
>>>
>>> - Robert
>>> _______________________________________________
>>> cython-devel mailing list
>>> cython-devel at python.org
>>> http://mail.python.org/mailman/listinfo/cython-devel
>>>
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From d.s.seljebotn at astro.uio.no  Thu Oct 20 11:35:50 2011
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Thu, 20 Oct 2011 11:35:50 +0200
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <CANg26EWDzYh2c04D96mj7bcsf+VEKk61kYg6T=q0a36utGFVZA@mail.gmail.com>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>	<4E919100.8020801@astro.uio.no>
	<4E919A40.2090001@astro.uio.no>	<CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>	<4E955180.1070601@astro.uio.no>	<CADiQ+QD3ic619jYEEC7mJ6U=EwWna7jOc0TDYhWDBPCfW=8kvg@mail.gmail.com>	<CANg26EVx0EZxRdC15tjKc+QVFaQjTT_yozZe47rLAQCs9aoH4Q@mail.gmail.com>	<CADiQ+QDHQnvtRMWVUiPRJC+ZsWg7J_gRw0GCzfTqARPL33-92A@mail.gmail.com>	<CANg26EVvVKGmfAbOJSiOVvWC-nrrCWsiRPOhKBKEmyfFGzqkaQ@mail.gmail.com>	<CADiQ+QC+8Rz2LyjM-BT8jp3kDba5JX0u+rV3xwbMmcOneCa4rA@mail.gmail.com>	<CANg26EXQz6aw4dCDkefikiTHwSUdXV41ngY15nrMJ6g6Kk3Kqg@mail.gmail.com>	<4E9FDEE7.2070301@astro.uio.no>
	<CANg26EWDzYh2c04D96mj7bcsf+VEKk61kYg6T=q0a36utGFVZA@mail.gmail.com>
Message-ID: <4E9FEB76.9040208@astro.uio.no>

On 10/20/2011 11:13 AM, mark florisson wrote:
> On 20 October 2011 09:42, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no>  wrote:
>> Meta: I've been meaning to respond to this thread, but can't find the time.
>> What's the time-frame for implementing this? If it's hypothetical at the
>> moment and just is a question of getting things spec-ed, one could perhaps
>> look at discussing it at the next Cython workshop, or perhaps a Skype call
>> with the three of us as some point...
>
> For me this is just about getting this spec-ed, so that when someone
> finds the time, we don't need to discuss it for weeks first. And the
> implementor won't necessarily have to support everything at once, e.g.
> just critical sections or barriers alone would be nice.
>
> Is there any plan for a new workshop then? Because if it's in two
> years I think we could be more time-efficient :)

At least in William's grant there's plans for 2-3 Cython workshops, so 
hopefully there's funding for one next year if we want to. We should ask 
him before planning anything though.

>> Regarding the tasks: One of my biggest problems with Python is the lack of
>> an elegant syntax for anonymous functions. But since Python has that
>> problem, I feel it is not necesarrily something we should fix (by using the
>> with statements to create tasks). Sometimes Pythonic-ness is more important
>> than elegance (for Cython).
>
> I agree it's not something we should fix, I just think tasks are most
> useful in inline blocks and not in separate functions or closures.
> Although it could certainly work, I think it restricts more, leads to
> more verbose code and possibly questionable semantics, and on top of
> that it would be a pain to implement (although that should not be used
> as a persuasive argument). I'm not saying there is no elegant way
> other than with blocks, I'm just saying that I think closures are not
> the right thing for it.
>
>> In general I'm happy as long as there's a chance of getting things to work
>> in pure Python mode as well (with serial execution). So if, e.g., with
>> statements creating tasks have the same effect when running the same code
>> (serially) in pure Python, I'm less opposed (didn't look at it in detail).
>
> Yes, it would have the same effect. The thing with tasks (and OpenMP
> constructs in general) is that usually if your compiler ignores all
> your pragmas, your code just runs serially in the same way. The same
> would be true for the tasks in with blocks.

Short note: I like the vision of Konrad Hinsen:

http://www.euroscipy.org/talk/2011

The core idea is that the "task-ness" of a block of code is orthogonal 
to the place you actually write it. That is, a block of code may often 
either be fit for execution as a task, or not, depending on how heavy it 
is (= values of arguments it takes in, not its contents).

He introduces the "async" expression to drive this point through.

I think "with task" is fine if used in this way, if you simply call a 
function (which itself doesn't know whether it is a task or not). But 
once you start to implement an entire function within the with-statement 
there's a code-smell.

Anyway, it's growing on me. But I think his "async" expression is more 
Pythonic in the way that it forces you away from making your code smell.

We could simply have

async(func)(arg, arg2, somekwarg=4)


(He also says "functional-style programming is better for parallization 
than threads+locks", which I can kind of agree with but nobody tried to 
make an efficient immutable array implementation suitable for numerical 
computation yet to my knowledge... that's an interesting MSc or 
PhD-topic, but I already have one :-) )

(Look at me, going along discussing when I really shouldn't -- see you 
later.)

Dag Sverre

From markflorisson88 at gmail.com  Thu Oct 20 14:51:19 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Thu, 20 Oct 2011 13:51:19 +0100
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <4E9FEB76.9040208@astro.uio.no>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
	<CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>
	<4E955180.1070601@astro.uio.no>
	<CADiQ+QD3ic619jYEEC7mJ6U=EwWna7jOc0TDYhWDBPCfW=8kvg@mail.gmail.com>
	<CANg26EVx0EZxRdC15tjKc+QVFaQjTT_yozZe47rLAQCs9aoH4Q@mail.gmail.com>
	<CADiQ+QDHQnvtRMWVUiPRJC+ZsWg7J_gRw0GCzfTqARPL33-92A@mail.gmail.com>
	<CANg26EVvVKGmfAbOJSiOVvWC-nrrCWsiRPOhKBKEmyfFGzqkaQ@mail.gmail.com>
	<CADiQ+QC+8Rz2LyjM-BT8jp3kDba5JX0u+rV3xwbMmcOneCa4rA@mail.gmail.com>
	<CANg26EXQz6aw4dCDkefikiTHwSUdXV41ngY15nrMJ6g6Kk3Kqg@mail.gmail.com>
	<4E9FDEE7.2070301@astro.uio.no>
	<CANg26EWDzYh2c04D96mj7bcsf+VEKk61kYg6T=q0a36utGFVZA@mail.gmail.com>
	<4E9FEB76.9040208@astro.uio.no>
Message-ID: <CANg26EWtroB2gcoQu0E=8nUv3p_QpvR82zBP80HWMGsJ70kEWw@mail.gmail.com>

On 20 October 2011 10:35, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 10/20/2011 11:13 AM, mark florisson wrote:
>>
>> On 20 October 2011 09:42, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no> ?wrote:
>>>
>>> Meta: I've been meaning to respond to this thread, but can't find the
>>> time.
>>> What's the time-frame for implementing this? If it's hypothetical at the
>>> moment and just is a question of getting things spec-ed, one could
>>> perhaps
>>> look at discussing it at the next Cython workshop, or perhaps a Skype
>>> call
>>> with the three of us as some point...
>>
>> For me this is just about getting this spec-ed, so that when someone
>> finds the time, we don't need to discuss it for weeks first. And the
>> implementor won't necessarily have to support everything at once, e.g.
>> just critical sections or barriers alone would be nice.
>>
>> Is there any plan for a new workshop then? Because if it's in two
>> years I think we could be more time-efficient :)
>
> At least in William's grant there's plans for 2-3 Cython workshops, so
> hopefully there's funding for one next year if we want to. We should ask him
> before planning anything though.
>
>>> Regarding the tasks: One of my biggest problems with Python is the lack
>>> of
>>> an elegant syntax for anonymous functions. But since Python has that
>>> problem, I feel it is not necesarrily something we should fix (by using
>>> the
>>> with statements to create tasks). Sometimes Pythonic-ness is more
>>> important
>>> than elegance (for Cython).
>>
>> I agree it's not something we should fix, I just think tasks are most
>> useful in inline blocks and not in separate functions or closures.
>> Although it could certainly work, I think it restricts more, leads to
>> more verbose code and possibly questionable semantics, and on top of
>> that it would be a pain to implement (although that should not be used
>> as a persuasive argument). I'm not saying there is no elegant way
>> other than with blocks, I'm just saying that I think closures are not
>> the right thing for it.
>>
>>> In general I'm happy as long as there's a chance of getting things to
>>> work
>>> in pure Python mode as well (with serial execution). So if, e.g., with
>>> statements creating tasks have the same effect when running the same code
>>> (serially) in pure Python, I'm less opposed (didn't look at it in
>>> detail).
>>
>> Yes, it would have the same effect. The thing with tasks (and OpenMP
>> constructs in general) is that usually if your compiler ignores all
>> your pragmas, your code just runs serially in the same way. The same
>> would be true for the tasks in with blocks.
>
> Short note: I like the vision of Konrad Hinsen:
>
> http://www.euroscipy.org/talk/2011
>
> The core idea is that the "task-ness" of a block of code is orthogonal to
> the place you actually write it. That is, a block of code may often either
> be fit for execution as a task, or not, depending on how heavy it is (=
> values of arguments it takes in, not its contents).
>
> He introduces the "async" expression to drive this point through.
>
> I think "with task" is fine if used in this way, if you simply call a
> function (which itself doesn't know whether it is a task or not). But once
> you start to implement an entire function within the with-statement there's
> a code-smell.

Definitely, do you'd just call the function from the task.

> Anyway, it's growing on me. But I think his "async" expression is more
> Pythonic in the way that it forces you away from making your code smell.
>
> We could simply have
>
> async(func)(arg, arg2, somekwarg=4)
>

That looks good. The question is, does this constitute an expression
or a statement? If it's an expression, then you expect a meaningful
return value, which means you're going to have to wait for the task to
complete. That would be fine if you submit multiple tasks in one
expression, from the slides:

    max(async expr1, async expr2)

or even

    [async expr for ... in ...]

I must say, it does look really elegant and it doesn't leave the user
to question when the task is executed (and if you need a taskwait
directive to wait for your variables to become defined). What I don't
see is how to do the producer consumer trick, unless you regard using
the result of async as a taskwait, and not using it as not having a
taskwait, e.g.

async func(...) # generate a task and don't wait for it
result = async func(...) # generate a task and wait for it.

The latter is not useful unless you have multiple expressions in one
statement, so we should also allow result1, result2 = async
func(data=a), async func(data=b).

I think you would need special support for the expression form to work
in multiple places, e.g. as a start you could allow it as function
arguments, tuple expressions and possibly a nogil form of list
comprehensions. The statement form is a lot more simple, and as a
start synchronization must simply be done through barriers. If you
want to change additional data through mechanisms other than immediate
result collection you have to pass in pointers to the data.

I like the syntax of async(func)(arg, arg2, somekwarg=4), as it would
work in pure mode, and you can still have something that looks like a
normal function call, but makes it clear it has to be a function call.
Then at a later point you could decide to support 'with async():' :).

> (He also says "functional-style programming is better for parallization than
> threads+locks", which I can kind of agree with but nobody tried to make an
> efficient immutable array implementation suitable for numerical computation
> yet to my knowledge... that's an interesting MSc or PhD-topic, but I already
> have one :-) )
>
> (Look at me, going along discussing when I really shouldn't -- see you
> later.)
>
> Dag Sverre
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From adriangeologo at yahoo.es  Thu Oct 20 18:25:22 2011
From: adriangeologo at yahoo.es (=?ISO-8859-1?Q?Adrian_Mart=EDnez_Vargas?=)
Date: Thu, 20 Oct 2011 09:25:22 -0700
Subject: [Cython] ImportError: DLL load failed: The specified module
 could not be found.
In-Reply-To: <CAGRfJ0ZRwHUY44XWkn6g6ytwzpm4de+jYBZu1EWUisFdDWzyLg@mail.gmail.com>
References: <4E9F05D2.9050602@yahoo.es>
	<CAGRfJ0ZRwHUY44XWkn6g6ytwzpm4de+jYBZu1EWUisFdDWzyLg@mail.gmail.com>
Message-ID: <4EA04B72.2060907@yahoo.es>

Dear Alexander and cython list,

the same question, different approach

a) I compiled and installed my python module with these two commands
C:\Temporal\source\Cython_ext> python OK_setup_windows.py build_ext
C:\Temporal\source\Cython_ext> python OK_setup_windows.py install

b) Testing my module in IPython
---------------------------------------------------------------------------------------------------------
In [5]: cd c:\
c:\

In [6]: import okriging_py as ok
------------------------------------------------------------
Traceback (most recent call last):
   File "<ipython console>", line 1, in <module>
ImportError: DLL load failed: The specified module could not be found.


In [7]: cd C:\Python27\Lib\site-packages
C:\Python27\Lib\site-packages

In [8]: import okriging_py as ok

In [9]:
---------------------------------------------------------------------------------------------------------

as you can see the module works if we call it from the source directory.

my question are:
a) where is the problem
b) how to distribute my module without this (possible system 
configuration) error

my OS is  windows 7 (probably with win xp compatibility)

sorry about my ignorance (I am more Linux Debian user...)

Regards
Adrian


On 19/10/2011 12:45 PM, Alexander T. Berghage wrote:
> Adrian
>
> I'm a little unclear on the big picture here. Are you trying to
> distribute a module (a .pyd / .dll) that you or someone else can
> import from a .py script, or are you looking to compile a .exe that
> runs your cython code on execution?
>
> ----
>
> Just interpreting the error you're describing (ImportError: DLL load
> failed: could not be found),  the dynamic linker couldn't find a
> library it needed. Most likely this is either a symptom of missing
> dependencies or a path problem. Here's my suggestions for diagnosing
> and fixing the problem:
>
> Missing Dependencies:
>      One very simple way to confirm that all the dependencies of your
> cython module are
>      available is to point the dependency walker utility[1] at it, and
> look for missing DLLs.
>
> Directory Structure:
>      Is the .pyd file you built from your cython module in the
> PYTHONPATH (or your current
>      working directory? If it's not, there's your issue.
>
> [1]  http://www.dependencywalker.com/
>
>
> Hope that helps!
>
> Best,
> -Alex
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20111020/23304027/attachment.html>

From vitja.makarov at gmail.com  Fri Oct 21 11:44:35 2011
From: vitja.makarov at gmail.com (Vitja Makarov)
Date: Fri, 21 Oct 2011 13:44:35 +0400
Subject: [Cython] What's wrong with py3k pyregr tests?
Message-ID: <CAKGHGPTLNGE7pcWCxpxoEcfyB=H4=p+KE8eFsAYG9KOTLgBM6A@mail.gmail.com>

I tried to run pyregr tests on my localhost and it doesn't sigsegv.
Perhaps I should try compiled version of Cython.

Btw, I've implemented noargs super and now I want to see how does it
affect py3k-pyregr test results.

-- 
vitja.

From stefan_ml at behnel.de  Fri Oct 21 12:01:45 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 21 Oct 2011 12:01:45 +0200
Subject: [Cython] What's wrong with py3k pyregr tests?
In-Reply-To: <CAKGHGPTLNGE7pcWCxpxoEcfyB=H4=p+KE8eFsAYG9KOTLgBM6A@mail.gmail.com>
References: <CAKGHGPTLNGE7pcWCxpxoEcfyB=H4=p+KE8eFsAYG9KOTLgBM6A@mail.gmail.com>
Message-ID: <4EA14309.2070900@behnel.de>

Vitja Makarov, 21.10.2011 11:44:
> I tried to run pyregr tests on my localhost and it doesn't sigsegv.

It's a crash bug in the debug builds of the latest py3k branch.


> Perhaps I should try compiled version of Cython.

Yes, but it's not required to reproduce the crash.


> Btw, I've implemented noargs super and now I want to see how does it
> affect py3k-pyregr test results.

Cool. You can configure your branch jobs to use the optimised py3k builds 
(-opt) instead of the normal debug builds.

Stefan

From stefan_ml at behnel.de  Fri Oct 21 12:53:57 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 21 Oct 2011 12:53:57 +0200
Subject: [Cython] What's wrong with py3k pyregr tests?
In-Reply-To: <4EA14309.2070900@behnel.de>
References: <CAKGHGPTLNGE7pcWCxpxoEcfyB=H4=p+KE8eFsAYG9KOTLgBM6A@mail.gmail.com>
	<4EA14309.2070900@behnel.de>
Message-ID: <4EA14F45.4080707@behnel.de>

Stefan Behnel, 21.10.2011 12:01:
> Vitja Makarov, 21.10.2011 11:44:
>> I tried to run pyregr tests on my localhost and it doesn't sigsegv.
>
> It's a crash bug in the debug builds of the latest py3k branch.
>
>
>> Perhaps I should try compiled version of Cython.
>
> Yes, but it's not required to reproduce the crash.

Hmm, I may have been mistaken. At least it seems to be a problem with 
getattr(), which breaks the lookup of builtin names. My guess is that 
unicode hashing is broken in some way for str subtypes (as we use for names).

Stefan

From d.s.seljebotn at astro.uio.no  Fri Oct 21 19:43:35 2011
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Fri, 21 Oct 2011 19:43:35 +0200
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <CANg26EWtroB2gcoQu0E=8nUv3p_QpvR82zBP80HWMGsJ70kEWw@mail.gmail.com>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>	<4E919100.8020801@astro.uio.no>
	<4E919A40.2090001@astro.uio.no>	<CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>	<4E955180.1070601@astro.uio.no>	<CADiQ+QD3ic619jYEEC7mJ6U=EwWna7jOc0TDYhWDBPCfW=8kvg@mail.gmail.com>	<CANg26EVx0EZxRdC15tjKc+QVFaQjTT_yozZe47rLAQCs9aoH4Q@mail.gmail.com>	<CADiQ+QDHQnvtRMWVUiPRJC+ZsWg7J_gRw0GCzfTqARPL33-92A@mail.gmail.com>	<CANg26EVvVKGmfAbOJSiOVvWC-nrrCWsiRPOhKBKEmyfFGzqkaQ@mail.gmail.com>	<CADiQ+QC+8Rz2LyjM-BT8jp3kDba5JX0u+rV3xwbMmcOneCa4rA@mail.gmail.com>	<CANg26EXQz6aw4dCDkefikiTHwSUdXV41ngY15nrMJ6g6Kk3Kqg@mail.gmail.com>	<4E9FDEE7.2070301@astro.uio.no>	<CANg26EWDzYh2c04D96mj7bcsf+VEKk61kYg6T=q0a36utGFVZA@mail.gmail.com>	<4E9FEB76.9040208@astro.uio.no>
	<CANg26EWtroB2gcoQu0E=8nUv3p_QpvR82zBP80HWMGsJ70kEWw@mail.gmail.com>
Message-ID: <4EA1AF47.2090908@astro.uio.no>

On 10/20/2011 02:51 PM, mark florisson wrote:
> On 20 October 2011 10:35, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no>  wrote:
>> On 10/20/2011 11:13 AM, mark florisson wrote:
>>>
>>> On 20 October 2011 09:42, Dag Sverre Seljebotn
>>> <d.s.seljebotn at astro.uio.no>    wrote:
>>>>
>>>> Meta: I've been meaning to respond to this thread, but can't find the
>>>> time.
>>>> What's the time-frame for implementing this? If it's hypothetical at the
>>>> moment and just is a question of getting things spec-ed, one could
>>>> perhaps
>>>> look at discussing it at the next Cython workshop, or perhaps a Skype
>>>> call
>>>> with the three of us as some point...
>>>
>>> For me this is just about getting this spec-ed, so that when someone
>>> finds the time, we don't need to discuss it for weeks first. And the
>>> implementor won't necessarily have to support everything at once, e.g.
>>> just critical sections or barriers alone would be nice.
>>>
>>> Is there any plan for a new workshop then? Because if it's in two
>>> years I think we could be more time-efficient :)
>>
>> At least in William's grant there's plans for 2-3 Cython workshops, so
>> hopefully there's funding for one next year if we want to. We should ask him
>> before planning anything though.
>>
>>>> Regarding the tasks: One of my biggest problems with Python is the lack
>>>> of
>>>> an elegant syntax for anonymous functions. But since Python has that
>>>> problem, I feel it is not necesarrily something we should fix (by using
>>>> the
>>>> with statements to create tasks). Sometimes Pythonic-ness is more
>>>> important
>>>> than elegance (for Cython).
>>>
>>> I agree it's not something we should fix, I just think tasks are most
>>> useful in inline blocks and not in separate functions or closures.
>>> Although it could certainly work, I think it restricts more, leads to
>>> more verbose code and possibly questionable semantics, and on top of
>>> that it would be a pain to implement (although that should not be used
>>> as a persuasive argument). I'm not saying there is no elegant way
>>> other than with blocks, I'm just saying that I think closures are not
>>> the right thing for it.
>>>
>>>> In general I'm happy as long as there's a chance of getting things to
>>>> work
>>>> in pure Python mode as well (with serial execution). So if, e.g., with
>>>> statements creating tasks have the same effect when running the same code
>>>> (serially) in pure Python, I'm less opposed (didn't look at it in
>>>> detail).
>>>
>>> Yes, it would have the same effect. The thing with tasks (and OpenMP
>>> constructs in general) is that usually if your compiler ignores all
>>> your pragmas, your code just runs serially in the same way. The same
>>> would be true for the tasks in with blocks.
>>
>> Short note: I like the vision of Konrad Hinsen:
>>
>> http://www.euroscipy.org/talk/2011
>>
>> The core idea is that the "task-ness" of a block of code is orthogonal to
>> the place you actually write it. That is, a block of code may often either
>> be fit for execution as a task, or not, depending on how heavy it is (=
>> values of arguments it takes in, not its contents).
>>
>> He introduces the "async" expression to drive this point through.
>>
>> I think "with task" is fine if used in this way, if you simply call a
>> function (which itself doesn't know whether it is a task or not). But once
>> you start to implement an entire function within the with-statement there's
>> a code-smell.
>
> Definitely, do you'd just call the function from the task.
>
>> Anyway, it's growing on me. But I think his "async" expression is more
>> Pythonic in the way that it forces you away from making your code smell.
>>
>> We could simply have
>>
>> async(func)(arg, arg2, somekwarg=4)
>>
>
> That looks good. The question is, does this constitute an expression
> or a statement? If it's an expression, then you expect a meaningful
> return value, which means you're going to have to wait for the task to
> complete. That would be fine if you submit multiple tasks in one
> expression, from the slides:
>
>      max(async expr1, async expr2)
>
> or even
>
>      [async expr for ... in ...]
>
> I must say, it does look really elegant and it doesn't leave the user
> to question when the task is executed (and if you need a taskwait
> directive to wait for your variables to become defined). What I don't
> see is how to do the producer consumer trick, unless you regard using
> the result of async as a taskwait, and not using it as not having a
> taskwait, e.g.
>
> async func(...) # generate a task and don't wait for it
> result = async func(...) # generate a task and wait for it.
>
> The latter is not useful unless you have multiple expressions in one
> statement, so we should also allow result1, result2 = async
> func(data=a), async func(data=b).

I think the idea is that you have a transparent, implicit future. You 
block when you use the result; you are allowed to pass the result back 
to the caller without blocking, and the caller does not need to know 
whether it is a future or not.

Implemented in Python itself, the protocol would be something like 
INCREF/DECREF does not block, but all other operations do block.

Of course, this is rather hard to implement in present-day Cython. Options:

  a) Have async(func)(x) return a future, must call result().

  b) Make async part of the type spec, such as "cdef async int x". And 
coerce it to Python using a proxy. Seems messy, and going beyond what 
current Python semantics allow. But I do like it a bit better than 
explicit futures everywhere.

Dag Sverre

From markflorisson88 at gmail.com  Fri Oct 21 21:31:58 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Fri, 21 Oct 2011 20:31:58 +0100
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <4EA1AF47.2090908@astro.uio.no>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>
	<4E919100.8020801@astro.uio.no> <4E919A40.2090001@astro.uio.no>
	<CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>
	<4E955180.1070601@astro.uio.no>
	<CADiQ+QD3ic619jYEEC7mJ6U=EwWna7jOc0TDYhWDBPCfW=8kvg@mail.gmail.com>
	<CANg26EVx0EZxRdC15tjKc+QVFaQjTT_yozZe47rLAQCs9aoH4Q@mail.gmail.com>
	<CADiQ+QDHQnvtRMWVUiPRJC+ZsWg7J_gRw0GCzfTqARPL33-92A@mail.gmail.com>
	<CANg26EVvVKGmfAbOJSiOVvWC-nrrCWsiRPOhKBKEmyfFGzqkaQ@mail.gmail.com>
	<CADiQ+QC+8Rz2LyjM-BT8jp3kDba5JX0u+rV3xwbMmcOneCa4rA@mail.gmail.com>
	<CANg26EXQz6aw4dCDkefikiTHwSUdXV41ngY15nrMJ6g6Kk3Kqg@mail.gmail.com>
	<4E9FDEE7.2070301@astro.uio.no>
	<CANg26EWDzYh2c04D96mj7bcsf+VEKk61kYg6T=q0a36utGFVZA@mail.gmail.com>
	<4E9FEB76.9040208@astro.uio.no>
	<CANg26EWtroB2gcoQu0E=8nUv3p_QpvR82zBP80HWMGsJ70kEWw@mail.gmail.com>
	<4EA1AF47.2090908@astro.uio.no>
Message-ID: <CANg26EUe5qPTSfNXkgHVHsjb-0UdzXAukXcS7OXTVVjySdfppw@mail.gmail.com>

On 21 October 2011 18:43, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 10/20/2011 02:51 PM, mark florisson wrote:
>>
>> On 20 October 2011 10:35, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no> ?wrote:
>>>
>>> On 10/20/2011 11:13 AM, mark florisson wrote:
>>>>
>>>> On 20 October 2011 09:42, Dag Sverre Seljebotn
>>>> <d.s.seljebotn at astro.uio.no> ? ?wrote:
>>>>>
>>>>> Meta: I've been meaning to respond to this thread, but can't find the
>>>>> time.
>>>>> What's the time-frame for implementing this? If it's hypothetical at
>>>>> the
>>>>> moment and just is a question of getting things spec-ed, one could
>>>>> perhaps
>>>>> look at discussing it at the next Cython workshop, or perhaps a Skype
>>>>> call
>>>>> with the three of us as some point...
>>>>
>>>> For me this is just about getting this spec-ed, so that when someone
>>>> finds the time, we don't need to discuss it for weeks first. And the
>>>> implementor won't necessarily have to support everything at once, e.g.
>>>> just critical sections or barriers alone would be nice.
>>>>
>>>> Is there any plan for a new workshop then? Because if it's in two
>>>> years I think we could be more time-efficient :)
>>>
>>> At least in William's grant there's plans for 2-3 Cython workshops, so
>>> hopefully there's funding for one next year if we want to. We should ask
>>> him
>>> before planning anything though.
>>>
>>>>> Regarding the tasks: One of my biggest problems with Python is the lack
>>>>> of
>>>>> an elegant syntax for anonymous functions. But since Python has that
>>>>> problem, I feel it is not necesarrily something we should fix (by using
>>>>> the
>>>>> with statements to create tasks). Sometimes Pythonic-ness is more
>>>>> important
>>>>> than elegance (for Cython).
>>>>
>>>> I agree it's not something we should fix, I just think tasks are most
>>>> useful in inline blocks and not in separate functions or closures.
>>>> Although it could certainly work, I think it restricts more, leads to
>>>> more verbose code and possibly questionable semantics, and on top of
>>>> that it would be a pain to implement (although that should not be used
>>>> as a persuasive argument). I'm not saying there is no elegant way
>>>> other than with blocks, I'm just saying that I think closures are not
>>>> the right thing for it.
>>>>
>>>>> In general I'm happy as long as there's a chance of getting things to
>>>>> work
>>>>> in pure Python mode as well (with serial execution). So if, e.g., with
>>>>> statements creating tasks have the same effect when running the same
>>>>> code
>>>>> (serially) in pure Python, I'm less opposed (didn't look at it in
>>>>> detail).
>>>>
>>>> Yes, it would have the same effect. The thing with tasks (and OpenMP
>>>> constructs in general) is that usually if your compiler ignores all
>>>> your pragmas, your code just runs serially in the same way. The same
>>>> would be true for the tasks in with blocks.
>>>
>>> Short note: I like the vision of Konrad Hinsen:
>>>
>>> http://www.euroscipy.org/talk/2011
>>>
>>> The core idea is that the "task-ness" of a block of code is orthogonal to
>>> the place you actually write it. That is, a block of code may often
>>> either
>>> be fit for execution as a task, or not, depending on how heavy it is (=
>>> values of arguments it takes in, not its contents).
>>>
>>> He introduces the "async" expression to drive this point through.
>>>
>>> I think "with task" is fine if used in this way, if you simply call a
>>> function (which itself doesn't know whether it is a task or not). But
>>> once
>>> you start to implement an entire function within the with-statement
>>> there's
>>> a code-smell.
>>
>> Definitely, do you'd just call the function from the task.
>>
>>> Anyway, it's growing on me. But I think his "async" expression is more
>>> Pythonic in the way that it forces you away from making your code smell.
>>>
>>> We could simply have
>>>
>>> async(func)(arg, arg2, somekwarg=4)
>>>
>>
>> That looks good. The question is, does this constitute an expression
>> or a statement? If it's an expression, then you expect a meaningful
>> return value, which means you're going to have to wait for the task to
>> complete. That would be fine if you submit multiple tasks in one
>> expression, from the slides:
>>
>> ? ? max(async expr1, async expr2)
>>
>> or even
>>
>> ? ? [async expr for ... in ...]
>>
>> I must say, it does look really elegant and it doesn't leave the user
>> to question when the task is executed (and if you need a taskwait
>> directive to wait for your variables to become defined). What I don't
>> see is how to do the producer consumer trick, unless you regard using
>> the result of async as a taskwait, and not using it as not having a
>> taskwait, e.g.
>>
>> async func(...) # generate a task and don't wait for it
>> result = async func(...) # generate a task and wait for it.
>>
>> The latter is not useful unless you have multiple expressions in one
>> statement, so we should also allow result1, result2 = async
>> func(data=a), async func(data=b).
>
> I think the idea is that you have a transparent, implicit future. You block
> when you use the result; you are allowed to pass the result back to the
> caller without blocking, and the caller does not need to know whether it is
> a future or not.
>
> Implemented in Python itself, the protocol would be something like
> INCREF/DECREF does not block, but all other operations do block.
>
> Of course, this is rather hard to implement in present-day Cython. Options:
>
> ?a) Have async(func)(x) return a future, must call result().
>
> ?b) Make async part of the type spec, such as "cdef async int x". And coerce
> it to Python using a proxy. Seems messy, and going beyond what current
> Python semantics allow. But I do like it a bit better than explicit futures
> everywhere.

Interesting. However, what happens when I do

cdef async int x

x = async(func)(y)
x = async(func)(z)

print x

? You don't really know what x will be, as you don't know which task
will complete first. This case could be solved by having multiple
different future result storage locations, but what if I do this in a
loop?
You could just define that as a race condition though, but I would
expect the value from the task last specified.

What happens when you return an async value from the task? Do you get
"cdef async async int x"? Or what if you pass in an async variable as
async argument to a new task? Basically we have to restrict async
value usage to "direct parents only". I think it also makes sense to
restrict use to the parallel section/orphaned function only.

What I don't really like about such a declaration is that it's really
only async until you first use it, but you might not know until
runtime. So for every use there will be (a slight) overhead. Also, I
think it's common to just specify a bunch of tasks and then wait for
all of them just once (often but not always at an implicit barrier).
I'm afraid that if you want to implement this and use the result of
just one task, you will have to wait on all of them, which is somewhat
misleading. This is unfortunately all OpenMP provides, if we create
another backend that will probably not be true.

So there are two alternatives to this:

1) allow async without a result, pass in a pointer and once you want
to know results are defined, you wait for all (children) tasks to
finish
2) use mechanism 1) + allow multiple async expressions in a single
statement, e.g. in tuples, list comprehensions and as function call
parameters

Although it would be incompatible with pure mode, I do find an async
keyword more elegant than the function equivalent.

> Dag Sverre
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From adriangeologo at yahoo.es  Fri Oct 21 21:51:51 2011
From: adriangeologo at yahoo.es (=?ISO-8859-1?Q?Adrian_Mart=EDnez_Vargas?=)
Date: Fri, 21 Oct 2011 12:51:51 -0700
Subject: [Cython] ImportError: DLL load failed: The specified module
 could not be found.
In-Reply-To: <CAGRfJ0ZRwHUY44XWkn6g6ytwzpm4de+jYBZu1EWUisFdDWzyLg@mail.gmail.com>
References: <4E9F05D2.9050602@yahoo.es>
	<CAGRfJ0ZRwHUY44XWkn6g6ytwzpm4de+jYBZu1EWUisFdDWzyLg@mail.gmail.com>
Message-ID: <4EA1CD57.2090701@yahoo.es>

I'm 90% sure that the problem is that the pyd file is not registered (it 
works if I put the module in my working directory). I'm trying to 
registered in windows 7  with regsvr32 but don't work.

I need HELP guys!

Regards
Adrian

On 19/10/2011 12:45 PM, Alexander T. Berghage wrote:
> Adrian
>
> I'm a little unclear on the big picture here. Are you trying to
> distribute a module (a .pyd / .dll) that you or someone else can
> import from a .py script, or are you looking to compile a .exe that
> runs your cython code on execution?
>
> ----
>
> Just interpreting the error you're describing (ImportError: DLL load
> failed: could not be found),  the dynamic linker couldn't find a
> library it needed. Most likely this is either a symptom of missing
> dependencies or a path problem. Here's my suggestions for diagnosing
> and fixing the problem:
>
> Missing Dependencies:
>      One very simple way to confirm that all the dependencies of your
> cython module are
>      available is to point the dependency walker utility[1] at it, and
> look for missing DLLs.
>
> Directory Structure:
>      Is the .pyd file you built from your cython module in the
> PYTHONPATH (or your current
>      working directory? If it's not, there's your issue.
>
> [1]  http://www.dependencywalker.com/
>
>
> Hope that helps!
>
> Best,
> -Alex
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>


From markflorisson88 at gmail.com  Fri Oct 21 22:03:07 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Fri, 21 Oct 2011 21:03:07 +0100
Subject: [Cython] ImportError: DLL load failed: The specified module
 could not be found.
In-Reply-To: <4EA1CD57.2090701@yahoo.es>
References: <4E9F05D2.9050602@yahoo.es>
	<CAGRfJ0ZRwHUY44XWkn6g6ytwzpm4de+jYBZu1EWUisFdDWzyLg@mail.gmail.com>
	<4EA1CD57.2090701@yahoo.es>
Message-ID: <CANg26EXeBajXPScUJtVGZETFvgiyiH=F09T1sthagsvLF_0SZg@mail.gmail.com>

Sorry, most of us don't use Windows. In any case, this is something
that belongs on the cython-users list, please continue the discussion
there.

On 21 October 2011 20:51, Adrian Mart?nez Vargas <adriangeologo at yahoo.es> wrote:
> I'm 90% sure that the problem is that the pyd file is not registered (it
> works if I put the module in my working directory). I'm trying to registered
> in windows 7 ?with regsvr32 but don't work.
>
> I need HELP guys!
>
> Regards
> Adrian
>
> On 19/10/2011 12:45 PM, Alexander T. Berghage wrote:
>>
>> Adrian
>>
>> I'm a little unclear on the big picture here. Are you trying to
>> distribute a module (a .pyd / .dll) that you or someone else can
>> import from a .py script, or are you looking to compile a .exe that
>> runs your cython code on execution?
>>
>> ----
>>
>> Just interpreting the error you're describing (ImportError: DLL load
>> failed: could not be found), ?the dynamic linker couldn't find a
>> library it needed. Most likely this is either a symptom of missing
>> dependencies or a path problem. Here's my suggestions for diagnosing
>> and fixing the problem:
>>
>> Missing Dependencies:
>> ? ? One very simple way to confirm that all the dependencies of your
>> cython module are
>> ? ? available is to point the dependency walker utility[1] at it, and
>> look for missing DLLs.
>>
>> Directory Structure:
>> ? ? Is the .pyd file you built from your cython module in the
>> PYTHONPATH (or your current
>> ? ? working directory? If it's not, there's your issue.
>>
>> [1] ?http://www.dependencywalker.com/
>>
>>
>> Hope that helps!
>>
>> Best,
>> -Alex
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From vitja.makarov at gmail.com  Fri Oct 21 22:38:07 2011
From: vitja.makarov at gmail.com (Vitja Makarov)
Date: Sat, 22 Oct 2011 00:38:07 +0400
Subject: [Cython] What's wrong with py3k pyregr tests?
In-Reply-To: <4EA14F45.4080707@behnel.de>
References: <CAKGHGPTLNGE7pcWCxpxoEcfyB=H4=p+KE8eFsAYG9KOTLgBM6A@mail.gmail.com>
	<4EA14309.2070900@behnel.de> <4EA14F45.4080707@behnel.de>
Message-ID: <CAKGHGPQkjbVc0cfOJTqYiCJLFsJpsGFN1QNrPGJZxCV+oM9F_g@mail.gmail.com>

2011/10/21 Stefan Behnel <stefan_ml at behnel.de>:
> Stefan Behnel, 21.10.2011 12:01:
>>
>> Vitja Makarov, 21.10.2011 11:44:
>>>
>>> I tried to run pyregr tests on my localhost and it doesn't sigsegv.
>>
>> It's a crash bug in the debug builds of the latest py3k branch.
>>
>>
>>> Perhaps I should try compiled version of Cython.
>>
>> Yes, but it's not required to reproduce the crash.
>
> Hmm, I may have been mistaken. At least it seems to be a problem with
> getattr(), which breaks the lookup of builtin names. My guess is that
> unicode hashing is broken in some way for str subtypes (as we use for
> names).
>


I switched to py3k-opt and it worked! Now we got ~13K/265:

https://sage.math.washington.edu:8091/hudson/view/dev-vitek/job/cython-vitek-tests-pyregr-py3k-c/

-- 
vitja.

From d.s.seljebotn at astro.uio.no  Fri Oct 21 23:27:50 2011
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Fri, 21 Oct 2011 23:27:50 +0200
Subject: [Cython] cython.parallel tasks, single, master, critical,
	barriers
In-Reply-To: <CANg26EUe5qPTSfNXkgHVHsjb-0UdzXAukXcS7OXTVVjySdfppw@mail.gmail.com>
References: <CANg26EXnkmrxDLTXdJgkoftn7gb1BfOi1cCEHzMyPmaC2m3rCg@mail.gmail.com>	<4E919100.8020801@astro.uio.no>
	<4E919A40.2090001@astro.uio.no>	<CADiQ+QBqrGgkVdr1iH9iOxBZJwZL8PDwQVc-xzAeqYD5pC26nA@mail.gmail.com>	<4E955180.1070601@astro.uio.no>	<CADiQ+QD3ic619jYEEC7mJ6U=EwWna7jOc0TDYhWDBPCfW=8kvg@mail.gmail.com>	<CANg26EVx0EZxRdC15tjKc+QVFaQjTT_yozZe47rLAQCs9aoH4Q@mail.gmail.com>	<CADiQ+QDHQnvtRMWVUiPRJC+ZsWg7J_gRw0GCzfTqARPL33-92A@mail.gmail.com>	<CANg26EVvVKGmfAbOJSiOVvWC-nrrCWsiRPOhKBKEmyfFGzqkaQ@mail.gmail.com>	<CADiQ+QC+8Rz2LyjM-BT8jp3kDba5JX0u+rV3xwbMmcOneCa4rA@mail.gmail.com>	<CANg26EXQz6aw4dCDkefikiTHwSUdXV41ngY15nrMJ6g6Kk3Kqg@mail.gmail.com>	<4E9FDEE7.2070301@astro.uio.no>	<CANg26EWDzYh2c04D96mj7bcsf+VEKk61kYg6T=q0a36utGFVZA@mail.gmail.com>	<4E9FEB76.9040208@astro.uio.no>	<CANg26EWtroB2gcoQu0E=8nUv3p_QpvR82zBP80HWMGsJ70kEWw@mail.gmail.com>	<4EA1AF47.2090908@astro.uio.no>
	<CANg26EUe5qPTSfNXkgHVHsjb-0UdzXAukXcS7OXTVVjySdfppw@mail.gmail.com>
Message-ID: <4EA1E3D6.9040103@astro.uio.no>

On 10/21/2011 09:31 PM, mark florisson wrote:
> On 21 October 2011 18:43, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no>  wrote:
>> On 10/20/2011 02:51 PM, mark florisson wrote:
>>>
>>> On 20 October 2011 10:35, Dag Sverre Seljebotn
>>> <d.s.seljebotn at astro.uio.no>    wrote:
>>>>
>>>> On 10/20/2011 11:13 AM, mark florisson wrote:
>>>>>
>>>>> On 20 October 2011 09:42, Dag Sverre Seljebotn
>>>>> <d.s.seljebotn at astro.uio.no>      wrote:
>>>>>>
>>>>>> Meta: I've been meaning to respond to this thread, but can't find the
>>>>>> time.
>>>>>> What's the time-frame for implementing this? If it's hypothetical at
>>>>>> the
>>>>>> moment and just is a question of getting things spec-ed, one could
>>>>>> perhaps
>>>>>> look at discussing it at the next Cython workshop, or perhaps a Skype
>>>>>> call
>>>>>> with the three of us as some point...
>>>>>
>>>>> For me this is just about getting this spec-ed, so that when someone
>>>>> finds the time, we don't need to discuss it for weeks first. And the
>>>>> implementor won't necessarily have to support everything at once, e.g.
>>>>> just critical sections or barriers alone would be nice.
>>>>>
>>>>> Is there any plan for a new workshop then? Because if it's in two
>>>>> years I think we could be more time-efficient :)
>>>>
>>>> At least in William's grant there's plans for 2-3 Cython workshops, so
>>>> hopefully there's funding for one next year if we want to. We should ask
>>>> him
>>>> before planning anything though.
>>>>
>>>>>> Regarding the tasks: One of my biggest problems with Python is the lack
>>>>>> of
>>>>>> an elegant syntax for anonymous functions. But since Python has that
>>>>>> problem, I feel it is not necesarrily something we should fix (by using
>>>>>> the
>>>>>> with statements to create tasks). Sometimes Pythonic-ness is more
>>>>>> important
>>>>>> than elegance (for Cython).
>>>>>
>>>>> I agree it's not something we should fix, I just think tasks are most
>>>>> useful in inline blocks and not in separate functions or closures.
>>>>> Although it could certainly work, I think it restricts more, leads to
>>>>> more verbose code and possibly questionable semantics, and on top of
>>>>> that it would be a pain to implement (although that should not be used
>>>>> as a persuasive argument). I'm not saying there is no elegant way
>>>>> other than with blocks, I'm just saying that I think closures are not
>>>>> the right thing for it.
>>>>>
>>>>>> In general I'm happy as long as there's a chance of getting things to
>>>>>> work
>>>>>> in pure Python mode as well (with serial execution). So if, e.g., with
>>>>>> statements creating tasks have the same effect when running the same
>>>>>> code
>>>>>> (serially) in pure Python, I'm less opposed (didn't look at it in
>>>>>> detail).
>>>>>
>>>>> Yes, it would have the same effect. The thing with tasks (and OpenMP
>>>>> constructs in general) is that usually if your compiler ignores all
>>>>> your pragmas, your code just runs serially in the same way. The same
>>>>> would be true for the tasks in with blocks.
>>>>
>>>> Short note: I like the vision of Konrad Hinsen:
>>>>
>>>> http://www.euroscipy.org/talk/2011
>>>>
>>>> The core idea is that the "task-ness" of a block of code is orthogonal to
>>>> the place you actually write it. That is, a block of code may often
>>>> either
>>>> be fit for execution as a task, or not, depending on how heavy it is (=
>>>> values of arguments it takes in, not its contents).
>>>>
>>>> He introduces the "async" expression to drive this point through.
>>>>
>>>> I think "with task" is fine if used in this way, if you simply call a
>>>> function (which itself doesn't know whether it is a task or not). But
>>>> once
>>>> you start to implement an entire function within the with-statement
>>>> there's
>>>> a code-smell.
>>>
>>> Definitely, do you'd just call the function from the task.
>>>
>>>> Anyway, it's growing on me. But I think his "async" expression is more
>>>> Pythonic in the way that it forces you away from making your code smell.
>>>>
>>>> We could simply have
>>>>
>>>> async(func)(arg, arg2, somekwarg=4)
>>>>
>>>
>>> That looks good. The question is, does this constitute an expression
>>> or a statement? If it's an expression, then you expect a meaningful
>>> return value, which means you're going to have to wait for the task to
>>> complete. That would be fine if you submit multiple tasks in one
>>> expression, from the slides:
>>>
>>>      max(async expr1, async expr2)
>>>
>>> or even
>>>
>>>      [async expr for ... in ...]
>>>
>>> I must say, it does look really elegant and it doesn't leave the user
>>> to question when the task is executed (and if you need a taskwait
>>> directive to wait for your variables to become defined). What I don't
>>> see is how to do the producer consumer trick, unless you regard using
>>> the result of async as a taskwait, and not using it as not having a
>>> taskwait, e.g.
>>>
>>> async func(...) # generate a task and don't wait for it
>>> result = async func(...) # generate a task and wait for it.
>>>
>>> The latter is not useful unless you have multiple expressions in one
>>> statement, so we should also allow result1, result2 = async
>>> func(data=a), async func(data=b).
>>
>> I think the idea is that you have a transparent, implicit future. You block
>> when you use the result; you are allowed to pass the result back to the
>> caller without blocking, and the caller does not need to know whether it is
>> a future or not.
>>
>> Implemented in Python itself, the protocol would be something like
>> INCREF/DECREF does not block, but all other operations do block.
>>
>> Of course, this is rather hard to implement in present-day Cython. Options:
>>
>>   a) Have async(func)(x) return a future, must call result().
>>
>>   b) Make async part of the type spec, such as "cdef async int x". And coerce
>> it to Python using a proxy. Seems messy, and going beyond what current
>> Python semantics allow. But I do like it a bit better than explicit futures
>> everywhere.
>
> Interesting. However, what happens when I do
>
> cdef async int x
>
> x = async(func)(y)
> x = async(func)(z)
>
> print x
>
> ? You don't really know what x will be, as you don't know which task
> will complete first. This case could be solved by having multiple
> different future result storage locations, but what if I do this in a
> loop?
> You could just define that as a race condition though, but I would
> expect the value from the task last specified.

The only intuitive thing to me is that the first x is discarded and you 
block for the second. Yes, that means heap-allocation and reference 
counting (the async function holds a reference, which would be the only 
reference in the case above, so that when the first call returns the 
target heap-allocated int gets deallocated).

Really a better model for changing CPython than Cython..

> What happens when you return an async value from the task? Do you get
> "cdef async async int x"? Or what if you pass in an async variable as
> async argument to a new task? Basically we have to restrict async
> value usage to "direct parents only". I think it also makes sense to
> restrict use to the parallel section/orphaned function only.

No, I imagined these to be heap-allocated things, so you just pass 
around these heap-allocated wrappers containing i) something you wait on 
(pthread semaphore?), ii) refcount, iii) value storage.

(After all, the inspiration is Konrad's slides on "Python 4" (as he'd 
wish it)).

Yes, there's some performance penalty for every read, but there's 
penalty with any task really. Also control flow analysis will likely 
take you as far as one wait per function.

Though I'm still not convinced that channels in the way Go uses them 
aren't "better".

Dag Sverre

From stefan_ml at behnel.de  Sat Oct 22 06:58:39 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 22 Oct 2011 06:58:39 +0200
Subject: [Cython] What's wrong with py3k pyregr tests?
In-Reply-To: <4EA14F45.4080707@behnel.de>
References: <CAKGHGPTLNGE7pcWCxpxoEcfyB=H4=p+KE8eFsAYG9KOTLgBM6A@mail.gmail.com>	<4EA14309.2070900@behnel.de>
	<4EA14F45.4080707@behnel.de>
Message-ID: <4EA24D7F.80400@behnel.de>

Stefan Behnel, 21.10.2011 12:53:
> Stefan Behnel, 21.10.2011 12:01:
>> Vitja Makarov, 21.10.2011 11:44:
>>> I tried to run pyregr tests on my localhost and it doesn't sigsegv.
>>
>> It's a crash bug in the debug builds of the latest py3k branch.
>>
>>> Perhaps I should try compiled version of Cython.
>>
>> Yes, but it's not required to reproduce the crash.
>
> Hmm, I may have been mistaken. At least it seems to be a problem with
> getattr(), which breaks the lookup of builtin names. My guess is that
> unicode hashing is broken in some way for str subtypes (as we use for names).

I've sent a fix to python-dev, let's see when they get it in.

http://thread.gmane.org/gmane.comp.python.devel/127321

Stefan

From aberghage at gmail.com  Sun Oct 23 03:11:04 2011
From: aberghage at gmail.com (Alex Berghage)
Date: Sat, 22 Oct 2011 21:11:04 -0400
Subject: [Cython] ImportError: DLL load failed: The specified module
	could not be found.
In-Reply-To: <4EA1CD57.2090701@yahoo.es>
References: <4E9F05D2.9050602@yahoo.es>
	<CAGRfJ0ZRwHUY44XWkn6g6ytwzpm4de+jYBZu1EWUisFdDWzyLg@mail.gmail.com>
	<4EA1CD57.2090701@yahoo.es>
Message-ID: <D549265C-AF2B-40FB-B8AB-4172E1F6CD2F@gmail.com>

Adrian,

If the import works when the module is in your working directory, the problem is probably your path. Have you tried adding the folder containing the module to your PYTHONPATH environment variable?

Sent from my iPhone (please pardon brevity)

On Oct 21, 2011, at 3:51 PM, Adrian Mart?nez Vargas <adriangeologo at yahoo.es> wrote:

> I'm 90% sure that the problem is that the pyd file is not registered (it works if I put the module in my working directory). I'm trying to registered in windows 7  with regsvr32 but don't work.
> 
> I need HELP guys!
> 
> Regards
> Adrian
> 
> On 19/10/2011 12:45 PM, Alexander T. Berghage wrote:
>> Adrian
>> 
>> I'm a little unclear on the big picture here. Are you trying to
>> distribute a module (a .pyd / .dll) that you or someone else can
>> import from a .py script, or are you looking to compile a .exe that
>> runs your cython code on execution?
>> 
>> ----
>> 
>> Just interpreting the error you're describing (ImportError: DLL load
>> failed: could not be found),  the dynamic linker couldn't find a
>> library it needed. Most likely this is either a symptom of missing
>> dependencies or a path problem. Here's my suggestions for diagnosing
>> and fixing the problem:
>> 
>> Missing Dependencies:
>>     One very simple way to confirm that all the dependencies of your
>> cython module are
>>     available is to point the dependency walker utility[1] at it, and
>> look for missing DLLs.
>> 
>> Directory Structure:
>>     Is the .pyd file you built from your cython module in the
>> PYTHONPATH (or your current
>>     working directory? If it's not, there's your issue.
>> 
>> [1]  http://www.dependencywalker.com/
>> 
>> 
>> Hope that helps!
>> 
>> Best,
>> -Alex
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>> 
> 
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

From vitja.makarov at gmail.com  Sun Oct 23 08:39:24 2011
From: vitja.makarov at gmail.com (Vitja Makarov)
Date: Sun, 23 Oct 2011 10:39:24 +0400
Subject: [Cython] Compiler crash at parsing stage
Message-ID: <CAKGHGPSK2Tqr0GX04aWCdqShcgaWh8B3A+6BPyt2eqzTQXuu6Q@mail.gmail.com>

Hi!

This simple code crashes compiler:

lambda i=1: i

"""
  File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py",
line 122, in p_test
    return p_lambdef(s)
  File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py",
line 102, in p_lambdef
    s, terminator=':', annotated=False)
  File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py",
line 2741, in p_varargslist
    annotated = annotated)
  File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py",
line 2388, in p_c_arg_list
    annotated = annotated))
  File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py",
line 2435, in p_c_arg_decl
    print s.level
AttributeError: 'PyrexScanner' object has no attribute 'level'
"""

I'm not sure what's the best way to fix this.

-- 
vitja.

From stefan_ml at behnel.de  Sun Oct 23 10:15:52 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 23 Oct 2011 10:15:52 +0200
Subject: [Cython] Compiler crash at parsing stage
In-Reply-To: <CAKGHGPSK2Tqr0GX04aWCdqShcgaWh8B3A+6BPyt2eqzTQXuu6Q@mail.gmail.com>
References: <CAKGHGPSK2Tqr0GX04aWCdqShcgaWh8B3A+6BPyt2eqzTQXuu6Q@mail.gmail.com>
Message-ID: <4EA3CD38.90709@behnel.de>

Vitja Makarov, 23.10.2011 08:39:
> This simple code crashes compiler:
>
> lambda i=1: i
>
> """
>    File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py",
> line 122, in p_test
>      return p_lambdef(s)
>    File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py",
> line 102, in p_lambdef
>      s, terminator=':', annotated=False)
>    File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py",
> line 2741, in p_varargslist
>      annotated = annotated)
>    File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py",
> line 2388, in p_c_arg_list
>      annotated = annotated))
>    File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py",
> line 2435, in p_c_arg_decl
>      print s.level
> AttributeError: 'PyrexScanner' object has no attribute 'level'
> """
>
> I'm not sure what's the best way to fix this.

I don't see a "print" statement anywhere, but it seems that the "level" 
attribute is really missing from the compiled scanner.

This should do the trick:

diff -r 886697a10602 Cython/Compiler/Scanning.pxd
--- a/Cython/Compiler/Scanning.pxd      Sat Oct 22 19:43:45 2011 +0100
+++ b/Cython/Compiler/Scanning.pxd      Sun Oct 23 10:11:10 2011 +0200
@@ -28,6 +28,7 @@
      cdef public int bracket_nesting_level
      cdef public sy
      cdef public systring
+    cdef public level

      cdef long current_level(self)
      #cpdef commentline(self, text)

I didn't commit it, just go ahead and do so if it works for you.

Stefan

From stefan_ml at behnel.de  Sun Oct 23 10:19:00 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 23 Oct 2011 10:19:00 +0200
Subject: [Cython] What's wrong with py3k pyregr tests?
In-Reply-To: <CAKGHGPQkjbVc0cfOJTqYiCJLFsJpsGFN1QNrPGJZxCV+oM9F_g@mail.gmail.com>
References: <CAKGHGPTLNGE7pcWCxpxoEcfyB=H4=p+KE8eFsAYG9KOTLgBM6A@mail.gmail.com>	<4EA14309.2070900@behnel.de>
	<4EA14F45.4080707@behnel.de>
	<CAKGHGPQkjbVc0cfOJTqYiCJLFsJpsGFN1QNrPGJZxCV+oM9F_g@mail.gmail.com>
Message-ID: <4EA3CDF4.2000303@behnel.de>

Vitja Makarov, 21.10.2011 22:38:
> 2011/10/21 Stefan Behnel:
>> Stefan Behnel, 21.10.2011 12:01:
>>>
>>> Vitja Makarov, 21.10.2011 11:44:
>>>>
>>>> I tried to run pyregr tests on my localhost and it doesn't sigsegv.
>>>
>>> It's a crash bug in the debug builds of the latest py3k branch.
>>>
>>>
>>>> Perhaps I should try compiled version of Cython.
>>>
>>> Yes, but it's not required to reproduce the crash.
>>
>> Hmm, I may have been mistaken. At least it seems to be a problem with
>> getattr(), which breaks the lookup of builtin names. My guess is that
>> unicode hashing is broken in some way for str subtypes (as we use for
>> names).
>
> I switched to py3k-opt and it worked! Now we got ~13K/265:
>
> https://sage.math.washington.edu:8091/hudson/view/dev-vitek/job/cython-vitek-tests-pyregr-py3k-c/

Very cool.

Note that the bug in CPython has finally been fixed, so the debug builds 
should be back to normal again. I re-enabled the tests for the master branch.

Stefan

From vitja.makarov at gmail.com  Sun Oct 23 11:05:09 2011
From: vitja.makarov at gmail.com (Vitja Makarov)
Date: Sun, 23 Oct 2011 13:05:09 +0400
Subject: [Cython] Compiler crash at parsing stage
In-Reply-To: <4EA3CD38.90709@behnel.de>
References: <CAKGHGPSK2Tqr0GX04aWCdqShcgaWh8B3A+6BPyt2eqzTQXuu6Q@mail.gmail.com>
	<4EA3CD38.90709@behnel.de>
Message-ID: <CAKGHGPSs42T7cxCDwgS-nxccaYYRgOpBNXZSdu1zQ88nxrhhtg@mail.gmail.com>

2011/10/23 Stefan Behnel <stefan_ml at behnel.de>:
> Vitja Makarov, 23.10.2011 08:39:
>>
>> This simple code crashes compiler:
>>
>> lambda i=1: i
>>
>> """
>> ? File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py",
>> line 122, in p_test
>> ? ? return p_lambdef(s)
>> ? File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py",
>> line 102, in p_lambdef
>> ? ? s, terminator=':', annotated=False)
>> ? File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py",
>> line 2741, in p_varargslist
>> ? ? annotated = annotated)
>> ? File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py",
>> line 2388, in p_c_arg_list
>> ? ? annotated = annotated))
>> ? File "/home/vitja/work/cython-vitek-git/Cython/Compiler/Parsing.py",
>> line 2435, in p_c_arg_decl
>> ? ? print s.level
>> AttributeError: 'PyrexScanner' object has no attribute 'level'
>> """
>>
>> I'm not sure what's the best way to fix this.
>
> I don't see a "print" statement anywhere, but it seems that the "level"
> attribute is really missing from the compiled scanner.
>

Yes, I've added print for debug purpose actually there is:
        if 'pxd' in s.level:

> This should do the trick:
>
> diff -r 886697a10602 Cython/Compiler/Scanning.pxd
> --- a/Cython/Compiler/Scanning.pxd ? ? ?Sat Oct 22 19:43:45 2011 +0100
> +++ b/Cython/Compiler/Scanning.pxd ? ? ?Sun Oct 23 10:11:10 2011 +0200
> @@ -28,6 +28,7 @@
> ? ? cdef public int bracket_nesting_level
> ? ? cdef public sy
> ? ? cdef public systring
> + ? ?cdef public level
>
> ? ? cdef long current_level(self)
> ? ? #cpdef commentline(self, text)
>
> I didn't commit it, just go ahead and do so if it works for you.
>

Hmm, that will help for compiled cython. I'm running uncompiled.
It seems that when lambda is spotted level is not set yet. Btw it
works fine if def node precedes lamda:

def foo(): pass
lambda i=1: i


-- 
vitja.

From wesmckinn at gmail.com  Mon Oct 24 21:26:03 2011
From: wesmckinn at gmail.com (Wes McKinney)
Date: Mon, 24 Oct 2011 15:26:03 -0400
Subject: [Cython] Buffer interface to boolean arrays with cast=True on
	Python 2.5 failing
Message-ID: <CAJPUwMDJ=0UFO8o3o-70+RtHMh1m=fdmU-Kiq5xsxRY9Q0u7_g@mail.gmail.com>

I've been using

ndarray[uint8_t, cast=True] bool_arr

to work with dtype=bool arrays in Cython lately. When testing using
Python 2.5 / NumPy 1.6.1 on Windows, I'm getting "unknown dtype code
in numpy.pxd (0)". Everything works fine with Python 2.6/2.7 and NumPy
1.6.1. This is with Cython 0.15.1.

Any advice or do I have to (very unhappily) work around this?

thanks,
Wes

From d.s.seljebotn at astro.uio.no  Mon Oct 24 21:37:22 2011
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Mon, 24 Oct 2011 21:37:22 +0200
Subject: [Cython] Buffer interface to boolean arrays with cast=True on
 Python 2.5 failing
In-Reply-To: <CAJPUwMDJ=0UFO8o3o-70+RtHMh1m=fdmU-Kiq5xsxRY9Q0u7_g@mail.gmail.com>
References: <CAJPUwMDJ=0UFO8o3o-70+RtHMh1m=fdmU-Kiq5xsxRY9Q0u7_g@mail.gmail.com>
Message-ID: <4EA5BE72.1090104@astro.uio.no>

On 10/24/2011 09:26 PM, Wes McKinney wrote:
> I've been using
>
> ndarray[uint8_t, cast=True] bool_arr
>
> to work with dtype=bool arrays in Cython lately. When testing using
> Python 2.5 / NumPy 1.6.1 on Windows, I'm getting "unknown dtype code
> in numpy.pxd (0)". Everything works fine with Python 2.6/2.7 and NumPy
> 1.6.1. This is with Cython 0.15.1.
>
> Any advice or do I have to (very unhappily) work around this?

Is this a recent bug in Cython? Try to bisect the the Cython release 
(and if it turns out to be Cython, possible commit).

Dag Sverre

From wesmckinn at gmail.com  Mon Oct 24 21:40:55 2011
From: wesmckinn at gmail.com (Wes McKinney)
Date: Mon, 24 Oct 2011 15:40:55 -0400
Subject: [Cython] Buffer interface to boolean arrays with cast=True on
 Python 2.5 failing
In-Reply-To: <4EA5BE72.1090104@astro.uio.no>
References: <CAJPUwMDJ=0UFO8o3o-70+RtHMh1m=fdmU-Kiq5xsxRY9Q0u7_g@mail.gmail.com>
	<4EA5BE72.1090104@astro.uio.no>
Message-ID: <CAJPUwMBGa=i5aVvSDxudTGPfpr=u6reZ9=B1UbSSuPzax_wLwQ@mail.gmail.com>

On Mon, Oct 24, 2011 at 3:37 PM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 10/24/2011 09:26 PM, Wes McKinney wrote:
>>
>> I've been using
>>
>> ndarray[uint8_t, cast=True] bool_arr
>>
>> to work with dtype=bool arrays in Cython lately. When testing using
>> Python 2.5 / NumPy 1.6.1 on Windows, I'm getting "unknown dtype code
>> in numpy.pxd (0)". Everything works fine with Python 2.6/2.7 and NumPy
>> 1.6.1. This is with Cython 0.15.1.
>>
>> Any advice or do I have to (very unhappily) work around this?
>
> Is this a recent bug in Cython? Try to bisect the the Cython release (and if
> it turns out to be Cython, possible commit).
>
> Dag Sverre
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

I'll check the HEAD revision and bisect if I can, don't have a lot of
time-- it's just strange that it's Python 2.5 only.

From markflorisson88 at gmail.com  Mon Oct 24 21:50:05 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Mon, 24 Oct 2011 20:50:05 +0100
Subject: [Cython] Acquisition counted cdef classes
Message-ID: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>

Hey,

This is in response to
http://groups.google.com/group/cython-users/browse_thread/thread/bcbc5fe0e329224f
and http://trac.cython.org/cython_trac/ticket/498 , and some of the
previous discussion on cython.parallel.

Basically I think we should have something more powerful than 'cdef
borrowed CdefClass obj', something that also doesn't rely on new
syntax.

What if we support acquisition counting for every instance of a cdef
class? In Python and Cython GIL mode you use reference counting, and
in Cython nogil mode and for structs attributes, array dtypes etc you
use acquisition counting. This allows you to pass around cdef objects
without the GIL and use their nogil methods. If the acquisition count
is greater than 1, the acquisition count owns a reference to the
object. If it reaches 0 you discard your owned reference (you can
simply acquire the GIL if you don't have it) and when you increment
from zero you obtain it. Perhaps something like libatomic could be
used to efficiently implement this.

The advantages are:

1) allow users to pass around cdef typed objects in nogil mode
2) allow cdef typed objects in as struct attributes or array elements
3) make it easy to implement things like memoryviews (already done but
would have been a lot easier), cython.parallel.async/future objects,
cython.parallel.mutex objects and possibly other things in the future

We should then allow a syntax like

    with mycdefobject:
        ...

to lock the object in GIL or nogil mode (like java's 'synchronized').
For objects that already have __enter__ and __exit__ you could support
something like 'with cython.synchronized(mycdefobject): ...' instead.
Or perhaps you should always require cython.synchronized (or
cython.parallel.synchronized).

In addition to nogil methods a user may provide special cdef nogil methods, i.e.

cdef int __len__(self) nogil:
    ...

which would provide a Cython as well as a Python implementation for
the function (with automatic cpdef behaviour), so you could use it in
both contexts.

There are two options for assignment semantics to a struct attribute
or array element:
    - decref the old value (this implies always initializing the
pointers to NULL first)
    - don't decref the old value (the user has to manually use 'del')

I think 1) is more definitely consistent with how everything else works.

All of this functionality should also get a sane C API (to be provided
by cython.h). You'd get a Cy_INCREF(obj, have_gil)/Cy_DECREF() etc.
Every class using this functionality is a subclass of CythonObject
(that contains a PyObject + an acquisition count + a lock). Perhaps if
the user is subclassing something other than object we could allow the
user to specify custom __cython_(un)lock__ and
__cython_acquisition_count__ methods and fields.

Now, building on top of this functionality, Cython could provide
built-in nogil-compatible types, like lists, dicts and maybe tuples
(as a start). These will by default not lock for operations to allow
e.g. one thread to iterate over the list and another thread to index
it without lock contention and other general overhead. If one thread
is somehow changing the size of the list, or writing to indices that
another thread is reading from/writing to, the results will of course
be undefined unless the user synchronizes on the object. So it would
be the user's responsibility. The acquisition counting itself will
always be thread-safe (i.e., it will be atomic if possible, otherwise
it will lock).

It's probably best to not enable this functionality by default as it
would be more expensive to instantiate objects, but it could be
supported through a cdef class decorator and a general directive.

Of course one may still use non-cdef borrowed objects, by simply
casting to a PyObject *.

Thoughts?

Mark

From wesmckinn at gmail.com  Mon Oct 24 21:51:21 2011
From: wesmckinn at gmail.com (Wes McKinney)
Date: Mon, 24 Oct 2011 14:51:21 -0500
Subject: [Cython] Buffer interface to boolean arrays with cast=True on
 Python 2.5 failing
In-Reply-To: <CAJPUwMBGa=i5aVvSDxudTGPfpr=u6reZ9=B1UbSSuPzax_wLwQ@mail.gmail.com>
References: <CAJPUwMDJ=0UFO8o3o-70+RtHMh1m=fdmU-Kiq5xsxRY9Q0u7_g@mail.gmail.com>
	<4EA5BE72.1090104@astro.uio.no>
	<CAJPUwMBGa=i5aVvSDxudTGPfpr=u6reZ9=B1UbSSuPzax_wLwQ@mail.gmail.com>
Message-ID: <CAJPUwMDrQqhHUkkxONe617GqKW1+2+BoMOiMon0W5NieN8Jp0w@mail.gmail.com>

On Mon, Oct 24, 2011 at 2:40 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
> On Mon, Oct 24, 2011 at 3:37 PM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no> wrote:
>> On 10/24/2011 09:26 PM, Wes McKinney wrote:
>>>
>>> I've been using
>>>
>>> ndarray[uint8_t, cast=True] bool_arr
>>>
>>> to work with dtype=bool arrays in Cython lately. When testing using
>>> Python 2.5 / NumPy 1.6.1 on Windows, I'm getting "unknown dtype code
>>> in numpy.pxd (0)". Everything works fine with Python 2.6/2.7 and NumPy
>>> 1.6.1. This is with Cython 0.15.1.
>>>
>>> Any advice or do I have to (very unhappily) work around this?
>>
>> Is this a recent bug in Cython? Try to bisect the the Cython release (and if
>> it turns out to be Cython, possible commit).
>>
>> Dag Sverre
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>
> I'll check the HEAD revision and bisect if I can, don't have a lot of
> time-- it's just strange that it's Python 2.5 only.
>

For some reason I can't build Cython (0.15.1 or git HEAD) with mingw32:

C:\cython>python setup.py install
running install
running build
running build_py
running build_ext
building 'Cython.Compiler.Scanning' extension
C:\MinGW\bin\gcc.exe -mno-cygwin -mdll -O -Wall -IC:\Python25\include -IC:\Pytho
n25\PC -c Cython\Compiler\Scanning.c -o build\temp.win32-2.5\Release\cython\comp
iler\scanning.o
Cython\Compiler\Scanning.c:13340: error: initializer element is not constant
Cython\Compiler\Scanning.c:13340: error: (near initialization for `__pyx_CyFunct
ionType_type.tp_call')
error: command 'gcc' failed with exit status 1

C:\cython>

I've half a mind to drop Python 2.5 support in pandas over this...

From d.s.seljebotn at astro.uio.no  Mon Oct 24 22:09:47 2011
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Mon, 24 Oct 2011 22:09:47 +0200
Subject: [Cython] Buffer interface to boolean arrays with cast=True on
 Python 2.5 failing
In-Reply-To: <CAJPUwMBGa=i5aVvSDxudTGPfpr=u6reZ9=B1UbSSuPzax_wLwQ@mail.gmail.com>
References: <CAJPUwMDJ=0UFO8o3o-70+RtHMh1m=fdmU-Kiq5xsxRY9Q0u7_g@mail.gmail.com>	<4EA5BE72.1090104@astro.uio.no>
	<CAJPUwMBGa=i5aVvSDxudTGPfpr=u6reZ9=B1UbSSuPzax_wLwQ@mail.gmail.com>
Message-ID: <4EA5C60B.2040704@astro.uio.no>

On 10/24/2011 09:40 PM, Wes McKinney wrote:
> On Mon, Oct 24, 2011 at 3:37 PM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no>  wrote:
>> On 10/24/2011 09:26 PM, Wes McKinney wrote:
>>>
>>> I've been using
>>>
>>> ndarray[uint8_t, cast=True] bool_arr
>>>
>>> to work with dtype=bool arrays in Cython lately. When testing using
>>> Python 2.5 / NumPy 1.6.1 on Windows, I'm getting "unknown dtype code
>>> in numpy.pxd (0)". Everything works fine with Python 2.6/2.7 and NumPy
>>> 1.6.1. This is with Cython 0.15.1.
>>>
>>> Any advice or do I have to (very unhappily) work around this?
>>
>> Is this a recent bug in Cython? Try to bisect the the Cython release (and if
>> it turns out to be Cython, possible commit).
>>
>> Dag Sverre
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>
> I'll check the HEAD revision and bisect if I can, don't have a lot of
> time-- it's just strange that it's Python 2.5 only.

So the difference between Python 2.5 and 2.6 is that in 2.5 the 
__getbuffer__ in numpy.pxd will be called, whereas in Python 2.6, NumPy 
is able to do the job itself. (PEP 3118)

Which means...that there's likely a bug in __getbuffer__ in numpy.pxd. 
You can debug

If you do have time, that's the place to start inserting print 
statements etc. to debug this.

It's difficult to say more without a copy&paste directly from your terminal.

Dag Sverre

From greg.ewing at canterbury.ac.nz  Mon Oct 24 23:03:32 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 25 Oct 2011 10:03:32 +1300
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>
Message-ID: <4EA5D2A4.3010303@canterbury.ac.nz>

mark florisson wrote:
> These will by default not lock for operations to allow
> e.g. one thread to iterate over the list and another thread to index
> it without lock contention and other general overhead.

I don't think that's safe. You can't say "I'm not modifying
this, so I don't need to lock it" because there may be another
thread that *is* in the midst of modifying it.

-- 
Greg

From wesmckinn at gmail.com  Mon Oct 24 23:03:56 2011
From: wesmckinn at gmail.com (Wes McKinney)
Date: Mon, 24 Oct 2011 17:03:56 -0400
Subject: [Cython] Buffer interface to boolean arrays with cast=True on
 Python 2.5 failing
In-Reply-To: <4EA5C60B.2040704@astro.uio.no>
References: <CAJPUwMDJ=0UFO8o3o-70+RtHMh1m=fdmU-Kiq5xsxRY9Q0u7_g@mail.gmail.com>
	<4EA5BE72.1090104@astro.uio.no>
	<CAJPUwMBGa=i5aVvSDxudTGPfpr=u6reZ9=B1UbSSuPzax_wLwQ@mail.gmail.com>
	<4EA5C60B.2040704@astro.uio.no>
Message-ID: <CAJPUwMCirmhc8GnoYDPO4QwZNstkDJL6TFkkpg4tj0E7hVK-bA@mail.gmail.com>

On Mon, Oct 24, 2011 at 4:09 PM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 10/24/2011 09:40 PM, Wes McKinney wrote:
>>
>> On Mon, Oct 24, 2011 at 3:37 PM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no> ?wrote:
>>>
>>> On 10/24/2011 09:26 PM, Wes McKinney wrote:
>>>>
>>>> I've been using
>>>>
>>>> ndarray[uint8_t, cast=True] bool_arr
>>>>
>>>> to work with dtype=bool arrays in Cython lately. When testing using
>>>> Python 2.5 / NumPy 1.6.1 on Windows, I'm getting "unknown dtype code
>>>> in numpy.pxd (0)". Everything works fine with Python 2.6/2.7 and NumPy
>>>> 1.6.1. This is with Cython 0.15.1.
>>>>
>>>> Any advice or do I have to (very unhappily) work around this?
>>>
>>> Is this a recent bug in Cython? Try to bisect the the Cython release (and
>>> if
>>> it turns out to be Cython, possible commit).
>>>
>>> Dag Sverre
>>> _______________________________________________
>>> cython-devel mailing list
>>> cython-devel at python.org
>>> http://mail.python.org/mailman/listinfo/cython-devel
>>>
>>
>> I'll check the HEAD revision and bisect if I can, don't have a lot of
>> time-- it's just strange that it's Python 2.5 only.
>
> So the difference between Python 2.5 and 2.6 is that in 2.5 the
> __getbuffer__ in numpy.pxd will be called, whereas in Python 2.6, NumPy is
> able to do the job itself. (PEP 3118)
>
> Which means...that there's likely a bug in __getbuffer__ in numpy.pxd. You
> can debug
>
> If you do have time, that's the place to start inserting print statements
> etc. to debug this.
>
> It's difficult to say more without a copy&paste directly from your terminal.
>
> Dag Sverre
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

I need pandas to build off of a released version of Cython so I am
just going to have to work around this by doing taking views of
boolean arrays as np.uint8. I wouldn't mind dropping Python 2.5
support altogether but some people might not like that.

From markflorisson88 at gmail.com  Mon Oct 24 23:51:22 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Mon, 24 Oct 2011 22:51:22 +0100
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <4EA5D2A4.3010303@canterbury.ac.nz>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>
	<4EA5D2A4.3010303@canterbury.ac.nz>
Message-ID: <CANg26EXc6v6qhLmrsbn4q6yE43oAc5fAJgR0qxL2AZUDwLYo6g@mail.gmail.com>

On 24 October 2011 22:03, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> mark florisson wrote:
>>
>> These will by default not lock for operations to allow
>> e.g. one thread to iterate over the list and another thread to index
>> it without lock contention and other general overhead.
>
> I don't think that's safe. You can't say "I'm not modifying
> this, so I don't need to lock it" because there may be another
> thread that *is* in the midst of modifying it.

Oh yes you're definitely right, that was silly of me. I suppose every
operation needs to lock. This can still be useful though, to allow
more fine-grained parallelism. Then it would be more efficient to use
arrays or memoryviews with acquisition counted objects, and the
dicts/lists/tuples etc for cases where you just need more fine-grained
locking and can deal with that overhead.

> --
> Greg
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From markflorisson88 at gmail.com  Mon Oct 24 23:52:35 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Mon, 24 Oct 2011 22:52:35 +0100
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <4EA5D2A4.3010303@canterbury.ac.nz>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>
	<4EA5D2A4.3010303@canterbury.ac.nz>
Message-ID: <CANg26EXcaN91J+beqDLSTb9v54FC+_Orm8qmSQFtGQCC2zo94A@mail.gmail.com>

On 24 October 2011 22:03, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> mark florisson wrote:
>>
>> These will by default not lock for operations to allow
>> e.g. one thread to iterate over the list and another thread to index
>> it without lock contention and other general overhead.
>
> I don't think that's safe. You can't say "I'm not modifying
> this, so I don't need to lock it" because there may be another
> thread that *is* in the midst of modifying it.

I was really thinking of the case where you instantiate it in Cython
and then do some parallel work, in which case you're the only user.
But you can't assume that in general.

> --
> Greg
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From robertwb at math.washington.edu  Tue Oct 25 06:47:05 2011
From: robertwb at math.washington.edu (Robert Bradshaw)
Date: Mon, 24 Oct 2011 21:47:05 -0700
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <CANg26EXcaN91J+beqDLSTb9v54FC+_Orm8qmSQFtGQCC2zo94A@mail.gmail.com>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>
	<4EA5D2A4.3010303@canterbury.ac.nz>
	<CANg26EXcaN91J+beqDLSTb9v54FC+_Orm8qmSQFtGQCC2zo94A@mail.gmail.com>
Message-ID: <CADiQ+QAmKQNP7OjY597zVPXWsYeeS-VsvpC4sqiQSYQtyne7Mw@mail.gmail.com>

On Mon, Oct 24, 2011 at 2:52 PM, mark florisson
<markflorisson88 at gmail.com> wrote:
> On 24 October 2011 22:03, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>> mark florisson wrote:
>>>
>>> These will by default not lock for operations to allow
>>> e.g. one thread to iterate over the list and another thread to index
>>> it without lock contention and other general overhead.
>>
>> I don't think that's safe. You can't say "I'm not modifying
>> this, so I don't need to lock it" because there may be another
>> thread that *is* in the midst of modifying it.
>
> I was really thinking of the case where you instantiate it in Cython
> and then do some parallel work, in which case you're the only user.
> But you can't assume that in general.

It could be useful to assert for a chunk of code that a given object
is read-only and will not be mutated for the duration of the context
(programmer error and strange crash/data corruption if it is). E.g.

with nogil, assert_frozen(my_dict):
    a = (<CdefClass>my_dict[key]).c_attribute
    [...]

All references obtained could be borrowed. Perhaps we could even
enforce this for cdef classes (but perhaps not consistently enough,
and perhaps that would make things even more confusing). Just a
thought.

- Robert

From stefan_ml at behnel.de  Tue Oct 25 09:33:28 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 25 Oct 2011 09:33:28 +0200
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>
Message-ID: <4EA66648.8030102@behnel.de>

mark florisson, 24.10.2011 21:50:
> This is in response to
> http://groups.google.com/group/cython-users/browse_thread/thread/bcbc5fe0e329224f
> and http://trac.cython.org/cython_trac/ticket/498 , and some of the
> previous discussion on cython.parallel.
>
> Basically I think we should have something more powerful than 'cdef
> borrowed CdefClass obj', something that also doesn't rely on new
> syntax.

We will still need borrowed reference support in the compiler eventually, 
whether we make it a language feature or not.


> What if we support acquisition counting for every instance of a cdef
> class? In Python and Cython GIL mode you use reference counting, and
> in Cython nogil mode and for structs attributes, array dtypes etc you
> use acquisition counting. This allows you to pass around cdef objects
> without the GIL and use their nogil methods. If the acquisition count
> is greater than 1, the acquisition count owns a reference to the
> object. If it reaches 0 you discard your owned reference (you can
> simply acquire the GIL if you don't have it) and when you increment
> from zero you obtain it. Perhaps something like libatomic could be
> used to efficiently implement this.

Where would you store that count? In the object struct? That would increase 
the size of each instance.


> The advantages are:
>
> 1) allow users to pass around cdef typed objects in nogil mode
> 2) allow cdef typed objects in as struct attributes or array elements
> 3) make it easy to implement things like memoryviews (already done but
> would have been a lot easier), cython.parallel.async/future objects,
> cython.parallel.mutex objects and possibly other things in the future

Would it really be easier? You can already call cdef methods in nogil mode, 
AFAIR.


> We should then allow a syntax like
>
>      with mycdefobject:
>          ...
>
> to lock the object in GIL or nogil mode (like java's 'synchronized').
> For objects that already have __enter__ and __exit__ you could support
> something like 'with cython.synchronized(mycdefobject): ...' instead.
> Or perhaps you should always require cython.synchronized (or
> cython.parallel.synchronized).

The latter, I sure hope.


> In addition to nogil methods a user may provide special cdef nogil methods, i.e.
>
> cdef int __len__(self) nogil:
>      ...
>
> which would provide a Cython as well as a Python implementation for
> the function (with automatic cpdef behaviour), so you could use it in
> both contexts.

That can already be done for final types, simply by adding cpdef behaviour 
to all special methods. That would also fix ticket #3, for example.

Note that the DefNode refactoring is still pending, it would help here.


> There are two options for assignment semantics to a struct attribute
> or array element:
>      - decref the old value (this implies always initializing the
> pointers to NULL first)
>      - don't decref the old value (the user has to manually use 'del')
>
> I think 1) is more definitely consistent with how everything else works.

Yes.


> All of this functionality should also get a sane C API (to be provided
> by cython.h). You'd get a Cy_INCREF(obj, have_gil)/Cy_DECREF() etc.
> Every class using this functionality is a subclass of CythonObject
> (that contains a PyObject + an acquisition count + a lock). Perhaps if
> the user is subclassing something other than object we could allow the
> user to specify custom __cython_(un)lock__ and
> __cython_acquisition_count__ methods and fields.
>
> Now, building on top of this functionality, Cython could provide
> built-in nogil-compatible types, like lists, dicts and maybe tuples
> (as a start). These will by default not lock for operations to allow
> e.g. one thread to iterate over the list and another thread to index
> it without lock contention and other general overhead. If one thread
> is somehow changing the size of the list, or writing to indices that
> another thread is reading from/writing to, the results will of course
> be undefined unless the user synchronizes on the object. So it would
> be the user's responsibility. The acquisition counting itself will
> always be thread-safe (i.e., it will be atomic if possible, otherwise
> it will lock).
>
> It's probably best to not enable this functionality by default as it
> would be more expensive to instantiate objects, but it could be
> supported through a cdef class decorator and a general directive.

It's well known that this would be expensive. One of the approaches that 
tried to get rid of the GIL in CPython introduced fine grained locking, and 
it turned out to be substantially slower, AFAIR by a factor of two.

You could potentially drop the locking for local variables, but you'd loose 
that ability as soon as the 'object' is passed into a function.

Basically, what you are trying to do here is to duplicate the complete 
ref-counting infrastructure of CPython, but without using CPython.


> Of course one may still use non-cdef borrowed objects, by simply
> casting to a PyObject *.

That's very ugly, though, because you loose all access to methods and 
attributes of the object. Basically, it becomes useless that way, except 
for storing away a pointer to it somewhere. You could just as well use a void*.

Stefan

From markflorisson88 at gmail.com  Tue Oct 25 11:11:24 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Tue, 25 Oct 2011 10:11:24 +0100
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <CADiQ+QAmKQNP7OjY597zVPXWsYeeS-VsvpC4sqiQSYQtyne7Mw@mail.gmail.com>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>
	<4EA5D2A4.3010303@canterbury.ac.nz>
	<CANg26EXcaN91J+beqDLSTb9v54FC+_Orm8qmSQFtGQCC2zo94A@mail.gmail.com>
	<CADiQ+QAmKQNP7OjY597zVPXWsYeeS-VsvpC4sqiQSYQtyne7Mw@mail.gmail.com>
Message-ID: <CANg26EV5afm_kG4qXaw=+vVN_-WkJUQePRO56gWh5HA=u6FT9A@mail.gmail.com>

On 25 October 2011 05:47, Robert Bradshaw <robertwb at math.washington.edu> wrote:
> On Mon, Oct 24, 2011 at 2:52 PM, mark florisson
> <markflorisson88 at gmail.com> wrote:
>> On 24 October 2011 22:03, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>>> mark florisson wrote:
>>>>
>>>> These will by default not lock for operations to allow
>>>> e.g. one thread to iterate over the list and another thread to index
>>>> it without lock contention and other general overhead.
>>>
>>> I don't think that's safe. You can't say "I'm not modifying
>>> this, so I don't need to lock it" because there may be another
>>> thread that *is* in the midst of modifying it.
>>
>> I was really thinking of the case where you instantiate it in Cython
>> and then do some parallel work, in which case you're the only user.
>> But you can't assume that in general.
>
> It could be useful to assert for a chunk of code that a given object
> is read-only and will not be mutated for the duration of the context
> (programmer error and strange crash/data corruption if it is). E.g.
>
> with nogil, assert_frozen(my_dict):
> ? ?a = (<CdefClass>my_dict[key]).c_attribute
> ? ?[...]
>
> All references obtained could be borrowed. Perhaps we could even
> enforce this for cdef classes (but perhaps not consistently enough,
> and perhaps that would make things even more confusing). Just a
> thought.

Hmm, I actually think that passing around references in general
(without having to declare them as borrowed in parameters) would be a
good feature. If my_dict would be e.g. a cython.types.dict, then it
would only accept CythonObjects, so it could just do the acquisition
counting.

For cython.parallel we could provide types more suited for the
cython.parallel kind of fine-grained parallelism, e.g. lock for
writes, don't lock for reads, which allows either to happen
simultaneously, but not any mixing of those two. Through explicit or
implicit barriers one may be sure that operations are correct.

> - Robert
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From markflorisson88 at gmail.com  Tue Oct 25 11:11:47 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Tue, 25 Oct 2011 10:11:47 +0100
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <4EA66648.8030102@behnel.de>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>
	<4EA66648.8030102@behnel.de>
Message-ID: <CANg26EViHdg6zfRzeBpH7VNvvqqfR+Cy7fu_-HJZBEv8+FdCHA@mail.gmail.com>

On 25 October 2011 08:33, Stefan Behnel <stefan_ml at behnel.de> wrote:
> mark florisson, 24.10.2011 21:50:
>>
>> This is in response to
>>
>> http://groups.google.com/group/cython-users/browse_thread/thread/bcbc5fe0e329224f
>> and http://trac.cython.org/cython_trac/ticket/498 , and some of the
>> previous discussion on cython.parallel.
>>
>> Basically I think we should have something more powerful than 'cdef
>> borrowed CdefClass obj', something that also doesn't rely on new
>> syntax.
>
> We will still need borrowed reference support in the compiler eventually,
> whether we make it a language feature or not.
>

I'm not sure I understand why,?acquisition counting could solve these
problems for cdef classes, and general objects may not be used without
the GIL. Do you want this as an optimization?

>> What if we support acquisition counting for every instance of a cdef
>> class? In Python and Cython GIL mode you use reference counting, and
>> in Cython nogil mode and for structs attributes, array dtypes etc you
>> use acquisition counting. This allows you to pass around cdef objects
>> without the GIL and use their nogil methods. If the acquisition count
>> is greater than 1, the acquisition count owns a reference to the
>> object. If it reaches 0 you discard your owned reference (you can
>> simply acquire the GIL if you don't have it) and when you increment
>> from zero you obtain it. Perhaps something like libatomic could be
>> used to efficiently implement this.
>
> Where would you store that count? In the object struct? That would increase
> the size of each instance.

Yes, not just the count, also the lock. This feature would be optional
and may be very useful for people (I think).

>
>> The advantages are:
>>
>> 1) allow users to pass around cdef typed objects in nogil mode
>> 2) allow cdef typed objects in as struct attributes or array elements
>> 3) make it easy to implement things like memoryviews (already done but
>> would have been a lot easier), cython.parallel.async/future objects,
>> cython.parallel.mutex objects and possibly other things in the future
>
> Would it really be easier? You can already call cdef methods in nogil mode,
> AFAIR.
>

Sure, but you cannot store cdef objects as struct attributes, array
elements (you could implement it with reference counting, but not for
nogil mode), and you cannot pass them around without the GIL. This
proposal is about making your life easier without the GIL, and
currently it's kind of a pain.

>> We should then allow a syntax like
>>
>> ? ? with mycdefobject:
>> ? ? ? ? ...
>>
>> to lock the object in GIL or nogil mode (like java's 'synchronized').
>> For objects that already have __enter__ and __exit__ you could support
>> something like 'with cython.synchronized(mycdefobject): ...' instead.
>> Or perhaps you should always require cython.synchronized (or
>> cython.parallel.synchronized).
>
> The latter, I sure hope.
>
>
>> In addition to nogil methods a user may provide special cdef nogil
>> methods, i.e.
>>
>> cdef int __len__(self) nogil:
>> ? ? ...
>>
>> which would provide a Cython as well as a Python implementation for
>> the function (with automatic cpdef behaviour), so you could use it in
>> both contexts.
>
> That can already be done for final types, simply by adding cpdef behaviour
> to all special methods. That would also fix ticket #3, for example.
>
> Note that the DefNode refactoring is still pending, it would help here.
>

Ah I assumed cpdef nogil was invalid, I see it isn't, cool. This
breaks terribly for special methods though.

>> There are two options for assignment semantics to a struct attribute
>> or array element:
>> ? ? - decref the old value (this implies always initializing the
>> pointers to NULL first)
>> ? ? - don't decref the old value (the user has to manually use 'del')
>>
>> I think 1) is more definitely consistent with how everything else works.
>
> Yes.
>
>
>> All of this functionality should also get a sane C API (to be provided
>> by cython.h). You'd get a Cy_INCREF(obj, have_gil)/Cy_DECREF() etc.
>> Every class using this functionality is a subclass of CythonObject
>> (that contains a PyObject + an acquisition count + a lock). Perhaps if
>> the user is subclassing something other than object we could allow the
>> user to specify custom __cython_(un)lock__ and
>> __cython_acquisition_count__ methods and fields.
>>
>> Now, building on top of this functionality, Cython could provide
>> built-in nogil-compatible types, like lists, dicts and maybe tuples
>> (as a start). These will by default not lock for operations to allow
>> e.g. one thread to iterate over the list and another thread to index
>> it without lock contention and other general overhead. If one thread
>> is somehow changing the size of the list, or writing to indices that
>> another thread is reading from/writing to, the results will of course
>> be undefined unless the user synchronizes on the object. So it would
>> be the user's responsibility. The acquisition counting itself will
>> always be thread-safe (i.e., it will be atomic if possible, otherwise
>> it will lock).
>>
>> It's probably best to not enable this functionality by default as it
>> would be more expensive to instantiate objects, but it could be
>> supported through a cdef class decorator and a general directive.
>
> It's well known that this would be expensive. One of the approaches that
> tried to get rid of the GIL in CPython introduced fine grained locking, and
> it turned out to be substantially slower, AFAIR by a factor of two.

Sure, I am aware of that. Often you can just keep the GIL, in which
case you wouldn't use these types. But when you want to leave the
shiny world of the GIL you still want these goodies. Acquiring the GIL
is too expensive as there is pretty much always contention.

> You could potentially drop the locking for local variables, but you'd loose
> that ability as soon as the 'object' is passed into a function.

Definitely, but you cannot use them with the GIL anyway :)

> Basically, what you are trying to do here is to duplicate the complete
> ref-counting infrastructure of CPython, but without using CPython.
>
>
>> Of course one may still use non-cdef borrowed objects, by simply
>> casting to a PyObject *.
>
> That's very ugly, though, because you loose all access to methods and
> attributes of the object. Basically, it becomes useless that way, except for
> storing away a pointer to it somewhere. You could just as well use a void*.

Indeed, and that's really all you can do without the GIL. I think
we're talking about different things, I'm talking about supporting
nogil, and you're talking about borrowed references in general. I'm
not sure why you'd not just take a reference instead in GIL mode,
unless you were worried about incrementing a counter.

> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From stefan_ml at behnel.de  Tue Oct 25 13:22:08 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 25 Oct 2011 13:22:08 +0200
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <CANg26EViHdg6zfRzeBpH7VNvvqqfR+Cy7fu_-HJZBEv8+FdCHA@mail.gmail.com>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>	<4EA66648.8030102@behnel.de>
	<CANg26EViHdg6zfRzeBpH7VNvvqqfR+Cy7fu_-HJZBEv8+FdCHA@mail.gmail.com>
Message-ID: <4EA69BE0.3060307@behnel.de>

mark florisson, 25.10.2011 11:11:
> On 25 October 2011 08:33, Stefan Behnel wrote:
>> mark florisson, 24.10.2011 21:50:
>>>
>>> This is in response to
>>>
>>> http://groups.google.com/group/cython-users/browse_thread/thread/bcbc5fe0e329224f
>>> and http://trac.cython.org/cython_trac/ticket/498 , and some of the
>>> previous discussion on cython.parallel.
>>>
>>> Basically I think we should have something more powerful than 'cdef
>>> borrowed CdefClass obj', something that also doesn't rely on new
>>> syntax.
>>
>> We will still need borrowed reference support in the compiler eventually,
>> whether we make it a language feature or not.
>
> I'm not sure I understand why, acquisition counting could solve these
> problems for cdef classes, and general objects may not be used without
> the GIL. Do you want this as an optimization?

Yes. Think of type(x), for example, or PyDict_GetItem(). They return 
borrowed references, and in many cases, Cython wouldn't have to INCREF and 
DECREF them when they are only being used as part of some specific kinds of 
expressions. The same applies to some utility functions in Cython that 
currently must INCREF their return value unconditionally, simply because 
they can't tell Cython that they could also return a borrowed reference 
instead. If there was a way to do that, we could optimise the reference 
counting away in a couple of more places, which would get us another bit 
closer to hand-tuned code.

However, note that this doesn't necessarily have an impact on nogil code. 
If you took a borrowed reference in one nogil thread, and a gil-holding 
thread deletes the object at the same time or during the lifetime of the 
borrowed reference (e.g. by updating a dict or assigning to a cdef 
attribute), the nogil thread would end up with a dead pointer in its hands. 
That's why the usage of borrowed references needs to be explicit in the 
code ("I know what I'm doing"), and the optimisations require the GIL to be 
held.


>>> What if we support acquisition counting for every instance of a cdef
>>> class? In Python and Cython GIL mode you use reference counting, and
>>> in Cython nogil mode and for structs attributes, array dtypes etc you
>>> use acquisition counting. This allows you to pass around cdef objects
>>> without the GIL and use their nogil methods. If the acquisition count
>>> is greater than 1, the acquisition count owns a reference to the
>>> object. If it reaches 0 you discard your owned reference (you can
>>> simply acquire the GIL if you don't have it) and when you increment
>>> from zero you obtain it. Perhaps something like libatomic could be
>>> used to efficiently implement this.
>>
>> Where would you store that count? In the object struct? That would increase
>> the size of each instance.
>
> Yes, not just the count, also the lock. This feature would be optional
> and may be very useful for people (I think).

Well, as long as it's an optional feature that requires a class decorator, 
the only obvious drawback is that it'll bloat the compiler even more than 
it is already.


>>> The advantages are:
>>>
>>> 1) allow users to pass around cdef typed objects in nogil mode
>>> 2) allow cdef typed objects in as struct attributes or array elements
>>> 3) make it easy to implement things like memoryviews (already done but
>>> would have been a lot easier), cython.parallel.async/future objects,
>>> cython.parallel.mutex objects and possibly other things in the future
>>
>> Would it really be easier? You can already call cdef methods in nogil mode,
>> AFAIR.
>
> Sure, but you cannot store cdef objects as struct attributes, array
> elements (you could implement it with reference counting, but not for
> nogil mode)

You could do that with borrowed references, though, assuming that you keep 
another reference around (or do your own ref-counting). However, I do see 
that keeping a real reference around may be hard to do in some cases.


> and you cannot pass them around without the GIL.

Yes, you can, as long as you only go through cdef functions. Obviously, you 
can't pass them into a Python function call, but you can (and could, if it 
was implemented) do loads of useful things with existing references even in 
nogil sections. The GIL checker is quite fine grained already but could do 
even better.


> This
> proposal is about making your life easier without the GIL, and
> currently it's kind of a pain.

The nogil sections I use are usually quite short, so I can't tell. It's 
certainly a pain to work without the GIL, because it means you have to take 
a lot more care when writing your code. But that won't change just by 
dropping reference counting. And nogil code will definitely become another 
bit harder to get right when using borrowed references.


> Ah I assumed cpdef nogil was invalid, I see it isn't, cool.

It makes perfect sense. Just because a function *can* be called without the 
GIL doesn't mean it can't be called from Python. So the Python wrapper 
requires the GIL, but the underlying cdef function doesn't.


> This breaks terribly for special methods though.

Why? It's just a matter of properly separating out their Python wrapper. 
That's why I was referring to the DefNode refactoring.


>>> All of this functionality should also get a sane C API (to be provided
>>> by cython.h). You'd get a Cy_INCREF(obj, have_gil)/Cy_DECREF() etc.
>>> Every class using this functionality is a subclass of CythonObject
>>> (that contains a PyObject + an acquisition count + a lock). Perhaps if
>>> the user is subclassing something other than object we could allow the
>>> user to specify custom __cython_(un)lock__ and
>>> __cython_acquisition_count__ methods and fields.
>>>
>>> Now, building on top of this functionality, Cython could provide
>>> built-in nogil-compatible types, like lists, dicts and maybe tuples
>>> (as a start). These will by default not lock for operations to allow
>>> e.g. one thread to iterate over the list and another thread to index
>>> it without lock contention and other general overhead. If one thread
>>> is somehow changing the size of the list, or writing to indices that
>>> another thread is reading from/writing to, the results will of course
>>> be undefined unless the user synchronizes on the object. So it would
>>> be the user's responsibility. The acquisition counting itself will
>>> always be thread-safe (i.e., it will be atomic if possible, otherwise
>>> it will lock).
>>>
>>> It's probably best to not enable this functionality by default as it
>>> would be more expensive to instantiate objects, but it could be
>>> supported through a cdef class decorator and a general directive.
>>
>> It's well known that this would be expensive. One of the approaches that
>> tried to get rid of the GIL in CPython introduced fine grained locking, and
>> it turned out to be substantially slower, AFAIR by a factor of two.
>
> Sure, I am aware of that. Often you can just keep the GIL, in which
> case you wouldn't use these types. But when you want to leave the
> shiny world of the GIL you still want these goodies. Acquiring the GIL
> is too expensive as there is pretty much always contention.

Acquiring a more fine grained lock is more likely to reduce the contention, 
but is not necessarily less expensive. The lock still needs to get acquired 
and released. GIL protected reference counting is a lot cheaper than that, 
as is manual locking in a more coarse grained fashion.


>> You could potentially drop the locking for local variables, but you'd loose
>> that ability as soon as the 'object' is passed into a function.
>
> Definitely, but you cannot use them with the GIL anyway :)

Yes you can. For cdef functions, it's the responsibility of the caller to 
own the references of object arguments it passes. The called function 
doesn't have to do reference counting for them, as long as it doesn't try 
to reassign the variable. And even that could be fixed with borrowed 
references, and also partly by better control flow analysis.


>> Basically, what you are trying to do here is to duplicate the complete
>> ref-counting infrastructure of CPython, but without using CPython.
>>
>>> Of course one may still use non-cdef borrowed objects, by simply
>>> casting to a PyObject *.
>>
>> That's very ugly, though, because you loose all access to methods and
>> attributes of the object. Basically, it becomes useless that way, except for
>> storing away a pointer to it somewhere. You could just as well use a void*.
>
> Indeed, and that's really all you can do without the GIL.

I think you're underestimating what can (or could) be done without holding 
the GIL. There are still some open features that wait for being 
implemented, even without adding new syntax (and thus further increasing 
the complexity of the language).


> I think
> we're talking about different things, I'm talking about supporting
> nogil, and you're talking about borrowed references in general.

Both are related, though. It's certainly a lot easier and cleaner to 
support borrowed references in the compiler, than to implement a whole new 
scheme for handling extension type instances in addition to the normal 
object handling which we need anyway.


> I'm
> not sure why you'd not just take a reference instead in GIL mode,
> unless you were worried about incrementing a counter.

Decrementing it, not incrementing. :)

The problem is not so much the INCREF (which is just an indirect add), it's 
the DECREF, which contains a conditional jump based on an unknown external 
value, that may trigger external code. That can kill several C compiler 
optimisations for the surrounding code. (And that would only get worse by 
using a dedicated locking mechanism.)

Stefan

From d.s.seljebotn at astro.uio.no  Tue Oct 25 15:28:57 2011
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Tue, 25 Oct 2011 15:28:57 +0200
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <4EA66648.8030102@behnel.de>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>
	<4EA66648.8030102@behnel.de>
Message-ID: <4EA6B999.7060807@astro.uio.no>

On 10/25/2011 09:33 AM, Stefan Behnel wrote:
> mark florisson, 24.10.2011 21:50:
>> This is in response to
>> http://groups.google.com/group/cython-users/browse_thread/thread/bcbc5fe0e329224f
>>
>> and http://trac.cython.org/cython_trac/ticket/498 , and some of the
>> previous discussion on cython.parallel.
>>
>> Basically I think we should have something more powerful than 'cdef
>> borrowed CdefClass obj', something that also doesn't rely on new
>> syntax.
>
> We will still need borrowed reference support in the compiler
> eventually, whether we make it a language feature or not.
>
>
>> What if we support acquisition counting for every instance of a cdef
>> class? In Python and Cython GIL mode you use reference counting, and
>> in Cython nogil mode and for structs attributes, array dtypes etc you
>> use acquisition counting. This allows you to pass around cdef objects
>> without the GIL and use their nogil methods. If the acquisition count
>> is greater than 1, the acquisition count owns a reference to the
>> object. If it reaches 0 you discard your owned reference (you can
>> simply acquire the GIL if you don't have it) and when you increment
>> from zero you obtain it. Perhaps something like libatomic could be
>> used to efficiently implement this.
>
> Where would you store that count? In the object struct? That would
> increase the size of each instance.
>
>
>> The advantages are:
>>
>> 1) allow users to pass around cdef typed objects in nogil mode
>> 2) allow cdef typed objects in as struct attributes or array elements
>> 3) make it easy to implement things like memoryviews (already done but
>> would have been a lot easier), cython.parallel.async/future objects,
>> cython.parallel.mutex objects and possibly other things in the future
>
> Would it really be easier? You can already call cdef methods in nogil
> mode, AFAIR.
>
>
>> We should then allow a syntax like
>>
>> with mycdefobject:
>> ...
>>
>> to lock the object in GIL or nogil mode (like java's 'synchronized').
>> For objects that already have __enter__ and __exit__ you could support
>> something like 'with cython.synchronized(mycdefobject): ...' instead.
>> Or perhaps you should always require cython.synchronized (or
>> cython.parallel.synchronized).
>
> The latter, I sure hope.
>
>
>> In addition to nogil methods a user may provide special cdef nogil
>> methods, i.e.
>>
>> cdef int __len__(self) nogil:
>> ...
>>
>> which would provide a Cython as well as a Python implementation for
>> the function (with automatic cpdef behaviour), so you could use it in
>> both contexts.
>
> That can already be done for final types, simply by adding cpdef
> behaviour to all special methods. That would also fix ticket #3, for
> example.
>
> Note that the DefNode refactoring is still pending, it would help here.
>
>
>> There are two options for assignment semantics to a struct attribute
>> or array element:
>> - decref the old value (this implies always initializing the
>> pointers to NULL first)
>> - don't decref the old value (the user has to manually use 'del')
>>
>> I think 1) is more definitely consistent with how everything else works.
>
> Yes.
>
>
>> All of this functionality should also get a sane C API (to be provided
>> by cython.h). You'd get a Cy_INCREF(obj, have_gil)/Cy_DECREF() etc.
>> Every class using this functionality is a subclass of CythonObject
>> (that contains a PyObject + an acquisition count + a lock). Perhaps if
>> the user is subclassing something other than object we could allow the
>> user to specify custom __cython_(un)lock__ and
>> __cython_acquisition_count__ methods and fields.
>>
>> Now, building on top of this functionality, Cython could provide
>> built-in nogil-compatible types, like lists, dicts and maybe tuples
>> (as a start). These will by default not lock for operations to allow
>> e.g. one thread to iterate over the list and another thread to index
>> it without lock contention and other general overhead. If one thread
>> is somehow changing the size of the list, or writing to indices that
>> another thread is reading from/writing to, the results will of course
>> be undefined unless the user synchronizes on the object. So it would
>> be the user's responsibility. The acquisition counting itself will
>> always be thread-safe (i.e., it will be atomic if possible, otherwise
>> it will lock).
>>
>> It's probably best to not enable this functionality by default as it
>> would be more expensive to instantiate objects, but it could be
>> supported through a cdef class decorator and a general directive.
>
> It's well known that this would be expensive. One of the approaches that
> tried to get rid of the GIL in CPython introduced fine grained locking,
> and it turned out to be substantially slower, AFAIR by a factor of two.

I'd gladly take a factor two (or even four) slowdown of CPython code any 
day to get rid of the GIL :-). The thing is, sometimes one has 48 cores 
and consider a 10x speedup better than nothing...

Dag Sverre

From stefan_ml at behnel.de  Tue Oct 25 16:37:24 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 25 Oct 2011 16:37:24 +0200
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <4EA6B999.7060807@astro.uio.no>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>	<4EA66648.8030102@behnel.de>
	<4EA6B999.7060807@astro.uio.no>
Message-ID: <4EA6C9A4.9030905@behnel.de>

Dag Sverre Seljebotn, 25.10.2011 15:28:
> On 10/25/2011 09:33 AM, Stefan Behnel wrote:
>> mark florisson, 24.10.2011 21:50:
>>> All of this functionality should also get a sane C API (to be provided
>>> by cython.h). You'd get a Cy_INCREF(obj, have_gil)/Cy_DECREF() etc.
>>> Every class using this functionality is a subclass of CythonObject
>>> (that contains a PyObject + an acquisition count + a lock). Perhaps if
>>> the user is subclassing something other than object we could allow the
>>> user to specify custom __cython_(un)lock__ and
>>> __cython_acquisition_count__ methods and fields.
>>>
>>> Now, building on top of this functionality, Cython could provide
>>> built-in nogil-compatible types, like lists, dicts and maybe tuples
>>> (as a start). These will by default not lock for operations to allow
>>> e.g. one thread to iterate over the list and another thread to index
>>> it without lock contention and other general overhead. If one thread
>>> is somehow changing the size of the list, or writing to indices that
>>> another thread is reading from/writing to, the results will of course
>>> be undefined unless the user synchronizes on the object. So it would
>>> be the user's responsibility. The acquisition counting itself will
>>> always be thread-safe (i.e., it will be atomic if possible, otherwise
>>> it will lock).
>>>
>>> It's probably best to not enable this functionality by default as it
>>> would be more expensive to instantiate objects, but it could be
>>> supported through a cdef class decorator and a general directive.
>>
>> It's well known that this would be expensive. One of the approaches that
>> tried to get rid of the GIL in CPython introduced fine grained locking,
>> and it turned out to be substantially slower, AFAIR by a factor of two.
>
> I'd gladly take a factor two (or even four) slowdown of CPython code any
> day to get rid of the GIL :-). The thing is, sometimes one has 48 cores and
> consider a 10x speedup better than nothing...

Ah, sorry, that factor was for single-threaded code. How it would scale for 
multi-core code depends on too many factors to make any general statement.

Stefan

From markflorisson88 at gmail.com  Tue Oct 25 18:58:39 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Tue, 25 Oct 2011 17:58:39 +0100
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <4EA69BE0.3060307@behnel.de>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>
	<4EA66648.8030102@behnel.de>
	<CANg26EViHdg6zfRzeBpH7VNvvqqfR+Cy7fu_-HJZBEv8+FdCHA@mail.gmail.com>
	<4EA69BE0.3060307@behnel.de>
Message-ID: <CANg26EVSJKkr9Y5Pu-wyCjWjhLWa5fN3VHTATzDqTTmnhjDehQ@mail.gmail.com>

On 25 October 2011 12:22, Stefan Behnel <stefan_ml at behnel.de> wrote:
> mark florisson, 25.10.2011 11:11:
>>
>> On 25 October 2011 08:33, Stefan Behnel wrote:
>>>
>>> mark florisson, 24.10.2011 21:50:
>>>>
>>>> This is in response to
>>>>
>>>>
>>>> http://groups.google.com/group/cython-users/browse_thread/thread/bcbc5fe0e329224f
>>>> and http://trac.cython.org/cython_trac/ticket/498 , and some of the
>>>> previous discussion on cython.parallel.
>>>>
>>>> Basically I think we should have something more powerful than 'cdef
>>>> borrowed CdefClass obj', something that also doesn't rely on new
>>>> syntax.
>>>
>>> We will still need borrowed reference support in the compiler eventually,
>>> whether we make it a language feature or not.
>>
>> I'm not sure I understand why, acquisition counting could solve these
>> problems for cdef classes, and general objects may not be used without
>> the GIL. Do you want this as an optimization?
>
> Yes. Think of type(x), for example, or PyDict_GetItem(). They return
> borrowed references, and in many cases, Cython wouldn't have to INCREF and
> DECREF them when they are only being used as part of some specific kinds of
> expressions. The same applies to some utility functions in Cython that
> currently must INCREF their return value unconditionally, simply because
> they can't tell Cython that they could also return a borrowed reference
> instead. If there was a way to do that, we could optimise the reference
> counting away in a couple of more places, which would get us another bit
> closer to hand-tuned code.
>
> However, note that this doesn't necessarily have an impact on nogil code. If
> you took a borrowed reference in one nogil thread, and a gil-holding thread
> deletes the object at the same time or during the lifetime of the borrowed
> reference (e.g. by updating a dict or assigning to a cdef attribute), the
> nogil thread would end up with a dead pointer in its hands. That's why the
> usage of borrowed references needs to be explicit in the code ("I know what
> I'm doing"), and the optimisations require the GIL to be held.
>

I see, ok. Thanks, that really helped me see the motivation behind it
(i.e., the INC/DECREF really is a performance issue for you).

>>>> What if we support acquisition counting for every instance of a cdef
>>>> class? In Python and Cython GIL mode you use reference counting, and
>>>> in Cython nogil mode and for structs attributes, array dtypes etc you
>>>> use acquisition counting. This allows you to pass around cdef objects
>>>> without the GIL and use their nogil methods. If the acquisition count
>>>> is greater than 1, the acquisition count owns a reference to the
>>>> object. If it reaches 0 you discard your owned reference (you can
>>>> simply acquire the GIL if you don't have it) and when you increment
>>>> from zero you obtain it. Perhaps something like libatomic could be
>>>> used to efficiently implement this.
>>>
>>> Where would you store that count? In the object struct? That would
>>> increase
>>> the size of each instance.
>>
>> Yes, not just the count, also the lock. This feature would be optional
>> and may be very useful for people (I think).
>
> Well, as long as it's an optional feature that requires a class decorator,
> the only obvious drawback is that it'll bloat the compiler even more than it
> is already.
>

Actually, I think it will help the implementation of mutexes and async
objects if we want those, and possibly other stuff in the future. The
acquisition counting is basically already there (for memoryviews), so
it's easy to track down where and when to apply this. However one
major problem would be circular acquisition counts, so you'd also have
to implement a garbage collector like CPython has (e.g. if you have a
cdef class with a cython.parallel.dict). We should just have a real
garbage collector instead of all the counting crap. Or we could make
it a burden for the user...

I agree that this is really not as feasible as I first thought. It
actually shows me a problem where I can have a memoryview object in a
memoryview with dtype 'object', although the problem here is that the
memoryview object doesn't traverse the object in the Py_buffer, or
when coerced from a memoryview slice to a memoryview object, the
memoryview slice struct object... I suppose I need to fix that (but
I'm not sure how, as you can't provide a manual traverse function in
Cython).

But I really believe that these are much-wanted features. If you're
using threads in Python you can only get concurrency not parallelism,
unless you release the GIL, even if there is some performance overhead
it will still be a lot better than sequential execution. Perhaps when
cython.parallel will be more mature, we may get functionality to
specify data distribution schemes and message passing, in which case
the GIL won't be a problem. But many things would be harder or much
more expensive, e.g. transposing, sending objects etc.

I think I'll just drop this discussion for now. I'm going to look at
how garbage collection works, how pypy works and their GIL, and figure
out what I want.

>>>> The advantages are:
>>>>
>>>> 1) allow users to pass around cdef typed objects in nogil mode
>>>> 2) allow cdef typed objects in as struct attributes or array elements
>>>> 3) make it easy to implement things like memoryviews (already done but
>>>> would have been a lot easier), cython.parallel.async/future objects,
>>>> cython.parallel.mutex objects and possibly other things in the future
>>>
>>> Would it really be easier? You can already call cdef methods in nogil
>>> mode,
>>> AFAIR.
>>
>> Sure, but you cannot store cdef objects as struct attributes, array
>> elements (you could implement it with reference counting, but not for
>> nogil mode)
>
> You could do that with borrowed references, though, assuming that you keep
> another reference around (or do your own ref-counting). However, I do see
> that keeping a real reference around may be hard to do in some cases.
>
>
>> and you cannot pass them around without the GIL.
>
> Yes, you can, as long as you only go through cdef functions. Obviously, you
> can't pass them into a Python function call, but you can (and could, if it
> was implemented) do loads of useful things with existing references even in
> nogil sections. The GIL checker is quite fine grained already but could do
> even better.
>

Ok, so cdef arguments are borrowed, which gets you somewhere but not
very far. It's rather baffling that f(x) is fine in nogil mode, but y
= x isn't.

>> This
>> proposal is about making your life easier without the GIL, and
>> currently it's kind of a pain.
>
> The nogil sections I use are usually quite short, so I can't tell. It's
> certainly a pain to work without the GIL, because it means you have to take
> a lot more care when writing your code. But that won't change just by
> dropping reference counting. And nogil code will definitely become another
> bit harder to get right when using borrowed references.
>
>
>> Ah I assumed cpdef nogil was invalid, I see it isn't, cool.
>
> It makes perfect sense. Just because a function *can* be called without the
> GIL doesn't mean it can't be called from Python. So the Python wrapper
> requires the GIL, but the underlying cdef function doesn't.
>
>
>> This breaks terribly for special methods though.
>
> Why? It's just a matter of properly separating out their Python wrapper.
> That's why I was referring to the DefNode refactoring.
>

I see, ok. All I meant was that it currently gives you compile errors.

>>>> All of this functionality should also get a sane C API (to be provided
>>>> by cython.h). You'd get a Cy_INCREF(obj, have_gil)/Cy_DECREF() etc.
>>>> Every class using this functionality is a subclass of CythonObject
>>>> (that contains a PyObject + an acquisition count + a lock). Perhaps if
>>>> the user is subclassing something other than object we could allow the
>>>> user to specify custom __cython_(un)lock__ and
>>>> __cython_acquisition_count__ methods and fields.
>>>>
>>>> Now, building on top of this functionality, Cython could provide
>>>> built-in nogil-compatible types, like lists, dicts and maybe tuples
>>>> (as a start). These will by default not lock for operations to allow
>>>> e.g. one thread to iterate over the list and another thread to index
>>>> it without lock contention and other general overhead. If one thread
>>>> is somehow changing the size of the list, or writing to indices that
>>>> another thread is reading from/writing to, the results will of course
>>>> be undefined unless the user synchronizes on the object. So it would
>>>> be the user's responsibility. The acquisition counting itself will
>>>> always be thread-safe (i.e., it will be atomic if possible, otherwise
>>>> it will lock).
>>>>
>>>> It's probably best to not enable this functionality by default as it
>>>> would be more expensive to instantiate objects, but it could be
>>>> supported through a cdef class decorator and a general directive.
>>>
>>> It's well known that this would be expensive. One of the approaches that
>>> tried to get rid of the GIL in CPython introduced fine grained locking,
>>> and
>>> it turned out to be substantially slower, AFAIR by a factor of two.
>>
>> Sure, I am aware of that. Often you can just keep the GIL, in which
>> case you wouldn't use these types. But when you want to leave the
>> shiny world of the GIL you still want these goodies. Acquiring the GIL
>> is too expensive as there is pretty much always contention.
>
> Acquiring a more fine grained lock is more likely to reduce the contention,
> but is not necessarily less expensive. The lock still needs to get acquired
> and released. GIL protected reference counting is a lot cheaper than that,
> as is manual locking in a more coarse grained fashion.
>

Well, many processors support atomic incrementing and decrementing
counters + checking whether the counter has reached zero. So for most
architectures you wouldn't need to lock for the counting (unless you
reach a count of zero and you're going to decref your object). Any
operation would lock though, which would indeed be expensive.

>>> You could potentially drop the locking for local variables, but you'd
>>> loose
>>> that ability as soon as the 'object' is passed into a function.
>>
>> Definitely, but you cannot use them with the GIL anyway :)
>
> Yes you can. For cdef functions, it's the responsibility of the caller to
> own the references of object arguments it passes. The called function
> doesn't have to do reference counting for them, as long as it doesn't try to
> reassign the variable. And even that could be fixed with borrowed
> references, and also partly by better control flow analysis.
>

Sorry, with "use" I mean "actually do something", like call a method,
lookup an attribute, coerce it, etc.

>>> Basically, what you are trying to do here is to duplicate the complete
>>> ref-counting infrastructure of CPython, but without using CPython.
>>>
>>>> Of course one may still use non-cdef borrowed objects, by simply
>>>> casting to a PyObject *.
>>>
>>> That's very ugly, though, because you loose all access to methods and
>>> attributes of the object. Basically, it becomes useless that way, except
>>> for
>>> storing away a pointer to it somewhere. You could just as well use a
>>> void*.
>>
>> Indeed, and that's really all you can do without the GIL.
>
> I think you're underestimating what can (or could) be done without holding
> the GIL. There are still some open features that wait for being implemented,
> even without adding new syntax (and thus further increasing the complexity
> of the language).
>

Yeah borrowed references definitely somewhere. It's just that for
supporting the parallel types that wouldn't be good enough.

>> I think
>> we're talking about different things, I'm talking about supporting
>> nogil, and you're talking about borrowed references in general.
>
> Both are related, though. It's certainly a lot easier and cleaner to support
> borrowed references in the compiler, than to implement a whole new scheme
> for handling extension type instances in addition to the normal object
> handling which we need anyway.
>
>
>> I'm
>> not sure why you'd not just take a reference instead in GIL mode,
>> unless you were worried about incrementing a counter.
>
> Decrementing it, not incrementing. :)
>
> The problem is not so much the INCREF (which is just an indirect add), it's
> the DECREF, which contains a conditional jump based on an unknown external
> value, that may trigger external code. That can kill several C compiler
> optimisations for the surrounding code. (And that would only get worse by
> using a dedicated locking mechanism.)
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

Anyway, sorry for the long mail. I agree this is likely not feasible
to implement, although I would like the functionality to be there.
Perhaps I'm trying to solve problems which don't really need to be
solved. Maybe we should just use multiprocessing, or MPI and numpy
with global arrays and pickling. Maybe memoryviews could help out with
that as well.

From stefan_ml at behnel.de  Tue Oct 25 20:10:51 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 25 Oct 2011 20:10:51 +0200
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <CANg26EVSJKkr9Y5Pu-wyCjWjhLWa5fN3VHTATzDqTTmnhjDehQ@mail.gmail.com>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>	<4EA66648.8030102@behnel.de>	<CANg26EViHdg6zfRzeBpH7VNvvqqfR+Cy7fu_-HJZBEv8+FdCHA@mail.gmail.com>	<4EA69BE0.3060307@behnel.de>
	<CANg26EVSJKkr9Y5Pu-wyCjWjhLWa5fN3VHTATzDqTTmnhjDehQ@mail.gmail.com>
Message-ID: <4EA6FBAB.5070301@behnel.de>

mark florisson, 25.10.2011 18:58:
> On 25 October 2011 12:22, Stefan Behnel wrote:
>> mark florisson, 25.10.2011 11:11:
>>> On 25 October 2011 08:33, Stefan Behnel wrote:
>>>> mark florisson, 24.10.2011 21:50:
>>>>> What if we support acquisition counting for every instance of a cdef
>>>>> class? In Python and Cython GIL mode you use reference counting, and
>>>>> in Cython nogil mode and for structs attributes, array dtypes etc you
>>>>> use acquisition counting. This allows you to pass around cdef objects
>>>>> without the GIL and use their nogil methods. If the acquisition count
>>>>> is greater than 1, the acquisition count owns a reference to the
>>>>> object. If it reaches 0 you discard your owned reference (you can
>>>>> simply acquire the GIL if you don't have it) and when you increment
>>>>> from zero you obtain it. Perhaps something like libatomic could be
>>>>> used to efficiently implement this.
>>>>
>>>> Where would you store that count? In the object struct? That would
>>>> increase the size of each instance.
>>>
>>> Yes, not just the count, also the lock. This feature would be optional
>>> and may be very useful for people (I think).
>>
>> Well, as long as it's an optional feature that requires a class decorator,
>> the only obvious drawback is that it'll bloat the compiler even more than it
>> is already.
>
> Actually, I think it will help the implementation of mutexes and async
> objects if we want those, and possibly other stuff in the future.

If all you want is to support the regular with statement in nogil blocks, 
part of that is implemented already. I recently added support for 
implementing the context manager's __enter__() method as c(p)def method. 
However, __exit__() isn't there yet, as it's a bit more tricky - maybe 
taking off a C pointer to the cdef method and calling that, or calling the 
cdef method directly instead (not sure), but always making sure that there 
still is a reference to the context manager itself, and eventually freeing 
it. I'm sure it can be done, though, maybe with some restrictions in nogil 
mode. If we additionally fix it up to use the exception propagation and 
try-finally support that you wrote for the with-gil feature, we're 
basically there.


> The
> acquisition counting is basically already there (for memoryviews), so
> it's easy to track down where and when to apply this. However one
> major problem would be circular acquisition counts, so you'd also have
> to implement a garbage collector like CPython has (e.g. if you have a
> cdef class with a cython.parallel.dict). We should just have a real
> garbage collector instead of all the counting crap. Or we could make
> it a burden for the user...

Right, these things can grow endlessly. It took CPython something like a 
dozen years to a) recognise the need for and b) implement a garbage 
collector. Let's hope that Cython will never get one.


> I agree that this is really not as feasible as I first thought. It
> actually shows me a problem where I can have a memoryview object in a
> memoryview with dtype 'object', although the problem here is that the
> memoryview object doesn't traverse the object in the Py_buffer, or
> when coerced from a memoryview slice to a memoryview object, the
> memoryview slice struct object... I suppose I need to fix that (but
> I'm not sure how, as you can't provide a manual traverse function in
> Cython).

No, you may have to descend into C here. Or, you could disable a Python 
object dtype for the time being?


> But I really believe that these are much-wanted features. If you're
> using threads in Python you can only get concurrency not parallelism,
> unless you release the GIL, even if there is some performance overhead
> it will still be a lot better than sequential execution. Perhaps when
> cython.parallel will be more mature, we may get functionality to
> specify data distribution schemes and message passing, in which case
> the GIL won't be a problem. But many things would be harder or much
> more expensive, e.g. transposing, sending objects etc.

See? That's what I mean with language complexity. These things quickly turn 
into an open can of worms. I don't think the language should handle any of 
these. Message passing is up to libraries, for example. If you want 
language support, use Erlang.


>>>>> The advantages are:
>>>>>
>>>>> 1) allow users to pass around cdef typed objects in nogil mode
>>>>> 2) allow cdef typed objects in as struct attributes or array elements
>>>>> 3) make it easy to implement things like memoryviews (already done but
>>>>> would have been a lot easier), cython.parallel.async/future objects,
>>>>> cython.parallel.mutex objects and possibly other things in the future
>>>>
>>>> Would it really be easier? You can already call cdef methods in nogil
>>>> mode,
>>>> AFAIR.
>>>
>>> Sure, but you cannot store cdef objects as struct attributes, array
>>> elements (you could implement it with reference counting, but not for
>>> nogil mode)
>>
>> You could do that with borrowed references, though, assuming that you keep
>> another reference around (or do your own ref-counting). However, I do see
>> that keeping a real reference around may be hard to do in some cases.
>>
>>
>>> and you cannot pass them around without the GIL.
>>
>> Yes, you can, as long as you only go through cdef functions. Obviously, you
>> can't pass them into a Python function call, but you can (and could, if it
>> was implemented) do loads of useful things with existing references even in
>> nogil sections. The GIL checker is quite fine grained already but could do
>> even better.
>>
>
> Ok, so cdef arguments are borrowed, which gets you somewhere but not
> very far. It's rather baffling that f(x) is fine in nogil mode, but y
> = x isn't.

"y = x" could work if it's using borrowed references, though. The 
"borrowed" flag could be inferred automatically in nogil mode. Then it 
would only be an error if the user explicitly declared it as owned.


>>> This
>>> proposal is about making your life easier without the GIL, and
>>> currently it's kind of a pain.
>>
>> The nogil sections I use are usually quite short, so I can't tell. It's
>> certainly a pain to work without the GIL, because it means you have to take
>> a lot more care when writing your code. But that won't change just by
>> dropping reference counting. And nogil code will definitely become another
>> bit harder to get right when using borrowed references.
>>
>>
>>> Ah I assumed cpdef nogil was invalid, I see it isn't, cool.
>>
>> It makes perfect sense. Just because a function *can* be called without the
>> GIL doesn't mean it can't be called from Python. So the Python wrapper
>> requires the GIL, but the underlying cdef function doesn't.
>>
>>
>>> This breaks terribly for special methods though.
>>
>> Why? It's just a matter of properly separating out their Python wrapper.
>> That's why I was referring to the DefNode refactoring.
>
> I see, ok. All I meant was that it currently gives you compile errors.

I know. I've given ticket #3 enough (smaller) tries to know basically all 
problems by now.


> Anyway, sorry for the long mail. I agree this is likely not feasible
> to implement, although I would like the functionality to be there.
> Perhaps I'm trying to solve problems which don't really need to be
> solved. Maybe we should just use multiprocessing, or MPI and numpy
> with global arrays and pickling. Maybe memoryviews could help out with
> that as well.

In any case, I think we should let the existing features settle for a 
while, and see what users come up with. Not every feature that *can* be 
done is worth making a language feature.

Stefan

From markflorisson88 at gmail.com  Tue Oct 25 20:45:46 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Tue, 25 Oct 2011 19:45:46 +0100
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <4EA6FBAB.5070301@behnel.de>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>
	<4EA66648.8030102@behnel.de>
	<CANg26EViHdg6zfRzeBpH7VNvvqqfR+Cy7fu_-HJZBEv8+FdCHA@mail.gmail.com>
	<4EA69BE0.3060307@behnel.de>
	<CANg26EVSJKkr9Y5Pu-wyCjWjhLWa5fN3VHTATzDqTTmnhjDehQ@mail.gmail.com>
	<4EA6FBAB.5070301@behnel.de>
Message-ID: <CANg26EUp9iRQry2jPOAoruaF10_SFRxAtOC2K-XwdEHgO54_fw@mail.gmail.com>

On 25 October 2011 19:10, Stefan Behnel <stefan_ml at behnel.de> wrote:
> mark florisson, 25.10.2011 18:58:
>>
>> On 25 October 2011 12:22, Stefan Behnel wrote:
>>>
>>> mark florisson, 25.10.2011 11:11:
>>>>
>>>> On 25 October 2011 08:33, Stefan Behnel wrote:
>>>>>
>>>>> mark florisson, 24.10.2011 21:50:
>>>>>>
>>>>>> What if we support acquisition counting for every instance of a cdef
>>>>>> class? In Python and Cython GIL mode you use reference counting, and
>>>>>> in Cython nogil mode and for structs attributes, array dtypes etc you
>>>>>> use acquisition counting. This allows you to pass around cdef objects
>>>>>> without the GIL and use their nogil methods. If the acquisition count
>>>>>> is greater than 1, the acquisition count owns a reference to the
>>>>>> object. If it reaches 0 you discard your owned reference (you can
>>>>>> simply acquire the GIL if you don't have it) and when you increment
>>>>>> from zero you obtain it. Perhaps something like libatomic could be
>>>>>> used to efficiently implement this.
>>>>>
>>>>> Where would you store that count? In the object struct? That would
>>>>> increase the size of each instance.
>>>>
>>>> Yes, not just the count, also the lock. This feature would be optional
>>>> and may be very useful for people (I think).
>>>
>>> Well, as long as it's an optional feature that requires a class
>>> decorator,
>>> the only obvious drawback is that it'll bloat the compiler even more than
>>> it
>>> is already.
>>
>> Actually, I think it will help the implementation of mutexes and async
>> objects if we want those, and possibly other stuff in the future.
>
> If all you want is to support the regular with statement in nogil blocks,
> part of that is implemented already. I recently added support for
> implementing the context manager's __enter__() method as c(p)def method.
> However, __exit__() isn't there yet, as it's a bit more tricky - maybe
> taking off a C pointer to the cdef method and calling that, or calling the
> cdef method directly instead (not sure), but always making sure that there
> still is a reference to the context manager itself, and eventually freeing
> it. I'm sure it can be done, though, maybe with some restrictions in nogil
> mode. If we additionally fix it up to use the exception propagation and
> try-finally support that you wrote for the with-gil feature, we're basically
> there.
>

Cool. I suppose if you combine that with borrowed references you may
just get somewhere implementing the mutexes. On the other hand it
won't really be more convenient than passing OpenMP or Python locks
around, just slightly more pythonic.

>> The
>> acquisition counting is basically already there (for memoryviews), so
>> it's easy to track down where and when to apply this. However one
>> major problem would be circular acquisition counts, so you'd also have
>> to implement a garbage collector like CPython has (e.g. if you have a
>> cdef class with a cython.parallel.dict). We should just have a real
>> garbage collector instead of all the counting crap. Or we could make
>> it a burden for the user...
>
> Right, these things can grow endlessly. It took CPython something like a
> dozen years to a) recognise the need for and b) implement a garbage
> collector. Let's hope that Cython will never get one.
>
>
>> I agree that this is really not as feasible as I first thought. It
>> actually shows me a problem where I can have a memoryview object in a
>> memoryview with dtype 'object', although the problem here is that the
>> memoryview object doesn't traverse the object in the Py_buffer, or
>> when coerced from a memoryview slice to a memoryview object, the
>> memoryview slice struct object... I suppose I need to fix that (but
>> I'm not sure how, as you can't provide a manual traverse function in
>> Cython).
>
> No, you may have to descend into C here. Or, you could disable a Python
> object dtype for the time being?
>

Yes disabling would be easy, but it should be fixed (at some point).
Perhaps I can just override the tp_traverse of the type object in the
module init function (and maybe save that pointer and call it from the
new function + traverse the Py_buffer).

I'm not entire sure how we support Py_buffer, but it is a built-in
thing and it doesn't result in a traverse:

cdef class X(object):
    cdef Py_buffer view

<- this won't have a traverse function. Fixing that won't get me there
though, I need to do the same thing for memoryview objects wrapping a
memoryview struct.

>> But I really believe that these are much-wanted features. If you're
>> using threads in Python you can only get concurrency not parallelism,
>> unless you release the GIL, even if there is some performance overhead
>> it will still be a lot better than sequential execution. Perhaps when
>> cython.parallel will be more mature, we may get functionality to
>> specify data distribution schemes and message passing, in which case
>> the GIL won't be a problem. But many things would be harder or much
>> more expensive, e.g. transposing, sending objects etc.
>
> See? That's what I mean with language complexity. These things quickly turn
> into an open can of worms. I don't think the language should handle any of
> these. Message passing is up to libraries, for example. If you want language
> support, use Erlang.
>

I haven't used Erlang (though I should give it a go), but I find that
built-in support for these things just ends up to be much more
elegant. MPI (and possibly zeromq) just look terrible and complicated
if you compare them to Unified Parallel C, High Performance Fortran or
Co-Array Fortran. I don't know about Go channels. This doesn't mean
that we should support it, but we might consider it.

>>>>>> The advantages are:
>>>>>>
>>>>>> 1) allow users to pass around cdef typed objects in nogil mode
>>>>>> 2) allow cdef typed objects in as struct attributes or array elements
>>>>>> 3) make it easy to implement things like memoryviews (already done but
>>>>>> would have been a lot easier), cython.parallel.async/future objects,
>>>>>> cython.parallel.mutex objects and possibly other things in the future
>>>>>
>>>>> Would it really be easier? You can already call cdef methods in nogil
>>>>> mode,
>>>>> AFAIR.
>>>>
>>>> Sure, but you cannot store cdef objects as struct attributes, array
>>>> elements (you could implement it with reference counting, but not for
>>>> nogil mode)
>>>
>>> You could do that with borrowed references, though, assuming that you
>>> keep
>>> another reference around (or do your own ref-counting). However, I do see
>>> that keeping a real reference around may be hard to do in some cases.
>>>
>>>
>>>> and you cannot pass them around without the GIL.
>>>
>>> Yes, you can, as long as you only go through cdef functions. Obviously,
>>> you
>>> can't pass them into a Python function call, but you can (and could, if
>>> it
>>> was implemented) do loads of useful things with existing references even
>>> in
>>> nogil sections. The GIL checker is quite fine grained already but could
>>> do
>>> even better.
>>>
>>
>> Ok, so cdef arguments are borrowed, which gets you somewhere but not
>> very far. It's rather baffling that f(x) is fine in nogil mode, but y
>> = x isn't.
>
> "y = x" could work if it's using borrowed references, though. The "borrowed"
> flag could be inferred automatically in nogil mode. Then it would only be an
> error if the user explicitly declared it as owned.
>

I think inferring that would be hard, unless x is already borrowed.
E.g. I could do 'with gil: x = None' and it might break. It would be
cool if it could detect that though.

>>>> This
>>>> proposal is about making your life easier without the GIL, and
>>>> currently it's kind of a pain.
>>>
>>> The nogil sections I use are usually quite short, so I can't tell. It's
>>> certainly a pain to work without the GIL, because it means you have to
>>> take
>>> a lot more care when writing your code. But that won't change just by
>>> dropping reference counting. And nogil code will definitely become
>>> another
>>> bit harder to get right when using borrowed references.
>>>
>>>
>>>> Ah I assumed cpdef nogil was invalid, I see it isn't, cool.
>>>
>>> It makes perfect sense. Just because a function *can* be called without
>>> the
>>> GIL doesn't mean it can't be called from Python. So the Python wrapper
>>> requires the GIL, but the underlying cdef function doesn't.
>>>
>>>
>>>> This breaks terribly for special methods though.
>>>
>>> Why? It's just a matter of properly separating out their Python wrapper.
>>> That's why I was referring to the DefNode refactoring.
>>
>> I see, ok. All I meant was that it currently gives you compile errors.
>
> I know. I've given ticket #3 enough (smaller) tries to know basically all
> problems by now.
>
>
>> Anyway, sorry for the long mail. I agree this is likely not feasible
>> to implement, although I would like the functionality to be there.
>> Perhaps I'm trying to solve problems which don't really need to be
>> solved. Maybe we should just use multiprocessing, or MPI and numpy
>> with global arrays and pickling. Maybe memoryviews could help out with
>> that as well.
>
> In any case, I think we should let the existing features settle for a while,
> and see what users come up with. Not every feature that *can* be done is
> worth making a language feature.

Definitely. It's just that you regularly find users who want to do
things in nogil mode that they just can't. Or to have arrays or some
such of objects etc. Lettings things settle is a good idea.

> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From d.s.seljebotn at astro.uio.no  Tue Oct 25 21:01:00 2011
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Tue, 25 Oct 2011 21:01:00 +0200
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <CANg26EVSJKkr9Y5Pu-wyCjWjhLWa5fN3VHTATzDqTTmnhjDehQ@mail.gmail.com>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>	<4EA66648.8030102@behnel.de>	<CANg26EViHdg6zfRzeBpH7VNvvqqfR+Cy7fu_-HJZBEv8+FdCHA@mail.gmail.com>	<4EA69BE0.3060307@behnel.de>
	<CANg26EVSJKkr9Y5Pu-wyCjWjhLWa5fN3VHTATzDqTTmnhjDehQ@mail.gmail.com>
Message-ID: <4EA7076C.1090708@astro.uio.no>

On 10/25/2011 06:58 PM, mark florisson wrote:
> On 25 October 2011 12:22, Stefan Behnel<stefan_ml at behnel.de>  wrote:
>> The problem is not so much the INCREF (which is just an indirect add), it's
>> the DECREF, which contains a conditional jump based on an unknown external
>> value, that may trigger external code. That can kill several C compiler
>> optimisations for the surrounding code. (And that would only get worse by
>> using a dedicated locking mechanism.)

What you could do is a form of psuedo-garbage-collection where, when the 
Cython refcount/acquisition count reaches 0, you enqueue a Python DECREF 
until you're holding the GIL anyway. If sticking it into the queue is 
unlikely(), and it is transparent to the compiler that it doesn't 
dispatch into unknown code.

(And regarding Stefan's comment about Erlang: It's all about available 
libraries. A language for concurrent computing running on CPython and 
able to use all the libraries available for CPython would be awesome. It 
doesn't need to be named Cython -- show me an Erlang port to the CPython 
platform and I'd perhaps jump ship.)


> Anyway, sorry for the long mail. I agree this is likely not feasible
> to implement, although I would like the functionality to be there.
> Perhaps I'm trying to solve problems which don't really need to be
> solved. Maybe we should just use multiprocessing, or MPI and numpy
> with global arrays and pickling. Maybe memoryviews could help out with
> that as well.

Nice conclusion. I think prange was a very nice 80%-there-solution 
(which is also the way we framed it when starting), but the GIL just 
creates to many barriers. Real garbage collection is needed, and CPython 
just isn't there.

What I'd like to see personally is:

  - A convenient utility to allocate an array in shared memory, so that 
when you pickle a view of it and send it to another Python process with 
multiprocessing and it unpickles, it gets a slice into to the same 
shared memory. People already do this but it's just a lot of jumping 
through hoops. A good place would probably be in NumPy.

  - Decent message passing using ZeroMQ in Cython code without any 
Python overhead, for fine-grained communication in Cython code in Python 
processes spawned using multiprocessing. I think this requires some 
syntax candy in Cython to feel natural enough, but perhaps it can be put 
on a form so that it is not ZeroMQ-specific.

Dag Sverre

From d.s.seljebotn at astro.uio.no  Tue Oct 25 21:02:19 2011
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Tue, 25 Oct 2011 21:02:19 +0200
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <4EA7076C.1090708@astro.uio.no>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>	<4EA66648.8030102@behnel.de>	<CANg26EViHdg6zfRzeBpH7VNvvqqfR+Cy7fu_-HJZBEv8+FdCHA@mail.gmail.com>	<4EA69BE0.3060307@behnel.de>
	<CANg26EVSJKkr9Y5Pu-wyCjWjhLWa5fN3VHTATzDqTTmnhjDehQ@mail.gmail.com>
	<4EA7076C.1090708@astro.uio.no>
Message-ID: <4EA707BB.9060408@astro.uio.no>

On 10/25/2011 09:01 PM, Dag Sverre Seljebotn wrote:
> On 10/25/2011 06:58 PM, mark florisson wrote:
>> On 25 October 2011 12:22, Stefan Behnel<stefan_ml at behnel.de> wrote:
>>> The problem is not so much the INCREF (which is just an indirect
>>> add), it's
>>> the DECREF, which contains a conditional jump based on an unknown
>>> external
>>> value, that may trigger external code. That can kill several C compiler
>>> optimisations for the surrounding code. (And that would only get
>>> worse by
>>> using a dedicated locking mechanism.)
>
> What you could do is a form of psuedo-garbage-collection where, when the
> Cython refcount/acquisition count reaches 0, you enqueue a Python DECREF
> until you're holding the GIL anyway. If sticking it into the queue is
> unlikely(), and it is transparent to the compiler that it doesn't
> dispatch into unknown code.

...then the C compiler optimizations should presumably not be killed.

DS

>
> (And regarding Stefan's comment about Erlang: It's all about available
> libraries. A language for concurrent computing running on CPython and
> able to use all the libraries available for CPython would be awesome. It
> doesn't need to be named Cython -- show me an Erlang port to the CPython
> platform and I'd perhaps jump ship.)
>
>
>> Anyway, sorry for the long mail. I agree this is likely not feasible
>> to implement, although I would like the functionality to be there.
>> Perhaps I'm trying to solve problems which don't really need to be
>> solved. Maybe we should just use multiprocessing, or MPI and numpy
>> with global arrays and pickling. Maybe memoryviews could help out with
>> that as well.
>
> Nice conclusion. I think prange was a very nice 80%-there-solution
> (which is also the way we framed it when starting), but the GIL just
> creates to many barriers. Real garbage collection is needed, and CPython
> just isn't there.
>
> What I'd like to see personally is:
>
> - A convenient utility to allocate an array in shared memory, so that
> when you pickle a view of it and send it to another Python process with
> multiprocessing and it unpickles, it gets a slice into to the same
> shared memory. People already do this but it's just a lot of jumping
> through hoops. A good place would probably be in NumPy.
>
> - Decent message passing using ZeroMQ in Cython code without any Python
> overhead, for fine-grained communication in Cython code in Python
> processes spawned using multiprocessing. I think this requires some
> syntax candy in Cython to feel natural enough, but perhaps it can be put
> on a form so that it is not ZeroMQ-specific.
>
> Dag Sverre


From d.s.seljebotn at astro.uio.no  Tue Oct 25 21:15:26 2011
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Tue, 25 Oct 2011 21:15:26 +0200
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <CANg26EUp9iRQry2jPOAoruaF10_SFRxAtOC2K-XwdEHgO54_fw@mail.gmail.com>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>	<4EA66648.8030102@behnel.de>	<CANg26EViHdg6zfRzeBpH7VNvvqqfR+Cy7fu_-HJZBEv8+FdCHA@mail.gmail.com>	<4EA69BE0.3060307@behnel.de>	<CANg26EVSJKkr9Y5Pu-wyCjWjhLWa5fN3VHTATzDqTTmnhjDehQ@mail.gmail.com>	<4EA6FBAB.5070301@behnel.de>
	<CANg26EUp9iRQry2jPOAoruaF10_SFRxAtOC2K-XwdEHgO54_fw@mail.gmail.com>
Message-ID: <4EA70ACE.3000401@astro.uio.no>

On 10/25/2011 08:45 PM, mark florisson wrote:
> On 25 October 2011 19:10, Stefan Behnel<stefan_ml at behnel.de>  wrote:
>> See? That's what I mean with language complexity. These things quickly turn
>> into an open can of worms. I don't think the language should handle any of
>> these. Message passing is up to libraries, for example. If you want language
>> support, use Erlang.
>>
>
> I haven't used Erlang (though I should give it a go), but I find that
> built-in support for these things just ends up to be much more
> elegant. MPI (and possibly zeromq) just look terrible and complicated
> if you compare them to Unified Parallel C, High Performance Fortran or

Using libraries for message passing is sort of like doing complex string 
manipulation only using malloc, free, and string.h :-)

> Co-Array Fortran. I don't know about Go channels. This doesn't mean
> that we should support it, but we might consider it.

I think you should definitely read up on Go channels, they're just like 
what I'd like to write in Cython.

Dag Sverre

From markflorisson88 at gmail.com  Tue Oct 25 21:24:13 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Tue, 25 Oct 2011 20:24:13 +0100
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <4EA7076C.1090708@astro.uio.no>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>
	<4EA66648.8030102@behnel.de>
	<CANg26EViHdg6zfRzeBpH7VNvvqqfR+Cy7fu_-HJZBEv8+FdCHA@mail.gmail.com>
	<4EA69BE0.3060307@behnel.de>
	<CANg26EVSJKkr9Y5Pu-wyCjWjhLWa5fN3VHTATzDqTTmnhjDehQ@mail.gmail.com>
	<4EA7076C.1090708@astro.uio.no>
Message-ID: <CANg26EWNcjTke1HKLFBM+1Vw8CWANvitzSkoBVvtTMon1BgPeA@mail.gmail.com>

On 25 October 2011 20:01, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 10/25/2011 06:58 PM, mark florisson wrote:
>>
>> On 25 October 2011 12:22, Stefan Behnel<stefan_ml at behnel.de> ?wrote:
>>>
>>> The problem is not so much the INCREF (which is just an indirect add),
>>> it's
>>> the DECREF, which contains a conditional jump based on an unknown
>>> external
>>> value, that may trigger external code. That can kill several C compiler
>>> optimisations for the surrounding code. (And that would only get worse by
>>> using a dedicated locking mechanism.)
>
> What you could do is a form of psuedo-garbage-collection where, when the
> Cython refcount/acquisition count reaches 0, you enqueue a Python DECREF
> until you're holding the GIL anyway. If sticking it into the queue is
> unlikely(), and it is transparent to the compiler that it doesn't dispatch
> into unknown code.

I thought about that as wel, but the problem is that you can only
defer the DECREF to a garbage collector if your acquisition count
reaches zero and your reference count is one. However, you may reach
an acquisition count of zero with a reference count > 1, which means
you could have the following race:

    1) acquisition count reaches zero, a DECREF is pending in the
garbage collector thread
    2) you obtain a nonzero acquisition count from the object (e.g. by
assigning a non-typed to a typed variable)
    3) you lose your acquisition count again, another DECREF should be pending
    4) the garbage collector figures out it needs to DECREF (it should
actually do this twice)

Now, you could keep a counter for how many times that happens, but
that will likely not be better than an immediate DECREF. In short,
reference counting is terrible. I think unlikely() will help the
compiler here as you said though, and your processor will have branch
prediction, out of order execution and conditional instructions which
may all help.

> (And regarding Stefan's comment about Erlang: It's all about available
> libraries. A language for concurrent computing running on CPython and able
> to use all the libraries available for CPython would be awesome. It doesn't
> need to be named Cython -- show me an Erlang port to the CPython platform
> and I'd perhaps jump ship.)
>
>
>> Anyway, sorry for the long mail. I agree this is likely not feasible
>> to implement, although I would like the functionality to be there.
>> Perhaps I'm trying to solve problems which don't really need to be
>> solved. Maybe we should just use multiprocessing, or MPI and numpy
>> with global arrays and pickling. Maybe memoryviews could help out with
>> that as well.
>
> Nice conclusion. I think prange was a very nice 80%-there-solution (which is
> also the way we framed it when starting), but the GIL just creates to many
> barriers. Real garbage collection is needed, and CPython just isn't there.
>
> What I'd like to see personally is:
>
> ?- A convenient utility to allocate an array in shared memory, so that when
> you pickle a view of it and send it to another Python process with
> multiprocessing and it unpickles, it gets a slice into to the same shared
> memory. People already do this but it's just a lot of jumping through hoops.
> A good place would probably be in NumPy.

I haven't used it myself, but can the global array support help in that regard?

> ?- Decent message passing using ZeroMQ in Cython code without any Python
> overhead, for fine-grained communication in Cython code in Python processes
> spawned using multiprocessing. I think this requires some syntax candy in
> Cython to feel natural enough, but perhaps it can be put on a form so that
> it is not ZeroMQ-specific.
>
> Dag Sverre
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From markflorisson88 at gmail.com  Tue Oct 25 21:24:49 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Tue, 25 Oct 2011 20:24:49 +0100
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <4EA70ACE.3000401@astro.uio.no>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>
	<4EA66648.8030102@behnel.de>
	<CANg26EViHdg6zfRzeBpH7VNvvqqfR+Cy7fu_-HJZBEv8+FdCHA@mail.gmail.com>
	<4EA69BE0.3060307@behnel.de>
	<CANg26EVSJKkr9Y5Pu-wyCjWjhLWa5fN3VHTATzDqTTmnhjDehQ@mail.gmail.com>
	<4EA6FBAB.5070301@behnel.de>
	<CANg26EUp9iRQry2jPOAoruaF10_SFRxAtOC2K-XwdEHgO54_fw@mail.gmail.com>
	<4EA70ACE.3000401@astro.uio.no>
Message-ID: <CANg26EX1MxeJV8ogB2AM+oz28WWKqUjoaebx0QQdGyENpVE8QA@mail.gmail.com>

On 25 October 2011 20:15, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 10/25/2011 08:45 PM, mark florisson wrote:
>>
>> On 25 October 2011 19:10, Stefan Behnel<stefan_ml at behnel.de> ?wrote:
>>>
>>> See? That's what I mean with language complexity. These things quickly
>>> turn
>>> into an open can of worms. I don't think the language should handle any
>>> of
>>> these. Message passing is up to libraries, for example. If you want
>>> language
>>> support, use Erlang.
>>>
>>
>> I haven't used Erlang (though I should give it a go), but I find that
>> built-in support for these things just ends up to be much more
>> elegant. MPI (and possibly zeromq) just look terrible and complicated
>> if you compare them to Unified Parallel C, High Performance Fortran or
>
> Using libraries for message passing is sort of like doing complex string
> manipulation only using malloc, free, and string.h :-)
>
>> Co-Array Fortran. I don't know about Go channels. This doesn't mean
>> that we should support it, but we might consider it.
>
> I think you should definitely read up on Go channels, they're just like what
> I'd like to write in Cython.

That's a good motivator :) I'll do that.

> Dag Sverre
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From greg.ewing at canterbury.ac.nz  Wed Oct 26 00:27:19 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 26 Oct 2011 11:27:19 +1300
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <4EA6B999.7060807@astro.uio.no>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>
	<4EA66648.8030102@behnel.de> <4EA6B999.7060807@astro.uio.no>
Message-ID: <4EA737C7.5010500@canterbury.ac.nz>

Dag Sverre Seljebotn wrote:

> I'd gladly take a factor two (or even four) slowdown of CPython code any 
> day to get rid of the GIL :-). The thing is, sometimes one has 48 cores 
> and consider a 10x speedup better than nothing...

Another thing to consider is that locking around refcount
changes may not be as expensive in typical Cython code as
it is in Python.

The trouble with Python is that you can't so much as scratch
your nose without touching a big pile of ref counts. But
if the Cython code is only dealing with a few Python objects
and doing most of its work at the C level, the relative
overhead of locking around refcount changes may not be
significant.

So it may be worth trying the strategy of just acquiring
the GIL whenever a refcount needs to be changed in a nogil
section, and damn the consequences.

-- 
Greg

From stefan_ml at behnel.de  Wed Oct 26 09:56:35 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 26 Oct 2011 09:56:35 +0200
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <4EA737C7.5010500@canterbury.ac.nz>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>	<4EA66648.8030102@behnel.de>
	<4EA6B999.7060807@astro.uio.no> <4EA737C7.5010500@canterbury.ac.nz>
Message-ID: <4EA7BD33.60009@behnel.de>

Greg Ewing, 26.10.2011 00:27:
> Dag Sverre Seljebotn wrote:
>
>> I'd gladly take a factor two (or even four) slowdown of CPython code any
>> day to get rid of the GIL :-). The thing is, sometimes one has 48 cores
>> and consider a 10x speedup better than nothing...
>
> Another thing to consider is that locking around refcount
> changes may not be as expensive in typical Cython code as
> it is in Python.
>
> The trouble with Python is that you can't so much as scratch
> your nose without touching a big pile of ref counts. But
> if the Cython code is only dealing with a few Python objects
> and doing most of its work at the C level, the relative
> overhead of locking around refcount changes may not be
> significant.
>
> So it may be worth trying the strategy of just acquiring
> the GIL whenever a refcount needs to be changed in a nogil
> section, and damn the consequences.

Hmm, interesting. That would give new semantics to "nogil" sections, basically:

"""
You can do Python interaction in nogil code, however, this will slow down 
your code. Cython will generate C code to acquire and release the GIL 
around any Python interaction that your code performs, thus serialising any 
calls into the CPython runtime. If you want to avoid this serialisation, 
use "cython -a" to find out where Python interaction happens and use static 
typing to let Cython generate C code instead.
"""

In other words: "with gil" sections hold the GIL by default and give it 
away on explicit request, whereas "nogil" sections have the GIL released by 
default and acquire it on implicit need.

The advantage over object level locking is that this does not increase the 
in-memory size of the object structs, and that it works with *any* Python 
object, not just extension types with a compile time known type.

I kind of like that.

Stefan

From markflorisson88 at gmail.com  Wed Oct 26 11:45:06 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Wed, 26 Oct 2011 10:45:06 +0100
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <4EA7BD33.60009@behnel.de>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>
	<4EA66648.8030102@behnel.de> <4EA6B999.7060807@astro.uio.no>
	<4EA737C7.5010500@canterbury.ac.nz> <4EA7BD33.60009@behnel.de>
Message-ID: <CANg26EWBWWMDCC0eN6aUXGYn_uKyFD68dz6EF6pCvjNHLeimWA@mail.gmail.com>

On 26 October 2011 08:56, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Greg Ewing, 26.10.2011 00:27:
>>
>> Dag Sverre Seljebotn wrote:
>>
>>> I'd gladly take a factor two (or even four) slowdown of CPython code any
>>> day to get rid of the GIL :-). The thing is, sometimes one has 48 cores
>>> and consider a 10x speedup better than nothing...
>>
>> Another thing to consider is that locking around refcount
>> changes may not be as expensive in typical Cython code as
>> it is in Python.
>>
>> The trouble with Python is that you can't so much as scratch
>> your nose without touching a big pile of ref counts. But
>> if the Cython code is only dealing with a few Python objects
>> and doing most of its work at the C level, the relative
>> overhead of locking around refcount changes may not be
>> significant.
>>
>> So it may be worth trying the strategy of just acquiring
>> the GIL whenever a refcount needs to be changed in a nogil
>> section, and damn the consequences.
>
> Hmm, interesting. That would give new semantics to "nogil" sections,
> basically:
>
> """
> You can do Python interaction in nogil code, however, this will slow down
> your code. Cython will generate C code to acquire and release the GIL around
> any Python interaction that your code performs, thus serialising any calls
> into the CPython runtime. If you want to avoid this serialisation, use
> "cython -a" to find out where Python interaction happens and use static
> typing to let Cython generate C code instead.
> """
>
> In other words: "with gil" sections hold the GIL by default and give it away
> on explicit request, whereas "nogil" sections have the GIL released by
> default and acquire it on implicit need.
>
> The advantage over object level locking is that this does not increase the
> in-memory size of the object structs, and that it works with *any* Python
> object, not just extension types with a compile time known type.
>
> I kind of like that.

My problem with that is that if there if any other python thread,
you're likely just going to sleep for thousands of CPU cycles as that
thread will keep the GIL. Doing this implicitly for operations with
such overhead would be unacceptable. I think writing 'with gil:' is
fine, it's the performance that's the problem in the first place which
prevents you from doing that, not the 9 characters you need to type.

What I would like is having Cython infer whether the GIL is needed for
a function, and mark it "implicitly nogil", so it can be called from
nogil contexts without actually having to declare it nogil. This would
only work for non-extern things, and you would still need to declare
it nogil in your pxd if you want to export it. Apparently many users
(even those that have used Cython quite a bit) are confused with what
nogil on functions actually does (or they are not even aware it
exists).

> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From d.s.seljebotn at astro.uio.no  Wed Oct 26 12:23:15 2011
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Wed, 26 Oct 2011 12:23:15 +0200
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <CANg26EWBWWMDCC0eN6aUXGYn_uKyFD68dz6EF6pCvjNHLeimWA@mail.gmail.com>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>	<4EA66648.8030102@behnel.de>
	<4EA6B999.7060807@astro.uio.no>	<4EA737C7.5010500@canterbury.ac.nz>
	<4EA7BD33.60009@behnel.de>
	<CANg26EWBWWMDCC0eN6aUXGYn_uKyFD68dz6EF6pCvjNHLeimWA@mail.gmail.com>
Message-ID: <4EA7DF93.6090700@astro.uio.no>

On 10/26/2011 11:45 AM, mark florisson wrote:
> On 26 October 2011 08:56, Stefan Behnel<stefan_ml at behnel.de>  wrote:
>> Greg Ewing, 26.10.2011 00:27:
>>>
>>> Dag Sverre Seljebotn wrote:
>>>
>>>> I'd gladly take a factor two (or even four) slowdown of CPython code any
>>>> day to get rid of the GIL :-). The thing is, sometimes one has 48 cores
>>>> and consider a 10x speedup better than nothing...
>>>
>>> Another thing to consider is that locking around refcount
>>> changes may not be as expensive in typical Cython code as
>>> it is in Python.
>>>
>>> The trouble with Python is that you can't so much as scratch
>>> your nose without touching a big pile of ref counts. But
>>> if the Cython code is only dealing with a few Python objects
>>> and doing most of its work at the C level, the relative
>>> overhead of locking around refcount changes may not be
>>> significant.
>>>
>>> So it may be worth trying the strategy of just acquiring
>>> the GIL whenever a refcount needs to be changed in a nogil
>>> section, and damn the consequences.
>>
>> Hmm, interesting. That would give new semantics to "nogil" sections,
>> basically:
>>
>> """
>> You can do Python interaction in nogil code, however, this will slow down
>> your code. Cython will generate C code to acquire and release the GIL around
>> any Python interaction that your code performs, thus serialising any calls
>> into the CPython runtime. If you want to avoid this serialisation, use
>> "cython -a" to find out where Python interaction happens and use static
>> typing to let Cython generate C code instead.
>> """
>>
>> In other words: "with gil" sections hold the GIL by default and give it away
>> on explicit request, whereas "nogil" sections have the GIL released by
>> default and acquire it on implicit need.
>>
>> The advantage over object level locking is that this does not increase the
>> in-memory size of the object structs, and that it works with *any* Python
>> object, not just extension types with a compile time known type.
>>
>> I kind of like that.
>
> My problem with that is that if there if any other python thread,
> you're likely just going to sleep for thousands of CPU cycles as that
> thread will keep the GIL. Doing this implicitly for operations with
> such overhead would be unacceptable. I think writing 'with gil:' is
> fine, it's the performance that's the problem in the first place which
> prevents you from doing that, not the 9 characters you need to type.

You are sure about the complete impossibility of having a seperate 
thread doing all INCREFs and DECREFs posted to it asynchronously (in the 
order they are posted), without race conditions?

>
> What I would like is having Cython infer whether the GIL is needed for
> a function, and mark it "implicitly nogil", so it can be called from
> nogil contexts without actually having to declare it nogil. This would
> only work for non-extern things, and you would still need to declare
> it nogil in your pxd if you want to export it. Apparently many users
> (even those that have used Cython quite a bit) are confused with what
> nogil on functions actually does (or they are not even aware it
> exists).

There's a long thread by me and Robert (and some of Stefan) on this from 
a couple of months back, don't know if you read it. You could support 
exports across pxds as well. Basically for *every* cdef function, export 
two function pointers:

  1) To a wrapper to be called if you hold the GIL (outside nogil sections)

  2) To a wrapper to be called if you don't hold the GIL, or don't know 
whether you hold the GIL (the wrapper can acquire the GIL if needed)

Taking the address of a function (for passing to C, e.g.) would give you 
the one that can be called without holding the GIL.

The implications should hopefully be getting rid of "with gil" and 
"nogil" on function declarations entirely.

Dag Sverre

From d.s.seljebotn at astro.uio.no  Wed Oct 26 12:29:18 2011
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Wed, 26 Oct 2011 12:29:18 +0200
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <CANg26EWBWWMDCC0eN6aUXGYn_uKyFD68dz6EF6pCvjNHLeimWA@mail.gmail.com>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>	<4EA66648.8030102@behnel.de>
	<4EA6B999.7060807@astro.uio.no>	<4EA737C7.5010500@canterbury.ac.nz>
	<4EA7BD33.60009@behnel.de>
	<CANg26EWBWWMDCC0eN6aUXGYn_uKyFD68dz6EF6pCvjNHLeimWA@mail.gmail.com>
Message-ID: <4EA7E0FE.1020605@astro.uio.no>

On 10/26/2011 11:45 AM, mark florisson wrote:
> On 26 October 2011 08:56, Stefan Behnel<stefan_ml at behnel.de>  wrote:
>> Greg Ewing, 26.10.2011 00:27:
>>>
>>> Dag Sverre Seljebotn wrote:
>>>
>>>> I'd gladly take a factor two (or even four) slowdown of CPython code any
>>>> day to get rid of the GIL :-). The thing is, sometimes one has 48 cores
>>>> and consider a 10x speedup better than nothing...
>>>
>>> Another thing to consider is that locking around refcount
>>> changes may not be as expensive in typical Cython code as
>>> it is in Python.
>>>
>>> The trouble with Python is that you can't so much as scratch
>>> your nose without touching a big pile of ref counts. But
>>> if the Cython code is only dealing with a few Python objects
>>> and doing most of its work at the C level, the relative
>>> overhead of locking around refcount changes may not be
>>> significant.
>>>
>>> So it may be worth trying the strategy of just acquiring
>>> the GIL whenever a refcount needs to be changed in a nogil
>>> section, and damn the consequences.
>>
>> Hmm, interesting. That would give new semantics to "nogil" sections,
>> basically:
>>
>> """
>> You can do Python interaction in nogil code, however, this will slow down
>> your code. Cython will generate C code to acquire and release the GIL around
>> any Python interaction that your code performs, thus serialising any calls
>> into the CPython runtime. If you want to avoid this serialisation, use
>> "cython -a" to find out where Python interaction happens and use static
>> typing to let Cython generate C code instead.
>> """
>>
>> In other words: "with gil" sections hold the GIL by default and give it away
>> on explicit request, whereas "nogil" sections have the GIL released by
>> default and acquire it on implicit need.
>>
>> The advantage over object level locking is that this does not increase the
>> in-memory size of the object structs, and that it works with *any* Python
>> object, not just extension types with a compile time known type.
>>
>> I kind of like that.
>
> My problem with that is that if there if any other python thread,
> you're likely just going to sleep for thousands of CPU cycles as that
> thread will keep the GIL. Doing this implicitly for operations with
> such overhead would be unacceptable. I think writing 'with gil:' is
> fine, it's the performance that's the problem in the first place which
> prevents you from doing that, not the 9 characters you need to type.

I'm with Stefan here. We have more or less the exact same problem if you 
inadvertendly do arithmetic with Python floats rather than C doubles. 
The workflow then is to check the HTML for yellow lines. Same with the 
GIL (we could even introduce a new color in the HTML report for where 
you hold the GIL and not).

The advice to get fast code is

But, we should also introduce directives that emit warnings in both of 
these situations, that you can use while developing to quickly pinpoint 
source code lines ("Type of variable not inferred", "GIL automatically 
acquired").

DS

>
> What I would like is having Cython infer whether the GIL is needed for
> a function, and mark it "implicitly nogil", so it can be called from
> nogil contexts without actually having to declare it nogil. This would
> only work for non-extern things, and you would still need to declare
> it nogil in your pxd if you want to export it. Apparently many users
> (even those that have used Cython quite a bit) are confused with what
> nogil on functions actually does (or they are not even aware it
> exists).
>
>> Stefan
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel


From d.s.seljebotn at astro.uio.no  Wed Oct 26 12:30:11 2011
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Wed, 26 Oct 2011 12:30:11 +0200
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <4EA7E0FE.1020605@astro.uio.no>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>	<4EA66648.8030102@behnel.de>
	<4EA6B999.7060807@astro.uio.no>	<4EA737C7.5010500@canterbury.ac.nz>
	<4EA7BD33.60009@behnel.de>
	<CANg26EWBWWMDCC0eN6aUXGYn_uKyFD68dz6EF6pCvjNHLeimWA@mail.gmail.com>
	<4EA7E0FE.1020605@astro.uio.no>
Message-ID: <4EA7E133.3000806@astro.uio.no>

On 10/26/2011 12:29 PM, Dag Sverre Seljebotn wrote:
> On 10/26/2011 11:45 AM, mark florisson wrote:
>> On 26 October 2011 08:56, Stefan Behnel<stefan_ml at behnel.de> wrote:
>>> Greg Ewing, 26.10.2011 00:27:
>>>>
>>>> Dag Sverre Seljebotn wrote:
>>>>
>>>>> I'd gladly take a factor two (or even four) slowdown of CPython
>>>>> code any
>>>>> day to get rid of the GIL :-). The thing is, sometimes one has 48
>>>>> cores
>>>>> and consider a 10x speedup better than nothing...
>>>>
>>>> Another thing to consider is that locking around refcount
>>>> changes may not be as expensive in typical Cython code as
>>>> it is in Python.
>>>>
>>>> The trouble with Python is that you can't so much as scratch
>>>> your nose without touching a big pile of ref counts. But
>>>> if the Cython code is only dealing with a few Python objects
>>>> and doing most of its work at the C level, the relative
>>>> overhead of locking around refcount changes may not be
>>>> significant.
>>>>
>>>> So it may be worth trying the strategy of just acquiring
>>>> the GIL whenever a refcount needs to be changed in a nogil
>>>> section, and damn the consequences.
>>>
>>> Hmm, interesting. That would give new semantics to "nogil" sections,
>>> basically:
>>>
>>> """
>>> You can do Python interaction in nogil code, however, this will slow
>>> down
>>> your code. Cython will generate C code to acquire and release the GIL
>>> around
>>> any Python interaction that your code performs, thus serialising any
>>> calls
>>> into the CPython runtime. If you want to avoid this serialisation, use
>>> "cython -a" to find out where Python interaction happens and use static
>>> typing to let Cython generate C code instead.
>>> """
>>>
>>> In other words: "with gil" sections hold the GIL by default and give
>>> it away
>>> on explicit request, whereas "nogil" sections have the GIL released by
>>> default and acquire it on implicit need.
>>>
>>> The advantage over object level locking is that this does not
>>> increase the
>>> in-memory size of the object structs, and that it works with *any*
>>> Python
>>> object, not just extension types with a compile time known type.
>>>
>>> I kind of like that.
>>
>> My problem with that is that if there if any other python thread,
>> you're likely just going to sleep for thousands of CPU cycles as that
>> thread will keep the GIL. Doing this implicitly for operations with
>> such overhead would be unacceptable. I think writing 'with gil:' is
>> fine, it's the performance that's the problem in the first place which
>> prevents you from doing that, not the 9 characters you need to type.
>
> I'm with Stefan here. We have more or less the exact same problem if you
> inadvertendly do arithmetic with Python floats rather than C doubles.
> The workflow then is to check the HTML for yellow lines. Same with the
> GIL (we could even introduce a new color in the HTML report for where
> you hold the GIL and not).
>
> The advice to get fast code is

Sorry, I keep hitting post to early... "The advice to get fast code is 
still to 'eliminate the yellow lines'".

DS

>
> But, we should also introduce directives that emit warnings in both of
> these situations, that you can use while developing to quickly pinpoint
> source code lines ("Type of variable not inferred", "GIL automatically
> acquired").
>
> DS
>
>>
>> What I would like is having Cython infer whether the GIL is needed for
>> a function, and mark it "implicitly nogil", so it can be called from
>> nogil contexts without actually having to declare it nogil. This would
>> only work for non-extern things, and you would still need to declare
>> it nogil in your pxd if you want to export it. Apparently many users
>> (even those that have used Cython quite a bit) are confused with what
>> nogil on functions actually does (or they are not even aware it
>> exists).
>>
>>> Stefan
>>> _______________________________________________
>>> cython-devel mailing list
>>> cython-devel at python.org
>>> http://mail.python.org/mailman/listinfo/cython-devel
>>>
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>


From markflorisson88 at gmail.com  Wed Oct 26 19:23:48 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Wed, 26 Oct 2011 18:23:48 +0100
Subject: [Cython] Acquisition counted cdef classes
In-Reply-To: <4EA7DF93.6090700@astro.uio.no>
References: <CANg26EWF+u+Qf+QmTjW-UASdDZ2=08+q=OTVGTSW0Fd3azWdXA@mail.gmail.com>
	<4EA66648.8030102@behnel.de> <4EA6B999.7060807@astro.uio.no>
	<4EA737C7.5010500@canterbury.ac.nz> <4EA7BD33.60009@behnel.de>
	<CANg26EWBWWMDCC0eN6aUXGYn_uKyFD68dz6EF6pCvjNHLeimWA@mail.gmail.com>
	<4EA7DF93.6090700@astro.uio.no>
Message-ID: <CANg26EVtXp3hzbq+sX5sOYAUbEw9==dmZMkRA37biAS_aBNV2Q@mail.gmail.com>

On 26 October 2011 11:23, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 10/26/2011 11:45 AM, mark florisson wrote:
>>
>> On 26 October 2011 08:56, Stefan Behnel<stefan_ml at behnel.de> ?wrote:
>>>
>>> Greg Ewing, 26.10.2011 00:27:
>>>>
>>>> Dag Sverre Seljebotn wrote:
>>>>
>>>>> I'd gladly take a factor two (or even four) slowdown of CPython code
>>>>> any
>>>>> day to get rid of the GIL :-). The thing is, sometimes one has 48 cores
>>>>> and consider a 10x speedup better than nothing...
>>>>
>>>> Another thing to consider is that locking around refcount
>>>> changes may not be as expensive in typical Cython code as
>>>> it is in Python.
>>>>
>>>> The trouble with Python is that you can't so much as scratch
>>>> your nose without touching a big pile of ref counts. But
>>>> if the Cython code is only dealing with a few Python objects
>>>> and doing most of its work at the C level, the relative
>>>> overhead of locking around refcount changes may not be
>>>> significant.
>>>>
>>>> So it may be worth trying the strategy of just acquiring
>>>> the GIL whenever a refcount needs to be changed in a nogil
>>>> section, and damn the consequences.
>>>
>>> Hmm, interesting. That would give new semantics to "nogil" sections,
>>> basically:
>>>
>>> """
>>> You can do Python interaction in nogil code, however, this will slow down
>>> your code. Cython will generate C code to acquire and release the GIL
>>> around
>>> any Python interaction that your code performs, thus serialising any
>>> calls
>>> into the CPython runtime. If you want to avoid this serialisation, use
>>> "cython -a" to find out where Python interaction happens and use static
>>> typing to let Cython generate C code instead.
>>> """
>>>
>>> In other words: "with gil" sections hold the GIL by default and give it
>>> away
>>> on explicit request, whereas "nogil" sections have the GIL released by
>>> default and acquire it on implicit need.
>>>
>>> The advantage over object level locking is that this does not increase
>>> the
>>> in-memory size of the object structs, and that it works with *any* Python
>>> object, not just extension types with a compile time known type.
>>>
>>> I kind of like that.
>>
>> My problem with that is that if there if any other python thread,
>> you're likely just going to sleep for thousands of CPU cycles as that
>> thread will keep the GIL. Doing this implicitly for operations with
>> such overhead would be unacceptable. I think writing 'with gil:' is
>> fine, it's the performance that's the problem in the first place which
>> prevents you from doing that, not the 9 characters you need to type.
>
> You are sure about the complete impossibility of having a seperate thread
> doing all INCREFs and DECREFs posted to it asynchronously (in the order they
> are posted), without race conditions?

No I think it is possible, but I don't believe it will solve the
DECREF C compiler optimization prevention problem (unlikely() should
help there though) as it will still have to submit an asynchronous
DECREF without races which means it has to call some kind of
(synchronized or atomically operating) function (which prevented the
optimization). It would be nice to have as it would mean you can pass
stuff around in nogil mode without acquisition counting, and it would
mean you can implement these types that can be used in nogil mode and
can synchronize using their own lock (if needed).

I wonder if deferring INCREFs are safe though. What if you have one
reference, you INCREF (deferred, because you don't have the GIL), you
call some function that steals your reference (after you obtained the
GIL), you somehow cause the program to lose the stolen reference which
causes it to be collected, and then the reference counter thread
decides to do the INCREF (too late). You also cannot atomically
INCREF, and Python doesn't do that, so there could be a race there as
well. So I think you really need the GIL to INCREF, and you need to do
it synchronously (I'm not completely sure, please feel free to poke
holes in my logic any time :).

I think it would be nicer to just fix this in CPython in any case,
though. Reference counting is terrible to work with in general
(regardless of whether you do them immediately or defer them), and
it's part of the reason why we have a GIL (although really not the
only one). As long as CPython does reference counting, removing the
GIL is an absolute no-go (although I wonder how many architectures
don't support atomic reference counting).

Refcounting has upsides too, though. One is more deterministic
collection of objects and destructor calling. Of course this argument
becomes moot if you have a reference cycle somewhere.

Has anyone ever attempted to implement a garbage collector for
CPython? Or did everyone who wanted this feature move to PyPy?

>>
>> What I would like is having Cython infer whether the GIL is needed for
>> a function, and mark it "implicitly nogil", so it can be called from
>> nogil contexts without actually having to declare it nogil. This would
>> only work for non-extern things, and you would still need to declare
>> it nogil in your pxd if you want to export it. Apparently many users
>> (even those that have used Cython quite a bit) are confused with what
>> nogil on functions actually does (or they are not even aware it
>> exists).
>
> There's a long thread by me and Robert (and some of Stefan) on this from a
> couple of months back, don't know if you read it. You could support exports
> across pxds as well. Basically for *every* cdef function, export two
> function pointers:
>
> ?1) To a wrapper to be called if you hold the GIL (outside nogil sections)
>
> ?2) To a wrapper to be called if you don't hold the GIL, or don't know
> whether you hold the GIL (the wrapper can acquire the GIL if needed)
>
> Taking the address of a function (for passing to C, e.g.) would give you the
> one that can be called without holding the GIL.
>
> The implications should hopefully be getting rid of "with gil" and "nogil"
> on function declarations entirely.

Oh, this was about functions. I agree that for functions that would be
neat. For inlined code in functions I don't like it very much,
although (unconditional) warnings help a lot in that regard.

However, this would mean that e.g. adding a print statement to your
function makes it acquire the GIL for nogil contexts, and since it
doesn't automatically release it again it may just call another
function that was really supposed to operate without the GIL (because
it's going to/may take a long time). Overall making all this
transparent to the user would be great, people care about their code,
not about how CPython is implemented.

> Dag Sverre
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From robertwb at math.washington.edu  Fri Oct 28 22:55:19 2011
From: robertwb at math.washington.edu (Robert Bradshaw)
Date: Fri, 28 Oct 2011 13:55:19 -0700
Subject: [Cython] Cython 0.16
Message-ID: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>

With Mark's fused types and memory views going in, I think it's about
time for a new release. Thoughts? Anyone want to volunteer to take up
the process?

- Robert

From markflorisson88 at gmail.com  Fri Oct 28 22:59:43 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Fri, 28 Oct 2011 21:59:43 +0100
Subject: [Cython] Cython 0.16
In-Reply-To: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>
References: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>
Message-ID: <CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>

On 28 October 2011 21:55, Robert Bradshaw <robertwb at math.washington.edu> wrote:
> With Mark's fused types and memory views going in, I think it's about
> time for a new release. Thoughts? Anyone want to volunteer to take up
> the process?
>
> - Robert
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

That'd be cool. However there are a few outstanding issues:
    a) the compiler is somewhat slower (possible solution: lazy utility codes)
    b) there's a potential memory leak problem for memoryviews with
object dtype that contain themselves, this still needs investigation.

As for a), Stefan mentioned code spending a lot of time in sub.
Stefan, could you post the code for this that made Cython compile very
slowly?

From robertwb at math.washington.edu  Sat Oct 29 00:37:14 2011
From: robertwb at math.washington.edu (Robert Bradshaw)
Date: Fri, 28 Oct 2011 15:37:14 -0700
Subject: [Cython] Cython 0.16
In-Reply-To: <CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>
References: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>
	<CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>
Message-ID: <CADiQ+QB1XyX5Lot7XdRLBboL1vT++z7UEJE49qRK2w0ZfNDoWQ@mail.gmail.com>

On Fri, Oct 28, 2011 at 1:59 PM, mark florisson
<markflorisson88 at gmail.com> wrote:
> On 28 October 2011 21:55, Robert Bradshaw <robertwb at math.washington.edu> wrote:
>> With Mark's fused types and memory views going in, I think it's about
>> time for a new release. Thoughts? Anyone want to volunteer to take up
>> the process?
>>
>> - Robert
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>
> That'd be cool. However there are a few outstanding issues:
> ? ?a) the compiler is somewhat slower (possible solution: lazy utility codes)

Yeah, I forgot about that. This should get resolved. Lazy utility
codes (perhaps breaking them up) would probably got us most of the way
there. Long term, I really like the "declaration caching" idea which
could be used for users .pxd files as well as internally.

> ? ?b) there's a potential memory leak problem for memoryviews with
> object dtype that contain themselves, this still needs investigation.

I think this could be mentioned as a caviat rather than being a blocker.

> As for a), Stefan mentioned code spending a lot of time in sub.
> Stefan, could you post the code for this that made Cython compile very
> slowly?
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From d.s.seljebotn at astro.uio.no  Sat Oct 29 11:30:43 2011
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Sat, 29 Oct 2011 11:30:43 +0200
Subject: [Cython] Cython 0.16
In-Reply-To: <CADiQ+QB1XyX5Lot7XdRLBboL1vT++z7UEJE49qRK2w0ZfNDoWQ@mail.gmail.com>
References: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>
	<CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>
	<CADiQ+QB1XyX5Lot7XdRLBboL1vT++z7UEJE49qRK2w0ZfNDoWQ@mail.gmail.com>
Message-ID: <f3e0ce65-9257-47a9-aedd-68c6a87a956d@email.android.com>

Re b), it would be better to disable object dtypes (or emit a warning about the possible bug when using them) than to delay the release. Object memoryviews are rare in the first place, and those who contain themselves should be very rare.
-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Robert Bradshaw <robertwb at math.washington.edu> wrote:

On Fri, Oct 28, 2011 at 1:59 PM, mark florisson <markflorisson88 at gmail.com> wrote: > On 28 October 2011 21:55, Robert Bradshaw <robertwb at math.washington.edu> wrote: >> With Mark's fused types and memory views going in, I think it's about >> time for a new release. Thoughts? Anyone want to volunteer to take up >> the process? >> >> - Robert >>_____________________________________________
>> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> > > That'd be cool. However there are a few outstanding issues: >    a) the compiler is somewhat slower (possible solution: lazy utility codes) Yeah, I forgot about that. This should get resolved. Lazy utility codes (perhaps breaking them up) would probably got us most of the way there. Long term, I really like the "declaration caching" idea which could be used for users .pxd files as well as internally. >    b) there's a potential memory leak problem for memoryviews with > object dtype that contain themselves, this still needs investigation. I think this could be mentioned as a caviat rather than being a blocker. > As for a), Stefan mentioned code spending a lot of time in sub. > Stefan, could you post the code for this that made Cython compile very > slowly? >_____________________________________________
> cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel >_____________________________________________
cython-devel mailing list cython-devel at python.org http://mail.python.org/mailman/listinfo/cython-devel 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20111029/7ffece04/attachment.html>

From markflorisson88 at gmail.com  Sat Oct 29 13:41:34 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sat, 29 Oct 2011 12:41:34 +0100
Subject: [Cython] Cython 0.16
In-Reply-To: <f3e0ce65-9257-47a9-aedd-68c6a87a956d@email.android.com>
References: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>
	<CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>
	<CADiQ+QB1XyX5Lot7XdRLBboL1vT++z7UEJE49qRK2w0ZfNDoWQ@mail.gmail.com>
	<f3e0ce65-9257-47a9-aedd-68c6a87a956d@email.android.com>
Message-ID: <CANg26EXDji3grAB8xuvzdnEvSm3JNhuFk=Rz4aabuWi72gt8-w@mail.gmail.com>

Hm ok I'll disable them then. Pointers and some other dtypes are also
not supported yet. As for the documentation, have you guys reviewed
the documentation for fused types and memoryviews? For instance this
is the introduction for memoryviews:

"
Typed memoryviews can be used for efficient access to buffers. It is
similar to the current buffer support, but has more features and
cleaner syntax. A memoryview can be used in any context (function
parameters, module-level, cdef class attribute, etc) and can be
obtained from any object that exposes the PEP 3118 buffer interface.
"

but I'm not sure this new functionality won't confuse users of the old
buffer support.

For fused types, cython.numeric only includes long, double and double
complex. I think that should be changed to short, int, long, float,
double, float complex and double complex. I was deliberately avoiding
long long and long double as they (if not used as a base type) would
be preferred over the others and may be a lot slower. But then, such
usage wouldn't be very useful. Should I include them then?

On 29 October 2011 10:30, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> Re b), it would be better to disable object dtypes (or emit a warning about
> the possible bug when using them) than to delay the release. Object
> memoryviews are rare in the first place, and those who contain themselves
> should be very rare.
> --
> Sent from my Android phone with K-9 Mail. Please excuse my brevity.
>
> Robert Bradshaw <robertwb at math.washington.edu> wrote:
>>
>> On Fri, Oct 28, 2011 at 1:59 PM, mark florisson
>> <markflorisson88 at gmail.com> wrote: > On 28 October 2011 21:55, Robert
>> Bradshaw <robertwb at math.washington.edu> wrote: >> With Mark's fused types
>> and memory views going in, I think it's about >> time for a new release.
>> Thoughts? Anyone want to volunteer to take up >> the process? >> >> - Robert
>> >>
>> ________________________________
>> >> cython-devel mailing list >> cython-devel at python.org >>
>> >> http://mail.python.org/mailman/listinfo/cython-devel >> > > That'd be cool.
>> >> However there are a few outstanding issues: > ? ?a) the compiler is somewhat
>> >> slower (possible solution: lazy utility codes) Yeah, I forgot about that.
>> >> This should get resolved. Lazy utility codes (perhaps breaking them up)
>> >> would probably got us most of the way there. Long term, I really like the
>> >> "declaration caching" idea which could be used for users .pxd files as well
>> >> as internally. > ? ?b) there's a potential memory leak problem for
>> >> memoryviews with > object dtype that contain themselves, this still needs
>> >> investigation. I think this could be mentioned as a caviat rather than being
>> >> a blocker. > As for a), Stefan mentioned code spending a lot of time in sub.
>> >> > Stefan, could you post the code for this that made Cython compile very >
>> >> slowly? >
>> ________________________________
>> > cython-devel mailing list > cython-devel at python.org >
>> > http://mail.python.org/mailman/listinfo/cython-devel >
>> ________________________________
>> cython-devel mailing list cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>
>

From markflorisson88 at gmail.com  Sat Oct 29 15:14:12 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sat, 29 Oct 2011 14:14:12 +0100
Subject: [Cython] Cython 0.16
In-Reply-To: <CANg26EXDji3grAB8xuvzdnEvSm3JNhuFk=Rz4aabuWi72gt8-w@mail.gmail.com>
References: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>
	<CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>
	<CADiQ+QB1XyX5Lot7XdRLBboL1vT++z7UEJE49qRK2w0ZfNDoWQ@mail.gmail.com>
	<f3e0ce65-9257-47a9-aedd-68c6a87a956d@email.android.com>
	<CANg26EXDji3grAB8xuvzdnEvSm3JNhuFk=Rz4aabuWi72gt8-w@mail.gmail.com>
Message-ID: <CANg26EW_c88FCYMEaJFAG5QySMenLmLAJ2qsaxx-GhEFbdyvmg@mail.gmail.com>

Before we do a release, would anyone be opposed to a 'chunksize'
keyword argument to prange()? That may have significant performance
impacts.

On 29 October 2011 12:41, mark florisson <markflorisson88 at gmail.com> wrote:
> Hm ok I'll disable them then. Pointers and some other dtypes are also
> not supported yet. As for the documentation, have you guys reviewed
> the documentation for fused types and memoryviews? For instance this
> is the introduction for memoryviews:
>
> "
> Typed memoryviews can be used for efficient access to buffers. It is
> similar to the current buffer support, but has more features and
> cleaner syntax. A memoryview can be used in any context (function
> parameters, module-level, cdef class attribute, etc) and can be
> obtained from any object that exposes the PEP 3118 buffer interface.
> "
>
> but I'm not sure this new functionality won't confuse users of the old
> buffer support.
>
> For fused types, cython.numeric only includes long, double and double
> complex. I think that should be changed to short, int, long, float,
> double, float complex and double complex. I was deliberately avoiding
> long long and long double as they (if not used as a base type) would
> be preferred over the others and may be a lot slower. But then, such
> usage wouldn't be very useful. Should I include them then?
>
> On 29 October 2011 10:30, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no> wrote:
>> Re b), it would be better to disable object dtypes (or emit a warning about
>> the possible bug when using them) than to delay the release. Object
>> memoryviews are rare in the first place, and those who contain themselves
>> should be very rare.
>> --
>> Sent from my Android phone with K-9 Mail. Please excuse my brevity.
>>
>> Robert Bradshaw <robertwb at math.washington.edu> wrote:
>>>
>>> On Fri, Oct 28, 2011 at 1:59 PM, mark florisson
>>> <markflorisson88 at gmail.com> wrote: > On 28 October 2011 21:55, Robert
>>> Bradshaw <robertwb at math.washington.edu> wrote: >> With Mark's fused types
>>> and memory views going in, I think it's about >> time for a new release.
>>> Thoughts? Anyone want to volunteer to take up >> the process? >> >> - Robert
>>> >>
>>> ________________________________
>>> >> cython-devel mailing list >> cython-devel at python.org >>
>>> >> http://mail.python.org/mailman/listinfo/cython-devel >> > > That'd be cool.
>>> >> However there are a few outstanding issues: > ? ?a) the compiler is somewhat
>>> >> slower (possible solution: lazy utility codes) Yeah, I forgot about that.
>>> >> This should get resolved. Lazy utility codes (perhaps breaking them up)
>>> >> would probably got us most of the way there. Long term, I really like the
>>> >> "declaration caching" idea which could be used for users .pxd files as well
>>> >> as internally. > ? ?b) there's a potential memory leak problem for
>>> >> memoryviews with > object dtype that contain themselves, this still needs
>>> >> investigation. I think this could be mentioned as a caviat rather than being
>>> >> a blocker. > As for a), Stefan mentioned code spending a lot of time in sub.
>>> >> > Stefan, could you post the code for this that made Cython compile very >
>>> >> slowly? >
>>> ________________________________
>>> > cython-devel mailing list > cython-devel at python.org >
>>> > http://mail.python.org/mailman/listinfo/cython-devel >
>>> ________________________________
>>> cython-devel mailing list cython-devel at python.org
>>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>>
>

From faltet at gmail.com  Sat Oct 29 16:23:53 2011
From: faltet at gmail.com (Francesc Alted)
Date: Sat, 29 Oct 2011 16:23:53 +0200
Subject: [Cython] Cython 0.16
In-Reply-To: <CANg26EW_c88FCYMEaJFAG5QySMenLmLAJ2qsaxx-GhEFbdyvmg@mail.gmail.com>
References: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>
	<CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>
	<CADiQ+QB1XyX5Lot7XdRLBboL1vT++z7UEJE49qRK2w0ZfNDoWQ@mail.gmail.com>
	<f3e0ce65-9257-47a9-aedd-68c6a87a956d@email.android.com>
	<CANg26EXDji3grAB8xuvzdnEvSm3JNhuFk=Rz4aabuWi72gt8-w@mail.gmail.com>
	<CANg26EW_c88FCYMEaJFAG5QySMenLmLAJ2qsaxx-GhEFbdyvmg@mail.gmail.com>
Message-ID: <CAFrp1vooL3shR_Op9E8E8i14n9SCA-giPq-KWef07brQPTJoWg@mail.gmail.com>

On the contrary, this is an excellent idea!
El 29/10/2011 15:14, "mark florisson" <markflorisson88 at gmail.com> va
escriure:

> Before we do a release, would anyone be opposed to a 'chunksize'
> keyword argument to prange()? That may have significant performance
> impacts.
>
> On 29 October 2011 12:41, mark florisson <markflorisson88 at gmail.com>
> wrote:
> > Hm ok I'll disable them then. Pointers and some other dtypes are also
> > not supported yet. As for the documentation, have you guys reviewed
> > the documentation for fused types and memoryviews? For instance this
> > is the introduction for memoryviews:
> >
> > "
> > Typed memoryviews can be used for efficient access to buffers. It is
> > similar to the current buffer support, but has more features and
> > cleaner syntax. A memoryview can be used in any context (function
> > parameters, module-level, cdef class attribute, etc) and can be
> > obtained from any object that exposes the PEP 3118 buffer interface.
> > "
> >
> > but I'm not sure this new functionality won't confuse users of the old
> > buffer support.
> >
> > For fused types, cython.numeric only includes long, double and double
> > complex. I think that should be changed to short, int, long, float,
> > double, float complex and double complex. I was deliberately avoiding
> > long long and long double as they (if not used as a base type) would
> > be preferred over the others and may be a lot slower. But then, such
> > usage wouldn't be very useful. Should I include them then?
> >
> > On 29 October 2011 10:30, Dag Sverre Seljebotn
> > <d.s.seljebotn at astro.uio.no> wrote:
> >> Re b), it would be better to disable object dtypes (or emit a warning
> about
> >> the possible bug when using them) than to delay the release. Object
> >> memoryviews are rare in the first place, and those who contain
> themselves
> >> should be very rare.
> >> --
> >> Sent from my Android phone with K-9 Mail. Please excuse my brevity.
> >>
> >> Robert Bradshaw <robertwb at math.washington.edu> wrote:
> >>>
> >>> On Fri, Oct 28, 2011 at 1:59 PM, mark florisson
> >>> <markflorisson88 at gmail.com> wrote: > On 28 October 2011 21:55, Robert
> >>> Bradshaw <robertwb at math.washington.edu> wrote: >> With Mark's fused
> types
> >>> and memory views going in, I think it's about >> time for a new
> release.
> >>> Thoughts? Anyone want to volunteer to take up >> the process? >> >> -
> Robert
> >>> >>
> >>> ________________________________
> >>> >> cython-devel mailing list >> cython-devel at python.org >>
> >>> >> http://mail.python.org/mailman/listinfo/cython-devel >> > > That'd
> be cool.
> >>> >> However there are a few outstanding issues: >    a) the compiler is
> somewhat
> >>> >> slower (possible solution: lazy utility codes) Yeah, I forgot about
> that.
> >>> >> This should get resolved. Lazy utility codes (perhaps breaking them
> up)
> >>> >> would probably got us most of the way there. Long term, I really
> like the
> >>> >> "declaration caching" idea which could be used for users .pxd files
> as well
> >>> >> as internally. >    b) there's a potential memory leak problem for
> >>> >> memoryviews with > object dtype that contain themselves, this still
> needs
> >>> >> investigation. I think this could be mentioned as a caviat rather
> than being
> >>> >> a blocker. > As for a), Stefan mentioned code spending a lot of
> time in sub.
> >>> >> > Stefan, could you post the code for this that made Cython compile
> very >
> >>> >> slowly? >
> >>> ________________________________
> >>> > cython-devel mailing list > cython-devel at python.org >
> >>> > http://mail.python.org/mailman/listinfo/cython-devel >
> >>> ________________________________
> >>> cython-devel mailing list cython-devel at python.org
> >>> http://mail.python.org/mailman/listinfo/cython-devel
> >>
> >> _______________________________________________
> >> cython-devel mailing list
> >> cython-devel at python.org
> >> http://mail.python.org/mailman/listinfo/cython-devel
> >>
> >>
> >
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20111029/6b76866c/attachment-0001.html>

From markflorisson88 at gmail.com  Sat Oct 29 16:37:17 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sat, 29 Oct 2011 15:37:17 +0100
Subject: [Cython] Cython 0.16
In-Reply-To: <CAFrp1vooL3shR_Op9E8E8i14n9SCA-giPq-KWef07brQPTJoWg@mail.gmail.com>
References: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>
	<CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>
	<CADiQ+QB1XyX5Lot7XdRLBboL1vT++z7UEJE49qRK2w0ZfNDoWQ@mail.gmail.com>
	<f3e0ce65-9257-47a9-aedd-68c6a87a956d@email.android.com>
	<CANg26EXDji3grAB8xuvzdnEvSm3JNhuFk=Rz4aabuWi72gt8-w@mail.gmail.com>
	<CANg26EW_c88FCYMEaJFAG5QySMenLmLAJ2qsaxx-GhEFbdyvmg@mail.gmail.com>
	<CAFrp1vooL3shR_Op9E8E8i14n9SCA-giPq-KWef07brQPTJoWg@mail.gmail.com>
Message-ID: <CANg26EXdxW_oyZ3MCfVU190mT-tb2Q7d9f4x+DYrjOxxKpxEhA@mail.gmail.com>

Heh, that's a +1 :)

This makes me wonder, should we organize soms polls to have users vote
on what functionality they would like to see in Cython? Some users may
read the cython-dev mailing list, but many might also not. E.g.
provide a poll where we list some things that we would like to see,
and an option with a form that allows them to fill in something else
entirely.

Maybe we could do that on cython.org to allow anonymous votes, not
everyone may be interested in discussion, just in voting.

On 29 October 2011 15:23, Francesc Alted <faltet at gmail.com> wrote:
> On the contrary, this is an excellent idea!
>
> El 29/10/2011 15:14, "mark florisson" <markflorisson88 at gmail.com> va
> escriure:
>>
>> Before we do a release, would anyone be opposed to a 'chunksize'
>> keyword argument to prange()? That may have significant performance
>> impacts.
>>
>> On 29 October 2011 12:41, mark florisson <markflorisson88 at gmail.com>
>> wrote:
>> > Hm ok I'll disable them then. Pointers and some other dtypes are also
>> > not supported yet. As for the documentation, have you guys reviewed
>> > the documentation for fused types and memoryviews? For instance this
>> > is the introduction for memoryviews:
>> >
>> > "
>> > Typed memoryviews can be used for efficient access to buffers. It is
>> > similar to the current buffer support, but has more features and
>> > cleaner syntax. A memoryview can be used in any context (function
>> > parameters, module-level, cdef class attribute, etc) and can be
>> > obtained from any object that exposes the PEP 3118 buffer interface.
>> > "
>> >
>> > but I'm not sure this new functionality won't confuse users of the old
>> > buffer support.
>> >
>> > For fused types, cython.numeric only includes long, double and double
>> > complex. I think that should be changed to short, int, long, float,
>> > double, float complex and double complex. I was deliberately avoiding
>> > long long and long double as they (if not used as a base type) would
>> > be preferred over the others and may be a lot slower. But then, such
>> > usage wouldn't be very useful. Should I include them then?
>> >
>> > On 29 October 2011 10:30, Dag Sverre Seljebotn
>> > <d.s.seljebotn at astro.uio.no> wrote:
>> >> Re b), it would be better to disable object dtypes (or emit a warning
>> >> about
>> >> the possible bug when using them) than to delay the release. Object
>> >> memoryviews are rare in the first place, and those who contain
>> >> themselves
>> >> should be very rare.
>> >> --
>> >> Sent from my Android phone with K-9 Mail. Please excuse my brevity.
>> >>
>> >> Robert Bradshaw <robertwb at math.washington.edu> wrote:
>> >>>
>> >>> On Fri, Oct 28, 2011 at 1:59 PM, mark florisson
>> >>> <markflorisson88 at gmail.com> wrote: > On 28 October 2011 21:55, Robert
>> >>> Bradshaw <robertwb at math.washington.edu> wrote: >> With Mark's fused
>> >>> types
>> >>> and memory views going in, I think it's about >> time for a new
>> >>> release.
>> >>> Thoughts? Anyone want to volunteer to take up >> the process? >> >> -
>> >>> Robert
>> >>> >>
>> >>> ________________________________
>> >>> >> cython-devel mailing list >> cython-devel at python.org >>
>> >>> >> http://mail.python.org/mailman/listinfo/cython-devel >> > > That'd
>> >>> >> be cool.
>> >>> >> However there are a few outstanding issues: > ? ?a) the compiler is
>> >>> >> somewhat
>> >>> >> slower (possible solution: lazy utility codes) Yeah, I forgot about
>> >>> >> that.
>> >>> >> This should get resolved. Lazy utility codes (perhaps breaking them
>> >>> >> up)
>> >>> >> would probably got us most of the way there. Long term, I really
>> >>> >> like the
>> >>> >> "declaration caching" idea which could be used for users .pxd files
>> >>> >> as well
>> >>> >> as internally. > ? ?b) there's a potential memory leak problem for
>> >>> >> memoryviews with > object dtype that contain themselves, this still
>> >>> >> needs
>> >>> >> investigation. I think this could be mentioned as a caviat rather
>> >>> >> than being
>> >>> >> a blocker. > As for a), Stefan mentioned code spending a lot of
>> >>> >> time in sub.
>> >>> >> > Stefan, could you post the code for this that made Cython compile
>> >>> >> > very >
>> >>> >> slowly? >
>> >>> ________________________________
>> >>> > cython-devel mailing list > cython-devel at python.org >
>> >>> > http://mail.python.org/mailman/listinfo/cython-devel >
>> >>> ________________________________
>> >>> cython-devel mailing list cython-devel at python.org
>> >>> http://mail.python.org/mailman/listinfo/cython-devel
>> >>
>> >> _______________________________________________
>> >> cython-devel mailing list
>> >> cython-devel at python.org
>> >> http://mail.python.org/mailman/listinfo/cython-devel
>> >>
>> >>
>> >
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>
>

From njs at pobox.com  Sat Oct 29 16:50:59 2011
From: njs at pobox.com (Nathaniel Smith)
Date: Sat, 29 Oct 2011 07:50:59 -0700
Subject: [Cython] Cython 0.16
In-Reply-To: <CANg26EXDji3grAB8xuvzdnEvSm3JNhuFk=Rz4aabuWi72gt8-w@mail.gmail.com>
References: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>
	<CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>
	<CADiQ+QB1XyX5Lot7XdRLBboL1vT++z7UEJE49qRK2w0ZfNDoWQ@mail.gmail.com>
	<f3e0ce65-9257-47a9-aedd-68c6a87a956d@email.android.com>
	<CANg26EXDji3grAB8xuvzdnEvSm3JNhuFk=Rz4aabuWi72gt8-w@mail.gmail.com>
Message-ID: <CAPJVwBnZgq_JUj8KRZzg11CNe9LUodoYN_kW7UV_YM1nmWjPGA@mail.gmail.com>

On Oct 29, 2011 4:41 AM, "mark florisson" <markflorisson88 at gmail.com> wrote:
> "
> Typed memoryviews can be used for efficient access to buffers. It is
> similar to the current buffer support, but has more features and
> cleaner syntax. A memoryview can be used in any context (function
> parameters, module-level, cdef class attribute, etc) and can be
> obtained from any object that exposes the PEP 3118 buffer interface.
> "

FWIW, I do find this paragraph somewhat confusing, because the main
description of what a typed memoryview is assumes that I already know the
current buffer support. I think that's actually true (the ndarray[int32]
syntax, right?), but I'm not sure, and people coming to this for the first
time probably won't even know that buffers are what they're looking for.

I'd say something like: "Typed memoryviews can be used for efficient access
to buffers. For example, you can use them to read and modify numpy arrays or
<some other awesome things> without incurring any python overhead." And put
a compare/contrast with the old syntax later, like the second paragraph or
so.

My 2?,
- Nathaniel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20111029/a5e864ee/attachment.html>

From stefan_ml at behnel.de  Sat Oct 29 16:50:50 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 29 Oct 2011 16:50:50 +0200
Subject: [Cython] Cython 0.16
In-Reply-To: <CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>
References: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>
	<CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>
Message-ID: <4EAC12CA.5040200@behnel.de>

mark florisson, 28.10.2011 22:59:
> On 28 October 2011 21:55, Robert Bradshaw wrote:
>> With Mark's fused types and memory views going in, I think it's about
>> time for a new release.

Agreed.


>> Thoughts?

I still haven't investigated the decorator issue that appeared in the Sage 
tests. I think it's related to decorators on module level def functions, 
which would suggest that it's best to eventually fix it as part of the 
function implementation changes that Vitja has started. But there may still 
be a simpler work-around somewhere that I'm not seeing yet.

I basically broke the Sage tests by resolving a bug (593 IIRC), and both 
don't currently work together. So, a variant would be to revert my changes 
for 0.16 and just leave the bug in, if that keeps us from breaking existing 
code for now.

But even leaving that out, the Sage tests look seriously broken currently:

https://sage.math.washington.edu:8091/hudson/view/All/job/sage-tests/lastCompletedBuild/consoleFull


> That'd be cool. However there are a few outstanding issues:
>      a) the compiler is somewhat slower (possible solution: lazy utility codes)
>      b) there's a potential memory leak problem for memoryviews with
> object dtype that contain themselves, this still needs investigation.
>
> As for a), Stefan mentioned code spending a lot of time in sub.
> Stefan, could you post the code for this that made Cython compile very
> slowly?

At the time, I just ran cProfile on runtests.py with something like 
"withstat with_stat" or so as tests - basically all with-statement related 
ones. It took about 20 seconds or so to build the utility code, just to 
throw it away unused afterwards. The compile/test run itself then took 
about 3 seconds.

Stefan

From markflorisson88 at gmail.com  Sat Oct 29 17:03:11 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sat, 29 Oct 2011 16:03:11 +0100
Subject: [Cython] Cython 0.16
In-Reply-To: <CAPJVwBnZgq_JUj8KRZzg11CNe9LUodoYN_kW7UV_YM1nmWjPGA@mail.gmail.com>
References: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>
	<CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>
	<CADiQ+QB1XyX5Lot7XdRLBboL1vT++z7UEJE49qRK2w0ZfNDoWQ@mail.gmail.com>
	<f3e0ce65-9257-47a9-aedd-68c6a87a956d@email.android.com>
	<CANg26EXDji3grAB8xuvzdnEvSm3JNhuFk=Rz4aabuWi72gt8-w@mail.gmail.com>
	<CAPJVwBnZgq_JUj8KRZzg11CNe9LUodoYN_kW7UV_YM1nmWjPGA@mail.gmail.com>
Message-ID: <CANg26EUNP11drQURF7=nuoO69jf5153uBYJx-pJ67pcExi5ugg@mail.gmail.com>

On 29 October 2011 15:50, Nathaniel Smith <njs at pobox.com> wrote:
> On Oct 29, 2011 4:41 AM, "mark florisson" <markflorisson88 at gmail.com> wrote:
>> "
>> Typed memoryviews can be used for efficient access to buffers. It is
>> similar to the current buffer support, but has more features and
>> cleaner syntax. A memoryview can be used in any context (function
>> parameters, module-level, cdef class attribute, etc) and can be
>> obtained from any object that exposes the PEP 3118 buffer interface.
>> "
>
> FWIW, I do find this paragraph somewhat confusing, because the main
> description of what a typed memoryview is assumes that I already know the
> current buffer support. I think that's actually true (the ndarray[int32]
> syntax, right?), but I'm not sure, and people coming to this for the first
> time probably won't even know that buffers are what they're looking for.
>
> I'd say something like: "Typed memoryviews can be used for efficient access
> to buffers. For example, you can use them to read and modify numpy arrays or
> <some other awesome things> without incurring any python overhead." And put
> a compare/contrast with the old syntax later, like the second paragraph or
> so.
>
> My 2?,
> - Nathaniel

Good idea, thanks! I'll update the documentation again.

> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>
>

From markflorisson88 at gmail.com  Sat Oct 29 17:03:24 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sat, 29 Oct 2011 16:03:24 +0100
Subject: [Cython] Cython 0.16
In-Reply-To: <4EAC12CA.5040200@behnel.de>
References: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>
	<CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>
	<4EAC12CA.5040200@behnel.de>
Message-ID: <CANg26EWLOY16NtvuWaSUAUKXg=5=s1qbxBozeh2NL=Yx0nG6ug@mail.gmail.com>

On 29 October 2011 15:50, Stefan Behnel <stefan_ml at behnel.de> wrote:
> mark florisson, 28.10.2011 22:59:
>>
>> On 28 October 2011 21:55, Robert Bradshaw wrote:
>>>
>>> With Mark's fused types and memory views going in, I think it's about
>>> time for a new release.
>
> Agreed.
>
>
>>> Thoughts?
>
> I still haven't investigated the decorator issue that appeared in the Sage
> tests. I think it's related to decorators on module level def functions,
> which would suggest that it's best to eventually fix it as part of the
> function implementation changes that Vitja has started. But there may still
> be a simpler work-around somewhere that I'm not seeing yet.
>
> I basically broke the Sage tests by resolving a bug (593 IIRC), and both
> don't currently work together. So, a variant would be to revert my changes
> for 0.16 and just leave the bug in, if that keeps us from breaking existing
> code for now.

If it's a bug I think it's worth fixing, even if it breaks other code.
Unfortunately I lost my trac password, so I don't know which bug that
is.

> But even leaving that out, the Sage tests look seriously broken currently:
>
> https://sage.math.washington.edu:8091/hudson/view/All/job/sage-tests/lastCompletedBuild/consoleFull
>
>
>> That'd be cool. However there are a few outstanding issues:
>> ? ? a) the compiler is somewhat slower (possible solution: lazy utility
>> codes)
>> ? ? b) there's a potential memory leak problem for memoryviews with
>> object dtype that contain themselves, this still needs investigation.
>>
>> As for a), Stefan mentioned code spending a lot of time in sub.
>> Stefan, could you post the code for this that made Cython compile very
>> slowly?
>
> At the time, I just ran cProfile on runtests.py with something like
> "withstat with_stat" or so as tests - basically all with-statement related
> ones. It took about 20 seconds or so to build the utility code, just to
> throw it away unused afterwards. The compile/test run itself then took about
> 3 seconds.

Was that before or after the deferred cython scope loading commit?

> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From markflorisson88 at gmail.com  Sat Oct 29 17:11:56 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sat, 29 Oct 2011 16:11:56 +0100
Subject: [Cython] Cython 0.16
In-Reply-To: <CANg26EW_c88FCYMEaJFAG5QySMenLmLAJ2qsaxx-GhEFbdyvmg@mail.gmail.com>
References: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>
	<CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>
	<CADiQ+QB1XyX5Lot7XdRLBboL1vT++z7UEJE49qRK2w0ZfNDoWQ@mail.gmail.com>
	<f3e0ce65-9257-47a9-aedd-68c6a87a956d@email.android.com>
	<CANg26EXDji3grAB8xuvzdnEvSm3JNhuFk=Rz4aabuWi72gt8-w@mail.gmail.com>
	<CANg26EW_c88FCYMEaJFAG5QySMenLmLAJ2qsaxx-GhEFbdyvmg@mail.gmail.com>
Message-ID: <CANg26EUWsX-+Q+07bhzFv+L2t7++Sf8=UHLEYn=Yknb4vicwaA@mail.gmail.com>

It seems that, most ironically, OpenMP "isn't defined" to be called
from multithreaded contexts. It seems that even if I use prange only
in another thread that isn't the main thread the program segfaults if
compiled with gcc. That's kind of worrying, I suppose we should
mention that in the documentation. This may be a problem especially
for people who write libraries.

Are NumPy, Scipy or Sage linked with any libraries that use OpenMP?

On 29 October 2011 14:14, mark florisson <markflorisson88 at gmail.com> wrote:
> Before we do a release, would anyone be opposed to a 'chunksize'
> keyword argument to prange()? That may have significant performance
> impacts.
>
> On 29 October 2011 12:41, mark florisson <markflorisson88 at gmail.com> wrote:
>> Hm ok I'll disable them then. Pointers and some other dtypes are also
>> not supported yet. As for the documentation, have you guys reviewed
>> the documentation for fused types and memoryviews? For instance this
>> is the introduction for memoryviews:
>>
>> "
>> Typed memoryviews can be used for efficient access to buffers. It is
>> similar to the current buffer support, but has more features and
>> cleaner syntax. A memoryview can be used in any context (function
>> parameters, module-level, cdef class attribute, etc) and can be
>> obtained from any object that exposes the PEP 3118 buffer interface.
>> "
>>
>> but I'm not sure this new functionality won't confuse users of the old
>> buffer support.
>>
>> For fused types, cython.numeric only includes long, double and double
>> complex. I think that should be changed to short, int, long, float,
>> double, float complex and double complex. I was deliberately avoiding
>> long long and long double as they (if not used as a base type) would
>> be preferred over the others and may be a lot slower. But then, such
>> usage wouldn't be very useful. Should I include them then?
>>
>> On 29 October 2011 10:30, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no> wrote:
>>> Re b), it would be better to disable object dtypes (or emit a warning about
>>> the possible bug when using them) than to delay the release. Object
>>> memoryviews are rare in the first place, and those who contain themselves
>>> should be very rare.
>>> --
>>> Sent from my Android phone with K-9 Mail. Please excuse my brevity.
>>>
>>> Robert Bradshaw <robertwb at math.washington.edu> wrote:
>>>>
>>>> On Fri, Oct 28, 2011 at 1:59 PM, mark florisson
>>>> <markflorisson88 at gmail.com> wrote: > On 28 October 2011 21:55, Robert
>>>> Bradshaw <robertwb at math.washington.edu> wrote: >> With Mark's fused types
>>>> and memory views going in, I think it's about >> time for a new release.
>>>> Thoughts? Anyone want to volunteer to take up >> the process? >> >> - Robert
>>>> >>
>>>> ________________________________
>>>> >> cython-devel mailing list >> cython-devel at python.org >>
>>>> >> http://mail.python.org/mailman/listinfo/cython-devel >> > > That'd be cool.
>>>> >> However there are a few outstanding issues: > ? ?a) the compiler is somewhat
>>>> >> slower (possible solution: lazy utility codes) Yeah, I forgot about that.
>>>> >> This should get resolved. Lazy utility codes (perhaps breaking them up)
>>>> >> would probably got us most of the way there. Long term, I really like the
>>>> >> "declaration caching" idea which could be used for users .pxd files as well
>>>> >> as internally. > ? ?b) there's a potential memory leak problem for
>>>> >> memoryviews with > object dtype that contain themselves, this still needs
>>>> >> investigation. I think this could be mentioned as a caviat rather than being
>>>> >> a blocker. > As for a), Stefan mentioned code spending a lot of time in sub.
>>>> >> > Stefan, could you post the code for this that made Cython compile very >
>>>> >> slowly? >
>>>> ________________________________
>>>> > cython-devel mailing list > cython-devel at python.org >
>>>> > http://mail.python.org/mailman/listinfo/cython-devel >
>>>> ________________________________
>>>> cython-devel mailing list cython-devel at python.org
>>>> http://mail.python.org/mailman/listinfo/cython-devel
>>>
>>> _______________________________________________
>>> cython-devel mailing list
>>> cython-devel at python.org
>>> http://mail.python.org/mailman/listinfo/cython-devel
>>>
>>>
>>
>

From stefan_ml at behnel.de  Sat Oct 29 17:42:06 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 29 Oct 2011 17:42:06 +0200
Subject: [Cython] Cython 0.16
In-Reply-To: <CANg26EWLOY16NtvuWaSUAUKXg=5=s1qbxBozeh2NL=Yx0nG6ug@mail.gmail.com>
References: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>	<CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>	<4EAC12CA.5040200@behnel.de>
	<CANg26EWLOY16NtvuWaSUAUKXg=5=s1qbxBozeh2NL=Yx0nG6ug@mail.gmail.com>
Message-ID: <4EAC1ECE.7060404@behnel.de>

mark florisson, 29.10.2011 17:03:
> On 29 October 2011 15:50, Stefan Behnel wrote:
>> mark florisson, 28.10.2011 22:59:
>>>
>>> On 28 October 2011 21:55, Robert Bradshaw wrote:
>>>>
>>>> With Mark's fused types and memory views going in, I think it's about
>>>> time for a new release.
>>
>> I still haven't investigated the decorator issue that appeared in the Sage
>> tests. I think it's related to decorators on module level def functions,
>> which would suggest that it's best to eventually fix it as part of the
>> function implementation changes that Vitja has started. But there may still
>> be a simpler work-around somewhere that I'm not seeing yet.
>>
>> I basically broke the Sage tests by resolving a bug (593 IIRC), and both
>> don't currently work together. So, a variant would be to revert my changes
>> for 0.16 and just leave the bug in, if that keeps us from breaking existing
>> code for now.
>
> If it's a bug I think it's worth fixing, even if it breaks other code.
> Unfortunately I lost my trac password, so I don't know which bug that
> is.

You should be able to set up a new password, that should get you back in.


>> But even leaving that out, the Sage tests look seriously broken currently:
>>
>> https://sage.math.washington.edu:8091/hudson/view/All/job/sage-tests/lastCompletedBuild/consoleFull
>>
>>
>>> That'd be cool. However there are a few outstanding issues:
>>>      a) the compiler is somewhat slower (possible solution: lazy utility
>>> codes)
>>>      b) there's a potential memory leak problem for memoryviews with
>>> object dtype that contain themselves, this still needs investigation.
>>>
>>> As for a), Stefan mentioned code spending a lot of time in sub.
>>> Stefan, could you post the code for this that made Cython compile very
>>> slowly?
>>
>> At the time, I just ran cProfile on runtests.py with something like
>> "withstat with_stat" or so as tests - basically all with-statement related
>> ones. It took about 20 seconds or so to build the utility code, just to
>> throw it away unused afterwards. The compile/test run itself then took about
>> 3 seconds.
>
> Was that before or after the deferred cython scope loading commit?

Likely before. It looks *much* better now.

Stefan

From vitja.makarov at gmail.com  Sat Oct 29 18:40:05 2011
From: vitja.makarov at gmail.com (Vitja Makarov)
Date: Sat, 29 Oct 2011 20:40:05 +0400
Subject: [Cython] Cython 0.16
In-Reply-To: <4EAC12CA.5040200@behnel.de>
References: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>
	<CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>
	<4EAC12CA.5040200@behnel.de>
Message-ID: <CAKGHGPRAysWt1vhHRob2ytEQP-y_oMuHuDtq1QPJScjV4YYmsw@mail.gmail.com>

2011/10/29 Stefan Behnel <stefan_ml at behnel.de>:
> mark florisson, 28.10.2011 22:59:
>>
>> On 28 October 2011 21:55, Robert Bradshaw wrote:
>>>
>>> With Mark's fused types and memory views going in, I think it's about
>>> time for a new release.
>
> Agreed.
>
>
>>> Thoughts?
>
> I still haven't investigated the decorator issue that appeared in the Sage
> tests. I think it's related to decorators on module level def functions,
> which would suggest that it's best to eventually fix it as part of the
> function implementation changes that Vitja has started. But there may still
> be a simpler work-around somewhere that I'm not seeing yet.
>

Recently I've implemented py3k-super and dynamic default args if we
have time. I would see this in release also.

Can you please point me to sage decorators related failure?


> I basically broke the Sage tests by resolving a bug (593 IIRC), and both
> don't currently work together. So, a variant would be to revert my changes
> for 0.16 and just leave the bug in, if that keeps us from breaking existing
> code for now.
>
> But even leaving that out, the Sage tests look seriously broken currently:
>
> https://sage.math.washington.edu:8091/hudson/view/All/job/sage-tests/lastCompletedBuild/consoleFull
>
>
>> That'd be cool. However there are a few outstanding issues:
>> ? ? a) the compiler is somewhat slower (possible solution: lazy utility
>> codes)
>> ? ? b) there's a potential memory leak problem for memoryviews with
>> object dtype that contain themselves, this still needs investigation.
>>
>> As for a), Stefan mentioned code spending a lot of time in sub.
>> Stefan, could you post the code for this that made Cython compile very
>> slowly?
>
> At the time, I just ran cProfile on runtests.py with something like
> "withstat with_stat" or so as tests - basically all with-statement related
> ones. It took about 20 seconds or so to build the utility code, just to
> throw it away unused afterwards. The compile/test run itself then took about
> 3 seconds.
>


-- 
vitja.

From robertwb at math.washington.edu  Sat Oct 29 18:58:48 2011
From: robertwb at math.washington.edu (Robert Bradshaw)
Date: Sat, 29 Oct 2011 09:58:48 -0700
Subject: [Cython] Cython 0.16
In-Reply-To: <4EAC12CA.5040200@behnel.de>
References: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>
	<CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>
	<4EAC12CA.5040200@behnel.de>
Message-ID: <CADiQ+QDTrvjDxxbmw905SotQM5-X2ac-ui54XK0LK0VhZdVM_A@mail.gmail.com>

On Sat, Oct 29, 2011 at 7:50 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> mark florisson, 28.10.2011 22:59:
>>
>> On 28 October 2011 21:55, Robert Bradshaw wrote:
>>>
>>> With Mark's fused types and memory views going in, I think it's about
>>> time for a new release.
>
> Agreed.
>
>
>>> Thoughts?
>
> I still haven't investigated the decorator issue that appeared in the Sage
> tests. I think it's related to decorators on module level def functions,
> which would suggest that it's best to eventually fix it as part of the
> function implementation changes that Vitja has started. But there may still
> be a simpler work-around somewhere that I'm not seeing yet.
>
> I basically broke the Sage tests by resolving a bug (593 IIRC), and both
> don't currently work together. So, a variant would be to revert my changes
> for 0.16 and just leave the bug in, if that keeps us from breaking existing
> code for now.
>
> But even leaving that out, the Sage tests look seriously broken currently:
>
> https://sage.math.washington.edu:8091/hudson/view/All/job/sage-tests/lastCompletedBuild/consoleFull

I recently fixed the Sage build (the errors on public api for non
public types broke it). As for those tests, they seem to be related to
name mangling for double-underscore names. Did something change here
recently? Or is it indirectly due to decorators (I haven't looked too
deeply yet.)

>> That'd be cool. However there are a few outstanding issues:
>> ? ? a) the compiler is somewhat slower (possible solution: lazy utility
>> codes)
>> ? ? b) there's a potential memory leak problem for memoryviews with
>> object dtype that contain themselves, this still needs investigation.
>>
>> As for a), Stefan mentioned code spending a lot of time in sub.
>> Stefan, could you post the code for this that made Cython compile very
>> slowly?
>
> At the time, I just ran cProfile on runtests.py with something like
> "withstat with_stat" or so as tests - basically all with-statement related
> ones. It took about 20 seconds or so to build the utility code, just to
> throw it away unused afterwards. The compile/test run itself then took about
> 3 seconds.
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From robertwb at math.washington.edu  Sat Oct 29 19:05:00 2011
From: robertwb at math.washington.edu (Robert Bradshaw)
Date: Sat, 29 Oct 2011 10:05:00 -0700
Subject: [Cython] Cython 0.16
In-Reply-To: <CANg26EXDji3grAB8xuvzdnEvSm3JNhuFk=Rz4aabuWi72gt8-w@mail.gmail.com>
References: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>
	<CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>
	<CADiQ+QB1XyX5Lot7XdRLBboL1vT++z7UEJE49qRK2w0ZfNDoWQ@mail.gmail.com>
	<f3e0ce65-9257-47a9-aedd-68c6a87a956d@email.android.com>
	<CANg26EXDji3grAB8xuvzdnEvSm3JNhuFk=Rz4aabuWi72gt8-w@mail.gmail.com>
Message-ID: <CADiQ+QAb8YXcEuWRG0dn1e=6TkW9ZDGOyr+y=A_0k+9oCmziQg@mail.gmail.com>

On Sat, Oct 29, 2011 at 4:41 AM, mark florisson
<markflorisson88 at gmail.com> wrote:
> Hm ok I'll disable them then. Pointers and some other dtypes are also
> not supported yet. As for the documentation, have you guys reviewed
> the documentation for fused types and memoryviews?

I looked at the fused types docs.

> For instance this
> is the introduction for memoryviews:
>
> "
> Typed memoryviews can be used for efficient access to buffers. It is
> similar to the current buffer support, but has more features and
> cleaner syntax. A memoryview can be used in any context (function
> parameters, module-level, cdef class attribute, etc) and can be
> obtained from any object that exposes the PEP 3118 buffer interface.
> "
>
> but I'm not sure this new functionality won't confuse users of the old
> buffer support.
>
> For fused types, cython.numeric only includes long, double and double
> complex. I think that should be changed to short, int, long, float,
> double, float complex and double complex.

Yes. What about size_t, ssize_t, and Py_ssize_t?

> I was deliberately avoiding
> long long and long double as they (if not used as a base type) would
> be preferred over the others and may be a lot slower. But then, such
> usage wouldn't be very useful. Should I include them then?

That's a good question. Perhaps these two could be used if explicitly
requested, or for dispatching from a Python long (in Py2) or
non-word-sized int (in Py3).

> On 29 October 2011 10:30, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no> wrote:
>> Re b), it would be better to disable object dtypes (or emit a warning about
>> the possible bug when using them) than to delay the release. Object
>> memoryviews are rare in the first place, and those who contain themselves
>> should be very rare.

+1 to a warning, especially if the problem is only related to circular
references.

- Robert

From markflorisson88 at gmail.com  Sat Oct 29 19:44:07 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sat, 29 Oct 2011 18:44:07 +0100
Subject: [Cython] Cython 0.16
In-Reply-To: <CADiQ+QAb8YXcEuWRG0dn1e=6TkW9ZDGOyr+y=A_0k+9oCmziQg@mail.gmail.com>
References: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>
	<CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>
	<CADiQ+QB1XyX5Lot7XdRLBboL1vT++z7UEJE49qRK2w0ZfNDoWQ@mail.gmail.com>
	<f3e0ce65-9257-47a9-aedd-68c6a87a956d@email.android.com>
	<CANg26EXDji3grAB8xuvzdnEvSm3JNhuFk=Rz4aabuWi72gt8-w@mail.gmail.com>
	<CADiQ+QAb8YXcEuWRG0dn1e=6TkW9ZDGOyr+y=A_0k+9oCmziQg@mail.gmail.com>
Message-ID: <CANg26EW66LeHMs=kvZ74A-TSo8zXEWvpvUBe_LequHyND04dOA@mail.gmail.com>

On 29 October 2011 18:05, Robert Bradshaw <robertwb at math.washington.edu> wrote:
> On Sat, Oct 29, 2011 at 4:41 AM, mark florisson
> <markflorisson88 at gmail.com> wrote:
>> Hm ok I'll disable them then. Pointers and some other dtypes are also
>> not supported yet. As for the documentation, have you guys reviewed
>> the documentation for fused types and memoryviews?
>
> I looked at the fused types docs.
>
>> For instance this
>> is the introduction for memoryviews:
>>
>> "
>> Typed memoryviews can be used for efficient access to buffers. It is
>> similar to the current buffer support, but has more features and
>> cleaner syntax. A memoryview can be used in any context (function
>> parameters, module-level, cdef class attribute, etc) and can be
>> obtained from any object that exposes the PEP 3118 buffer interface.
>> "
>>
>> but I'm not sure this new functionality won't confuse users of the old
>> buffer support.
>>
>> For fused types, cython.numeric only includes long, double and double
>> complex. I think that should be changed to short, int, long, float,
>> double, float complex and double complex.
>
> Yes. What about size_t, ssize_t, and Py_ssize_t?

Hmm, these things don't contain unsigned types as they may be chosen
when calling directly (as they're longer), but they will cause
problems for negative values. I think unsigned types should be
explicit. I think size_t is also more for representing the size of
objects, I'm not sure you'd want the same code operating on size_t and
say, ints. Py_ssize_t is typically used as the type for indices, but
not much else I think, so it might be weird to include it.

>> I was deliberately avoiding
>> long long and long double as they (if not used as a base type) would
>> be preferred over the others and may be a lot slower. But then, such
>> usage wouldn't be very useful. Should I include them then?
>
> That's a good question. Perhaps these two could be used if explicitly
> requested, or for dispatching from a Python long (in Py2) or
> non-word-sized int (in Py3).

I'm not sure I understand, how would you request them explicitly? The
user could always just created a fused type manually if he/she wants
long long, long double, or long double complex.

>> On 29 October 2011 10:30, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no> wrote:
>>> Re b), it would be better to disable object dtypes (or emit a warning about
>>> the possible bug when using them) than to delay the release. Object
>>> memoryviews are rare in the first place, and those who contain themselves
>>> should be very rare.
>
> +1 to a warning, especially if the problem is only related to circular
> references.

Hmm, a warning, ok.

Do we desperately want to get a release out, or do we want it for
somewhere e.g. at the end of the week? Because fixing this issue
wouldn't be too hard I think, and it might give us some more time to
review and merge Vitja's code. super() is pretty neat.

> - Robert
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From markflorisson88 at gmail.com  Sat Oct 29 19:47:33 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sat, 29 Oct 2011 18:47:33 +0100
Subject: [Cython] Cython 0.16
In-Reply-To: <CANg26EW66LeHMs=kvZ74A-TSo8zXEWvpvUBe_LequHyND04dOA@mail.gmail.com>
References: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>
	<CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>
	<CADiQ+QB1XyX5Lot7XdRLBboL1vT++z7UEJE49qRK2w0ZfNDoWQ@mail.gmail.com>
	<f3e0ce65-9257-47a9-aedd-68c6a87a956d@email.android.com>
	<CANg26EXDji3grAB8xuvzdnEvSm3JNhuFk=Rz4aabuWi72gt8-w@mail.gmail.com>
	<CADiQ+QAb8YXcEuWRG0dn1e=6TkW9ZDGOyr+y=A_0k+9oCmziQg@mail.gmail.com>
	<CANg26EW66LeHMs=kvZ74A-TSo8zXEWvpvUBe_LequHyND04dOA@mail.gmail.com>
Message-ID: <CANg26EVgnW8sPOd5KC0P0UMG6L2c0f7XvT3Guc4ZoaN+STAhUQ@mail.gmail.com>

On 29 October 2011 18:44, mark florisson <markflorisson88 at gmail.com> wrote:
> On 29 October 2011 18:05, Robert Bradshaw <robertwb at math.washington.edu> wrote:
>> On Sat, Oct 29, 2011 at 4:41 AM, mark florisson
>> <markflorisson88 at gmail.com> wrote:
>>> Hm ok I'll disable them then. Pointers and some other dtypes are also
>>> not supported yet. As for the documentation, have you guys reviewed
>>> the documentation for fused types and memoryviews?
>>
>> I looked at the fused types docs.
>>
>>> For instance this
>>> is the introduction for memoryviews:
>>>
>>> "
>>> Typed memoryviews can be used for efficient access to buffers. It is
>>> similar to the current buffer support, but has more features and
>>> cleaner syntax. A memoryview can be used in any context (function
>>> parameters, module-level, cdef class attribute, etc) and can be
>>> obtained from any object that exposes the PEP 3118 buffer interface.
>>> "
>>>
>>> but I'm not sure this new functionality won't confuse users of the old
>>> buffer support.
>>>
>>> For fused types, cython.numeric only includes long, double and double
>>> complex. I think that should be changed to short, int, long, float,
>>> double, float complex and double complex.
>>
>> Yes. What about size_t, ssize_t, and Py_ssize_t?
>
> Hmm, these things don't contain unsigned types as they may be chosen
> when calling directly (as they're longer), but they will cause
> problems for negative values. I think unsigned types should be
> explicit. I think size_t is also more for representing the size of
> objects, I'm not sure you'd want the same code operating on size_t and
> say, ints. Py_ssize_t is typically used as the type for indices, but
> not much else I think, so it might be weird to include it.

Yes, I think the long long and long double ones should just be
excluded. If people want them they can fuse their own types.

>>> I was deliberately avoiding
>>> long long and long double as they (if not used as a base type) would
>>> be preferred over the others and may be a lot slower. But then, such
>>> usage wouldn't be very useful. Should I include them then?
>>
>> That's a good question. Perhaps these two could be used if explicitly
>> requested, or for dispatching from a Python long (in Py2) or
>> non-word-sized int (in Py3).
>
> I'm not sure I understand, how would you request them explicitly? The
> user could always just created a fused type manually if he/she wants
> long long, long double, or long double complex.
>
>>> On 29 October 2011 10:30, Dag Sverre Seljebotn
>>> <d.s.seljebotn at astro.uio.no> wrote:
>>>> Re b), it would be better to disable object dtypes (or emit a warning about
>>>> the possible bug when using them) than to delay the release. Object
>>>> memoryviews are rare in the first place, and those who contain themselves
>>>> should be very rare.
>>
>> +1 to a warning, especially if the problem is only related to circular
>> references.
>
> Hmm, a warning, ok.
>
> Do we desperately want to get a release out, or do we want it for
> somewhere e.g. at the end of the week? Because fixing this issue
> wouldn't be too hard I think, and it might give us some more time to
> review and merge Vitja's code. super() is pretty neat.
>
>> - Robert
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>

From markflorisson88 at gmail.com  Sat Oct 29 19:50:57 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sat, 29 Oct 2011 18:50:57 +0100
Subject: [Cython] Cython 0.16
In-Reply-To: <CANg26EVgnW8sPOd5KC0P0UMG6L2c0f7XvT3Guc4ZoaN+STAhUQ@mail.gmail.com>
References: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>
	<CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>
	<CADiQ+QB1XyX5Lot7XdRLBboL1vT++z7UEJE49qRK2w0ZfNDoWQ@mail.gmail.com>
	<f3e0ce65-9257-47a9-aedd-68c6a87a956d@email.android.com>
	<CANg26EXDji3grAB8xuvzdnEvSm3JNhuFk=Rz4aabuWi72gt8-w@mail.gmail.com>
	<CADiQ+QAb8YXcEuWRG0dn1e=6TkW9ZDGOyr+y=A_0k+9oCmziQg@mail.gmail.com>
	<CANg26EW66LeHMs=kvZ74A-TSo8zXEWvpvUBe_LequHyND04dOA@mail.gmail.com>
	<CANg26EVgnW8sPOd5KC0P0UMG6L2c0f7XvT3Guc4ZoaN+STAhUQ@mail.gmail.com>
Message-ID: <CANg26EVhE7ydQy=o8TsSX1Ujng9J0_8mydDa=BRpCUt577Sdfg@mail.gmail.com>

On 29 October 2011 18:47, mark florisson <markflorisson88 at gmail.com> wrote:
> On 29 October 2011 18:44, mark florisson <markflorisson88 at gmail.com> wrote:
>> On 29 October 2011 18:05, Robert Bradshaw <robertwb at math.washington.edu> wrote:
>>> On Sat, Oct 29, 2011 at 4:41 AM, mark florisson
>>> <markflorisson88 at gmail.com> wrote:
>>>> Hm ok I'll disable them then. Pointers and some other dtypes are also
>>>> not supported yet. As for the documentation, have you guys reviewed
>>>> the documentation for fused types and memoryviews?
>>>
>>> I looked at the fused types docs.
>>>
>>>> For instance this
>>>> is the introduction for memoryviews:
>>>>
>>>> "
>>>> Typed memoryviews can be used for efficient access to buffers. It is
>>>> similar to the current buffer support, but has more features and
>>>> cleaner syntax. A memoryview can be used in any context (function
>>>> parameters, module-level, cdef class attribute, etc) and can be
>>>> obtained from any object that exposes the PEP 3118 buffer interface.
>>>> "
>>>>
>>>> but I'm not sure this new functionality won't confuse users of the old
>>>> buffer support.
>>>>
>>>> For fused types, cython.numeric only includes long, double and double
>>>> complex. I think that should be changed to short, int, long, float,
>>>> double, float complex and double complex.
>>>
>>> Yes. What about size_t, ssize_t, and Py_ssize_t?
>>
>> Hmm, these things don't contain unsigned types as they may be chosen
>> when calling directly (as they're longer), but they will cause
>> problems for negative values. I think unsigned types should be
>> explicit. I think size_t is also more for representing the size of
>> objects, I'm not sure you'd want the same code operating on size_t and
>> say, ints. Py_ssize_t is typically used as the type for indices, but
>> not much else I think, so it might be weird to include it.
>
> Yes, I think the long long and long double ones should just be
> excluded. If people want them they can fuse their own types.
>
>>>> I was deliberately avoiding
>>>> long long and long double as they (if not used as a base type) would
>>>> be preferred over the others and may be a lot slower. But then, such
>>>> usage wouldn't be very useful. Should I include them then?
>>>
>>> That's a good question. Perhaps these two could be used if explicitly
>>> requested, or for dispatching from a Python long (in Py2) or
>>> non-word-sized int (in Py3).
>>
>> I'm not sure I understand, how would you request them explicitly? The
>> user could always just created a fused type manually if he/she wants
>> long long, long double, or long double complex.
>>
>>>> On 29 October 2011 10:30, Dag Sverre Seljebotn
>>>> <d.s.seljebotn at astro.uio.no> wrote:
>>>>> Re b), it would be better to disable object dtypes (or emit a warning about
>>>>> the possible bug when using them) than to delay the release. Object
>>>>> memoryviews are rare in the first place, and those who contain themselves
>>>>> should be very rare.
>>>
>>> +1 to a warning, especially if the problem is only related to circular
>>> references.
>>
>> Hmm, a warning, ok.
>>
>> Do we desperately want to get a release out, or do we want it for
>> somewhere e.g. at the end of the week? Because fixing this issue
>> wouldn't be too hard I think, and it might give us some more time to
>> review and merge Vitja's code. super() is pretty neat.
>>
>>> - Robert
>>> _______________________________________________
>>> cython-devel mailing list
>>> cython-devel at python.org
>>> http://mail.python.org/mailman/listinfo/cython-devel
>>>
>>
>

Maybe numpy.pxd could provide a numpy version of integral, floating
and numeric, that will contain all relevant numpy types.

From robertwb at math.washington.edu  Sat Oct 29 19:59:18 2011
From: robertwb at math.washington.edu (Robert Bradshaw)
Date: Sat, 29 Oct 2011 10:59:18 -0700
Subject: [Cython] Cython 0.16
In-Reply-To: <CANg26EW66LeHMs=kvZ74A-TSo8zXEWvpvUBe_LequHyND04dOA@mail.gmail.com>
References: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>
	<CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>
	<CADiQ+QB1XyX5Lot7XdRLBboL1vT++z7UEJE49qRK2w0ZfNDoWQ@mail.gmail.com>
	<f3e0ce65-9257-47a9-aedd-68c6a87a956d@email.android.com>
	<CANg26EXDji3grAB8xuvzdnEvSm3JNhuFk=Rz4aabuWi72gt8-w@mail.gmail.com>
	<CADiQ+QAb8YXcEuWRG0dn1e=6TkW9ZDGOyr+y=A_0k+9oCmziQg@mail.gmail.com>
	<CANg26EW66LeHMs=kvZ74A-TSo8zXEWvpvUBe_LequHyND04dOA@mail.gmail.com>
Message-ID: <CADiQ+QAtfvP+0XUsjgYJKxX_ji6io6Si_JnoPQ8M_wQBKpV3WA@mail.gmail.com>

On Sat, Oct 29, 2011 at 10:44 AM, mark florisson
<markflorisson88 at gmail.com> wrote:
> On 29 October 2011 18:05, Robert Bradshaw <robertwb at math.washington.edu> wrote:
>> On Sat, Oct 29, 2011 at 4:41 AM, mark florisson
>> <markflorisson88 at gmail.com> wrote:
>>> Hm ok I'll disable them then. Pointers and some other dtypes are also
>>> not supported yet. As for the documentation, have you guys reviewed
>>> the documentation for fused types and memoryviews?
>>
>> I looked at the fused types docs.
>>
>>> For instance this
>>> is the introduction for memoryviews:
>>>
>>> "
>>> Typed memoryviews can be used for efficient access to buffers. It is
>>> similar to the current buffer support, but has more features and
>>> cleaner syntax. A memoryview can be used in any context (function
>>> parameters, module-level, cdef class attribute, etc) and can be
>>> obtained from any object that exposes the PEP 3118 buffer interface.
>>> "
>>>
>>> but I'm not sure this new functionality won't confuse users of the old
>>> buffer support.
>>>
>>> For fused types, cython.numeric only includes long, double and double
>>> complex. I think that should be changed to short, int, long, float,
>>> double, float complex and double complex.
>>
>> Yes. What about size_t, ssize_t, and Py_ssize_t?
>
> Hmm, these things don't contain unsigned types as they may be chosen
> when calling directly (as they're longer), but they will cause
> problems for negative values. I think unsigned types should be
> explicit.

You're right about unsigned.

> I think size_t is also more for representing the size of
> objects, I'm not sure you'd want the same code operating on size_t and
> say, ints. Py_ssize_t is typically used as the type for indices, but
> not much else I think, so it might be weird to include it.

I was thinking if one had

cdef foo(integral x):
   ...

then foo[ssize_t]

should be available, but perhaps not used implicitly. I suppose this
would be an exceptional case for dispatching, and

cdef fused my_type:
    integral
    ssize_t
    long long

is easy enough for the user to do.

>>> I was deliberately avoiding
>>> long long and long double as they (if not used as a base type) would
>>> be preferred over the others and may be a lot slower. But then, such
>>> usage wouldn't be very useful. Should I include them then?
>>
>> That's a good question. Perhaps these two could be used if explicitly
>> requested, or for dispatching from a Python long (in Py2) or
>> non-word-sized int (in Py3).
>
> I'm not sure I understand, how would you request them explicitly? The
> user could always just created a fused type manually if he/she wants
> long long, long double, or long double complex.
>
>>> On 29 October 2011 10:30, Dag Sverre Seljebotn
>>> <d.s.seljebotn at astro.uio.no> wrote:
>>>> Re b), it would be better to disable object dtypes (or emit a warning about
>>>> the possible bug when using them) than to delay the release. Object
>>>> memoryviews are rare in the first place, and those who contain themselves
>>>> should be very rare.
>>
>> +1 to a warning, especially if the problem is only related to circular
>> references.
>
> Hmm, a warning, ok.
>
> Do we desperately want to get a release out, or do we want it for
> somewhere e.g. at the end of the week? Because fixing this issue
> wouldn't be too hard I think, and it might give us some more time to
> review and merge Vitja's code. super() is pretty neat.

No hurry, but I was thinking it'd be good to get the ball rolling and
get these features released.

- Robert

From markflorisson88 at gmail.com  Sat Oct 29 20:03:16 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sat, 29 Oct 2011 19:03:16 +0100
Subject: [Cython] Cython 0.16
In-Reply-To: <CADiQ+QAtfvP+0XUsjgYJKxX_ji6io6Si_JnoPQ8M_wQBKpV3WA@mail.gmail.com>
References: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>
	<CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>
	<CADiQ+QB1XyX5Lot7XdRLBboL1vT++z7UEJE49qRK2w0ZfNDoWQ@mail.gmail.com>
	<f3e0ce65-9257-47a9-aedd-68c6a87a956d@email.android.com>
	<CANg26EXDji3grAB8xuvzdnEvSm3JNhuFk=Rz4aabuWi72gt8-w@mail.gmail.com>
	<CADiQ+QAb8YXcEuWRG0dn1e=6TkW9ZDGOyr+y=A_0k+9oCmziQg@mail.gmail.com>
	<CANg26EW66LeHMs=kvZ74A-TSo8zXEWvpvUBe_LequHyND04dOA@mail.gmail.com>
	<CADiQ+QAtfvP+0XUsjgYJKxX_ji6io6Si_JnoPQ8M_wQBKpV3WA@mail.gmail.com>
Message-ID: <CANg26EWJPP=9zwA1J1ViixXT8BPKxmgc9jOmWa3_K_L_oxt6Ww@mail.gmail.com>

On 29 October 2011 18:59, Robert Bradshaw <robertwb at math.washington.edu> wrote:
> On Sat, Oct 29, 2011 at 10:44 AM, mark florisson
> <markflorisson88 at gmail.com> wrote:
>> On 29 October 2011 18:05, Robert Bradshaw <robertwb at math.washington.edu> wrote:
>>> On Sat, Oct 29, 2011 at 4:41 AM, mark florisson
>>> <markflorisson88 at gmail.com> wrote:
>>>> Hm ok I'll disable them then. Pointers and some other dtypes are also
>>>> not supported yet. As for the documentation, have you guys reviewed
>>>> the documentation for fused types and memoryviews?
>>>
>>> I looked at the fused types docs.
>>>
>>>> For instance this
>>>> is the introduction for memoryviews:
>>>>
>>>> "
>>>> Typed memoryviews can be used for efficient access to buffers. It is
>>>> similar to the current buffer support, but has more features and
>>>> cleaner syntax. A memoryview can be used in any context (function
>>>> parameters, module-level, cdef class attribute, etc) and can be
>>>> obtained from any object that exposes the PEP 3118 buffer interface.
>>>> "
>>>>
>>>> but I'm not sure this new functionality won't confuse users of the old
>>>> buffer support.
>>>>
>>>> For fused types, cython.numeric only includes long, double and double
>>>> complex. I think that should be changed to short, int, long, float,
>>>> double, float complex and double complex.
>>>
>>> Yes. What about size_t, ssize_t, and Py_ssize_t?
>>
>> Hmm, these things don't contain unsigned types as they may be chosen
>> when calling directly (as they're longer), but they will cause
>> problems for negative values. I think unsigned types should be
>> explicit.
>
> You're right about unsigned.
>
>> I think size_t is also more for representing the size of
>> objects, I'm not sure you'd want the same code operating on size_t and
>> say, ints. Py_ssize_t is typically used as the type for indices, but
>> not much else I think, so it might be weird to include it.
>
> I was thinking if one had
>
> cdef foo(integral x):
> ? ...
>
> then foo[ssize_t]
>
> should be available, but perhaps not used implicitly. I suppose this
> would be an exceptional case for dispatching, and
>
> cdef fused my_type:
> ? ?integral
> ? ?ssize_t
> ? ?long long
>
> is easy enough for the user to do.

Ah, I see. Yeah, that's not implemented :P

>>>> I was deliberately avoiding
>>>> long long and long double as they (if not used as a base type) would
>>>> be preferred over the others and may be a lot slower. But then, such
>>>> usage wouldn't be very useful. Should I include them then?
>>>
>>> That's a good question. Perhaps these two could be used if explicitly
>>> requested, or for dispatching from a Python long (in Py2) or
>>> non-word-sized int (in Py3).
>>
>> I'm not sure I understand, how would you request them explicitly? The
>> user could always just created a fused type manually if he/she wants
>> long long, long double, or long double complex.
>>
>>>> On 29 October 2011 10:30, Dag Sverre Seljebotn
>>>> <d.s.seljebotn at astro.uio.no> wrote:
>>>>> Re b), it would be better to disable object dtypes (or emit a warning about
>>>>> the possible bug when using them) than to delay the release. Object
>>>>> memoryviews are rare in the first place, and those who contain themselves
>>>>> should be very rare.
>>>
>>> +1 to a warning, especially if the problem is only related to circular
>>> references.
>>
>> Hmm, a warning, ok.
>>
>> Do we desperately want to get a release out, or do we want it for
>> somewhere e.g. at the end of the week? Because fixing this issue
>> wouldn't be too hard I think, and it might give us some more time to
>> review and merge Vitja's code. super() is pretty neat.
>
> No hurry, but I was thinking it'd be good to get the ball rolling and
> get these features released.
>
> - Robert
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From stefan_ml at behnel.de  Sun Oct 30 09:49:11 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 30 Oct 2011 09:49:11 +0100
Subject: [Cython] Cython 0.16
In-Reply-To: <CADiQ+QDTrvjDxxbmw905SotQM5-X2ac-ui54XK0LK0VhZdVM_A@mail.gmail.com>
References: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>	<CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>	<4EAC12CA.5040200@behnel.de>
	<CADiQ+QDTrvjDxxbmw905SotQM5-X2ac-ui54XK0LK0VhZdVM_A@mail.gmail.com>
Message-ID: <4EAD0F87.5000208@behnel.de>

Robert Bradshaw, 29.10.2011 18:58:
> On Sat, Oct 29, 2011 at 7:50 AM, Stefan Behnel wrote:
>> I still haven't investigated the decorator issue that appeared in the Sage
>> tests. I think it's related to decorators on module level def functions,
>> which would suggest that it's best to eventually fix it as part of the
>> function implementation changes that Vitja has started. But there may still
>> be a simpler work-around somewhere that I'm not seeing yet.
>>
>> I basically broke the Sage tests by resolving a bug (593 IIRC), and both
>> don't currently work together. So, a variant would be to revert my changes
>> for 0.16 and just leave the bug in, if that keeps us from breaking existing
>> code for now.
>>
>> But even leaving that out, the Sage tests look seriously broken currently:
>>
>> https://sage.math.washington.edu:8091/hudson/view/All/job/sage-tests/lastCompletedBuild/consoleFull
>
> I recently fixed the Sage build (the errors on public api for non
> public types broke it). As for those tests, they seem to be related to
> name mangling for double-underscore names. Did something change here
> recently?

Not exactly recently. I implemented private name mangling for cdef classes, 
but I could swear that that was long before the Sage build started showing 
these failures. At least, I'm surprised to see them now.

Stefan

From markflorisson88 at gmail.com  Sun Oct 30 16:39:24 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Sun, 30 Oct 2011 15:39:24 +0000
Subject: [Cython] Cython 0.16
In-Reply-To: <CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>
References: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>
	<CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>
Message-ID: <CANg26EWebPdK3+N8qoqktTKP6reVhoDui9XL1CcL-AZSyFmgCw@mail.gmail.com>

On 28 October 2011 21:59, mark florisson <markflorisson88 at gmail.com> wrote:
> On 28 October 2011 21:55, Robert Bradshaw <robertwb at math.washington.edu> wrote:
>> With Mark's fused types and memory views going in, I think it's about
>> time for a new release. Thoughts? Anyone want to volunteer to take up
>> the process?
>>
>> - Robert
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>
> That'd be cool. However there are a few outstanding issues:
> ? ?a) the compiler is somewhat slower (possible solution: lazy utility codes)
> ? ?b) there's a potential memory leak problem for memoryviews with
> object dtype that contain themselves, this still needs investigation.
>
> As for a), Stefan mentioned code spending a lot of time in sub.
> Stefan, could you post the code for this that made Cython compile very
> slowly?
>

It seems that NumPy does not support cyclic references, it has
'(traverseproc)0,                            /* tp_traverse */' in its
source (PyArray_Type is the ndarray right?). Indeed, this code prints
"deallocated!" only if there is no reference cycle:


import numpy

cdef class DeallocateMe(object):
    def __dealloc__(self):
        print "deallocated!"
a = numpy.arange(20, dtype=numpy.object)
a[10] = DeallocateMe()
a[1] = a # <- try commenting out this line
del a
import gc
gc.collect()

Anyway, I got it to traverse and clear the buffer object and the
memoryview slice struct, so it should work if the buffer exporter
supports cycles.

From markflorisson88 at gmail.com  Mon Oct 31 23:13:06 2011
From: markflorisson88 at gmail.com (mark florisson)
Date: Mon, 31 Oct 2011 22:13:06 +0000
Subject: [Cython] Cython 0.16
In-Reply-To: <CANg26EW_c88FCYMEaJFAG5QySMenLmLAJ2qsaxx-GhEFbdyvmg@mail.gmail.com>
References: <CADiQ+QB28Z566jk_LnRZYw8Op4zfmnF65xYV1W5ME-CVA4Gz5w@mail.gmail.com>
	<CANg26EXNsNsgVWHaDYNKJSsZ2cM905OvumSyok9SCgGeUzd+jw@mail.gmail.com>
	<CADiQ+QB1XyX5Lot7XdRLBboL1vT++z7UEJE49qRK2w0ZfNDoWQ@mail.gmail.com>
	<f3e0ce65-9257-47a9-aedd-68c6a87a956d@email.android.com>
	<CANg26EXDji3grAB8xuvzdnEvSm3JNhuFk=Rz4aabuWi72gt8-w@mail.gmail.com>
	<CANg26EW_c88FCYMEaJFAG5QySMenLmLAJ2qsaxx-GhEFbdyvmg@mail.gmail.com>
Message-ID: <CANg26EWhkvtLTH-wX89oURsMV_DigavtHk3WuXdz+=XOCYG1dg@mail.gmail.com>

We can now pass a chunksize argument into prange:
https://github.com/cython/cython/commit/5c3e77d3c70686fedd5619d7267728fc819b4c60

On 29 October 2011 14:14, mark florisson <markflorisson88 at gmail.com> wrote:
> Before we do a release, would anyone be opposed to a 'chunksize'
> keyword argument to prange()? That may have significant performance
> impacts.
>
> On 29 October 2011 12:41, mark florisson <markflorisson88 at gmail.com> wrote:
>> Hm ok I'll disable them then. Pointers and some other dtypes are also
>> not supported yet. As for the documentation, have you guys reviewed
>> the documentation for fused types and memoryviews? For instance this
>> is the introduction for memoryviews:
>>
>> "
>> Typed memoryviews can be used for efficient access to buffers. It is
>> similar to the current buffer support, but has more features and
>> cleaner syntax. A memoryview can be used in any context (function
>> parameters, module-level, cdef class attribute, etc) and can be
>> obtained from any object that exposes the PEP 3118 buffer interface.
>> "
>>
>> but I'm not sure this new functionality won't confuse users of the old
>> buffer support.
>>
>> For fused types, cython.numeric only includes long, double and double
>> complex. I think that should be changed to short, int, long, float,
>> double, float complex and double complex. I was deliberately avoiding
>> long long and long double as they (if not used as a base type) would
>> be preferred over the others and may be a lot slower. But then, such
>> usage wouldn't be very useful. Should I include them then?
>>
>> On 29 October 2011 10:30, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no> wrote:
>>> Re b), it would be better to disable object dtypes (or emit a warning about
>>> the possible bug when using them) than to delay the release. Object
>>> memoryviews are rare in the first place, and those who contain themselves
>>> should be very rare.
>>> --
>>> Sent from my Android phone with K-9 Mail. Please excuse my brevity.
>>>
>>> Robert Bradshaw <robertwb at math.washington.edu> wrote:
>>>>
>>>> On Fri, Oct 28, 2011 at 1:59 PM, mark florisson
>>>> <markflorisson88 at gmail.com> wrote: > On 28 October 2011 21:55, Robert
>>>> Bradshaw <robertwb at math.washington.edu> wrote: >> With Mark's fused types
>>>> and memory views going in, I think it's about >> time for a new release.
>>>> Thoughts? Anyone want to volunteer to take up >> the process? >> >> - Robert
>>>> >>
>>>> ________________________________
>>>> >> cython-devel mailing list >> cython-devel at python.org >>
>>>> >> http://mail.python.org/mailman/listinfo/cython-devel >> > > That'd be cool.
>>>> >> However there are a few outstanding issues: > ? ?a) the compiler is somewhat
>>>> >> slower (possible solution: lazy utility codes) Yeah, I forgot about that.
>>>> >> This should get resolved. Lazy utility codes (perhaps breaking them up)
>>>> >> would probably got us most of the way there. Long term, I really like the
>>>> >> "declaration caching" idea which could be used for users .pxd files as well
>>>> >> as internally. > ? ?b) there's a potential memory leak problem for
>>>> >> memoryviews with > object dtype that contain themselves, this still needs
>>>> >> investigation. I think this could be mentioned as a caviat rather than being
>>>> >> a blocker. > As for a), Stefan mentioned code spending a lot of time in sub.
>>>> >> > Stefan, could you post the code for this that made Cython compile very >
>>>> >> slowly? >
>>>> ________________________________
>>>> > cython-devel mailing list > cython-devel at python.org >
>>>> > http://mail.python.org/mailman/listinfo/cython-devel >
>>>> ________________________________
>>>> cython-devel mailing list cython-devel at python.org
>>>> http://mail.python.org/mailman/listinfo/cython-devel
>>>
>>> _______________________________________________
>>> cython-devel mailing list
>>> cython-devel at python.org
>>> http://mail.python.org/mailman/listinfo/cython-devel
>>>
>>>
>>
>