From greg.ewing at canterbury.ac.nz  Fri Feb  1 01:11:42 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 01 Feb 2013 13:11:42 +1300
Subject: [Cython] [cython-users] Recommendations for efficient typed
 arrays in Cython?
In-Reply-To: <510AD4B5.9000600@molden.no>
References: <f5cfcb14-ab48-4d3f-abea-e9a581de9df6@googlegroups.com>
	<5107E786.7040202@molden.no>
	<CADiQ+QDAH+WWVVKHhk77dMPvd7htggLWUs=1wGGfmgsjfkdhEg@mail.gmail.com>
	<4cf979ef-8684-4438-912e-4680afb01808@googlegroups.com>
	<510AD4B5.9000600@molden.no>
Message-ID: <510B083E.5030203@canterbury.ac.nz>

On 01/02/13 09:31, Sturla Molden wrote:
> cdef object a
> cdef list b
> cdef foobar c
>
> etc to define Python variables. 'cdef' seems to indicate that it is a C
> declaration, yet here it is not.

Yes, it is. In this context, the cdef isn't about the type of the
variable, it's about where and how it's stored and accessed. The
above declarations result in the generation of C code something like:

    PyObject *a;
    PyListObject *b;
    Foobar *c;

They are then accessed directly by the generated C code.

Without the cdef, these variables would be stored wherever Python
normally stores variables for the relevant scope, which could be
in a module or instance dict, and the usual Python/C API machinery
is used to access them.

Distinguishing between Python and C types would be problematic
anyway, since a PyObject* is both a Python type *and* a C type.

> Neither does this cdef syntax allow us to declare Python int and float
> statically.

I've never found the need to declare a Python int or float
statically, but a way could be provided to access these types
if need be. Maybe Cython has already done this, I don't know.

-- 
Greg

From sturla at molden.no  Fri Feb  1 17:11:44 2013
From: sturla at molden.no (Sturla Molden)
Date: Fri, 01 Feb 2013 17:11:44 +0100
Subject: [Cython] [cython-users] Recommendations for efficient typed
 arrays in Cython?
In-Reply-To: <510B083E.5030203@canterbury.ac.nz>
References: <f5cfcb14-ab48-4d3f-abea-e9a581de9df6@googlegroups.com>
	<5107E786.7040202@molden.no>
	<CADiQ+QDAH+WWVVKHhk77dMPvd7htggLWUs=1wGGfmgsjfkdhEg@mail.gmail.com>
	<4cf979ef-8684-4438-912e-4680afb01808@googlegroups.com>
	<510AD4B5.9000600@molden.no> <510B083E.5030203@canterbury.ac.nz>
Message-ID: <510BE940.3030203@molden.no>

On 01.02.2013 01:11, Greg Ewing wrote:

> Without the cdef, these variables would be stored wherever Python
> normally stores variables for the relevant scope, which could be
> in a module or instance dict, and the usual Python/C API machinery
> is used to access them.

> Distinguishing between Python and C types would be problematic
> anyway, since a PyObject* is both a Python type *and* a C type.

Really?

The way I see it, "object" is a Python type and "PyObject*" is a C type. 
That is, PyObject* is just a raw C pointer with respect to behavior.


Sturla

From greg.ewing at canterbury.ac.nz  Sat Feb  2 01:23:29 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 02 Feb 2013 13:23:29 +1300
Subject: [Cython] [cython-users] Recommendations for efficient typed
 arrays in Cython?
In-Reply-To: <510BE940.3030203@molden.no>
References: <f5cfcb14-ab48-4d3f-abea-e9a581de9df6@googlegroups.com>
	<5107E786.7040202@molden.no>
	<CADiQ+QDAH+WWVVKHhk77dMPvd7htggLWUs=1wGGfmgsjfkdhEg@mail.gmail.com>
	<4cf979ef-8684-4438-912e-4680afb01808@googlegroups.com>
	<510AD4B5.9000600@molden.no> <510B083E.5030203@canterbury.ac.nz>
	<510BE940.3030203@molden.no>
Message-ID: <510C5C81.4060008@canterbury.ac.nz>

Sturla Molden wrote:
> The way I see it, "object" is a Python type and "PyObject*" is a C type. 
> That is, PyObject* is just a raw C pointer with respect to behavior.

Well... while it's possible to declare something as PyObject*
in Cython and get raw pointer behaviour, it's something you
should only do in very rare circumstances, because you're
then totally on your own when it comes to reference counting
and exception handling.

If you're suggesting that 'def object foo' should give Python
reference semantics and 'cdef object foo' raw C pointer
semantics, that would lead to a world of pain.

-- 
Greg

From sturla at molden.no  Mon Feb  4 13:12:56 2013
From: sturla at molden.no (Sturla Molden)
Date: Mon, 04 Feb 2013 13:12:56 +0100
Subject: [Cython] [cython-users] Recommendations for efficient typed
 arrays in Cython?
In-Reply-To: <510C5C81.4060008@canterbury.ac.nz>
References: <f5cfcb14-ab48-4d3f-abea-e9a581de9df6@googlegroups.com>
	<5107E786.7040202@molden.no>
	<CADiQ+QDAH+WWVVKHhk77dMPvd7htggLWUs=1wGGfmgsjfkdhEg@mail.gmail.com>
	<4cf979ef-8684-4438-912e-4680afb01808@googlegroups.com>
	<510AD4B5.9000600@molden.no> <510B083E.5030203@canterbury.ac.nz>
	<510BE940.3030203@molden.no> <510C5C81.4060008@canterbury.ac.nz>
Message-ID: <510FA5C8.2020602@molden.no>

On 02.02.2013 01:23, Greg Ewing wrote:

> If you're suggesting that 'def object foo' should give Python
> reference semantics and 'cdef object foo' raw C pointer
> semantics,

No I was not.

I was suggesting that static declarations of Python and C variables 
should have different keywords.

Because they behave differently e.g. with respect to reference counting, 
it can be confusing to new users. For example I was replying to a Cython 
user who thought anything declared 'cdef' was reference counted. It 
might not be obvious to a new Cython user what can be put in a Python 
list and what can be put in an STL vector.

"cdef" refers to storage in the generated C, not to the semantics of 
Cython. But how and where variables are stored in the generated C is an 
implementation detail. Semantically the difference is between static and 
dynamic variables.


Sturla


From robertwb at gmail.com  Mon Feb  4 23:12:28 2013
From: robertwb at gmail.com (Robert Bradshaw)
Date: Mon, 4 Feb 2013 14:12:28 -0800
Subject: [Cython] [cython-users] Recommendations for efficient typed
 arrays in Cython?
In-Reply-To: <510FA5C8.2020602@molden.no>
References: <f5cfcb14-ab48-4d3f-abea-e9a581de9df6@googlegroups.com>
	<5107E786.7040202@molden.no>
	<CADiQ+QDAH+WWVVKHhk77dMPvd7htggLWUs=1wGGfmgsjfkdhEg@mail.gmail.com>
	<4cf979ef-8684-4438-912e-4680afb01808@googlegroups.com>
	<510AD4B5.9000600@molden.no>
	<510B083E.5030203@canterbury.ac.nz> <510BE940.3030203@molden.no>
	<510C5C81.4060008@canterbury.ac.nz> <510FA5C8.2020602@molden.no>
Message-ID: <CADiQ+QDNgNR1n01sT_YLfcgTKByFt9q0C_xee1wykuA1M+2MOQ@mail.gmail.com>

On Mon, Feb 4, 2013 at 4:12 AM, Sturla Molden <sturla at molden.no> wrote:
> On 02.02.2013 01:23, Greg Ewing wrote:
>
>> If you're suggesting that 'def object foo' should give Python
>> reference semantics and 'cdef object foo' raw C pointer
>> semantics,
>
>
> No I was not.
>
> I was suggesting that static declarations of Python and C variables should
> have different keywords.
>
> Because they behave differently e.g. with respect to reference counting, it
> can be confusing to new users. For example I was replying to a Cython user
> who thought anything declared 'cdef' was reference counted. It might not be
> obvious to a new Cython user what can be put in a Python list and what can
> be put in an STL vector.

I find the distinction obvious: if Python understands it, it can be
put in a Python list. If C++ understands it, it can be put in a STL
container. Of course I'm the antithesis of a "new user." We should at
least be producing obvious errors.

> "cdef" refers to storage in the generated C, not to the semantics of Cython.
> But how and where variables are stored in the generated C is an
> implementation detail. Semantically the difference is between static and
> dynamic variables.

I think reference counting is much more of an implementation detail
than how and where the variables are stored. When using Cython I
hardly ever think about reference counts, it just does the right thing
everywhere for me. From a performance perspective, aside from being
able to manipulate raw C numeric types, one of the most important
features is that functions and variables (both Python and C types) can
be statically rather than dynamically bound, and specifying where it
should be so.

In any case, whether "cdef A a" is reference counted or not depends A
in a straightforward manner (it's refcounted if and only if it can be,
i.e. A is a subclass of object). Forcing the user to choose between
two different forms of "cdef" based on the type of A would be entirely
redundant.

- Robert

From greg.ewing at canterbury.ac.nz  Tue Feb  5 00:17:43 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 05 Feb 2013 12:17:43 +1300
Subject: [Cython] [cython-users] Recommendations for efficient typed
 arrays in Cython?
In-Reply-To: <510FA5C8.2020602@molden.no>
References: <f5cfcb14-ab48-4d3f-abea-e9a581de9df6@googlegroups.com>
	<5107E786.7040202@molden.no>
	<CADiQ+QDAH+WWVVKHhk77dMPvd7htggLWUs=1wGGfmgsjfkdhEg@mail.gmail.com>
	<4cf979ef-8684-4438-912e-4680afb01808@googlegroups.com>
	<510AD4B5.9000600@molden.no> <510B083E.5030203@canterbury.ac.nz>
	<510BE940.3030203@molden.no> <510C5C81.4060008@canterbury.ac.nz>
	<510FA5C8.2020602@molden.no>
Message-ID: <51104197.9030806@canterbury.ac.nz>

Sturla Molden wrote:
> I was replying to a Cython 
> user who thought anything declared 'cdef' was reference counted

That's a matter of user education. We can't use syntax to
address every possible misconception a user might have.

> "cdef" refers to storage in the generated C, not to the semantics of 
> Cython. But how and where variables are stored in the generated C is an 
> implementation detail.

Not entirely -- you can't access a cdef attribute of an
extension type using getattr(), for example. And external
C code only has direct access to cdef variables and
attributes.

-- 
Greg

From roed.math at gmail.com  Tue Feb  5 01:28:45 2013
From: roed.math at gmail.com (David Roe)
Date: Mon, 4 Feb 2013 18:28:45 -0600
Subject: [Cython] Two generators in one function
Message-ID: <CAChs6_ncRJSFStBfPe488_JLD7pd40MbMQ8+Et52ZeVx3g1sWg@mail.gmail.com>

Hi everyone,
I ran into the following problem using Cython 0.17.4 (current version of
Sage).

If you try to compile a file with the following function in it:

def test_double_gen(L):
    a = all(x != 0 for x in L)
    b = all(x != 1 for x in L)
    return a and b

you get errors from the Cython compiler about 'genexpr' being redefined.

Error compiling Cython file:
------------------------------------------------------------
...


def test_double_gen(L):
    a = all(x != 0 for x in L)
    b = all(x != 1 for x in L)
             ^
------------------------------------------------------------

cython_test.pyx:5:14: 'genexpr' already declared

Error compiling Cython file:
------------------------------------------------------------
...


def test_double_gen(L):
    a = all(x != 0 for x in L)
             ^
------------------------------------------------------------

cython_test.pyx:4:14: Previous declaration is here

Error compiling Cython file:
------------------------------------------------------------
...


def test_double_gen(L):
    a = all(x != 0 for x in L)
    b = all(x != 1 for x in L)
             ^
------------------------------------------------------------

cython_test.pyx:5:14: 'genexpr' redeclared

Are you currently only able to use one inline generator pre function?
David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20130204/cf53604c/attachment.html>

From jrobertray at gmail.com  Tue Feb  5 20:56:04 2013
From: jrobertray at gmail.com (J Robert Ray)
Date: Tue, 5 Feb 2013 11:56:04 -0800
Subject: [Cython] SIGSEGV in __Pyx_CyFunction_traverse
Message-ID: <CADb3U=6A6bnDodZVqdKpv_gBz-8bFqwLK_QpRh7Z+1uc9NsVAA@mail.gmail.com>

I was getting a crash during module init of a cython module if a garbage
collection happens between a call to __Pyx_CyFunction_InitDefaults and the
code to populate the defaults.

The attached patch fixes the crash. This bug affects at least Cython 0.18
and 0.17.1.

__Pyx_CyFunction_InitDefaults was not completely zeroing the newly
allocated 'defaults' buffer.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20130205/14123f5e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cython.patch
Type: application/octet-stream
Size: 539 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20130205/14123f5e/attachment.obj>

From Samuele.Kaplun at cern.ch  Thu Feb  7 10:16:28 2013
From: Samuele.Kaplun at cern.ch (Samuele Kaplun)
Date: Thu, 7 Feb 2013 10:16:28 +0100
Subject: [Cython] Possible bug when using cython -Wextra
Message-ID: <1737486.SGKB3DSTKz@pcsk4>

Hello,

I am not sure if this is a bug or it is the intended behaviour, however, 
consider for example this snippet:

[...]
def test():
    cdef int i
    for i from 0 <= i < 10:
        print "foo"
[...]

If I save it into x.pyx and compile it with:

$ cython -Wextra x.pyx

I obtain the warning:
[...]
warning: x.pyx:2:13: Unused entry 'i'
[...]

IMHO, this is a false positive since the i variable is indeed used as a 
counter in the loop. I guess cython considers it unused due to the fact that 
it does not appear on the right hand side of an assignment nor it is further 
used as an argument in a function, isn?t it?

Best regards,
	Samuele
-- 
Samuele Kaplun
Invenio Developer ** <http://invenio-software.org/>

From stefan_ml at behnel.de  Thu Feb  7 12:11:47 2013
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 07 Feb 2013 12:11:47 +0100
Subject: [Cython] Possible bug when using cython -Wextra
In-Reply-To: <1737486.SGKB3DSTKz@pcsk4>
References: <1737486.SGKB3DSTKz@pcsk4>
Message-ID: <51138BF3.6030606@behnel.de>

Samuele Kaplun, 07.02.2013 10:16:
> I am not sure if this is a bug or it is the intended behaviour, however, 
> consider for example this snippet:
> 
> [...]
> def test():
>     cdef int i
>     for i from 0 <= i < 10:
>         print "foo"
> [...]
> 
> If I save it into x.pyx and compile it with:
> 
> $ cython -Wextra x.pyx
> 
> I obtain the warning:
> [...]
> warning: x.pyx:2:13: Unused entry 'i'
> [...]
> 
> IMHO, this is a false positive since the i variable is indeed used as a 
> counter in the loop. I guess cython considers it unused due to the fact that 
> it does not appear on the right hand side of an assignment nor it is further 
> used as an argument in a function, isn?t it?

Yes, it actually is an unused variable in your code. There is no reference
to it, only assignments.

Stefan


From Samuele.Kaplun at cern.ch  Thu Feb  7 13:00:37 2013
From: Samuele.Kaplun at cern.ch (Samuele Kaplun)
Date: Thu, 7 Feb 2013 13:00:37 +0100
Subject: [Cython] Possible bug when using cython -Wextra
In-Reply-To: <51138BF3.6030606@behnel.de>
References: <1737486.SGKB3DSTKz@pcsk4> <51138BF3.6030606@behnel.de>
Message-ID: <1395070.2WGoObUNKK@pcsk4>

Dear Stefan,

In data gioved? 7 febbraio 2013 12:11:47, Stefan Behnel ha scritto:
> > [...]
> > 
> > def test():
> >     cdef int i
> >     
> >     for i from 0 <= i < 10:
> >         print "foo"
> > 
> > [...]
> 
> Yes, it actually is an unused variable in your code. There is no reference
> to it, only assignments.

mmh. But is it used albeit indirectly. Then what pattern would you suggest in 
this case (i.e. to repeat a certain body a given number of times), in order to 
avoid such warning?

Cheers (and thanks for your time!),
	Samuele

-- 
Samuele Kaplun
Invenio Developer ** <http://invenio-software.org/>

From stefan_ml at behnel.de  Thu Feb  7 18:32:59 2013
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 07 Feb 2013 18:32:59 +0100
Subject: [Cython] analyse_types() refactoring
Message-ID: <5113E54B.6020708@behnel.de>

Hi,

I finally found the time to refactor the analysis phase.

https://github.com/cython/cython/commit/f9c385e08401ed96b5b0afb8411480037dc772b9

The methods now return a node, which allows them to replace themselves with
a different implementation.

Note that the relatively large code impact of this change also means that
you might easily run into merge conflicts with your own local changes, so
here's how to fix them. The transformation pattern is pretty straight
forward. The "analyse_types()" method returns "self", unless it wants to
replace itself, i.e. this

    def analyse_types(self, env):
        self.index.analyse_types(env)

becomes

    def analyse_types(self, env):
        self.index = self.index.analyse_types(env)
        return self

The "analyse_target_types()" method works the same, but because it calls
"analyse_types()" internally in most cases, it's more likely to look like this:

    def analyse_target_types(self, env):
        self.analyse_types(env)
        if self.type.is_pyobject:
            self.type = py_object_type

which now turns into this:

    def analyse_target_types(self, env):
        node = self.analyse_types(env)
        if node.type.is_pyobject:
            node.type = py_object_type
        return node

The same pattern obviously applies in the cases where the node needs to be
replaced in "analyse_types()". It would simply build and return a different
node. This also allows for in-place coercions of the current node, for example.

With this change in place, we can now start to clean up old hacks like the
"__class__" replacement in AttributeNode. If anyone wants to give it a try,
please go ahead. :)

Stefan

From markflorisson88 at gmail.com  Thu Feb  7 18:46:22 2013
From: markflorisson88 at gmail.com (mark florisson)
Date: Thu, 7 Feb 2013 11:46:22 -0600
Subject: [Cython] analyse_types() refactoring
In-Reply-To: <5113E54B.6020708@behnel.de>
References: <5113E54B.6020708@behnel.de>
Message-ID: <CANg26EV21_0nkNivJjQf8jqtYMaCPJ2z8viHBfGV+oWUPyKNhQ@mail.gmail.com>

On 7 February 2013 11:32, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Hi,
>
> I finally found the time to refactor the analysis phase.
>
> https://github.com/cython/cython/commit/f9c385e08401ed96b5b0afb8411480037dc772b9
>
> The methods now return a node, which allows them to replace themselves with
> a different implementation.
>
> Note that the relatively large code impact of this change also means that
> you might easily run into merge conflicts with your own local changes, so
> here's how to fix them. The transformation pattern is pretty straight
> forward. The "analyse_types()" method returns "self", unless it wants to
> replace itself, i.e. this
>
>     def analyse_types(self, env):
>         self.index.analyse_types(env)
>
> becomes
>
>     def analyse_types(self, env):
>         self.index = self.index.analyse_types(env)
>         return self
>
> The "analyse_target_types()" method works the same, but because it calls
> "analyse_types()" internally in most cases, it's more likely to look like this:
>
>     def analyse_target_types(self, env):
>         self.analyse_types(env)
>         if self.type.is_pyobject:
>             self.type = py_object_type
>
> which now turns into this:
>
>     def analyse_target_types(self, env):
>         node = self.analyse_types(env)
>         if node.type.is_pyobject:
>             node.type = py_object_type
>         return node
>
> The same pattern obviously applies in the cases where the node needs to be
> replaced in "analyse_types()". It would simply build and return a different
> node. This also allows for in-place coercions of the current node, for example.
>
> With this change in place, we can now start to clean up old hacks like the
> "__class__" replacement in AttributeNode. If anyone wants to give it a try,
> please go ahead. :)
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

What, you didn't like overriding __class__? :) That's great work
Stefan! Do you eventually want to move these methods to a visitor, or
do you want to keep them as methods?

From stefan_ml at behnel.de  Thu Feb  7 18:53:58 2013
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 07 Feb 2013 18:53:58 +0100
Subject: [Cython] analyse_types() refactoring
In-Reply-To: <CANg26EV21_0nkNivJjQf8jqtYMaCPJ2z8viHBfGV+oWUPyKNhQ@mail.gmail.com>
References: <5113E54B.6020708@behnel.de>
	<CANg26EV21_0nkNivJjQf8jqtYMaCPJ2z8viHBfGV+oWUPyKNhQ@mail.gmail.com>
Message-ID: <5113EA36.3080700@behnel.de>

mark florisson, 07.02.2013 18:46:
> On 7 February 2013 11:32, Stefan Behnel wrote:
>> I finally found the time to refactor the analysis phase.
>>
>> https://github.com/cython/cython/commit/f9c385e08401ed96b5b0afb8411480037dc772b9
>>
>> The methods now return a node, which allows them to replace themselves with
>> a different implementation.
>
> Do you eventually want to move these methods to a visitor, or
> do you want to keep them as methods?

I think it makes more sense to keep them as methods. It's not so uncommon
that the order matters in which children are being analysed, and the result
of one child might even impact how another child is being analysed. There's
really a lot of logic in the analysis methods that makes it difficult to
extract a more general visitor pattern.

Stefan


From stefan_ml at behnel.de  Thu Feb  7 19:05:55 2013
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 07 Feb 2013 19:05:55 +0100
Subject: [Cython] Possible bug when using cython -Wextra
In-Reply-To: <1395070.2WGoObUNKK@pcsk4>
References: <1737486.SGKB3DSTKz@pcsk4> <51138BF3.6030606@behnel.de>
	<1395070.2WGoObUNKK@pcsk4>
Message-ID: <5113ED03.5060307@behnel.de>

Samuele Kaplun, 07.02.2013 13:00:
> In data gioved? 7 febbraio 2013 12:11:47, Stefan Behnel ha scritto:
>>> [...]
>>>
>>> def test():
>>>     cdef int i
>>>     
>>>     for i from 0 <= i < 10:
>>>         print "foo"
>>>
>>> [...]
>>
>> Yes, it actually is an unused variable in your code. There is no reference
>> to it, only assignments.
> 
> mmh. But is it used albeit indirectly. Then what pattern would you suggest in 
> this case (i.e. to repeat a certain body a given number of times), in order to 
> avoid such warning?

The normal thing to do in Python would be to use an underscore (i.e. "_")
as variable name. I don't think we currently special case that pattern,
though. Maybe we should.

Or maybe we should just drop the "unused variable" warning for loop
variables as they actually do something and serve a purpose, even if they
are never referenced.

Stefan


From d.s.seljebotn at astro.uio.no  Thu Feb  7 22:04:51 2013
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Thu, 07 Feb 2013 22:04:51 +0100
Subject: [Cython] analyse_types() refactoring
In-Reply-To: <5113E54B.6020708@behnel.de>
References: <5113E54B.6020708@behnel.de>
Message-ID: <511416F3.3050609@astro.uio.no>

On 02/07/2013 06:32 PM, Stefan Behnel wrote:
> Hi,
>
> I finally found the time to refactor the analysis phase.
>
> https://github.com/cython/cython/commit/f9c385e08401ed96b5b0afb8411480037dc772b9
>
> The methods now return a node, which allows them to replace themselves with
> a different implementation.
>
> Note that the relatively large code impact of this change also means that
> you might easily run into merge conflicts with your own local changes, so
> here's how to fix them. The transformation pattern is pretty straight
> forward. The "analyse_types()" method returns "self", unless it wants to
> replace itself, i.e. this
>
>      def analyse_types(self, env):
>          self.index.analyse_types(env)
>
> becomes
>
>      def analyse_types(self, env):
>          self.index = self.index.analyse_types(env)
>          return self

Wohoo!

Dag Sverre

From stefan_ml at behnel.de  Fri Feb  8 00:28:21 2013
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 08 Feb 2013 00:28:21 +0100
Subject: [Cython] Keyword arguments for cdef functions and call for a little
	help
Message-ID: <51143895.9040907@behnel.de>

Hi,

in the current master branch, I implemented support for passing keyword
arguments into cdef functions. The names are mapped statically at compile
time to the names declared in the signature. This means that you can now do
this:

    cdef func(int x, bint flag):
        pass

    func(1, flag=True)

and it will be compiled down into (essentially) this calling code:

    func(1, 1);

Note that optional arguments at the end worked already, so this:

    cdef func(int x, bint flag1=False, bint flag2=True):
        pass

    func(1, flag1=True)

is equivalent to this call:

    func(1, 1, 1);

but you can't (currently) do this:

    func(1, flag2=True)     # error: flag1 is missing!


Obviously, you also can't use keyword arguments for functions that were
declared without argument names, e.g. in a case like this:

    cdef extern from "some_header.h":
        int somefunc(int, char*)


This feature also works for libc declarations, e.g.

    from libc.string cimport strstr

    print(strstr(needle="abc", haystack="xabcy"))

where strstr() is declared as

    cdef extern from "string.h":
        char *strstr (const char *haystack, const char *needle)

(I keep getting the argument order wrong here, so this really makes it
easier to read for me.)

We keep these declarations here in the source tree:

https://github.com/cython/cython/tree/master/Cython/Includes


However, I only now converted the parameter names in these standard
declarations to lower case, they were previously written as upper case
names (i.e. "HAYSTACK" and "NEEDLE"), which is a bit ugly when used as
keyword arguments (but allowed parameter names like "FROM" instead of the
reserved word "from").

Would someone be so kind to go over the standard declarations that we ship
and check that the argument names they are declared with are a) available
and b) proper lower case parameter names, as one would expect them?
Preferably someone who has the glibc headers within reach to look them up
if they are missing? I'm pretty sure that the current names were copied
from there anyway, so most of the declarations should be ok already - but
some may not be, and I'd like to get this straight before people start to
rely on them.

I also noticed that many of the C++ function/method declarations and posix
declarations lack names. It would be nice if someone could add them, too.

I admit that this is a boring and somewhat tedious task, but it would
really help us.

And, obviously, this new feature needs a bit of general testing. :)

Thanks for any help,

Stefan

From dave.hirschfeld at gmail.com  Fri Feb  8 17:54:28 2013
From: dave.hirschfeld at gmail.com (Dave Hirschfeld)
Date: Fri, 8 Feb 2013 16:54:28 +0000 (UTC)
Subject: [Cython] Fused types don't work with cdef classes?
Message-ID: <loom.20130208T174208-532@post.gmane.org>

Is this a bug?
The following code fails to compile on windows VS2012, 32bit Python2.7 with a 
recent 0.19-pre github cython:


cimport cython

ctypedef fused char_or_float:
    cython.char
    cython.float


cdef class FusedExample:
    
    def __init__(self, char_or_float x):
        pass
#

Resulting in the following exception:

C:\temp>C:\dev\bin\Python27\python.exe setup.py build_ext --inplace --
compiler=msvc
Compiling example.pyx because it changed.
Cythonizing example.pyx
running build_ext
building 'example' extension
C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\BIN\cl.exe 
    /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -I. 
    -IC:\dev\code\Gazprom.MT\pricing\gazprom\mt\pricing 
    -IC:\dev\bin\Python27\lib\site-packages\numpy\core\include 
    "-IC:\dev\lib\Intel\Composer XE 2013\mkl\include" 
    -IC:\dev\bin\Python27\include 
    -IC:\dev\bin\Python27\PC 
    /Tp example.cpp 
    /Fobuild\temp.win32-2.7\Release\example.obj
    /EHsc /openmp
    example.cpp
example.cpp(1630) : error C2062: type 'int' unexpected
example.cpp(1630) : error C2143: syntax error : missing ';' before '{'
example.cpp(1630) : error C2447: '{' : missing function header (old-style formal 
list?)
example.cpp(1687) : error C2062: type 'int' unexpected
example.cpp(1687) : error C2143: syntax error : missing ';' before '{'
example.cpp(1687) : error C2447: '{' : missing function header (old-style formal 
list?)
example.cpp(3869) : error C2440: 'initializing' : cannot convert 
        from 'PyObject *(__cdecl *)(PyObject *,PyObject *,PyObject *)' to 
'initproc'
        None of the functions with this name in scope match the target type
error: command '"C:\Program Files (x86)\Microsoft Visual Studio 
11.0\VC\BIN\cl.exe"' 
failed with exit status 2


If the cdef is removed from the class it compiles fine. Are fused types supposed 
to work with cdef classes?


Thanks,
Dave


From stefan_ml at behnel.de  Sat Feb  9 10:44:01 2013
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 09 Feb 2013 10:44:01 +0100
Subject: [Cython] How does a fused function differ from an overloaded
	function?
Message-ID: <51161A61.8020101@behnel.de>

Hi,

I noticed that Cython currently fails to do this:

   cdef int (*int_abs)(int x)
   cdef object py_abs
   py_abs = int_abs = abs

Here, abs() is an overloaded function with a couple of C signatures (fabs()
and friends) and a Python signature (the builtin). While there is code in
NameNode.coerce_to() that figures out that the RHS can be replaced by the
Python builtin, the same is lacking for the general case of overloaded entries.

While working on fixing this problem (and after turning ProxyNode into an
actual node proxy when it comes to coercion), I thought it would be a good
idea to make NameNode generally aware of alternative entries and just build
a new NameNode with the right entry in its coerce_to() method. Then I
noticed that the generic coerce_to() contains this code:

    if src_type.is_fused or dst_type.is_fused:
        # See if we are coercing a fused function to a pointer to a
        # specialized function
        if (src_type.is_cfunction and not dst_type.is_fused and
                dst_type.is_ptr and dst_type.base_type.is_cfunction):

            dst_type = dst_type.base_type
            for signature in src_type.get_all_specialized_function_types():
                if signature.same_as(dst_type):
                    src.type = signature
                    src.entry = src.type.entry
                    src.entry.used = True
                    return self

This is essentially the same idea, just done a bit differently (with the
drawback that it modifies the node in place, which coerce_to() must *never*
do).

So, two questions:

1) why is the above code in the generic coerce_to() method and not in
NameNode? It doesn't seem to do anything sensible for most other nodes,
potentially not even AttributeNode. And it might fail silently when working
on things like CloneNode that don't care about entries. Are there other
nodes where it does what it should?

2) couldn't fused functions be mapped to a set of overloaded functions
(read: entries) before hand, instead of special casing both in places like
this?

Stefan

From dave.hirschfeld at gmail.com  Sat Feb  9 14:03:03 2013
From: dave.hirschfeld at gmail.com (David Hirschfeld)
Date: Sat, 9 Feb 2013 13:03:03 +0000
Subject: [Cython] Fwd: MemoryView.is_f_contig sometimes not defined?
In-Reply-To: <CACGp2_NUH0bcYpF5N9EbzQHu-VzfHsCXvAarSfmV9HnqJ3Lstw@mail.gmail.com>
References: <CACGp2_NUH0bcYpF5N9EbzQHu-VzfHsCXvAarSfmV9HnqJ3Lstw@mail.gmail.com>
Message-ID: <CACGp2_NqMJznPjne9aicR0tFMkYgCsXEda6A0Cdbx9yF=1-Fbg@mail.gmail.com>

Reposting because I think my original got blocked because of
attachments. Apologies if this appears twice.

I want to allow arbitrary C/F contiguous arrays as input to a cdef
class so I can dispatch to a different calculation method in each
case, avoiding a potentially costly copy.
Unfortunately, it appears that cython is generating incorrect code.
The following minimal example reproduces the problem:

cimport cython

cdef class TestContig:

    cdef cython.bint contig

    def __init__(self, double[:,:] y):
        if y.is_c_contig():
            self.contig = 1
        elif y.is_f_contig():
            self.contig = 1
        else:
            self.contig = 0

    property contig:
        def __get__(self):
            return self.contig

#


C:\temp> python setup.py build_ext --inplace
running build_ext
building 'example' extension
C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\BIN\cl.exe /c
/nologo /Ox /MD /W3 /GS- /DNDEBUG -IC:\dev\bin\Python27\include
-IC:\dev\bin\Python27\PC /Tpexample.cpp
/Fobuild\temp.win32-2.7\Release\example.obj
example.cpp
example.cpp(1277) : error C3861: '__pyx_memviewslice_is_f_contig2':
identifier not found
error: command '"C:\Program Files (x86)\Microsoft Visual Studio
11.0\VC\BIN\cl.exe"' failed with exit status 2
C:\temp>


I'm on Windows7 with 32bit Python2.7 and I tested that compilation
fails with both VS2012 & MinGW32 4.6.1.

NB: If you only check for f-contiguity (or c-contiguity) in the method it
compiles fine, it appears that the bug only appears when you test for
both f and c contiguity in the same method.


Thanks,
Dave

From robertwb at gmail.com  Sat Feb  9 22:26:08 2013
From: robertwb at gmail.com (Robert Bradshaw)
Date: Sat, 9 Feb 2013 13:26:08 -0800
Subject: [Cython] analyse_types() refactoring
In-Reply-To: <511416F3.3050609@astro.uio.no>
References: <5113E54B.6020708@behnel.de> <511416F3.3050609@astro.uio.no>
Message-ID: <CADiQ+QBGg9SYVezJ1tPwfZ2a5WxjygLnK4oqqqmedxRsciKZGQ@mail.gmail.com>

On Thu, Feb 7, 2013 at 1:04 PM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 02/07/2013 06:32 PM, Stefan Behnel wrote:
>>
>> Hi,
>>
>> I finally found the time to refactor the analysis phase.
>>
>>
>> https://github.com/cython/cython/commit/f9c385e08401ed96b5b0afb8411480037dc772b9
>>
>> The methods now return a node, which allows them to replace themselves
>> with
>> a different implementation.
>>
>> Note that the relatively large code impact of this change also means that
>> you might easily run into merge conflicts with your own local changes, so
>> here's how to fix them. The transformation pattern is pretty straight
>> forward. The "analyse_types()" method returns "self", unless it wants to
>> replace itself, i.e. this
>>
>>      def analyse_types(self, env):
>>          self.index.analyse_types(env)
>>
>> becomes
>>
>>      def analyse_types(self, env):
>>          self.index = self.index.analyse_types(env)
>>          return self
>
>
> Wohoo!

Yay!

- Robert

From markflorisson88 at gmail.com  Sun Feb 10 03:25:33 2013
From: markflorisson88 at gmail.com (mark florisson)
Date: Sat, 9 Feb 2013 20:25:33 -0600
Subject: [Cython] How does a fused function differ from an overloaded
	function?
In-Reply-To: <51161A61.8020101@behnel.de>
References: <51161A61.8020101@behnel.de>
Message-ID: <CANg26EXr0CeBkdB3-gq5kf9KOEXFekZPRFsOwRRhWtxTSeU8QA@mail.gmail.com>

On 9 February 2013 03:44, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Hi,
>
> I noticed that Cython currently fails to do this:
>
>    cdef int (*int_abs)(int x)
>    cdef object py_abs
>    py_abs = int_abs = abs
>
> Here, abs() is an overloaded function with a couple of C signatures (fabs()
> and friends) and a Python signature (the builtin). While there is code in
> NameNode.coerce_to() that figures out that the RHS can be replaced by the
> Python builtin, the same is lacking for the general case of overloaded entries.
>
> While working on fixing this problem (and after turning ProxyNode into an
> actual node proxy when it comes to coercion), I thought it would be a good
> idea to make NameNode generally aware of alternative entries and just build
> a new NameNode with the right entry in its coerce_to() method. Then I
> noticed that the generic coerce_to() contains this code:
>
>     if src_type.is_fused or dst_type.is_fused:
>         # See if we are coercing a fused function to a pointer to a
>         # specialized function
>         if (src_type.is_cfunction and not dst_type.is_fused and
>                 dst_type.is_ptr and dst_type.base_type.is_cfunction):
>
>             dst_type = dst_type.base_type
>             for signature in src_type.get_all_specialized_function_types():
>                 if signature.same_as(dst_type):
>                     src.type = signature
>                     src.entry = src.type.entry
>                     src.entry.used = True
>                     return self
>
> This is essentially the same idea, just done a bit differently (with the
> drawback that it modifies the node in place, which coerce_to() must *never*
> do).
>
> So, two questions:
>
> 1) why is the above code in the generic coerce_to() method and not in
> NameNode? It doesn't seem to do anything sensible for most other nodes,
> potentially not even AttributeNode. And it might fail silently when working
> on things like CloneNode that don't care about entries. Are there other
> nodes where it does what it should?

I think it works for names and attributes, it allows you to retrieve a
specialized version of the fused c(p)def functions and methods.

> 2) couldn't fused functions be mapped to a set of overloaded functions
> (read: entries) before hand, instead of special casing both in places like
> this?

Quite possibly, although I'd have to dig in the codebase some more to
verify that. You can give it a try, it'd be nice to unify the
approaches under the same model.

> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

From markflorisson88 at gmail.com  Sun Feb 10 03:41:12 2013
From: markflorisson88 at gmail.com (mark florisson)
Date: Sat, 9 Feb 2013 20:41:12 -0600
Subject: [Cython] Fwd: MemoryView.is_f_contig sometimes not defined?
In-Reply-To: <CACGp2_NqMJznPjne9aicR0tFMkYgCsXEda6A0Cdbx9yF=1-Fbg@mail.gmail.com>
References: <CACGp2_NUH0bcYpF5N9EbzQHu-VzfHsCXvAarSfmV9HnqJ3Lstw@mail.gmail.com>
	<CACGp2_NqMJznPjne9aicR0tFMkYgCsXEda6A0Cdbx9yF=1-Fbg@mail.gmail.com>
Message-ID: <CANg26EVr12x1-sJSLTjMua1ZuJNynXASZ1H2VLxuKnsnWw_JnQ@mail.gmail.com>

On 9 February 2013 07:03, David Hirschfeld <dave.hirschfeld at gmail.com> wrote:
> Reposting because I think my original got blocked because of
> attachments. Apologies if this appears twice.
>
> I want to allow arbitrary C/F contiguous arrays as input to a cdef
> class so I can dispatch to a different calculation method in each
> case, avoiding a potentially costly copy.
> Unfortunately, it appears that cython is generating incorrect code.
> The following minimal example reproduces the problem:
>
> cimport cython
>
> cdef class TestContig:
>
>     cdef cython.bint contig
>
>     def __init__(self, double[:,:] y):
>         if y.is_c_contig():
>             self.contig = 1
>         elif y.is_f_contig():
>             self.contig = 1
>         else:
>             self.contig = 0
>
>     property contig:
>         def __get__(self):
>             return self.contig
>
> #
>
>
> C:\temp> python setup.py build_ext --inplace
> running build_ext
> building 'example' extension
> C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\BIN\cl.exe /c
> /nologo /Ox /MD /W3 /GS- /DNDEBUG -IC:\dev\bin\Python27\include
> -IC:\dev\bin\Python27\PC /Tpexample.cpp
> /Fobuild\temp.win32-2.7\Release\example.obj
> example.cpp
> example.cpp(1277) : error C3861: '__pyx_memviewslice_is_f_contig2':
> identifier not found
> error: command '"C:\Program Files (x86)\Microsoft Visual Studio
> 11.0\VC\BIN\cl.exe"' failed with exit status 2
> C:\temp>
>
>
> I'm on Windows7 with 32bit Python2.7 and I tested that compilation
> fails with both VS2012 & MinGW32 4.6.1.
>
> NB: If you only check for f-contiguity (or c-contiguity) in the method it
> compiles fine, it appears that the bug only appears when you test for
> both f and c contiguity in the same method.
>
>
> Thanks,
> Dave
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

Thanks, Cython seems to think (for some reason) that the second
function is the same as the first and omits the definition.

From stefan_ml at behnel.de  Sun Feb 10 06:56:04 2013
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 10 Feb 2013 06:56:04 +0100
Subject: [Cython] How does a fused function differ from an overloaded
 function?
In-Reply-To: <CANg26EXr0CeBkdB3-gq5kf9KOEXFekZPRFsOwRRhWtxTSeU8QA@mail.gmail.com>
References: <51161A61.8020101@behnel.de>
	<CANg26EXr0CeBkdB3-gq5kf9KOEXFekZPRFsOwRRhWtxTSeU8QA@mail.gmail.com>
Message-ID: <51173674.1070706@behnel.de>

mark florisson, 10.02.2013 03:25:
> On 9 February 2013 03:44, Stefan Behnel <stefan_ml at behnel.de> wrote:
>> Hi,
>>
>> I noticed that Cython currently fails to do this:
>>
>>    cdef int (*int_abs)(int x)
>>    cdef object py_abs
>>    py_abs = int_abs = abs
>>
>> Here, abs() is an overloaded function with a couple of C signatures (fabs()
>> and friends) and a Python signature (the builtin). While there is code in
>> NameNode.coerce_to() that figures out that the RHS can be replaced by the
>> Python builtin, the same is lacking for the general case of overloaded entries.
>>
>> While working on fixing this problem (and after turning ProxyNode into an
>> actual node proxy when it comes to coercion), I thought it would be a good
>> idea to make NameNode generally aware of alternative entries and just build
>> a new NameNode with the right entry in its coerce_to() method. Then I
>> noticed that the generic coerce_to() contains this code:
>>
>>     if src_type.is_fused or dst_type.is_fused:
>>         # See if we are coercing a fused function to a pointer to a
>>         # specialized function
>>         if (src_type.is_cfunction and not dst_type.is_fused and
>>                 dst_type.is_ptr and dst_type.base_type.is_cfunction):
>>
>>             dst_type = dst_type.base_type
>>             for signature in src_type.get_all_specialized_function_types():
>>                 if signature.same_as(dst_type):
>>                     src.type = signature
>>                     src.entry = src.type.entry
>>                     src.entry.used = True
>>                     return self
>>
>> This is essentially the same idea, just done a bit differently (with the
>> drawback that it modifies the node in place, which coerce_to() must *never*
>> do).
>>
>> So, two questions:
>>
>> 1) why is the above code in the generic coerce_to() method and not in
>> NameNode? It doesn't seem to do anything sensible for most other nodes,
>> potentially not even AttributeNode. And it might fail silently when working
>> on things like CloneNode that don't care about entries. Are there other
>> nodes where it does what it should?
> 
> I think it works for names and attributes, it allows you to retrieve a
> specialized version of the fused c(p)def functions and methods.

That's what I figured. I might have to take a look at AttributeNode a bit
more to see if it really does the right thing in all cases.

I would like to avoid having this in the generic coerce_to() method because
if it's anything but a NameNode or AttributeNode, it can only have one type
(unless I'm missing something), so coercion to different signatures won't
be possible anyway. And I wouldn't mind letting the above two nodes share a
bit more code, in one way or another.

I also think that the idea of having a ProxyNode for reuse was quite right.
I've started playing with it a little to let it support coercion
delegation, i.e. it would have it's own coerce_to() method that builds
CloneNodes at need and coerces either directly its argument or the
CloneNode to the target type, depending on is_simple() and maybe other
criteria.


>> 2) couldn't fused functions be mapped to a set of overloaded functions
>> (read: entries) before hand, instead of special casing both in places like
>> this?
> 
> Quite possibly, although I'd have to dig in the codebase some more to
> verify that. You can give it a try, it'd be nice to unify the
> approaches under the same model.

What I would like to see, eventually, is that NameNode basically just looks
up its entry on type analysis (including all overloaded entries), and then
whatever uses the node (to call or assign it) would pass in the right
signature/type into its coerce_to() method, which would then select the
right entry and return a new NameNode for it (or fail if the signature
can't be matched to any entry).

AttributeNode would essentially do the same thing, just return either an
AttributeNode or a NameNode on type analysis and/or coercion, depending on
what entry it finds (and if more than one).

Does this sound like it could work for fused types?

Stefan


From markflorisson88 at gmail.com  Sun Feb 10 16:11:57 2013
From: markflorisson88 at gmail.com (mark florisson)
Date: Sun, 10 Feb 2013 09:11:57 -0600
Subject: [Cython] How does a fused function differ from an overloaded
	function?
In-Reply-To: <51173674.1070706@behnel.de>
References: <51161A61.8020101@behnel.de>
	<CANg26EXr0CeBkdB3-gq5kf9KOEXFekZPRFsOwRRhWtxTSeU8QA@mail.gmail.com>
	<51173674.1070706@behnel.de>
Message-ID: <CANg26EWPY=BeQ0r6jG2p4RQTO03UpYp31Ps2Z1+69iqPxcZwXQ@mail.gmail.com>

On 9 February 2013 23:56, Stefan Behnel <stefan_ml at behnel.de> wrote:
> mark florisson, 10.02.2013 03:25:
>> On 9 February 2013 03:44, Stefan Behnel <stefan_ml at behnel.de> wrote:
>>> Hi,
>>>
>>> I noticed that Cython currently fails to do this:
>>>
>>>    cdef int (*int_abs)(int x)
>>>    cdef object py_abs
>>>    py_abs = int_abs = abs
>>>
>>> Here, abs() is an overloaded function with a couple of C signatures (fabs()
>>> and friends) and a Python signature (the builtin). While there is code in
>>> NameNode.coerce_to() that figures out that the RHS can be replaced by the
>>> Python builtin, the same is lacking for the general case of overloaded entries.
>>>
>>> While working on fixing this problem (and after turning ProxyNode into an
>>> actual node proxy when it comes to coercion), I thought it would be a good
>>> idea to make NameNode generally aware of alternative entries and just build
>>> a new NameNode with the right entry in its coerce_to() method. Then I
>>> noticed that the generic coerce_to() contains this code:
>>>
>>>     if src_type.is_fused or dst_type.is_fused:
>>>         # See if we are coercing a fused function to a pointer to a
>>>         # specialized function
>>>         if (src_type.is_cfunction and not dst_type.is_fused and
>>>                 dst_type.is_ptr and dst_type.base_type.is_cfunction):
>>>
>>>             dst_type = dst_type.base_type
>>>             for signature in src_type.get_all_specialized_function_types():
>>>                 if signature.same_as(dst_type):
>>>                     src.type = signature
>>>                     src.entry = src.type.entry
>>>                     src.entry.used = True
>>>                     return self
>>>
>>> This is essentially the same idea, just done a bit differently (with the
>>> drawback that it modifies the node in place, which coerce_to() must *never*
>>> do).
>>>
>>> So, two questions:
>>>
>>> 1) why is the above code in the generic coerce_to() method and not in
>>> NameNode? It doesn't seem to do anything sensible for most other nodes,
>>> potentially not even AttributeNode. And it might fail silently when working
>>> on things like CloneNode that don't care about entries. Are there other
>>> nodes where it does what it should?
>>
>> I think it works for names and attributes, it allows you to retrieve a
>> specialized version of the fused c(p)def functions and methods.
>
> That's what I figured. I might have to take a look at AttributeNode a bit
> more to see if it really does the right thing in all cases.
>
> I would like to avoid having this in the generic coerce_to() method because
> if it's anything but a NameNode or AttributeNode, it can only have one type
> (unless I'm missing something), so coercion to different signatures won't
> be possible anyway. And I wouldn't mind letting the above two nodes share a
> bit more code, in one way or another.
>
> I also think that the idea of having a ProxyNode for reuse was quite right.
> I've started playing with it a little to let it support coercion
> delegation, i.e. it would have it's own coerce_to() method that builds
> CloneNodes at need and coerces either directly its argument or the
> CloneNode to the target type, depending on is_simple() and maybe other
> criteria.
>
>
>>> 2) couldn't fused functions be mapped to a set of overloaded functions
>>> (read: entries) before hand, instead of special casing both in places like
>>> this?
>>
>> Quite possibly, although I'd have to dig in the codebase some more to
>> verify that. You can give it a try, it'd be nice to unify the
>> approaches under the same model.
>
> What I would like to see, eventually, is that NameNode basically just looks
> up its entry on type analysis (including all overloaded entries), and then
> whatever uses the node (to call or assign it) would pass in the right
> signature/type into its coerce_to() method, which would then select the
> right entry and return a new NameNode for it (or fail if the signature
> can't be matched to any entry).
>
> AttributeNode would essentially do the same thing, just return either an
> AttributeNode or a NameNode on type analysis and/or coercion, depending on
> what entry it finds (and if more than one).
>
> Does this sound like it could work for fused types?

It sounds this approach might be cleaner than catching this in a
global coercion, but on the other hand you want full generality. For
instance, there is also the cast syntax that can specialize a
function. Or I might have a pointer to a known fused function or
method, that I want to deference and specialize.

Maybe we need a nicer way to deal with and register coercions, and
with what an assignment expects and a value generates. A lot of
assignment code seems similar but slightly different in tricky ways.

> Stefan
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

From robertwb at gmail.com  Wed Feb 13 09:07:45 2013
From: robertwb at gmail.com (Robert Bradshaw)
Date: Wed, 13 Feb 2013 00:07:45 -0800
Subject: [Cython] Fused types don't work with cdef classes?
In-Reply-To: <loom.20130208T174208-532@post.gmane.org>
References: <loom.20130208T174208-532@post.gmane.org>
Message-ID: <CADiQ+QDF1nvxu0DQtW-vfMd-dQK42m+y+HBazy2vVvmk6s0E_g@mail.gmail.com>

Yes, this is a bug; there is a bad interaction between fused types and
special methods.

I created http://trac.cython.org/cython_trac/ticket/802

On Fri, Feb 8, 2013 at 8:54 AM, Dave Hirschfeld
<dave.hirschfeld at gmail.com> wrote:
> Is this a bug?
> The following code fails to compile on windows VS2012, 32bit Python2.7 with a
> recent 0.19-pre github cython:
>
>
> cimport cython
>
> ctypedef fused char_or_float:
>     cython.char
>     cython.float
>
>
> cdef class FusedExample:
>
>     def __init__(self, char_or_float x):
>         pass
> #
>
> Resulting in the following exception:
>
> C:\temp>C:\dev\bin\Python27\python.exe setup.py build_ext --inplace --
> compiler=msvc
> Compiling example.pyx because it changed.
> Cythonizing example.pyx
> running build_ext
> building 'example' extension
> C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\BIN\cl.exe
>     /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -I.
>     -IC:\dev\code\Gazprom.MT\pricing\gazprom\mt\pricing
>     -IC:\dev\bin\Python27\lib\site-packages\numpy\core\include
>     "-IC:\dev\lib\Intel\Composer XE 2013\mkl\include"
>     -IC:\dev\bin\Python27\include
>     -IC:\dev\bin\Python27\PC
>     /Tp example.cpp
>     /Fobuild\temp.win32-2.7\Release\example.obj
>     /EHsc /openmp
>     example.cpp
> example.cpp(1630) : error C2062: type 'int' unexpected
> example.cpp(1630) : error C2143: syntax error : missing ';' before '{'
> example.cpp(1630) : error C2447: '{' : missing function header (old-style formal
> list?)
> example.cpp(1687) : error C2062: type 'int' unexpected
> example.cpp(1687) : error C2143: syntax error : missing ';' before '{'
> example.cpp(1687) : error C2447: '{' : missing function header (old-style formal
> list?)
> example.cpp(3869) : error C2440: 'initializing' : cannot convert
>         from 'PyObject *(__cdecl *)(PyObject *,PyObject *,PyObject *)' to
> 'initproc'
>         None of the functions with this name in scope match the target type
> error: command '"C:\Program Files (x86)\Microsoft Visual Studio
> 11.0\VC\BIN\cl.exe"'
> failed with exit status 2
>
>
> If the cdef is removed from the class it compiles fine. Are fused types supposed
> to work with cdef classes?
>
>
> Thanks,
> Dave
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

From dave.hirschfeld at gmail.com  Wed Feb 13 09:16:07 2013
From: dave.hirschfeld at gmail.com (Dave Hirschfeld)
Date: Wed, 13 Feb 2013 08:16:07 +0000 (UTC)
Subject: [Cython] Fused types don't work with cdef classes?
References: <loom.20130208T174208-532@post.gmane.org>
	<CADiQ+QDF1nvxu0DQtW-vfMd-dQK42m+y+HBazy2vVvmk6s0E_g@mail.gmail.com>
Message-ID: <loom.20130213T091058-925@post.gmane.org>

Robert Bradshaw <robertwb at ...> writes:

> 
> Yes, this is a bug; there is a bad interaction between fused types and
> special methods.
> 
> I created http://trac.cython.org/cython_trac/ticket/802
> 

Thanks for following up. My actual use-case was to allow either 1D or 2D
MemoryView inputs to a function by simply transforming the 1D MemoryView
to a column vector.

For now the workaround is to simply disallow 1D inputs, but it would be nice to 
have it working.

Thanks,
Dave


From stefan_ml at behnel.de  Wed Feb 13 09:30:22 2013
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 13 Feb 2013 09:30:22 +0100
Subject: [Cython] Fused types don't work with cdef classes?
In-Reply-To: <loom.20130213T091058-925@post.gmane.org>
References: <loom.20130208T174208-532@post.gmane.org>
	<CADiQ+QDF1nvxu0DQtW-vfMd-dQK42m+y+HBazy2vVvmk6s0E_g@mail.gmail.com>
	<loom.20130213T091058-925@post.gmane.org>
Message-ID: <511B4F1E.70409@behnel.de>

Dave Hirschfeld, 13.02.2013 09:16:
> Robert Bradshaw writes:
>> Yes, this is a bug; there is a bad interaction between fused types and
>> special methods.
>>
>> I created http://trac.cython.org/cython_trac/ticket/802
>>
> 
> Thanks for following up. My actual use-case was to allow either 1D or 2D
> MemoryView inputs to a function by simply transforming the 1D MemoryView
> to a column vector.
> 
> For now the workaround is to simply disallow 1D inputs, but it would be nice to 
> have it working.

Depending on what the rest of your code looks like, a work-around might be
to move the code from __init__() to a separate cdef function and just call
that.

Stefan


From d.s.seljebotn at astro.uio.no  Wed Feb 13 20:04:44 2013
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Wed, 13 Feb 2013 20:04:44 +0100
Subject: [Cython] cldoc
Message-ID: <511BE3CC.6050903@astro.uio.no>

Just a heads up about this project; there's bound to be something useful 
there for auto-wrapping.

http://jessevdk.github.com/cldoc/

Dag Sverre

From robertwb at gmail.com  Thu Feb 14 05:49:46 2013
From: robertwb at gmail.com (Robert Bradshaw)
Date: Wed, 13 Feb 2013 20:49:46 -0800
Subject: [Cython] SIGSEGV in __Pyx_CyFunction_traverse
In-Reply-To: <CADb3U=6A6bnDodZVqdKpv_gBz-8bFqwLK_QpRh7Z+1uc9NsVAA@mail.gmail.com>
References: <CADb3U=6A6bnDodZVqdKpv_gBz-8bFqwLK_QpRh7Z+1uc9NsVAA@mail.gmail.com>
Message-ID: <CADiQ+QAGOy0i8npGfwzOw7kDfsKxPNCLVQ=+r-Dr=JPrcBFAdg@mail.gmail.com>

Thanks.

On Tue, Feb 5, 2013 at 11:56 AM, J Robert Ray <jrobertray at gmail.com> wrote:
> I was getting a crash during module init of a cython module if a garbage
> collection happens between a call to __Pyx_CyFunction_InitDefaults and the
> code to populate the defaults.
>
> The attached patch fixes the crash. This bug affects at least Cython 0.18
> and 0.17.1.
>
> __Pyx_CyFunction_InitDefaults was not completely zeroing the newly allocated
> 'defaults' buffer.
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From robertwb at gmail.com  Thu Feb 14 06:08:53 2013
From: robertwb at gmail.com (Robert Bradshaw)
Date: Wed, 13 Feb 2013 21:08:53 -0800
Subject: [Cython] Possible bug when using cython -Wextra
In-Reply-To: <5113ED03.5060307@behnel.de>
References: <1737486.SGKB3DSTKz@pcsk4> <51138BF3.6030606@behnel.de>
	<1395070.2WGoObUNKK@pcsk4> <5113ED03.5060307@behnel.de>
Message-ID: <CADiQ+QBPutA0=m3SmmdqxH10Q4oCVROxTR8WDjKZvm62g36Cbw@mail.gmail.com>

On Thu, Feb 7, 2013 at 10:05 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Samuele Kaplun, 07.02.2013 13:00:
>> In data gioved? 7 febbraio 2013 12:11:47, Stefan Behnel ha scritto:
>>>> [...]
>>>>
>>>> def test():
>>>>     cdef int i
>>>>
>>>>     for i from 0 <= i < 10:
>>>>         print "foo"
>>>>
>>>> [...]
>>>
>>> Yes, it actually is an unused variable in your code. There is no reference
>>> to it, only assignments.
>>
>> mmh. But is it used albeit indirectly. Then what pattern would you suggest in
>> this case (i.e. to repeat a certain body a given number of times), in order to
>> avoid such warning?
>
> The normal thing to do in Python would be to use an underscore (i.e. "_")
> as variable name. I don't think we currently special case that pattern,
> though. Maybe we should.

I agree. Done.

> Or maybe we should just drop the "unused variable" warning for loop
> variables as they actually do something and serve a purpose, even if they
> are never referenced.

+1 to this too. (Not yet done.)

- Robert

From robertwb at gmail.com  Thu Feb 14 06:29:48 2013
From: robertwb at gmail.com (Robert Bradshaw)
Date: Wed, 13 Feb 2013 21:29:48 -0800
Subject: [Cython] Two generators in one function
In-Reply-To: <CAChs6_ncRJSFStBfPe488_JLD7pd40MbMQ8+Et52ZeVx3g1sWg@mail.gmail.com>
References: <CAChs6_ncRJSFStBfPe488_JLD7pd40MbMQ8+Et52ZeVx3g1sWg@mail.gmail.com>
Message-ID: <CADiQ+QDV2ATby6OisF8EDQw32uxjqQadrM-KXUwNLsG=_Gw7FQ@mail.gmail.com>

This is due to the archaic --disable-function-redefinition flag.

On Mon, Feb 4, 2013 at 4:28 PM, David Roe <roed.math at gmail.com> wrote:
> Hi everyone,
> I ran into the following problem using Cython 0.17.4 (current version of
> Sage).
>
> If you try to compile a file with the following function in it:
>
> def test_double_gen(L):
>     a = all(x != 0 for x in L)
>     b = all(x != 1 for x in L)
>     return a and b
>
> you get errors from the Cython compiler about 'genexpr' being redefined.
>
> Error compiling Cython file:
> ------------------------------------------------------------
> ...
>
>
> def test_double_gen(L):
>     a = all(x != 0 for x in L)
>     b = all(x != 1 for x in L)
>              ^
> ------------------------------------------------------------
>
> cython_test.pyx:5:14: 'genexpr' already declared
>
> Error compiling Cython file:
> ------------------------------------------------------------
> ...
>
>
> def test_double_gen(L):
>     a = all(x != 0 for x in L)
>              ^
> ------------------------------------------------------------
>
> cython_test.pyx:4:14: Previous declaration is here
>
> Error compiling Cython file:
> ------------------------------------------------------------
> ...
>
>
> def test_double_gen(L):
>     a = all(x != 0 for x in L)
>     b = all(x != 1 for x in L)
>              ^
> ------------------------------------------------------------
>
> cython_test.pyx:5:14: 'genexpr' redeclared
>
> Are you currently only able to use one inline generator pre function?
> David
>
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

From roed.math at gmail.com  Thu Feb 14 06:35:04 2013
From: roed.math at gmail.com (David Roe)
Date: Wed, 13 Feb 2013 22:35:04 -0700
Subject: [Cython] Two generators in one function
In-Reply-To: <CADiQ+QDV2ATby6OisF8EDQw32uxjqQadrM-KXUwNLsG=_Gw7FQ@mail.gmail.com>
References: <CAChs6_ncRJSFStBfPe488_JLD7pd40MbMQ8+Et52ZeVx3g1sWg@mail.gmail.com>
	<CADiQ+QDV2ATby6OisF8EDQw32uxjqQadrM-KXUwNLsG=_Gw7FQ@mail.gmail.com>
Message-ID: <CAChs6_mqsTFT4GzzzZEmbZfU7U92FK4gkz6LXJO+46rH3KDMiw@mail.gmail.com>

Thanks.
David


On Wed, Feb 13, 2013 at 10:29 PM, Robert Bradshaw <robertwb at gmail.com>wrote:

> This is due to the archaic --disable-function-redefinition flag.
>
> On Mon, Feb 4, 2013 at 4:28 PM, David Roe <roed.math at gmail.com> wrote:
> > Hi everyone,
> > I ran into the following problem using Cython 0.17.4 (current version of
> > Sage).
> >
> > If you try to compile a file with the following function in it:
> >
> > def test_double_gen(L):
> >     a = all(x != 0 for x in L)
> >     b = all(x != 1 for x in L)
> >     return a and b
> >
> > you get errors from the Cython compiler about 'genexpr' being redefined.
> >
> > Error compiling Cython file:
> > ------------------------------------------------------------
> > ...
> >
> >
> > def test_double_gen(L):
> >     a = all(x != 0 for x in L)
> >     b = all(x != 1 for x in L)
> >              ^
> > ------------------------------------------------------------
> >
> > cython_test.pyx:5:14: 'genexpr' already declared
> >
> > Error compiling Cython file:
> > ------------------------------------------------------------
> > ...
> >
> >
> > def test_double_gen(L):
> >     a = all(x != 0 for x in L)
> >              ^
> > ------------------------------------------------------------
> >
> > cython_test.pyx:4:14: Previous declaration is here
> >
> > Error compiling Cython file:
> > ------------------------------------------------------------
> > ...
> >
> >
> > def test_double_gen(L):
> >     a = all(x != 0 for x in L)
> >     b = all(x != 1 for x in L)
> >              ^
> > ------------------------------------------------------------
> >
> > cython_test.pyx:5:14: 'genexpr' redeclared
> >
> > Are you currently only able to use one inline generator pre function?
> > David
> >
> >
> > _______________________________________________
> > cython-devel mailing list
> > cython-devel at python.org
> > http://mail.python.org/mailman/listinfo/cython-devel
> >
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20130213/76cfec16/attachment.html>

From stefan_ml at behnel.de  Thu Feb 14 08:01:25 2013
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 14 Feb 2013 08:01:25 +0100
Subject: [Cython] [cython-users] Re: Python 3 and string frustration
In-Reply-To: <CADiQ+QDL=fF9RD0U=sNsBzrrLgPs3G=JRPp_12BDmU4eQZRirA@mail.gmail.com>
References: <CAJQnXJd+_2T8c5CNZJdb1Lx6OPwiijBGz3V7AZgbv+AZ21L0SA@mail.gmail.com>
	<c7fe54a4-e6d5-496d-8757-b291fdfeebbe@googlegroups.com>
	<510F7512.7070101@behnel.de>
	<c38a6b9a-756d-4a6b-8c13-256c1f627fc5@googlegroups.com>
	<CADiQ+QDL=fF9RD0U=sNsBzrrLgPs3G=JRPp_12BDmU4eQZRirA@mail.gmail.com>
Message-ID: <511C8BC5.7050404@behnel.de>

Robert Bradshaw, 14.02.2013 06:51:
> I've proposed having a compiler
> directive that lets you specify an encoding (e.g. ascii, utf8) and
> automatically endodes/decodes when converting between C and Python
> strings.

My main objection against that is that it would only work in one direction,
from C strings to Python strings. The other direction requires an explicit
intermediate bytes object in order to correctly do the memory management,
so there's really nothing to win there. Doing anything implicit in that
direction would just call for either trouble or inefficiency.

For the first direction, C-to-Python, I don't see the major advantage
between the implicit

    cdef unicode py_string = c_string      # typing required here

and the explicit

    py_string = c_string.decode('utf-8')   # note: no typing here

There is only one case where it's a bit simpler:

    py_string = c_string[:length]          # no typing, auto-coercion

in contrast to

    py_string = c_string[:length].decode('utf-8')

Anyway, it's just a couple of characters difference, which are best hidden
in an explicit "conversion + validation" function anyway. Auto-coercion of
C strings will always be more inefficient and error prone than users should
be asked to bare, and all we could add would only be the unidirectional
conversion part, not any validation or whatever user code has to do in
addition.


The situation is entirely different for C++ strings. They have an efficient
two-way auto-coercion and safely copy their content on creation. In their
case, auto-coercion would basically behave like

    from __future__ import unicode_literals

but for string coercion. I have no objections against that. I think it just
needs implementing and then testing against a couple of real, existing code
bases to see what the real-world tradeoff is then. It's just a matter of
whether a user needs to write "<unicode>" or "<bytes>" in the right places.


All of that being said, the proposal sounds like it's actually two: 1)
specify an implicit encoding for coercion between C++ strings and Python
unicode strings, and 2) automatically coerce between C++ strings and Python
unicode strings by default. 1) means that

    cdef libcpp.string cs1 = ..., cs2

    py_string = <unicode>cs1
    cs2 = py_string

would auto-decode and -encode the string, 2) means that

    cdef libcpp.string cs1 = ..., cs2

    py_string = <object>cs1
    cs2 = py_string

would do it (including any implicit coercions to Python objects). If 2) is
desirable at all, I think it makes sense to fold that into two separate
directives, as many users will be better off without the second one.


There's also the question whether you want coercion to and from "unicode"
or to and from "str". Getting the latter right wouldn't be easy, most
likely neither for us nor for users who want to apply it to their code.
However, given that the only use case for that would be Py2 backwards
compatibility, waiting a couple of years longer should nicely solve this
problem for us. No need to burden the compiler with it now.

Stefan


From markflorisson88 at gmail.com  Thu Feb 14 18:32:19 2013
From: markflorisson88 at gmail.com (mark florisson)
Date: Thu, 14 Feb 2013 17:32:19 +0000
Subject: [Cython] Possible bug when using cython -Wextra
In-Reply-To: <CADiQ+QBPutA0=m3SmmdqxH10Q4oCVROxTR8WDjKZvm62g36Cbw@mail.gmail.com>
References: <1737486.SGKB3DSTKz@pcsk4> <51138BF3.6030606@behnel.de>
	<1395070.2WGoObUNKK@pcsk4> <5113ED03.5060307@behnel.de>
	<CADiQ+QBPutA0=m3SmmdqxH10Q4oCVROxTR8WDjKZvm62g36Cbw@mail.gmail.com>
Message-ID: <CANg26EW6DhBOjrCh165pxfPBN9UVZB0KvPKqkgVMeM=6Rzdb1w@mail.gmail.com>

On 14 February 2013 05:08, Robert Bradshaw <robertwb at gmail.com> wrote:
> On Thu, Feb 7, 2013 at 10:05 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
>> Samuele Kaplun, 07.02.2013 13:00:
>>> In data gioved? 7 febbraio 2013 12:11:47, Stefan Behnel ha scritto:
>>>>> [...]
>>>>>
>>>>> def test():
>>>>>     cdef int i
>>>>>
>>>>>     for i from 0 <= i < 10:
>>>>>         print "foo"
>>>>>
>>>>> [...]
>>>>
>>>> Yes, it actually is an unused variable in your code. There is no reference
>>>> to it, only assignments.
>>>
>>> mmh. But is it used albeit indirectly. Then what pattern would you suggest in
>>> this case (i.e. to repeat a certain body a given number of times), in order to
>>> avoid such warning?
>>
>> The normal thing to do in Python would be to use an underscore (i.e. "_")
>> as variable name. I don't think we currently special case that pattern,
>> though. Maybe we should.
>
> I agree. Done.
>
>> Or maybe we should just drop the "unused variable" warning for loop
>> variables as they actually do something and serve a purpose, even if they
>> are never referenced.
>
> +1 to this too. (Not yet done.)

Yeah, I think that's the sanest thing. I already implemented this in
Numba which bases its control flow on Cython's (because it's awesome,
thanks to Vitja :)). It simply adds a flag to NameAssignment which is
set for the ForNode's target variable.

> - Robert
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

From stefan_ml at behnel.de  Fri Feb 15 08:53:19 2013
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 15 Feb 2013 08:53:19 +0100
Subject: [Cython] Fwd: A new webpage promoting Compiler technology for
	CPython
In-Reply-To: <3E96A7DD-C8A2-47FB-89C4-D18EB7AEF018@gmail.com>
References: <3E96A7DD-C8A2-47FB-89C4-D18EB7AEF018@gmail.com>
Message-ID: <511DE96F.4010109@behnel.de>

This just came through python-dev:

-------- Original-Message --------
Subject: A new webpage promoting Compiler technology for CPython
Date: Fri, 15 Feb 2013 01:11:12 -0600
From: Travis Oliphant <teoliphant... at gmail.com>

Hey all,

With Numba and Blaze we have been doing a lot of work on what essentially
is compiler technology and realizing more and more that we are treading on
ground that has been plowed before with many other projects.   So, we
wanted to create a web-site and perhaps even a mailing list or forum where
people could coordinate and communicate about compiler projects, compiler
tools, and ways to share efforts and ideas.

The website is:  http://compilers.pydata.org/

This page is specifically for Compiler projects that either integrate with
or work directly with the CPython run-time which is why PyPy is not
presently listed.  The PyPy project is a great project but we just felt
that we wanted to explicitly create a collection of links to compilation
projects that are accessible from CPython which are likely less well known.

But that is just where we started from.   The website is intended to be a
community website constructed from a github repository.   So, we welcome
pull requests from anyone who would like to see the website updated to
reflect their related project.    Jon Riehl (Mython, PyFront, ROFL, and
many other interesting projects) and Stephen Diehl (Blaze) and I will be
moderating the pull requests to begin with.   But, we welcome others with
similar interests to participate in that effort of moderation.

The github repository is here:  https://github.com/pydata/compilers-webpage

This is intended to be a community website for information spreading, and
so we welcome any and all contributions.

Thank you,

Travis Oliphant

-------------- next part --------------
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


From sturla at molden.no  Mon Feb 18 19:32:40 2013
From: sturla at molden.no (Sturla Molden)
Date: Mon, 18 Feb 2013 19:32:40 +0100
Subject: [Cython] PR on refcounting memoryview buffers
Message-ID: <512273C8.4000005@molden.no>

As Stefan suggested, I have posted a PR for a better fix for the issue 
when MinGW for some reason emits the symbol "__synch_fetch_and_add_4" 
instead of generating atomic opcode for the __synch_fetch_and_add builtin.

The PR is here:
https://github.com/cython/cython/pull/185

The discussion probably belongs on this list instead og Cython user:

The problem this addresses is when GCC does not use atomic builtins and 
emits __synch_fetch_and_add_4 and __synch_fetch_and_sub_4 when Cython 
are internally refcounting memoryview buffers. For some reason it can 
even happen on x86 and amd64.

My PR undos Marks quick fix that always uses PyThread_acquire_lock on 
MinGW. PyThread_acquire_lock uses a kernel object (semaphore) on Windows 
and is not very efficient. I want slicing memoryviews to be fast, and 
that means PyThread_acquire_lock must go. My PR uses Windows API atomic 
function InterlockedAdd to implement the semantics of 
__synch_fetch_and_add_4 and __synch_fetch_and_sub_4 instead of using a 
Python lock.

Usually MinGW is configured to compile GNU atomic builtins correctly. I 
have yet to see a case where it is not. But obviously one user (JF 
Gallant) has encountered it. I don't think it is a MinGW specific 
problem, but currently it has only been seen on MinGW and the fix is 
MinGW specific (well, it should work on Cygwin too). But whenever MinGW 
does use atomic builtins it just uses them. So it incurs no speed 
penalty on well-behaved MinGW builds.

I took the liberty to use GNU extensions __inline__ and 
__attribute(always_inline)__. They will make sure the functions always 
behave like macros. The rationale being that it is GCC specific code so 
we can assume GNU extensions are available. If we take them away the 
code should still work, but we have no guarantee the functions will be 
inlined. I did not use macros because __synch_fetch_and_add is emitted 
by the preprocessor, and thus GCC will presumably emit 
__synch_fetch_and_sub_4 after the preprocessing step, which could 
require __synch_fetch_and_sub_4 to be a function instead of another 
macro. (I have no way of finding it out since I cannot test for it.)


Regarding Linux and OSX:

Failure of GCC to use atomic builtins could also happen on other GCC 
builds though. I don't think it is a MinGW-only issue. It's probably due 
to how the GCC build was configured. So we should as a safeguard have 
this for other OSes too.

http://developer.apple.com/library/ios/#DOCUMENTATION/System/Conceptual/ManPages_iPhoneOS/man3/OSAtomicAdd32.3.html

We probably just need similar code to what I wrote for MinGW. I can 
write the code, but I don't have a Mac on which to test it.

Also we should use OSAtomic* on clang/LLVM, which is now the platform C 
compiler on OSX. This will avoid PyThread_acquire_lock being the common 
synch mechanism for refcounting memoryview buffers on OSX.

On Linux I am not sure what to suggest if GCC fails to use atomic 
builtins. I can handcode inline assembly for x86/amd64. I could also use 
pthreads and pth threads locks. But we could also assume that it never 
happen and just let the linker fail on __synch_fetch_and_add_4.


Sturla

From sturla at molden.no  Wed Feb 20 11:55:51 2013
From: sturla at molden.no (Sturla Molden)
Date: Wed, 20 Feb 2013 11:55:51 +0100
Subject: [Cython] PR on refcounting memoryview buffers
In-Reply-To: <512273C8.4000005@molden.no>
References: <512273C8.4000005@molden.no>
Message-ID: <15C80BD0-302E-4576-ACF3-C0FFD700569B@molden.no>


Den 18. feb. 2013 kl. 19:32 skrev Sturla Molden <sturla at molden.no>:

> The problem this addresses is when GCC does not use atomic builtins and emits __synch_fetch_and_add_4 and __synch_fetch_and_sub_4 when Cython are internally refcounting memoryview buffers. For some reason it can even happen on x86 and amd64.
> 

Specifically, atomic builtins are not used when compiling for i386, which is MinGWs default target architecture (unless we specify a different -march). GCC will always encounter this problem when targeting i386.

Thus the correct fix is to use fallback when GCC is targeting i386 ? not when GCC is targeting MS Windows. 

So I am closing this PR. But Mark's fix must be corrected, because it does not really address the problem (which is i386, not MinGW)! 

Sturla


From stefan_ml at behnel.de  Thu Feb 21 07:46:35 2013
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 21 Feb 2013 07:46:35 +0100
Subject: [Cython] Sage build broken
Message-ID: <5125C2CB.5000603@behnel.de>

Hi,

I just noticed that the Sage build is broken:

"""
gcc -pthread -shared -L/jenkins/sage/sage-5.2/local/lib
build/temp.linux-x86_64-2.7/sage/rings/polynomial/polydict.o
-L/jenkins/sage/sage-5.2/local/lib -L/release/merger/sage-5.2/local/lib
-lcsage -lstdc++ -lntl -lpython2.7 -o
build/lib.linux-x86_64-2.7/sage/rings/polynomial/polydict.so

/usr/bin/ld: build/temp.linux-x86_64-2.7/sage/rings/polynomial/polydict.o:
relocation R_X86_64_PC32 against `__Pyx_PyDict_IterItems' can not be used
when making a shared object; recompile with -fPIC

/usr/bin/ld: final link failed: Bad value
collect2: ld returned 1 exit status
command 'gcc' failed with exit status 1
"""

Looks like a problem in Sage to me, the gcc command really lacks the -fPIC
here.

Stefan

From stefan_ml at behnel.de  Thu Feb 21 21:48:12 2013
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 21 Feb 2013 21:48:12 +0100
Subject: [Cython] Sage build broken
In-Reply-To: <5125C2CB.5000603@behnel.de>
References: <5125C2CB.5000603@behnel.de>
Message-ID: <5126880C.3090903@behnel.de>

Stefan Behnel, 21.02.2013 07:46:
> I just noticed that the Sage build is broken:
> 
> """
> gcc -pthread -shared -L/jenkins/sage/sage-5.2/local/lib
> build/temp.linux-x86_64-2.7/sage/rings/polynomial/polydict.o
> -L/jenkins/sage/sage-5.2/local/lib -L/release/merger/sage-5.2/local/lib
> -lcsage -lstdc++ -lntl -lpython2.7 -o
> build/lib.linux-x86_64-2.7/sage/rings/polynomial/polydict.so
> 
> /usr/bin/ld: build/temp.linux-x86_64-2.7/sage/rings/polynomial/polydict.o:
> relocation R_X86_64_PC32 against `__Pyx_PyDict_IterItems' can not be used
> when making a shared object; recompile with -fPIC
> 
> /usr/bin/ld: final link failed: Bad value
> collect2: ld returned 1 exit status
> command 'gcc' failed with exit status 1
> """
> 
> Looks like a problem in Sage to me, the gcc command really lacks the -fPIC
> here.

Sorry, my bad. I had a typo in a utility code section name, which prevented
the actual implementation of that function from appearing in the C code. No
idea what makes gcc generate that misleading error message above, though.

Stefan


From stefan_ml at behnel.de  Thu Feb 21 22:37:13 2013
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 21 Feb 2013 22:37:13 +0100
Subject: [Cython] [cython-users] To add datetime.pxd to cython.cpython?
In-Reply-To: <94587c61-6e24-4627-b328-d079c69c2334@googlegroups.com>
References: <19694bda-6d20-49ea-87bd-503dfc16dedb@googlegroups.com>
	<51262838.7040402@behnel.de>
	<94587c61-6e24-4627-b328-d079c69c2334@googlegroups.com>
Message-ID: <51269389.2060400@behnel.de>

Hi,

I think this discussion is actually better suited for the cython-devel
mailing list. We should move it over there.

Zaur Shibzukhov, 21.02.2013 20:59:
> ???????, 21 ??????? 2013 ?., 16:59:20 UTC+3 ???????????? Stefan Behnel 
> ???????:
>> Zaur Shibzukhov, 21.02.2013 11:25: 
>>> Last time I actively used datetime module. Because I needed fast 
>>> creation 
>>> of date/time/datetime instances I wrote datetime.pxd. It contains much 
>>> of 
>>> datetime API from datetime.h + two extended version for time/datetime 
>>> creation. Does it make sense to include datetime.pxd in cython.cpython? 
>>
>> Given that datetime.h is actually part of the header files that CPython 
>> installs, it makes total sense to me to include it. Please provide a pull 
>> request on github for it. 
>
> OK. I will create pull request with datetime.pxd + tests

Great.


>> However, I don't know what you mean by "extended version for time/datetime 
>> creation". Could you show us the code for that first? 
>>
> Datetime.h from cpython contains factory functions for creation 
> time/datetime without timezone info.
> But actually datetime module contains public definition of factory 
> functions for creation time/date with timezone info, which are not in 
> cpython's datetime.h. 
> I could create datetime_ex.h for these functions in order to include them 
> in datetime.pxd. The problem: how to adopt datetime_ex.h to Cython...
> 
> Current datetime.pxd looks like:
> [...] 

I was more interested in the parts that are not in the public header file.
Could you list those?

Letting Cython generate those definitions isn't really all that much of a
problem. We already do this for the stdlib array module, which doesn't have
a public header file at all.

Stefan


From szport at gmail.com  Fri Feb 22 08:01:06 2013
From: szport at gmail.com (ZS)
Date: Fri, 22 Feb 2013 10:01:06 +0300
Subject: [Cython] To Add datetime.pxd to cython.cpython
Message-ID: <CAPOE21SkWaQMYhA_b9zuzC--MugcdvrBZidQXHK1+56xf4GUgQ@mail.gmail.com>

Extended part is in datetime_ex.h:

#include "datetime.h"

#define PyDateTime_FromDateAndTimeEx(year, month, day, hour, min, sec,
usec, tzinfo) \
    PyDateTimeAPI->DateTime_FromDateAndTime(year, month, day, hour, \
        min, sec, usec, tzinfo, PyDateTimeAPI->DateTimeType)

#define PyTime_FromTimeEx(hour, minute, second, usecond, tzinfo) \
    PyDateTimeAPI->Time_FromTime(hour, minute, second, usecond, \
        tzinfo, PyDateTimeAPI->TimeType)

These macros allow to create dattime/time objects with tzinfo.
Of course we could do:

    t = PyTime_FromTime(........)
    t = t.replace(tzinfo)

in absence of that.


Zaur Shibzukhov

From szport at gmail.com  Fri Feb 22 08:38:56 2013
From: szport at gmail.com (ZS)
Date: Fri, 22 Feb 2013 10:38:56 +0300
Subject: [Cython] To Add datetime.pxd to cython.cpython
In-Reply-To: <CAPOE21SkWaQMYhA_b9zuzC--MugcdvrBZidQXHK1+56xf4GUgQ@mail.gmail.com>
References: <CAPOE21SkWaQMYhA_b9zuzC--MugcdvrBZidQXHK1+56xf4GUgQ@mail.gmail.com>
Message-ID: <CAPOE21QgWNw42T+wptBMBGr-xNrCgkNK=wN7ROf0SyJS=hRcuQ@mail.gmail.com>

>These macros allow to create dattime/time objects with tzinfo.
>Of course we could do:
>
>    t = PyTime_FromTime(........)
>    t = t.replace(tzinfo)
 Sorry last line has to be:
    t = t.replace(tzinfo=tzinfo)


Zaur Shibzukhov

From robertwb at gmail.com  Fri Feb 22 09:03:47 2013
From: robertwb at gmail.com (Robert Bradshaw)
Date: Fri, 22 Feb 2013 00:03:47 -0800
Subject: [Cython] To Add datetime.pxd to cython.cpython
In-Reply-To: <CAPOE21SkWaQMYhA_b9zuzC--MugcdvrBZidQXHK1+56xf4GUgQ@mail.gmail.com>
References: <CAPOE21SkWaQMYhA_b9zuzC--MugcdvrBZidQXHK1+56xf4GUgQ@mail.gmail.com>
Message-ID: <CADiQ+QDayi5E7D5Gg565PYB=fS-T-PhgnSn3+Dvw-Wv73823BA@mail.gmail.com>

These could be provided as inline functions in the pxd rather rather
than adding another hack like we did for array.

On Thu, Feb 21, 2013 at 11:01 PM, ZS <szport at gmail.com> wrote:
> Extended part is in datetime_ex.h:
>
> #include "datetime.h"
>
> #define PyDateTime_FromDateAndTimeEx(year, month, day, hour, min, sec,
> usec, tzinfo) \
>     PyDateTimeAPI->DateTime_FromDateAndTime(year, month, day, hour, \
>         min, sec, usec, tzinfo, PyDateTimeAPI->DateTimeType)
>
> #define PyTime_FromTimeEx(hour, minute, second, usecond, tzinfo) \
>     PyDateTimeAPI->Time_FromTime(hour, minute, second, usecond, \
>         tzinfo, PyDateTimeAPI->TimeType)
>
> These macros allow to create dattime/time objects with tzinfo.
> Of course we could do:
>
>     t = PyTime_FromTime(........)
>     t = t.replace(tzinfo)
>
> in absence of that.
>
>
> Zaur Shibzukhov
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

From stefan_ml at behnel.de  Sun Feb 24 16:58:32 2013
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 24 Feb 2013 16:58:32 +0100
Subject: [Cython] [cython-users] freelist benchmarks
In-Reply-To: <CANg26EXiUESD9RyYeK3-B_5qZCVcgiRqV97s5DHf193fb-pVMw@mail.gmail.com>
References: <512A1B20.4050707@behnel.de>
	<CANg26EXiUESD9RyYeK3-B_5qZCVcgiRqV97s5DHf193fb-pVMw@mail.gmail.com>
Message-ID: <512A38A8.3040905@behnel.de>

mark florisson, 24.02.2013 15:50:
> On 24 February 2013 13:52, Stefan Behnel wrote:
>> for those who haven't notice my other e-mail, I implemented a new extension
>> type decorator "@cython.freelist(N)" that replaces the normal object
>> creation and deallocation with a freelist of N recently freed objects.
>> Currently, it's only supported for types that do not have a base class (and
>> lifting that restriction is not all that easy).
> 
> Very cool! I've been wanting that for a while now :)

So did I.


> What's the hurdle with base classes?

The problem is that the current way types are being instantiated is
recursive. The top-most base class calls tp_alloc() and then each step in
the hierarchy adds its bit of initialisation. If you want to introduce a
freelist into this scheme, then it's still the top-most class that does the
allocation, so it would need to manage all freelists of all of its children
in order to return the right object struct for a given instantiation request.

This cannot really be done at compile time. Imagine a subtype in a
different module, for example, for which the code requests a freelist. The
compilation of the base type wouldn't even know that it's supposed to
manage a freelist at all, only the subtype knows it.

There are a couple of ways to deal with this. One is to replicate the
freelist in the base type for all subtypes that it finds at runtime. That
might actually be the easiest way to do it, but it requires a bit of memory
management in order to add a new freelist when a new subtype is found at
runtime. It also means that we'd have to find the right freelist before we
can get an object from it (or not, if it's empty), which would likely be an
operation that's linear with the number of subtypes. And the freelist set
would better be bounded in size to prevent users from flooding it with lots
and lots of subtypes.

Another option would be to split the initialisation up into two functions,
one that allocates *and* initialises the instance and one that *only*
initialises it. That would allow each hierarchy level to manage its own
freelists and to take its own decision about where to get the object from.
This approach comes with a couple of tricky details, as CPython doesn't
provide support for this. So we'd need to find a way to handle type
hierarchies that are implemented across modules.

Maybe the best approach would be to let the base type manage everything and
just statically limit the maximum number of subtypes for which it provides
separate freelists, at a first come, first serve basis. And the freelist
selection could be based on the object struct size (tp_basicsize) instead
of the specific type. As long as we don't support inheriting from variable
size objects (like tuple/bytes), that would cut down the problem quite
nicely. I think I should just give it a try at some point.

Stefan


From markflorisson88 at gmail.com  Sun Feb 24 18:56:06 2013
From: markflorisson88 at gmail.com (mark florisson)
Date: Sun, 24 Feb 2013 17:56:06 +0000
Subject: [Cython] [cython-users] freelist benchmarks
In-Reply-To: <CANg26EUcM+ZbUzT+GgW-2DHusNn0301y4p_Xwi0Ride2ev_h7w@mail.gmail.com>
References: <512A1B20.4050707@behnel.de>
	<CANg26EXiUESD9RyYeK3-B_5qZCVcgiRqV97s5DHf193fb-pVMw@mail.gmail.com>
	<512A38A8.3040905@behnel.de>
	<CANg26EUcM+ZbUzT+GgW-2DHusNn0301y4p_Xwi0Ride2ev_h7w@mail.gmail.com>
Message-ID: <CANg26EW_QNSWmeZ-QyBaZmyxGtjGycJOm5bV+rEaQPV6LiW7hQ@mail.gmail.com>

On 24 February 2013 17:50, mark florisson <markflorisson88 at gmail.com> wrote:
> On 24 February 2013 15:58, Stefan Behnel <stefan_ml at behnel.de> wrote:
>> mark florisson, 24.02.2013 15:50:
>>> On 24 February 2013 13:52, Stefan Behnel wrote:
>>>> for those who haven't notice my other e-mail, I implemented a new extension
>>>> type decorator "@cython.freelist(N)" that replaces the normal object
>>>> creation and deallocation with a freelist of N recently freed objects.
>>>> Currently, it's only supported for types that do not have a base class (and
>>>> lifting that restriction is not all that easy).
>>>
>>> Very cool! I've been wanting that for a while now :)
>>
>> So did I.
>>
>>
>>> What's the hurdle with base classes?
>>
>> The problem is that the current way types are being instantiated is
>> recursive. The top-most base class calls tp_alloc() and then each step in
>> the hierarchy adds its bit of initialisation. If you want to introduce a
>> freelist into this scheme, then it's still the top-most class that does the
>> allocation, so it would need to manage all freelists of all of its children
>> in order to return the right object struct for a given instantiation request.
>>
>> This cannot really be done at compile time. Imagine a subtype in a
>> different module, for example, for which the code requests a freelist. The
>> compilation of the base type wouldn't even know that it's supposed to
>> manage a freelist at all, only the subtype knows it.
>>
>> There are a couple of ways to deal with this. One is to replicate the
>> freelist in the base type for all subtypes that it finds at runtime. That
>> might actually be the easiest way to do it, but it requires a bit of memory
>> management in order to add a new freelist when a new subtype is found at
>> runtime. It also means that we'd have to find the right freelist before we
>> can get an object from it (or not, if it's empty), which would likely be an
>> operation that's linear with the number of subtypes. And the freelist set
>> would better be bounded in size to prevent users from flooding it with lots
>> and lots of subtypes.
>>
>> Another option would be to split the initialisation up into two functions,
>> one that allocates *and* initialises the instance and one that *only*
>> initialises it. That would allow each hierarchy level to manage its own
>> freelists and to take its own decision about where to get the object from.
>> This approach comes with a couple of tricky details, as CPython doesn't
>> provide support for this. So we'd need to find a way to handle type
>> hierarchies that are implemented across modules.
>
> Thanks for the explanation Stefan, this is the one I was thinking of,
> but I suppose it'd need an extra pointer to the pure init function in
> the type.

Hm, since extension types don't do multiple inheritance (and excluding
Python subclasses), couldn't you import those init functions across
modules through capsules?

>> Maybe the best approach would be to let the base type manage everything and
>> just statically limit the maximum number of subtypes for which it provides
>> separate freelists, at a first come, first serve basis. And the freelist
>> selection could be based on the object struct size (tp_basicsize) instead
>> of the specific type. As long as we don't support inheriting from variable
>> size objects (like tuple/bytes), that would cut down the problem quite
>> nicely. I think I should just give it a try at some point.
>
> What about using pyextensible type from SEP 200 and using a custom
> freelist entry on the type?
>
>> Stefan
>>
>> --
>>
>> ---
>> You received this message because you are subscribed to the Google Groups "cython-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to cython-users+unsubscribe at googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>

From markflorisson88 at gmail.com  Sun Feb 24 18:50:47 2013
From: markflorisson88 at gmail.com (mark florisson)
Date: Sun, 24 Feb 2013 17:50:47 +0000
Subject: [Cython] [cython-users] freelist benchmarks
In-Reply-To: <512A38A8.3040905@behnel.de>
References: <512A1B20.4050707@behnel.de>
	<CANg26EXiUESD9RyYeK3-B_5qZCVcgiRqV97s5DHf193fb-pVMw@mail.gmail.com>
	<512A38A8.3040905@behnel.de>
Message-ID: <CANg26EUcM+ZbUzT+GgW-2DHusNn0301y4p_Xwi0Ride2ev_h7w@mail.gmail.com>

On 24 February 2013 15:58, Stefan Behnel <stefan_ml at behnel.de> wrote:
> mark florisson, 24.02.2013 15:50:
>> On 24 February 2013 13:52, Stefan Behnel wrote:
>>> for those who haven't notice my other e-mail, I implemented a new extension
>>> type decorator "@cython.freelist(N)" that replaces the normal object
>>> creation and deallocation with a freelist of N recently freed objects.
>>> Currently, it's only supported for types that do not have a base class (and
>>> lifting that restriction is not all that easy).
>>
>> Very cool! I've been wanting that for a while now :)
>
> So did I.
>
>
>> What's the hurdle with base classes?
>
> The problem is that the current way types are being instantiated is
> recursive. The top-most base class calls tp_alloc() and then each step in
> the hierarchy adds its bit of initialisation. If you want to introduce a
> freelist into this scheme, then it's still the top-most class that does the
> allocation, so it would need to manage all freelists of all of its children
> in order to return the right object struct for a given instantiation request.
>
> This cannot really be done at compile time. Imagine a subtype in a
> different module, for example, for which the code requests a freelist. The
> compilation of the base type wouldn't even know that it's supposed to
> manage a freelist at all, only the subtype knows it.
>
> There are a couple of ways to deal with this. One is to replicate the
> freelist in the base type for all subtypes that it finds at runtime. That
> might actually be the easiest way to do it, but it requires a bit of memory
> management in order to add a new freelist when a new subtype is found at
> runtime. It also means that we'd have to find the right freelist before we
> can get an object from it (or not, if it's empty), which would likely be an
> operation that's linear with the number of subtypes. And the freelist set
> would better be bounded in size to prevent users from flooding it with lots
> and lots of subtypes.
>
> Another option would be to split the initialisation up into two functions,
> one that allocates *and* initialises the instance and one that *only*
> initialises it. That would allow each hierarchy level to manage its own
> freelists and to take its own decision about where to get the object from.
> This approach comes with a couple of tricky details, as CPython doesn't
> provide support for this. So we'd need to find a way to handle type
> hierarchies that are implemented across modules.

Thanks for the explanation Stefan, this is the one I was thinking of,
but I suppose it'd need an extra pointer to the pure init function in
the type.

> Maybe the best approach would be to let the base type manage everything and
> just statically limit the maximum number of subtypes for which it provides
> separate freelists, at a first come, first serve basis. And the freelist
> selection could be based on the object struct size (tp_basicsize) instead
> of the specific type. As long as we don't support inheriting from variable
> size objects (like tuple/bytes), that would cut down the problem quite
> nicely. I think I should just give it a try at some point.

What about using pyextensible type from SEP 200 and using a custom
freelist entry on the type?

> Stefan
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups "cython-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cython-users+unsubscribe at googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

From stefan_ml at behnel.de  Sun Feb 24 22:45:13 2013
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 24 Feb 2013 22:45:13 +0100
Subject: [Cython] [cython-users] freelist benchmarks
In-Reply-To: <CANg26EW_QNSWmeZ-QyBaZmyxGtjGycJOm5bV+rEaQPV6LiW7hQ@mail.gmail.com>
References: <512A1B20.4050707@behnel.de>
	<CANg26EXiUESD9RyYeK3-B_5qZCVcgiRqV97s5DHf193fb-pVMw@mail.gmail.com>
	<512A38A8.3040905@behnel.de>
	<CANg26EUcM+ZbUzT+GgW-2DHusNn0301y4p_Xwi0Ride2ev_h7w@mail.gmail.com>
	<CANg26EW_QNSWmeZ-QyBaZmyxGtjGycJOm5bV+rEaQPV6LiW7hQ@mail.gmail.com>
Message-ID: <512A89E9.2070104@behnel.de>

mark florisson, 24.02.2013 18:56:
> On 24 February 2013 17:50, mark florisson <markflorisson88 at gmail.com> wrote:
>> On 24 February 2013 15:58, Stefan Behnel <stefan_ml at behnel.de> wrote:
>>> mark florisson, 24.02.2013 15:50:
>>>> On 24 February 2013 13:52, Stefan Behnel wrote:
>>>>> for those who haven't notice my other e-mail, I implemented a new extension
>>>>> type decorator "@cython.freelist(N)" that replaces the normal object
>>>>> creation and deallocation with a freelist of N recently freed objects.
>>>>> Currently, it's only supported for types that do not have a base class (and
>>>>> lifting that restriction is not all that easy).
>>>>
>>>> Very cool! I've been wanting that for a while now :)
>>>
>>> So did I.
>>>
>>>
>>>> What's the hurdle with base classes?
>>>
>>> The problem is that the current way types are being instantiated is
>>> recursive. The top-most base class calls tp_alloc() and then each step in
>>> the hierarchy adds its bit of initialisation. If you want to introduce a
>>> freelist into this scheme, then it's still the top-most class that does the
>>> allocation, so it would need to manage all freelists of all of its children
>>> in order to return the right object struct for a given instantiation request.
>>>
>>> This cannot really be done at compile time. Imagine a subtype in a
>>> different module, for example, for which the code requests a freelist. The
>>> compilation of the base type wouldn't even know that it's supposed to
>>> manage a freelist at all, only the subtype knows it.
>>>
>>> There are a couple of ways to deal with this. One is to replicate the
>>> freelist in the base type for all subtypes that it finds at runtime. That
>>> might actually be the easiest way to do it, but it requires a bit of memory
>>> management in order to add a new freelist when a new subtype is found at
>>> runtime. It also means that we'd have to find the right freelist before we
>>> can get an object from it (or not, if it's empty), which would likely be an
>>> operation that's linear with the number of subtypes. And the freelist set
>>> would better be bounded in size to prevent users from flooding it with lots
>>> and lots of subtypes.
>>>
>>> Another option would be to split the initialisation up into two functions,
>>> one that allocates *and* initialises the instance and one that *only*
>>> initialises it. That would allow each hierarchy level to manage its own
>>> freelists and to take its own decision about where to get the object from.
>>> This approach comes with a couple of tricky details, as CPython doesn't
>>> provide support for this. So we'd need to find a way to handle type
>>> hierarchies that are implemented across modules.
>>
>> Thanks for the explanation Stefan, this is the one I was thinking of,
>> but I suppose it'd need an extra pointer to the pure init function in
>> the type.
> 
> Hm, since extension types don't do multiple inheritance (and excluding
> Python subclasses), couldn't you import those init functions across
> modules through capsules?

Well, yes, I suppose you could. However, that's quite some overhead. I
think it's way easier to just provision a couple of freelists in advance
and assign them to different subtype sizes as they come in. Even in
somewhat large hierarchies, I doubt that the object structs will have all
that many different sizes. Remember, the size only changes when you add
cdef attributes, and only once when you start adding cdef methods. And even
structs that appear in different subtrees of the hierarchy and that carry
different attributes may end up having the same struct size due to layout
coincidences. I would expect that even a type hierarchy of, say, 20 types,
would have at most some 4-8 different struct sizes. Most of the time,
subtypes are there to change behaviour, not state.

The only real drawback is that you need to enable the base type to do all
that's necessary, which you may not have control over in a few cases. But
then again, if it's worth using a freelist on one subtype, it's probably
worth using it in general, so it's best to fix the base type in any way.


>>> Maybe the best approach would be to let the base type manage everything and
>>> just statically limit the maximum number of subtypes for which it provides
>>> separate freelists, at a first come, first serve basis. And the freelist
>>> selection could be based on the object struct size (tp_basicsize) instead
>>> of the specific type. As long as we don't support inheriting from variable
>>> size objects (like tuple/bytes), that would cut down the problem quite
>>> nicely. I think I should just give it a try at some point.

I changed the current type pointer check to look at tp_basicsize instead.
That made it work for almost all classes in lxml's own Element hierarchy,
with only a couple of exceptions in lxml.objectify that have one additional
object field. So, just extending the freelist support to use two different
lists for different struct sizes instead of just one would make it work for
all of lxml already. Taking a look at Sage to see how the situation appears
over there would be interesting, I guess.

Stefan


From roed.math at gmail.com  Mon Feb 25 00:00:31 2013
From: roed.math at gmail.com (David Roe)
Date: Sun, 24 Feb 2013 16:00:31 -0700
Subject: [Cython] [cython-users] freelist benchmarks
In-Reply-To: <512A89E9.2070104@behnel.de>
References: <512A1B20.4050707@behnel.de>
	<CANg26EXiUESD9RyYeK3-B_5qZCVcgiRqV97s5DHf193fb-pVMw@mail.gmail.com>
	<512A38A8.3040905@behnel.de>
	<CANg26EUcM+ZbUzT+GgW-2DHusNn0301y4p_Xwi0Ride2ev_h7w@mail.gmail.com>
	<CANg26EW_QNSWmeZ-QyBaZmyxGtjGycJOm5bV+rEaQPV6LiW7hQ@mail.gmail.com>
	<512A89E9.2070104@behnel.de>
Message-ID: <CAChs6_k0tcUpjWm1ZStxQswD4Hao7qQ3sW+=KgXwQeeHwsi-dg@mail.gmail.com>

I changed the current type pointer check to look at tp_basicsize instead.

> That made it work for almost all classes in lxml's own Element hierarchy,
> with only a couple of exceptions in lxml.objectify that have one additional
> object field. So, just extending the freelist support to use two different
> lists for different struct sizes instead of just one would make it work for
> all of lxml already. Taking a look at Sage to see how the situation appears
> over there would be interesting, I guess.
>

I found some chains of length 5.  This could be shortened to 4 by putting
the freelist at the level of Element (which is where you most care about
speed of object creation).

SageObject
    -> Element (_parent attribute and cdef methods)
    -> Vector (_degree)
    -> FreeModuleElement (_is_mutable)
    -> FreeModuleElement_generic_dense (_entries)

SageObject
    -> Element (_parent attribute and cdef methods)
    ->sage.structure.element.Matrix (_nrows)
    -> sage.matrix.matrix.Matrix (_base_ring)
    -> Matrix_integer_dense (_entries)

This does look cool to have though.
David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20130224/f09d498c/attachment.html>

From stefan_ml at behnel.de  Mon Feb 25 10:17:25 2013
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 25 Feb 2013 10:17:25 +0100
Subject: [Cython] [cython-users] freelist benchmarks
In-Reply-To: <CAChs6_k0tcUpjWm1ZStxQswD4Hao7qQ3sW+=KgXwQeeHwsi-dg@mail.gmail.com>
References: <512A1B20.4050707@behnel.de>
	<CANg26EXiUESD9RyYeK3-B_5qZCVcgiRqV97s5DHf193fb-pVMw@mail.gmail.com>
	<512A38A8.3040905@behnel.de>
	<CANg26EUcM+ZbUzT+GgW-2DHusNn0301y4p_Xwi0Ride2ev_h7w@mail.gmail.com>
	<CANg26EW_QNSWmeZ-QyBaZmyxGtjGycJOm5bV+rEaQPV6LiW7hQ@mail.gmail.com>
	<512A89E9.2070104@behnel.de>
	<CAChs6_k0tcUpjWm1ZStxQswD4Hao7qQ3sW+=KgXwQeeHwsi-dg@mail.gmail.com>
Message-ID: <512B2C25.7080009@behnel.de>

Hi,

thanks for looking through it.

David Roe, 25.02.2013 00:00:
> I changed the current type pointer check to look at tp_basicsize instead.
> 
>> That made it work for almost all classes in lxml's own Element hierarchy,
>> with only a couple of exceptions in lxml.objectify that have one additional
>> object field. So, just extending the freelist support to use two different
>> lists for different struct sizes instead of just one would make it work for
>> all of lxml already. Taking a look at Sage to see how the situation appears
>> over there would be interesting, I guess.
> 
> I found some chains of length 5.  This could be shortened to 4 by putting
> the freelist at the level of Element (which is where you most care about
> speed of object creation).

It's substantially easier to keep it in the top-level base class, though.
Otherwise, we'd need a new protocol between inheriting types as I
previously described. That add a *lot* of complexity.


> SageObject
>     -> Element (_parent attribute and cdef methods)
>     -> Vector (_degree)
>     -> FreeModuleElement (_is_mutable)
>     -> FreeModuleElement_generic_dense (_entries)
> 
> SageObject
>     -> Element (_parent attribute and cdef methods)
>     ->sage.structure.element.Matrix (_nrows)
>     -> sage.matrix.matrix.Matrix (_base_ring)
>     -> Matrix_integer_dense (_entries)

Ok, so even for something as large as Sage, we'd apparently end up with
just a couple of freelists for a given base type. That really makes it
appear reasonable to make that number a compile time constant as well. I
mean, even if you *really* oversize it, all you loose is the static memory
for a couple of pointers. On a 64 bit system, if you use a freelist size of
8 objects and provision freelists for 8 differently sized subtypes, that's
8*8*8 bytes in total, or half a KB, statically allocated. Even a hundred
times that size shouldn't hurt anyone. Unused subtype freelists really take
almost no space and won't hurt performance either.


> This does look cool to have though.

It definitely is.

Stefan


From dave.hirschfeld at gmail.com  Mon Feb 25 22:56:45 2013
From: dave.hirschfeld at gmail.com (Dave Hirschfeld)
Date: Mon, 25 Feb 2013 21:56:45 +0000 (UTC)
Subject: [Cython] No matching signature with fused memoryview and None
	default
Message-ID: <loom.20130225T222401-294@post.gmane.org>

With the following code I get a "No matching signature found" error.
Is this a bug?

```
%%cython
cimport cython

ctypedef fused floating:
    cython.double
    cython.float


def nosignature(floating[:] x, floating[:] myarray = None):
    print myarray is None
    return x
```

In [39]: nosignature(ones(1, dtype=np.float32), ones(1, dtype=np.float32))
False
Out[39]: <MemoryView of 'ndarray' at 0x937e4d8>

In [40]: nosignature(ones(1, dtype=np.float64), ones(1, dtype=np.float64))
False
Out[40]: <MemoryView of 'ndarray' at 0x937f2d8>

In [41]: nosignature(ones(1, dtype=np.float64))
True
Out[41]: <MemoryView of 'ndarray' at 0x9381258>

In [42]: nosignature(ones(1, dtype=np.float32))

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-42-c96cfff67289> in <module>()
----> 1 nosignature(ones(1, dtype=np.float32))

ca9.pyd in ca9.__pyx_fused_cpdef ca9.c:2282)()

TypeError: No matching signature found


Thanks,
Dave


From dave.hirschfeld at gmail.com  Tue Feb 26 13:47:54 2013
From: dave.hirschfeld at gmail.com (Dave Hirschfeld)
Date: Tue, 26 Feb 2013 12:47:54 +0000 (UTC)
Subject: [Cython] Can't assign memview cast to memview slice
Message-ID: <loom.20130226T133706-870@post.gmane.org>

The following works:

```
%%cython
cimport cython
import numpy as np
cimport numpy as np

def f(double[:,:] arr):
    cdef double[:] res = np.zeros(2*arr.size, dtype=np.float64)
    cdef double[:] tmp
    tmp = <double[:arr.size]>&arr[0,0]
    res[:arr.size] = tmp
    return res
```

whereas the following:

```
%%cython
cimport cython
import numpy as np
cimport numpy as np

def f(double[:,:] arr):
    cdef double[:] res = np.zeros(2*arr.size, dtype=np.float64)
    res[:arr.size] = <double[:arr.size]>&arr[0,0]
    return res
```

...gives the below error:

Error compiling Cython file:
------------------------------------------------------------
...
import numpy as np
cimport numpy as np

def f(double[:,:] arr):
    cdef double[:] res = np.zeros(2*arr.size, dtype=np.float64)
    res[:arr.size] = <double[:arr.size]>&arr[0,0]
                    ^
------------------------------------------------------------

d3ce.pyx:7:21: Cannot assign type 'double[::1]' to 'double'


It would be nice if cython could take care of the temporary 
itself though the workaround is certainly simple enough
that it's not a big issue at all.

Thanks,
Dave


From robertwb at gmail.com  Tue Feb 26 21:16:52 2013
From: robertwb at gmail.com (Robert Bradshaw)
Date: Tue, 26 Feb 2013 12:16:52 -0800
Subject: [Cython] [cython-users] freelist benchmarks
In-Reply-To: <512B2C25.7080009@behnel.de>
References: <512A1B20.4050707@behnel.de>
	<CANg26EXiUESD9RyYeK3-B_5qZCVcgiRqV97s5DHf193fb-pVMw@mail.gmail.com>
	<512A38A8.3040905@behnel.de>
	<CANg26EUcM+ZbUzT+GgW-2DHusNn0301y4p_Xwi0Ride2ev_h7w@mail.gmail.com>
	<CANg26EW_QNSWmeZ-QyBaZmyxGtjGycJOm5bV+rEaQPV6LiW7hQ@mail.gmail.com>
	<512A89E9.2070104@behnel.de>
	<CAChs6_k0tcUpjWm1ZStxQswD4Hao7qQ3sW+=KgXwQeeHwsi-dg@mail.gmail.com>
	<512B2C25.7080009@behnel.de>
Message-ID: <CADiQ+QDFW6suPtrcdTwvK5aX6ogXuXy0kW2H7n3ygd+BbuX6cg@mail.gmail.com>

On Mon, Feb 25, 2013 at 1:17 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Hi,
>
> thanks for looking through it.
>
> David Roe, 25.02.2013 00:00:
>> I changed the current type pointer check to look at tp_basicsize instead.
>>
>>> That made it work for almost all classes in lxml's own Element hierarchy,
>>> with only a couple of exceptions in lxml.objectify that have one additional
>>> object field. So, just extending the freelist support to use two different
>>> lists for different struct sizes instead of just one would make it work for
>>> all of lxml already. Taking a look at Sage to see how the situation appears
>>> over there would be interesting, I guess.
>>
>> I found some chains of length 5.  This could be shortened to 4 by putting
>> the freelist at the level of Element (which is where you most care about
>> speed of object creation).
>
> It's substantially easier to keep it in the top-level base class, though.
> Otherwise, we'd need a new protocol between inheriting types as I
> previously described. That add a *lot* of complexity.
>
>
>> SageObject
>>     -> Element (_parent attribute and cdef methods)
>>     -> Vector (_degree)
>>     -> FreeModuleElement (_is_mutable)
>>     -> FreeModuleElement_generic_dense (_entries)
>>
>> SageObject
>>     -> Element (_parent attribute and cdef methods)
>>     ->sage.structure.element.Matrix (_nrows)
>>     -> sage.matrix.matrix.Matrix (_base_ring)
>>     -> Matrix_integer_dense (_entries)

I don't know that (expensive) matrices are the best example, and often
the chains are larger for elements one really cares about.

sage: def base_tree(x): return [] if x is None else [x] +
base_tree(x.__base__)
   ...:

sage: base_tree(Integer)
 [<type 'sage.rings.integer.Integer'>, <type
'sage.structure.element.EuclideanDomainElement'>, <type
'sage.structure.element.PrincipalIdealDomainElement'>, <type
'sage.structure.element.DedekindDomainElement'>, <type
'sage.structure.element.IntegralDomainElement'>, <type
'sage.structure.element.CommutativeRingElement'>, <type
'sage.structure.element.RingElement'>, <type
'sage.structure.element.ModuleElement'>, <type
'sage.structure.element.Element'>, <type
'sage.structure.sage_object.SageObject'>, <type 'object'>]

sage: base_tree(RealDoubleElement)
 [<type 'sage.rings.real_double.RealDoubleElement'>, <type
'sage.structure.element.FieldElement'>, <type
'sage.structure.element.CommutativeRingElement'>, <type
'sage.structure.element.RingElement'>, <type
'sage.structure.element.ModuleElement'>, <type
'sage.structure.element.Element'>, <type
'sage.structure.sage_object.SageObject'>, <type 'object'>]

sage: base_tree(type(mod(1, 10)))
 [<type 'sage.rings.finite_rings.integer_mod.IntegerMod_int'>, <type
'sage.rings.finite_rings.integer_mod.IntegerMod_abstract'>, <type
'sage.rings.finite_rings.element_base.FiniteRingElement'>, <type
'sage.structure.element.CommutativeRingElement'>, <type
'sage.structure.element.RingElement'>, <type
'sage.structure.element.ModuleElement'>, <type
'sage.structure.element.Element'>, <type
'sage.structure.sage_object.SageObject'>, <type 'object'>]

> Ok, so even for something as large as Sage, we'd apparently end up with
> just a couple of freelists for a given base type. That really makes it
> appear reasonable to make that number a compile time constant as well. I
> mean, even if you *really* oversize it, all you loose is the static memory
> for a couple of pointers. On a 64 bit system, if you use a freelist size of
> 8 objects and provision freelists for 8 differently sized subtypes, that's
> 8*8*8 bytes in total, or half a KB, statically allocated. Even a hundred
> times that size shouldn't hurt anyone. Unused subtype freelists really take
> almost no space and won't hurt performance either.

Elements in Sage are typically larger than 8 bytes, and our
experiments for Integer showed that the benefit (for this class)
extended well beyond 8 items. On the other hand lots of elements are
so expensive that they don't merit this at all.

I think one thing to keep in mind is that Python's heap is essentially
a "freelist" of objects of every size up to 128(?) bytes, so what are
we trying to save by putting it at the base type and going up and down
the __cinit__/__dealloc__ chain? I suppose we save the zero-ing out of
memory and a function call or two, but that's not the expensive part.
For our Integer free list, we save going up the __cinit__/__dealloc__
call, initializing a couple of members, setting the vtable pointers,
which turns out to be the bulk of the cost. I'd love to see something
like this work, if just for final classes. It may require the
introduction of new functions to determine exactly how much
cleanup/setup we want to do when inserting/removing stuff from the
pool rather than giving up the memory completely.

>> This does look cool to have though.
>
> It definitely is.

Yes!

- Robert

From stefan_ml at behnel.de  Wed Feb 27 08:24:55 2013
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 27 Feb 2013 08:24:55 +0100
Subject: [Cython] [cython-users] freelist benchmarks
In-Reply-To: <CADiQ+QDFW6suPtrcdTwvK5aX6ogXuXy0kW2H7n3ygd+BbuX6cg@mail.gmail.com>
References: <512A1B20.4050707@behnel.de>
	<CANg26EXiUESD9RyYeK3-B_5qZCVcgiRqV97s5DHf193fb-pVMw@mail.gmail.com>
	<512A38A8.3040905@behnel.de>
	<CANg26EUcM+ZbUzT+GgW-2DHusNn0301y4p_Xwi0Ride2ev_h7w@mail.gmail.com>
	<CANg26EW_QNSWmeZ-QyBaZmyxGtjGycJOm5bV+rEaQPV6LiW7hQ@mail.gmail.com>
	<512A89E9.2070104@behnel.de>
	<CAChs6_k0tcUpjWm1ZStxQswD4Hao7qQ3sW+=KgXwQeeHwsi-dg@mail.gmail.com>
	<512B2C25.7080009@behnel.de>
	<CADiQ+QDFW6suPtrcdTwvK5aX6ogXuXy0kW2H7n3ygd+BbuX6cg@mail.gmail.com>
Message-ID: <512DB4C7.8000405@behnel.de>

Robert Bradshaw, 26.02.2013 21:16:
> On Mon, Feb 25, 2013 at 1:17 AM, Stefan Behnel wrote:
>> David Roe, 25.02.2013 00:00:
>>> I changed the current type pointer check to look at tp_basicsize instead.
>>>
>>>> That made it work for almost all classes in lxml's own Element hierarchy,
>>>> with only a couple of exceptions in lxml.objectify that have one additional
>>>> object field. So, just extending the freelist support to use two different
>>>> lists for different struct sizes instead of just one would make it work for
>>>> all of lxml already. Taking a look at Sage to see how the situation appears
>>>> over there would be interesting, I guess.
>>>
>>> I found some chains of length 5.  This could be shortened to 4 by putting
>>> the freelist at the level of Element (which is where you most care about
>>> speed of object creation).
>>
>> It's substantially easier to keep it in the top-level base class, though.
>> Otherwise, we'd need a new protocol between inheriting types as I
>> previously described. That add a *lot* of complexity.
>>
>>
>>> SageObject
>>>     -> Element (_parent attribute and cdef methods)
>>>     -> Vector (_degree)
>>>     -> FreeModuleElement (_is_mutable)
>>>     -> FreeModuleElement_generic_dense (_entries)
>>>
>>> SageObject
>>>     -> Element (_parent attribute and cdef methods)
>>>     ->sage.structure.element.Matrix (_nrows)
>>>     -> sage.matrix.matrix.Matrix (_base_ring)
>>>     -> Matrix_integer_dense (_entries)
> 
> I don't know that (expensive) matrices are the best example, and often
> the chains are larger for elements one really cares about.
> 
> sage: def base_tree(x): return [] if x is None else [x] +
> base_tree(x.__base__)
>    ...:
> 
> sage: base_tree(Integer)
>  [<type 'sage.rings.integer.Integer'>, <type
> 'sage.structure.element.EuclideanDomainElement'>, <type
> 'sage.structure.element.PrincipalIdealDomainElement'>, <type
> 'sage.structure.element.DedekindDomainElement'>, <type
> 'sage.structure.element.IntegralDomainElement'>, <type
> 'sage.structure.element.CommutativeRingElement'>, <type
> 'sage.structure.element.RingElement'>, <type
> 'sage.structure.element.ModuleElement'>, <type
> 'sage.structure.element.Element'>, <type
> 'sage.structure.sage_object.SageObject'>, <type 'object'>]
> 
> sage: base_tree(RealDoubleElement)
>  [<type 'sage.rings.real_double.RealDoubleElement'>, <type
> 'sage.structure.element.FieldElement'>, <type
> 'sage.structure.element.CommutativeRingElement'>, <type
> 'sage.structure.element.RingElement'>, <type
> 'sage.structure.element.ModuleElement'>, <type
> 'sage.structure.element.Element'>, <type
> 'sage.structure.sage_object.SageObject'>, <type 'object'>]
> 
> sage: base_tree(type(mod(1, 10)))
>  [<type 'sage.rings.finite_rings.integer_mod.IntegerMod_int'>, <type
> 'sage.rings.finite_rings.integer_mod.IntegerMod_abstract'>, <type
> 'sage.rings.finite_rings.element_base.FiniteRingElement'>, <type
> 'sage.structure.element.CommutativeRingElement'>, <type
> 'sage.structure.element.RingElement'>, <type
> 'sage.structure.element.ModuleElement'>, <type
> 'sage.structure.element.Element'>, <type
> 'sage.structure.sage_object.SageObject'>, <type 'object'>]

My original question was if they have differently sized object structs or
not. Those that don't would currently go into the same freelist.


>> Ok, so even for something as large as Sage, we'd apparently end up with
>> just a couple of freelists for a given base type. That really makes it
>> appear reasonable to make that number a compile time constant as well. I
>> mean, even if you *really* oversize it, all you loose is the static memory
>> for a couple of pointers. On a 64 bit system, if you use a freelist size of
>> 8 objects and provision freelists for 8 differently sized subtypes, that's
>> 8*8*8 bytes in total, or half a KB, statically allocated. Even a hundred
>> times that size shouldn't hurt anyone. Unused subtype freelists really take
>> almost no space and won't hurt performance either.
> 
> Elements in Sage are typically larger than 8 bytes

I wasn't adding up the size of the objects, only of the pointers in the
freelists. If the objects end up in the freelist, they'll also be used on
the next instantiation, so their size doesn't really matter.


> and our
> experiments for Integer showed that the benefit (for this class)
> extended well beyond 8 items. On the other hand lots of elements are
> so expensive that they don't merit this at all.
>
> I think one thing to keep in mind is that Python's heap is essentially
> a "freelist" of objects of every size up to 128(?) bytes, so what are
> we trying to save by putting it at the base type and going up and down
> the __cinit__/__dealloc__ chain?

Allocation still seems to take its time.


> I suppose we save the zero-ing out of
> memory and a function call or two, but that's not the expensive part.

And I noticed now that we still have to do the zeroing out in order to
initialise C typed attributes. And that it's actually not trivial to figure
out in what cases we can safely put a subtype into the freelist. There are
a couple of special cases in CPython's object allocation, e.g. for heap
types. Their instances own a reference to the type, which is not the case
for static types.


> For our Integer free list, we save going up the __cinit__/__dealloc__
> call, initializing a couple of members, setting the vtable pointers,
> which turns out to be the bulk of the cost.

And your hierarchy examples above show that that they are implemented
across multiple modules. I can imagine that being a major problem as the C
compiler can't inline the tp_new calls in that case, can't really reorder
the struct field assignments, etc.

I imagine that the freelist could leave the initial vtable untouched in
some cases, but that would mean that we need a freelist per actual type,
instead of object struct size.

Now, if we move the freelist handling into each subtype (as you and Mark
proposed already), we'd get some of this for free, because the objects that
get freed are already properly set up for the specific type, including
vtable etc. All that remains to be done is to zero out the (known) C typed
attributes, set the (known) object attributes to None, and call any
__cinit__() methods in the super types to do the rest for us. We might have
to do it in the right order, i.e. initialise some attributes, call the
corresponding __cinit__() method, initialise some more attributes, ...

So, basically, we'd manually inline the bottom-up aggregation of all tp_new
functions into the current one, skipping those operations that we don't
consider necessary in the freelist case, such as the vtable setup.

Now, the only remaining issue is how to get at the __cinit__() functions if
the base type isn't in the same module, but as Mark proposed, that could
still be done if we require it to be exported in a C-API (and assume that
it doesn't exist if not?). Would be better to know it at compile time,
though...

Stefan


From robertwb at gmail.com  Wed Feb 27 09:54:24 2013
From: robertwb at gmail.com (Robert Bradshaw)
Date: Wed, 27 Feb 2013 00:54:24 -0800
Subject: [Cython] [cython-users] freelist benchmarks
In-Reply-To: <512DB4C7.8000405@behnel.de>
References: <512A1B20.4050707@behnel.de>
	<CANg26EXiUESD9RyYeK3-B_5qZCVcgiRqV97s5DHf193fb-pVMw@mail.gmail.com>
	<512A38A8.3040905@behnel.de>
	<CANg26EUcM+ZbUzT+GgW-2DHusNn0301y4p_Xwi0Ride2ev_h7w@mail.gmail.com>
	<CANg26EW_QNSWmeZ-QyBaZmyxGtjGycJOm5bV+rEaQPV6LiW7hQ@mail.gmail.com>
	<512A89E9.2070104@behnel.de>
	<CAChs6_k0tcUpjWm1ZStxQswD4Hao7qQ3sW+=KgXwQeeHwsi-dg@mail.gmail.com>
	<512B2C25.7080009@behnel.de>
	<CADiQ+QDFW6suPtrcdTwvK5aX6ogXuXy0kW2H7n3ygd+BbuX6cg@mail.gmail.com>
	<512DB4C7.8000405@behnel.de>
Message-ID: <CADiQ+QDKjAqWR4074psBLOLyDFuaALN2uE=sWCgs3dL8+zy80g@mail.gmail.com>

On Tue, Feb 26, 2013 at 11:24 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Robert Bradshaw, 26.02.2013 21:16:
>> On Mon, Feb 25, 2013 at 1:17 AM, Stefan Behnel wrote:
>>> David Roe, 25.02.2013 00:00:
>>>> I changed the current type pointer check to look at tp_basicsize instead.
>>>>
>>>>> That made it work for almost all classes in lxml's own Element hierarchy,
>>>>> with only a couple of exceptions in lxml.objectify that have one additional
>>>>> object field. So, just extending the freelist support to use two different
>>>>> lists for different struct sizes instead of just one would make it work for
>>>>> all of lxml already. Taking a look at Sage to see how the situation appears
>>>>> over there would be interesting, I guess.
>>>>
>>>> I found some chains of length 5.  This could be shortened to 4 by putting
>>>> the freelist at the level of Element (which is where you most care about
>>>> speed of object creation).
>>>
>>> It's substantially easier to keep it in the top-level base class, though.
>>> Otherwise, we'd need a new protocol between inheriting types as I
>>> previously described. That add a *lot* of complexity.
>>>
>>>
>>>> SageObject
>>>>     -> Element (_parent attribute and cdef methods)
>>>>     -> Vector (_degree)
>>>>     -> FreeModuleElement (_is_mutable)
>>>>     -> FreeModuleElement_generic_dense (_entries)
>>>>
>>>> SageObject
>>>>     -> Element (_parent attribute and cdef methods)
>>>>     ->sage.structure.element.Matrix (_nrows)
>>>>     -> sage.matrix.matrix.Matrix (_base_ring)
>>>>     -> Matrix_integer_dense (_entries)
>>
>> I don't know that (expensive) matrices are the best example, and often
>> the chains are larger for elements one really cares about.
>>
>> sage: def base_tree(x): return [] if x is None else [x] +
>> base_tree(x.__base__)
>>    ...:
>>
>> sage: base_tree(Integer)
>>  [<type 'sage.rings.integer.Integer'>, <type
>> 'sage.structure.element.EuclideanDomainElement'>, <type
>> 'sage.structure.element.PrincipalIdealDomainElement'>, <type
>> 'sage.structure.element.DedekindDomainElement'>, <type
>> 'sage.structure.element.IntegralDomainElement'>, <type
>> 'sage.structure.element.CommutativeRingElement'>, <type
>> 'sage.structure.element.RingElement'>, <type
>> 'sage.structure.element.ModuleElement'>, <type
>> 'sage.structure.element.Element'>, <type
>> 'sage.structure.sage_object.SageObject'>, <type 'object'>]
>>
>> sage: base_tree(RealDoubleElement)
>>  [<type 'sage.rings.real_double.RealDoubleElement'>, <type
>> 'sage.structure.element.FieldElement'>, <type
>> 'sage.structure.element.CommutativeRingElement'>, <type
>> 'sage.structure.element.RingElement'>, <type
>> 'sage.structure.element.ModuleElement'>, <type
>> 'sage.structure.element.Element'>, <type
>> 'sage.structure.sage_object.SageObject'>, <type 'object'>]
>>
>> sage: base_tree(type(mod(1, 10)))
>>  [<type 'sage.rings.finite_rings.integer_mod.IntegerMod_int'>, <type
>> 'sage.rings.finite_rings.integer_mod.IntegerMod_abstract'>, <type
>> 'sage.rings.finite_rings.element_base.FiniteRingElement'>, <type
>> 'sage.structure.element.CommutativeRingElement'>, <type
>> 'sage.structure.element.RingElement'>, <type
>> 'sage.structure.element.ModuleElement'>, <type
>> 'sage.structure.element.Element'>, <type
>> 'sage.structure.sage_object.SageObject'>, <type 'object'>]
>
> My original question was if they have differently sized object structs or
> not. Those that don't would currently go into the same freelist.

They all add to the struct at the leaf subclass.

>>> Ok, so even for something as large as Sage, we'd apparently end up with
>>> just a couple of freelists for a given base type. That really makes it
>>> appear reasonable to make that number a compile time constant as well. I
>>> mean, even if you *really* oversize it, all you loose is the static memory
>>> for a couple of pointers. On a 64 bit system, if you use a freelist size of
>>> 8 objects and provision freelists for 8 differently sized subtypes, that's
>>> 8*8*8 bytes in total, or half a KB, statically allocated. Even a hundred
>>> times that size shouldn't hurt anyone. Unused subtype freelists really take
>>> almost no space and won't hurt performance either.
>>
>> Elements in Sage are typically larger than 8 bytes
>
> I wasn't adding up the size of the objects, only of the pointers in the
> freelists. If the objects end up in the freelist, they'll also be used on
> the next instantiation, so their size doesn't really matter.

It does if you use a lot of them, then they just sit around forever,
but I suppose if you use them once you're willing to pay the price. It
also doesn't make sense for a lot of them that are rather expensive
anyways (e.g. every kind of matrix or polynomial specialization).

>> and our
>> experiments for Integer showed that the benefit (for this class)
>> extended well beyond 8 items. On the other hand lots of elements are
>> so expensive that they don't merit this at all.
>>
>> I think one thing to keep in mind is that Python's heap is essentially
>> a "freelist" of objects of every size up to 128(?) bytes, so what are
>> we trying to save by putting it at the base type and going up and down
>> the __cinit__/__dealloc__ chain?
>
> Allocation still seems to take its time.

Yes, it does.

>> I suppose we save the zero-ing out of
>> memory and a function call or two, but that's not the expensive part.
>
> And I noticed now that we still have to do the zeroing out in order to
> initialise C typed attributes.

Yep.

> And that it's actually not trivial to figure
> out in what cases we can safely put a subtype into the freelist. There are
> a couple of special cases in CPython's object allocation, e.g. for heap
> types. Their instances own a reference to the type, which is not the case
> for static types.
>
>
>> For our Integer free list, we save going up the __cinit__/__dealloc__
>> call, initializing a couple of members, setting the vtable pointers,
>> which turns out to be the bulk of the cost.
>
> And your hierarchy examples above show that that they are implemented
> across multiple modules. I can imagine that being a major problem as the C
> compiler can't inline the tp_new calls in that case, can't really reorder
> the struct field assignments, etc.

Yes. And some of those modules are already 1000s of lines, so it's not
like we should just put them all in one (though perhaps someday we'll
support some kind of static linking...).

> I imagine that the freelist could leave the initial vtable untouched in
> some cases, but that would mean that we need a freelist per actual type,
> instead of object struct size.
>
> Now, if we move the freelist handling into each subtype (as you and Mark
> proposed already), we'd get some of this for free, because the objects that
> get freed are already properly set up for the specific type, including
> vtable etc. All that remains to be done is to zero out the (known) C typed
> attributes, set the (known) object attributes to None, and call any
> __cinit__() methods in the super types to do the rest for us. We might have
> to do it in the right order, i.e. initialise some attributes, call the
> corresponding __cinit__() method, initialise some more attributes, ...
>
> So, basically, we'd manually inline the bottom-up aggregation of all tp_new
> functions into the current one, skipping those operations that we don't
> consider necessary in the freelist case, such as the vtable setup.
>
> Now, the only remaining issue is how to get at the __cinit__() functions if
> the base type isn't in the same module, but as Mark proposed, that could
> still be done if we require it to be exported in a C-API (and assume that
> it doesn't exist if not?). Would be better to know it at compile time,
> though...

Yes, and that's still going to (potentially) be expensive. I'd rather
have a way of controlling what, if anything, gets zero'd out/set to
None, as most of that (in Sage's case at least) will still be valid
for the newly-reused type or instantly over-written (though perhaps
the default could be to call __dealloc__/__cinit__). With this we
could skip going up and down the type hierarchy at all.

- Robert

From stefan_ml at behnel.de  Wed Feb 27 14:09:16 2013
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 27 Feb 2013 14:09:16 +0100
Subject: [Cython] freelist benchmarks
In-Reply-To: <CADiQ+QDKjAqWR4074psBLOLyDFuaALN2uE=sWCgs3dL8+zy80g@mail.gmail.com>
References: <512A1B20.4050707@behnel.de>
	<CANg26EXiUESD9RyYeK3-B_5qZCVcgiRqV97s5DHf193fb-pVMw@mail.gmail.com>
	<512A38A8.3040905@behnel.de>
	<CANg26EUcM+ZbUzT+GgW-2DHusNn0301y4p_Xwi0Ride2ev_h7w@mail.gmail.com>
	<CANg26EW_QNSWmeZ-QyBaZmyxGtjGycJOm5bV+rEaQPV6LiW7hQ@mail.gmail.com>
	<512A89E9.2070104@behnel.de>
	<CAChs6_k0tcUpjWm1ZStxQswD4Hao7qQ3sW+=KgXwQeeHwsi-dg@mail.gmail.com>
	<512B2C25.7080009@behnel.de>
	<CADiQ+QDFW6suPtrcdTwvK5aX6ogXuXy0kW2H7n3ygd+BbuX6cg@mail.gmail.com>
	<512DB4C7.8000405@behnel.de>
	<CADiQ+QDKjAqWR4074psBLOLyDFuaALN2uE=sWCgs3dL8+zy80g@mail.gmail.com>
Message-ID: <512E057C.2010507@behnel.de>

Robert Bradshaw, 27.02.2013 09:54:
> On Tue, Feb 26, 2013 at 11:24 PM, Stefan Behnel wrote:
>> I imagine that the freelist could leave the initial vtable untouched in
>> some cases, but that would mean that we need a freelist per actual type,
>> instead of object struct size.
>>
>> Now, if we move the freelist handling into each subtype (as you and Mark
>> proposed already), we'd get some of this for free, because the objects that
>> get freed are already properly set up for the specific type, including
>> vtable etc. All that remains to be done is to zero out the (known) C typed
>> attributes, set the (known) object attributes to None, and call any
>> __cinit__() methods in the super types to do the rest for us. We might have
>> to do it in the right order, i.e. initialise some attributes, call the
>> corresponding __cinit__() method, initialise some more attributes, ...
>>
>> So, basically, we'd manually inline the bottom-up aggregation of all tp_new
>> functions into the current one, skipping those operations that we don't
>> consider necessary in the freelist case, such as the vtable setup.
>>
>> Now, the only remaining issue is how to get at the __cinit__() functions if
>> the base type isn't in the same module, but as Mark proposed, that could
>> still be done if we require it to be exported in a C-API (and assume that
>> it doesn't exist if not?). Would be better to know it at compile time,
>> though...
> 
> Yes, and that's still going to (potentially) be expensive. I'd rather
> have a way of controlling what, if anything, gets zero'd out/set to
> None, as most of that (in Sage's case at least) will still be valid
> for the newly-reused type or instantly over-written (though perhaps
> the default could be to call __dealloc__/__cinit__). With this we
> could skip going up and down the type hierarchy at all.

I don't think the zeroing is a problem. Just bursting out static data to
memory should be plenty fast these days and not incur any wait cycles or
pipeline stalls, as long as the compiler/processor can figure out that
there are no interdependencies between the assignments. The None
assignments may be a problem due to the INCREFs, but even in that case, the
C compiler and processor should be able to detect that they are all just
incrementing the same address in memory and may end up reducing a series of
updates into one. The only real problem are the calls to __cinit__(), which
run user code and can thus do anything. If they can't be inlined, the C
compiler needs to lessen a lot of its assumptions.

Would it make sense to require users to implement __cinit__() as an inline
method in a .pxd file if they want to use a freelist on a subtype? Or would
that be overly restrictive? It would prevent them from using module
globals, for example. That's quite a restriction normally, but I'm not sure
how much it hurts the "average" code in the specific case of __cinit__().

Stefan


From dave.hirschfeld at gmail.com  Wed Feb 27 14:17:40 2013
From: dave.hirschfeld at gmail.com (Dave Hirschfeld)
Date: Wed, 27 Feb 2013 13:17:40 +0000 (UTC)
Subject: [Cython] Non-deterministic behavoiur?
Message-ID: <loom.20130227T135719-981@post.gmane.org>

Using the following test code:

import numpy as np
from lapack import dgelsy
from numpy.linalg import lstsq

A = np.array([[ 0.12, -8.19,  7.69, -2.26, -4.71],
       [-6.91,  2.22, -5.12, -9.08,  9.96],
       [-3.33, -8.94, -6.72, -4.4 , -9.98],
       [ 3.97,  3.33, -2.74, -7.92, -3.2 ]])
#
b = np.array([[ 7.3 ,  0.47, -6.28],
       [ 1.33,  6.58, -3.42],
       [ 2.68, -1.71,  3.46],
       [-9.62, -0.79,  0.41]])
#
print '#################'
print '# ACTUAL RESULT #'
print '#################'
print lstsq(A, b)[0]
print
print '#################'
print '# DGELSY RESULT #'
print '#################'
print dgelsy(A, b)

I get:

#################
# ACTUAL RESULT #
#################
[[-0.6858 -0.2431  0.0642]
 [-0.795  -0.0836  0.2118]
 [ 0.3767  0.1208 -0.6541]
 [ 0.2885 -0.2415  0.4176]
 [ 0.2916  0.3525 -0.3015]]

#################
# DGELSY RESULT #
#################
[[-0.6858 -0.2431  0.0642]
 [-0.795  -0.0836  0.2118]
 [ 0.3767  0.1208 -0.6541]
 [ 0.2885 -0.2415  0.4176]
 [ 0.2916  0.3525 -0.3015]]


All well and good, however if I type the `tmp` variable as a memview in the 
following code

cdef double[:] res
cdef double[:,:] tmp
tmp = np.zeros([ldb, nrhs], order='F', dtype=np.float64)
tmp[:b.shape[0]] = b
res = np.ravel(tmp, order='F')

the result changes?!?

#################
# DGELSY RESULT #
#################
[[-0.7137 -0.2429  0.0876]
 [-0.2882 -0.0884 -0.2117]
 [-0.4282  0.1284  0.0185]
 [ 0.9564 -0.2478 -0.1404]
 [ 0.3625  0.3519 -0.3607]]


Remove the `cdef double[:,:] tmp` and I'm back to the correct result.
Does this make any sense?

To try and figure out what was going on I put in a couple of debugging print 
statements:

print 'res = ', repr(np.asarray(res))
print 'res.flags = {{{flags}}}'.format(flags=np.asarray(res).flags)


Only changing these lines resulted in the same incorrect result

#################
# DGELSY RESULT #
#################
res =  array([ 7.3 ,  1.33,  2.68, -9.62,  0.,
               0.47,  6.58, -1.71, -0.79,  0.,
              -6.28, -3.42,  3.46,  0.41,  0.])

res.flags = {  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False}

[[-0.7137 -0.2429  0.0876]
 [-0.2882 -0.0884 -0.2117]
 [-0.4282  0.1284  0.0185]
 [ 0.9564 -0.2478 -0.1404]
 [ 0.3625  0.3519 -0.3607]]


Removing (only) the print statements again gave me the correct results.

So, it seems either typing the array as a memview or printing res
will screw up the calculation.

The cython code is given below. Any ideas if this is a cython bug or something 
I'm doing wrong?

Thanks,
Dave


cdef extern from "mkl_lapack.h" nogil:
    void DGELSY(const MKL_INT* m,
                const MKL_INT* n,
                const MKL_INT* nrhs,
                double* a,
                const MKL_INT* lda,
                double* b,
                const MKL_INT* ldb,
                MKL_INT* jpvt,
                const double* rcond,
                MKL_INT* rank,
                double* work,
                const MKL_INT* lwork,
                MKL_INT* info)


@cython.embedsignature(True)
@cython.boundscheck(False)
@cython.wraparound(False)
@cython.cdivision(True)
def dgelsy(double[:,:] A,
          double[:,:] b,
          double rcond=1e-15,
          overwrite_A=False,
          overwrite_b=False):
    cdef MKL_INT rank, info
    cdef MKL_INT m = A.shape[0]
    cdef MKL_INT n = A.shape[1]
    cdef MKL_INT nrhs = b.shape[1]
    cdef MKL_INT lda = m
    cdef MKL_INT ldb = max(m, n)
    cdef MKL_INT lwork = -1
    cdef double worksize = 0
    #cdef double[:,:] tmp
    cdef double[:] res, work
    cdef MKL_INT[:] jpvt

    if b.shape[0] != m:
        message = "b.shape[0] must be equal to A.shape[0].\n"
        message += "b.shape[0] = {b.shape[0]}\n"
        message += "A.shape[0] = {A.shape[0]}\n"
        raise MKL_LAPACK_ERROR(message.format(A=A, b=b))

    flags = np.asarray(A).flags
    if not flags['F_CONTIGUOUS'] or not overwrite_A:
        A = A.copy_fortran()

    flags = np.asarray(b).flags
    if not flags['F_CONTIGUOUS'] or not overwrite_b or b.shape[0] < n:
        tmp = np.zeros([ldb, nrhs], order='F', dtype=np.float64)
        tmp[:b.shape[0]] = b
        res = np.ravel(tmp, order='F')
    else:
        res = np.ravel(b, order='F')

    #print 'res = ', repr(np.asarray(res))
    #print 'res.flags = {{{flags}}}'.format(flags=np.asarray(res).flags)

    jpvt = np.empty(n, dtype=np.int32)
    DGELSY(&m, &n, &nrhs, &A[0,0], &lda, &res[0], &ldb, &jpvt[0], &rcond, &rank, 
&worksize, &lwork, &info)
    if info != 0:
        message = "Parameter {i} had an illegal value when calling gelsy."
        raise MKL_LAPACK_ERROR(message.format(i=info))

    lwork = int(worksize)
    work = np.empty(lwork, dtype=np.float64)
    DGELSY(&m, &n, &nrhs, &A[0,0], &lda, &res[0], &ldb, &jpvt[0], &rcond, &rank, 
&worksize, &lwork, &info)
    if info != 0:
        message = "Parameter {i} had an illegal value when calling gelsy."
        raise MKL_LAPACK_ERROR(message.format(i=info))
    return np.asarray(res).reshape(nrhs, -1).T[:n]
#


From robertwb at gmail.com  Wed Feb 27 19:42:26 2013
From: robertwb at gmail.com (Robert Bradshaw)
Date: Wed, 27 Feb 2013 10:42:26 -0800
Subject: [Cython] freelist benchmarks
In-Reply-To: <512E057C.2010507@behnel.de>
References: <512A1B20.4050707@behnel.de>
	<CANg26EXiUESD9RyYeK3-B_5qZCVcgiRqV97s5DHf193fb-pVMw@mail.gmail.com>
	<512A38A8.3040905@behnel.de>
	<CANg26EUcM+ZbUzT+GgW-2DHusNn0301y4p_Xwi0Ride2ev_h7w@mail.gmail.com>
	<CANg26EW_QNSWmeZ-QyBaZmyxGtjGycJOm5bV+rEaQPV6LiW7hQ@mail.gmail.com>
	<512A89E9.2070104@behnel.de>
	<CAChs6_k0tcUpjWm1ZStxQswD4Hao7qQ3sW+=KgXwQeeHwsi-dg@mail.gmail.com>
	<512B2C25.7080009@behnel.de>
	<CADiQ+QDFW6suPtrcdTwvK5aX6ogXuXy0kW2H7n3ygd+BbuX6cg@mail.gmail.com>
	<512DB4C7.8000405@behnel.de>
	<CADiQ+QDKjAqWR4074psBLOLyDFuaALN2uE=sWCgs3dL8+zy80g@mail.gmail.com>
	<512E057C.2010507@behnel.de>
Message-ID: <CADiQ+QCkhOG8H4R0a0HGYLoXTG+P-1_T8AcPNE8w=PA5tECXnQ@mail.gmail.com>

On Wed, Feb 27, 2013 at 5:09 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Robert Bradshaw, 27.02.2013 09:54:
>> On Tue, Feb 26, 2013 at 11:24 PM, Stefan Behnel wrote:
>>> I imagine that the freelist could leave the initial vtable untouched in
>>> some cases, but that would mean that we need a freelist per actual type,
>>> instead of object struct size.
>>>
>>> Now, if we move the freelist handling into each subtype (as you and Mark
>>> proposed already), we'd get some of this for free, because the objects that
>>> get freed are already properly set up for the specific type, including
>>> vtable etc. All that remains to be done is to zero out the (known) C typed
>>> attributes, set the (known) object attributes to None, and call any
>>> __cinit__() methods in the super types to do the rest for us. We might have
>>> to do it in the right order, i.e. initialise some attributes, call the
>>> corresponding __cinit__() method, initialise some more attributes, ...
>>>
>>> So, basically, we'd manually inline the bottom-up aggregation of all tp_new
>>> functions into the current one, skipping those operations that we don't
>>> consider necessary in the freelist case, such as the vtable setup.
>>>
>>> Now, the only remaining issue is how to get at the __cinit__() functions if
>>> the base type isn't in the same module, but as Mark proposed, that could
>>> still be done if we require it to be exported in a C-API (and assume that
>>> it doesn't exist if not?). Would be better to know it at compile time,
>>> though...
>>
>> Yes, and that's still going to (potentially) be expensive. I'd rather
>> have a way of controlling what, if anything, gets zero'd out/set to
>> None, as most of that (in Sage's case at least) will still be valid
>> for the newly-reused type or instantly over-written (though perhaps
>> the default could be to call __dealloc__/__cinit__). With this we
>> could skip going up and down the type hierarchy at all.
>
> I don't think the zeroing is a problem. Just bursting out static data to
> memory should be plenty fast these days and not incur any wait cycles or
> pipeline stalls, as long as the compiler/processor can figure out that
> there are no interdependencies between the assignments. The None
> assignments may be a problem due to the INCREFs, but even in that case, the
> C compiler and processor should be able to detect that they are all just
> incrementing the same address in memory and may end up reducing a series of
> updates into one. The only real problem are the calls to __cinit__(), which
> run user code and can thus do anything. If they can't be inlined, the C
> compiler needs to lessen a lot of its assumptions.
>
> Would it make sense to require users to implement __cinit__() as an inline
> method in a .pxd file if they want to use a freelist on a subtype? Or would
> that be overly restrictive? It would prevent them from using module
> globals, for example. That's quite a restriction normally, but I'm not sure
> how much it hurts the "average" code in the specific case of __cinit__().

It would hurt in the couple of examples I've thought about (e.g. fast
Sage elements, where one wants to set the Parent field correctly).

- Robert

From dave.hirschfeld at gmail.com  Wed Feb 27 20:05:08 2013
From: dave.hirschfeld at gmail.com (Dave Hirschfeld)
Date: Wed, 27 Feb 2013 19:05:08 +0000 (UTC)
Subject: [Cython] MemoryViews require writeable arrays?
Message-ID: <loom.20130227T195717-560@post.gmane.org>

%%cython
cimport cython

import numpy as np
cimport numpy as np

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.cdivision(True)
cpdef double[:] return_one(double[:] x):
    return np.array([1.0])


In [43]: x = randn(3)
    ...: return_one(x)
Out[43]: <MemoryView of 'ndarray' at 0x8ae14e0>

In [44]: x.flags['WRITEABLE'] = False
    ...: return_one(x)
Traceback (most recent call last):

  File "<ipython-input-44-4fbbd1035d56>", line 2, in <module>
    return_one(x)

  File "_cython_magic_7761e77f78c4e321261152684b47c674.pyx", line 11, in 
_cython_magic_7761e77f78c4e321261152684b47c674.return_one 
(C:\Users\dhirschfeld\.ipython\cython\_cython_magic_7761e77f78c4e321261152684b47
c674.c:1727)

  File "stringsource", line 619, in View.MemoryView.memoryview_cwrapper 
(C:\Users\dhirschfeld\.ipython\cython\_cython_magic_7761e77f78c4e321261152684b47
c674.c:8819)

  File "stringsource", line 327, in View.MemoryView.memoryview.__cinit__ 
(C:\Users\dhirschfeld\.ipython\cython\_cython_magic_7761e77f78c4e321261152684b47
c674.c:5594)

ValueError: buffer source array is read-only


Is this a required restriction? Is there any workaround?
The context is calling cython routines using IPython.parallel.
IIUC any input arrays sent over zmq are necessarily read-only.

As can be seen with the example, even if we don't modify (or use)
the input array at all we still get the error.

Any help, esp. in regards to a workaround would be greatly appreciated!

Thanks,
Dave


From dave.hirschfeld at gmail.com  Wed Feb 27 20:16:08 2013
From: dave.hirschfeld at gmail.com (Dave Hirschfeld)
Date: Wed, 27 Feb 2013 19:16:08 +0000 (UTC)
Subject: [Cython] MemoryViews require writeable arrays?
References: <loom.20130227T195717-560@post.gmane.org>
Message-ID: <loom.20130227T201339-966@post.gmane.org>

Dave Hirschfeld <dave.hirschfeld at ...> writes:

> 
> cpdef double[:] return_one(double[:] x):
>     return np.array([1.0])
> 
> In [43]: x = randn(3)
>     ...: return_one(x)
> Out[43]: <MemoryView of 'ndarray' at 0x8ae14e0>
> 
> In [44]: x.flags['WRITEABLE'] = False
>     ...: return_one(x)
> ValueError: buffer source array is read-only
> 
> 
> Any help, esp. in regards to a workaround would be greatly appreciated!
> 
> Thanks,
> Dave
> 
> 


It seems using the numpy buffer interface works but I guess it would still
be good if this worked for memviews too:


%%cython
cimport cython

import numpy as np
cimport numpy as np

ctypedef np.float64_t float64_t

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.cdivision(True)
cpdef double[:] return_one_np(np.ndarray[float64_t, ndim=1] x):
    return np.array([1.0])


In [203]: return_one_np(x)
Out[203]: <MemoryView of 'ndarray' at 0x7d2d558>


Cheers,
Dave


From dave.hirschfeld at gmail.com  Thu Feb 28 09:45:21 2013
From: dave.hirschfeld at gmail.com (Dave Hirschfeld)
Date: Thu, 28 Feb 2013 08:45:21 +0000 (UTC)
Subject: [Cython] Non-deterministic behavoiur?
References: <loom.20130227T135719-981@post.gmane.org>
Message-ID: <loom.20130228T094200-635@post.gmane.org>

Dave Hirschfeld <dave.hirschfeld at ...> writes:

> 
> Using the following test code:
<snip>
> 
> So, it seems either typing the array as a memview or printing res
> will screw up the calculation.
> 
> The cython code is given below. Any ideas if this is a cython bug or something 
> I'm doing wrong?
> 
> Thanks,
> Dave
> 

To answer my own question, it can't be that a simple print statement will
change the program so I must be doing something wrong! It makes it hard
to track down when it gives the right answer most of the time and segfaults
randomly when nothing seems to have changed. I'm sure it's just incorrect
arguments to dgelsy so I'll look into that...

-Dave


From szport at gmail.com  Thu Feb 28 12:49:45 2013
From: szport at gmail.com (ZS)
Date: Thu, 28 Feb 2013 14:49:45 +0300
Subject: [Cython] About IndexNode and unicode[index]
Message-ID: <CAPOE21Ti03-WWad2jEbHMr5029NqrP6zrj-xrZaiigMq8UT+uw@mail.gmail.com>

Looking into IndexNode class in ExprNode.py I have seen a possibility
for addition of more fast code path for unicode[index]  as it done in
method `generate_setitem_code` in case of lists.

This is files for evaluation of performance difference:

#### unicode_index.h

/* This is striped version of __Pyx_GetItemInt_Unicode_Fast */
#include "unicodeobject.h"

static inline Py_UCS4 unicode_char(PyObject* ustring, Py_ssize_t i);

static inline Py_UCS4 unicode_char(PyObject* ustring, Py_ssize_t i) {
#if CYTHON_PEP393_ENABLED
    if (PyUnicode_READY(ustring) < 0) return (Py_UCS4)-1;
#endif
    return __Pyx_PyUnicode_READ_CHAR(ustring, i);
}

##### unicode_index.pyx

# coding: utf-8

cdef extern from 'unicode_index.h':
    inline Py_UCS4 unicode_char(unicode ustring, int i)

cdef unicode text = u"abcdefghigklmnopqrstuvwxyzabcdefghigklmnopqrstuvwxyz"

def f_1(unicode text):
    cdef int i, j
    cdef int n = len(text)
    cdef Py_UCS4 ch

    for j from 0<=j<=1000000:
        for i from 0<=i<=n-1:
            ch = text[i]

def f_2(unicode text):
    cdef int i, j
    cdef int n = len(text)
    cdef Py_UCS4 ch

    for j from 0<=j<=1000000:
        for i from 0<=i<=n-1:
            ch = unicode_char(text, i)

def test_1():
    f_1(text)

def test_2():
    f_2(text)

Timing results:

(py33) zbook:mytests $ python3.3 -m timeit -n 100 -r 10 -s "from
mytests.unicode_index import test_1" "test_1()"
100 loops, best of 10: 89 msec per loop
(py33) zbook:mytests $ python3.3 -m timeit -n 100 -r 10 -s "from
mytests.unicode_index import test_2" "test_2()"
100 loops, best of 10: 46.1 msec per loop

in setup.py globally:

       "boundscheck": False
       "wraparound": False
       "nonecheck": False

Zaur Shibzukhov

From dave.hirschfeld at gmail.com  Thu Feb 28 12:55:31 2013
From: dave.hirschfeld at gmail.com (Dave Hirschfeld)
Date: Thu, 28 Feb 2013 11:55:31 +0000 (UTC)
Subject: [Cython] Non-deterministic behavoiur?
References: <loom.20130227T135719-981@post.gmane.org>
	<loom.20130228T094200-635@post.gmane.org>
Message-ID: <loom.20130228T124723-675@post.gmane.org>

Dave Hirschfeld <dave.hirschfeld at ...> writes:

> 
> Dave Hirschfeld <dave.hirschfeld at ...> writes:
> 
> > 
> > Using the following test code:
> <snip>
> > 
> > So, it seems either typing the array as a memview or printing res
> > will screw up the calculation.
> > 
> > The cython code is given below. Any ideas if this is a cython bug or 
something 
> > I'm doing wrong?
> > 
> > Thanks,
> > Dave
> > 
> 
> To answer my own question, it can't be that a simple print statement will
> change the program so I must be doing something wrong! It makes it hard
> to track down when it gives the right answer most of the time and segfaults
> randomly when nothing seems to have changed. I'm sure it's just incorrect
> arguments to dgelsy so I'll look into that...
> 
> -Dave
> 
> 

And for those following, the obvious error was in using the double `worksize`
instead of the array of size n, `work` in the 2nd call to DGELSY.

DGELSY(&m, &n, &nrhs, &A[0,0], &lda, &res[0], &ldb, &jpvt[0], &rcond, &rank, 
&worksize, &lwork, &info)

Sorry for the noise.

-Dave


From dave.hirschfeld at gmail.com  Thu Feb 28 13:11:07 2013
From: dave.hirschfeld at gmail.com (Dave Hirschfeld)
Date: Thu, 28 Feb 2013 12:11:07 +0000 (UTC)
Subject: [Cython] MemoryView Casting slow compared to ndarray buffer syntax
Message-ID: <loom.20130228T130710-628@post.gmane.org>

%%cython
cimport cython

import numpy as np
cimport numpy as np

ctypedef np.float64_t float64_t

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.cdivision(True)
def echo_numpy(np.ndarray[float64_t, ndim=1] x):
    return x

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.cdivision(True)
def echo_memview(double[:] x):
    return np.asarray(x)

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.cdivision(True)
def echo_memview_nocast(double[:] x):
    return x


In [19]: %timeit echo_memview(x)
    ...: %timeit echo_memview_nocast(x)
    ...: %timeit echo_numpy(x)
10000 loops, best of 3: 38.1 ?s per loop
100000 loops, best of 3: 5.58 ?s per loop
1000000 loops, best of 3: 749 ns per loop

In [20]: 38.1e-6/749e-9
Out[20]: 50.86782376502002

In [21]: 5.58e-6/749e-9
Out[21]: 7.449933244325767

So it seems that the MemoryView is 50x slower than using the ndarray buffer
syntax and even 7.5x slower without casting to an array.

Is there anything that can be done about this or is it jsut something to be
aware of and use each of them in the situations where they perform best?

Thanks,
Dave


From yury at shurup.com  Thu Feb 28 13:58:08 2013
From: yury at shurup.com (Yury V. Zaytsev)
Date: Thu, 28 Feb 2013 13:58:08 +0100
Subject: [Cython] Class methods returning C++ class references are not dealt
 with correctly?
Message-ID: <1362056288.2913.28.camel@newpride>

Hi,

I'm sorry if my question would appear to be trivial, but what am I
supposed to do, if I want to wrap class methods, that return a reference
to another class?

>From reading the list, I've gathered that apparently the best strategy
of dealing with references is just to not to use them (convert to
pointers immediately), because of some scoping rules issues.

It works for me for a simple case of POD types, like

    cdef extern from "test.h":
        int& foo()

    cdef int* x = &foo()

but in a more complex case, Cython generates incorrect C++ code (first
it declares a reference, then assigns to it, which, of course, doesn't
even compile):

    cdef extern from "token.h":
        cppclass Token:
            Token(const Datum&) except +

    cdef extern from "tokenstack.h":
        cppclass TokenStack:
            Token& top() except +

    cdef Token* tok = &self.pEngine.OStack.top()

<->

    Token *__pyx_v_tok;
    Token &__pyx_t_5;
    __pyx_t_5 = __pyx_v_self->pEngine->OStack.top();
    __pyx_v_tok = (&__pyx_t_5);

I would expect to see this instead:

    Token *__pyx_v_tok = &__pyx_v_self->pEngine->OStack.top();

Am I doing something wrong? Is there any other way to achieve what I
want, other than writing custom C macros?

Thanks,

-- 
Sincerely yours,
Yury V. Zaytsev


From sturla at molden.no  Thu Feb 28 14:29:59 2013
From: sturla at molden.no (Sturla Molden)
Date: Thu, 28 Feb 2013 14:29:59 +0100
Subject: [Cython] MemoryViews require writeable arrays?
In-Reply-To: <loom.20130227T195717-560@post.gmane.org>
References: <loom.20130227T195717-560@post.gmane.org>
Message-ID: <512F5BD7.9080906@molden.no>

On 27.02.2013 20:05, Dave Hirschfeld wrote:

> Is this a required restriction? Is there any workaround?


http://www.python.org/dev/peps/pep-3118/

What you should consider is the "readonly" field in "struct bufferinfo" 
or the access flag "PyBUF_WRITEABLE".

In short:

A PEP3118 buffer can be readonly, and then you shouldn't write to it! 
When you set the readonly flag, Cython cannot retrieve the buffer with 
PyBUF_WRITEABLE. Thus, Cython helps you not to shoot yourself in the 
foot. I don't think you can declare a read-only memoryview in Cython. 
(Well, not by any means I know of.)


Sturla


From sturla at molden.no  Thu Feb 28 15:34:31 2013
From: sturla at molden.no (Sturla Molden)
Date: Thu, 28 Feb 2013 15:34:31 +0100
Subject: [Cython] Class methods returning C++ class references are not
 dealt with correctly?
In-Reply-To: <1362056288.2913.28.camel@newpride>
References: <1362056288.2913.28.camel@newpride>
Message-ID: <512F6AF7.6040001@molden.no>

On 28.02.2013 13:58, Yury V. Zaytsev wrote:
> Hi,
>
> I'm sorry if my question would appear to be trivial, but what am I
> supposed to do, if I want to wrap class methods, that return a reference
> to another class?
>
>  From reading the list, I've gathered that apparently the best strategy
> of dealing with references is just to not to use them (convert to
> pointers immediately), because of some scoping rules issues.
>
> It works for me for a simple case of POD types, like
>
>      cdef extern from "test.h":
>          int& foo()
>
>      cdef int* x = &foo()
>
> but in a more complex case, Cython generates incorrect C++ code (first
> it declares a reference, then assigns to it, which, of course, doesn't
> even compile):
>
>      cdef extern from "token.h":
>          cppclass Token:
>              Token(const Datum&) except +
>
>      cdef extern from "tokenstack.h":
>          cppclass TokenStack:
>              Token& top() except +
>
>      cdef Token* tok = &self.pEngine.OStack.top()
>
> <->
>
>      Token *__pyx_v_tok;
>      Token &__pyx_t_5;
>      __pyx_t_5 = __pyx_v_self->pEngine->OStack.top();
>      __pyx_v_tok = (&__pyx_t_5);


This is clearly a bug in Cython. The generated code should be:

       Token *__pyx_v_tok;
       Token &__pyx_t_5 = __pyx_v_self->pEngine->OStack.top();
       __pyx_v_tok = (&__pyx_t_5);


One cannot let a C++ reference dangle:

       Token &__pyx_t_5;  // illegal C++


Sturla


From yury at shurup.com  Thu Feb 28 15:46:48 2013
From: yury at shurup.com (Yury V. Zaytsev)
Date: Thu, 28 Feb 2013 15:46:48 +0100
Subject: [Cython] Class methods returning C++ class references are not
 dealt with correctly?
In-Reply-To: <512F6AF7.6040001@molden.no>
References: <1362056288.2913.28.camel@newpride> <512F6AF7.6040001@molden.no>
Message-ID: <1362062808.2913.62.camel@newpride>

On Thu, 2013-02-28 at 15:34 +0100, Sturla Molden wrote:
> 
> This is clearly a bug in Cython. One cannot let a C++ reference
> dangle.

Hi Sturla,

Thanks for the confirmation! I had a closer look at it, and I think I
know why this happens.

My method call is actually wrapped in a try { ... } catch clause,
because I declared it as being able to throw exceptions, so the
reference can't be defined in this block, or it will not be accessible
to the outside world.

Apparently, Cython should rather do something like this instead:

    Token *__pyx_v_tok;
    Token *__pyx_t_5_p;

    try {
        Token &__pyx_t_5 = __pyx_v_self->pEngine->OStack.top();
        __pyx_t_5_p = (&__pyx_t_5);
    }
    ...

    __pyx_v_tok = __pyx_t_5_p;

I'm sorry, but I don't think that I can personally help fixing this,
because even if I manage to come up with a patch to generate
declarations inside try blocks with my non-existing knowledge of Cython
internals, this simply not gonna work.

I believe that some convention should be established regarding
references handling, i.e. stating that Cython will generate correct code
to convert them to pointers if such and such syntax is used...

Hopefully, in the mean time, there is some other solution to the problem
that I have overlooked.

Z.

-- 
Sincerely yours,
Yury V. Zaytsev


From dave.hirschfeld at gmail.com  Thu Feb 28 15:55:15 2013
From: dave.hirschfeld at gmail.com (Dave Hirschfeld)
Date: Thu, 28 Feb 2013 14:55:15 +0000 (UTC)
Subject: [Cython] MemoryViews require writeable arrays?
References: <loom.20130227T195717-560@post.gmane.org>
	<512F5BD7.9080906@molden.no>
Message-ID: <loom.20130228T153636-110@post.gmane.org>

Sturla Molden <sturla at ...> writes:

> 
> On 27.02.2013 20:05, Dave Hirschfeld wrote:
> 
> > Is this a required restriction? Is there any workaround?
> 
> http://www.python.org/dev/peps/pep-3118/
> 
> What you should consider is the "readonly" field in "struct bufferinfo" 
> or the access flag "PyBUF_WRITEABLE".
> 
> In short:
> 
> A PEP3118 buffer can be readonly, and then you shouldn't write to it! 
> When you set the readonly flag, Cython cannot retrieve the buffer with 
> PyBUF_WRITEABLE. Thus, Cython helps you not to shoot yourself in the 
> foot. I don't think you can declare a read-only memoryview in Cython. 
> (Well, not by any means I know of.)
> 
> Sturla
> 
> 

So the issue is that at present memoryviews can't be readonly? Presumably 
because this works for numpy arrays it would be possible to also make readonly 
memoryviews? I think that would certainly be nice to have, but maybe it's a
niche use case. 

Certainly, for IPython.parallel use it's easy enough to write a shim which sets 
the array to writeable with the understanding that changes don't get propagated 
back.

Thanks,
Dave


From sturla at molden.no  Thu Feb 28 15:58:36 2013
From: sturla at molden.no (Sturla Molden)
Date: Thu, 28 Feb 2013 15:58:36 +0100
Subject: [Cython] Class methods returning C++ class references are not
 dealt with correctly?
In-Reply-To: <1362062808.2913.62.camel@newpride>
References: <1362056288.2913.28.camel@newpride> <512F6AF7.6040001@molden.no>
	<1362062808.2913.62.camel@newpride>
Message-ID: <512F709C.1070405@molden.no>

On 28.02.2013 15:46, Yury V. Zaytsev wrote:

> My method call is actually wrapped in a try { ... } catch clause,
> because I declared it as being able to throw exceptions, so the
> reference can't be defined in this block, or it will not be accessible
> to the outside world.

If Cython generates illegal C++ code (i.e. C++ that don't compile) it is 
a bug in Cython.

There must be a general error in the handling of C++ references when 
they are declared without a target.


Sturla


From sturla at molden.no  Thu Feb 28 16:41:27 2013
From: sturla at molden.no (Sturla Molden)
Date: Thu, 28 Feb 2013 16:41:27 +0100
Subject: [Cython] MemoryViews require writeable arrays?
In-Reply-To: <loom.20130228T153636-110@post.gmane.org>
References: <loom.20130227T195717-560@post.gmane.org>
	<512F5BD7.9080906@molden.no>
	<loom.20130228T153636-110@post.gmane.org>
Message-ID: <512F7AA7.9060604@molden.no>

On 28.02.2013 15:55, Dave Hirschfeld wrote:

> So the issue is that at present memoryviews can't be readonly?

https://github.com/cython/cython/blob/master/Cython/Compiler/MemoryView.py#L33

Typed memoryviews are thus acquired with the PyBUF_WRITEABLE flag. If 
the the assigned buffer is readonly, the request to acquire the PEP3118 
buffer will fail.

If you remove the PyBUF_WRITEABLE flag from lines 33 to 36, you can 
acquire a readonly buffer with typed memoryviews. But this is not 
recommended. In this case you would have to check for the readonly flag 
yourself and make sure you don't write to readonly buffer.


Sturla


From sebastian at sipsolutions.net  Thu Feb 28 16:13:17 2013
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Thu, 28 Feb 2013 16:13:17 +0100
Subject: [Cython] Be more forgiving about memoryview strides
Message-ID: <1362064397.2663.14.camel@sebastian-laptop>

Hey,

Maybe someone here already saw it (I don't have a track account, or I
would just create a ticket), but it would be nice if Cython was more
forgiving about contiguous requirements on strides. In the future this
would make it easier for numpy to go forward with changing the
contiguous flags to be more reasonable for its purpose, and second also
to allow old (and maybe for the moment remaining) corner cases in numpy
to slip past (as well as possibly the same for other programs...). An
example is (see also https://github.com/numpy/numpy/issues/2956 and the
PR linked there for more details):

def add_one(array):
    cdef double[::1] a = array
    a[0] += 1.
    return array

giving:

>>> add_one(np.ascontiguousarray(np.arange(10.)[::100]))
ValueError: Buffer and memoryview are not contiguous in the same
dimension.

This could easily be changed if MemoryViews check the strides as "can be
interpreted as contiguous". That means that if shape[i] == 1, then
strides[i] are arbitrary (you can just change them if you like). This is
also the case for 0-sized arrays, which are arguably always contiguous,
no matter their strides are!

Regards,

Sebastian

PS: A similar thing exists with np.ndarray[...] interface if the user
accesses array.strides. They get the arrays strides not the buffers.
This is not quite related, but if it would be easy to use the buffer's
strides in that case, it may make it easier if we want to change the
flags in numpy in the long term, since one could clean up strides for
forced contiguous buffer requests.


From brad.froehle at gmail.com  Thu Feb 28 17:01:52 2013
From: brad.froehle at gmail.com (Bradley M. Froehle)
Date: Thu, 28 Feb 2013 08:01:52 -0800
Subject: [Cython] Class methods returning C++ class references are not
 dealt with correctly?
In-Reply-To: <1362056288.2913.28.camel@newpride>
References: <1362056288.2913.28.camel@newpride>
Message-ID: <CAHXv-MhsvXO=GiCPrH6Gc3jP+EzhRO0625Txit26OXSLVoY30g@mail.gmail.com>

On Thu, Feb 28, 2013 at 4:58 AM, Yury V. Zaytsev <yury at shurup.com> wrote:

> Hi,
>
> I'm sorry if my question would appear to be trivial, but what am I
> supposed to do, if I want to wrap class methods, that return a reference
> to another class?


As a workaround you could use:

cdef extern from "test.h":
    int* foo2ptr "&foo" ()

cdef int *x = foo2ptr()

This could be extended to your other example as well.

-Brad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20130228/8e4f0dae/attachment.html>

From robertwb at gmail.com  Thu Feb 28 18:50:34 2013
From: robertwb at gmail.com (Robert Bradshaw)
Date: Thu, 28 Feb 2013 09:50:34 -0800
Subject: [Cython] Be more forgiving about memoryview strides
In-Reply-To: <1362064397.2663.14.camel@sebastian-laptop>
References: <1362064397.2663.14.camel@sebastian-laptop>
Message-ID: <CADiQ+QA0vF1YLtDih5iD7gn4MGhPnnnX845mUx3+2cwhyxK9Sg@mail.gmail.com>

On Thu, Feb 28, 2013 at 7:13 AM, Sebastian Berg
<sebastian at sipsolutions.net> wrote:
> Hey,
>
> Maybe someone here already saw it (I don't have a track account, or I
> would just create a ticket), but it would be nice if Cython was more
> forgiving about contiguous requirements on strides. In the future this
> would make it easier for numpy to go forward with changing the
> contiguous flags to be more reasonable for its purpose, and second also
> to allow old (and maybe for the moment remaining) corner cases in numpy
> to slip past (as well as possibly the same for other programs...). An
> example is (see also https://github.com/numpy/numpy/issues/2956 and the
> PR linked there for more details):
>
> def add_one(array):
>     cdef double[::1] a = array
>     a[0] += 1.
>     return array
>
> giving:
>
>>>> add_one(np.ascontiguousarray(np.arange(10.)[::100]))
> ValueError: Buffer and memoryview are not contiguous in the same
> dimension.
>
> This could easily be changed if MemoryViews check the strides as "can be
> interpreted as contiguous". That means that if shape[i] == 1, then
> strides[i] are arbitrary (you can just change them if you like). This is
> also the case for 0-sized arrays, which are arguably always contiguous,
> no matter their strides are!

I was under the impression that the primary value for contiguous is
that it a foo[::1] can be interpreted as a foo*. Letting strides be
arbitrary completely breaks this, right?

> PS: A similar thing exists with np.ndarray[...] interface if the user
> accesses array.strides. They get the arrays strides not the buffers.
> This is not quite related, but if it would be easy to use the buffer's
> strides in that case, it may make it easier if we want to change the
> flags in numpy in the long term, since one could clean up strides for
> forced contiguous buffer requests.
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

From yury at shurup.com  Thu Feb 28 19:38:29 2013
From: yury at shurup.com (Yury V. Zaytsev)
Date: Thu, 28 Feb 2013 19:38:29 +0100
Subject: [Cython] Class methods returning C++ class references are not
 dealt with correctly?
In-Reply-To: <CAHXv-MhsvXO=GiCPrH6Gc3jP+EzhRO0625Txit26OXSLVoY30g@mail.gmail.com>
References: <1362056288.2913.28.camel@newpride>
	<CAHXv-MhsvXO=GiCPrH6Gc3jP+EzhRO0625Txit26OXSLVoY30g@mail.gmail.com>
Message-ID: <1362076709.2913.102.camel@newpride>

Hi Brad,

On Thu, 2013-02-28 at 08:01 -0800, Bradley M. Froehle wrote:
> 
> cdef extern from "test.h":
>     int* foo2ptr "&foo" ()
> 
> cdef int *x = foo2ptr()

Thank you for this interesting suggestion, but I must be missing
something, because when I do the following:

    cdef extern from "tokenstack.h":
        cppclass TokenStack:
            Token* top "Token&" () except +

    cdef Token* tok = self.pEngine.OStack.top()

I end up with the following generated code, which, of course, doesn't
compile:

    Token *__pyx_t_5;
    __pyx_t_5 = __pyx_v_self->pEngine->OStack.Token&();

whereas, I'd like to see generated this:

    Token *__pyx_t_5;
    __pyx_t_5 = __pyx_v_self->pEngine->OStack->top();

Any ideas?

-- 
Sincerely yours,
Yury V. Zaytsev


From szport at gmail.com  Thu Feb 28 19:31:28 2013
From: szport at gmail.com (ZS)
Date: Thu, 28 Feb 2013 21:31:28 +0300
Subject: [Cython] About IndexNode and unicode[index]
In-Reply-To: <CAPOE21Ti03-WWad2jEbHMr5029NqrP6zrj-xrZaiigMq8UT+uw@mail.gmail.com>
References: <CAPOE21Ti03-WWad2jEbHMr5029NqrP6zrj-xrZaiigMq8UT+uw@mail.gmail.com>
Message-ID: <CAPOE21Sa7fvs2JjpO+CORNnWFM=Fiqb3M07nUNdwxaL9wD=dCg@mail.gmail.com>

2013/2/28 ZS <szport at gmail.com>:
> Looking into IndexNode class in ExprNode.py I have seen a possibility
> for addition of more fast code path for unicode[index]  as it done in
> method `generate_setitem_code` in case of lists.
>
> This is files for evaluation of performance difference:
>
> #### unicode_index.h
>
> /* This is striped version of __Pyx_GetItemInt_Unicode_Fast */
> #include "unicodeobject.h"
>
> static inline Py_UCS4 unicode_char(PyObject* ustring, Py_ssize_t i);
>
> static inline Py_UCS4 unicode_char(PyObject* ustring, Py_ssize_t i) {
> #if CYTHON_PEP393_ENABLED
>     if (PyUnicode_READY(ustring) < 0) return (Py_UCS4)-1;
> #endif
>     return __Pyx_PyUnicode_READ_CHAR(ustring, i);
> }
>
> ##### unicode_index.pyx
>
> # coding: utf-8
>
> cdef extern from 'unicode_index.h':
>     inline Py_UCS4 unicode_char(unicode ustring, int i)
>
> cdef unicode text = u"abcdefghigklmnopqrstuvwxyzabcdefghigklmnopqrstuvwxyz"
>
> def f_1(unicode text):
>     cdef int i, j
>     cdef int n = len(text)
>     cdef Py_UCS4 ch
>
>     for j from 0<=j<=1000000:
>         for i from 0<=i<=n-1:
>             ch = text[i]
>
> def f_2(unicode text):
>     cdef int i, j
>     cdef int n = len(text)
>     cdef Py_UCS4 ch
>
>     for j from 0<=j<=1000000:
>         for i from 0<=i<=n-1:
>             ch = unicode_char(text, i)
>
> def test_1():
>     f_1(text)
>
> def test_2():
>     f_2(text)
>
> Timing results:
>
> (py33) zbook:mytests $ python3.3 -m timeit -n 100 -r 10 -s "from
> mytests.unicode_index import test_1" "test_1()"
> 100 loops, best of 10: 89 msec per loop
> (py33) zbook:mytests $ python3.3 -m timeit -n 100 -r 10 -s "from
> mytests.unicode_index import test_2" "test_2()"
> 100 loops, best of 10: 46.1 msec per loop
>
> in setup.py globally:
>
>        "boundscheck": False
>        "wraparound": False
>        "nonecheck": False
>
For the sake of clarity I would like to add the following... This
optimization is for the case when both `boundscheck(False)` and
`wraparound(False)` is applied. Otherwise default path of evaluation
(__Pyx_GetItemInt_Unicode) is applied.

This allows to write unicode text parsing code almost at C speed
mostly in python (+ .pxd defintions).

 Zaur Shibzukhov

From brad.froehle at gmail.com  Thu Feb 28 20:00:18 2013
From: brad.froehle at gmail.com (Bradley M. Froehle)
Date: Thu, 28 Feb 2013 11:00:18 -0800
Subject: [Cython] Class methods returning C++ class references are not
 dealt with correctly?
In-Reply-To: <1362076709.2913.102.camel@newpride>
References: <1362056288.2913.28.camel@newpride>
	<CAHXv-MhsvXO=GiCPrH6Gc3jP+EzhRO0625Txit26OXSLVoY30g@mail.gmail.com>
	<1362076709.2913.102.camel@newpride>
Message-ID: <CAHXv-MgYRMhV8y3pos-k4uGvMtLHhkmuRT6K_MdCyqjawSiZig@mail.gmail.com>

Hey Yury:

Yes, you are right.  I was thinking this was a function and not a method.
 As an even ickier workaround:

#define TokenStack_top_p(token_stack)  &token_stack->top()

cdef extern from "............":
    Token* TokenStack_top_p(TokenStack*) except +

cdef Token* tok = TokenStack_top_p(self.pEngine.OStack)

-Brad


On Thu, Feb 28, 2013 at 10:38 AM, Yury V. Zaytsev <yury at shurup.com> wrote:

> Hi Brad,
>
> On Thu, 2013-02-28 at 08:01 -0800, Bradley M. Froehle wrote:
> >
> > cdef extern from "test.h":
> >     int* foo2ptr "&foo" ()
> >
> > cdef int *x = foo2ptr()
>
> Thank you for this interesting suggestion, but I must be missing
> something, because when I do the following:
>
>     cdef extern from "tokenstack.h":
>         cppclass TokenStack:
>             Token* top "Token&" () except +
>
>     cdef Token* tok = self.pEngine.OStack.top()
>
> I end up with the following generated code, which, of course, doesn't
> compile:
>
>     Token *__pyx_t_5;
>     __pyx_t_5 = __pyx_v_self->pEngine->OStack.Token&();
>
> whereas, I'd like to see generated this:
>
>     Token *__pyx_t_5;
>     __pyx_t_5 = __pyx_v_self->pEngine->OStack->top();
>
> Any ideas?
>
> --
> Sincerely yours,
> Yury V. Zaytsev
>
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20130228/d4220ce7/attachment.html>

From njs at pobox.com  Thu Feb 28 20:12:09 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Thu, 28 Feb 2013 19:12:09 +0000
Subject: [Cython] Be more forgiving about memoryview strides
In-Reply-To: <CADiQ+QA0vF1YLtDih5iD7gn4MGhPnnnX845mUx3+2cwhyxK9Sg@mail.gmail.com>
References: <1362064397.2663.14.camel@sebastian-laptop>
	<CADiQ+QA0vF1YLtDih5iD7gn4MGhPnnnX845mUx3+2cwhyxK9Sg@mail.gmail.com>
Message-ID: <CAPJVwBnzWzRjv+tYtrv7D4rpN-Zhoj9X+WXp5H7N28p1i3-vLg@mail.gmail.com>

On Thu, Feb 28, 2013 at 5:50 PM, Robert Bradshaw <robertwb at gmail.com> wrote:
> On Thu, Feb 28, 2013 at 7:13 AM, Sebastian Berg
> <sebastian at sipsolutions.net> wrote:
>> Hey,
>>
>> Maybe someone here already saw it (I don't have a track account, or I
>> would just create a ticket), but it would be nice if Cython was more
>> forgiving about contiguous requirements on strides. In the future this
>> would make it easier for numpy to go forward with changing the
>> contiguous flags to be more reasonable for its purpose, and second also
>> to allow old (and maybe for the moment remaining) corner cases in numpy
>> to slip past (as well as possibly the same for other programs...). An
>> example is (see also https://github.com/numpy/numpy/issues/2956 and the
>> PR linked there for more details):
>>
>> def add_one(array):
>>     cdef double[::1] a = array
>>     a[0] += 1.
>>     return array
>>
>> giving:
>>
>>>>> add_one(np.ascontiguousarray(np.arange(10.)[::100]))
>> ValueError: Buffer and memoryview are not contiguous in the same
>> dimension.
>>
>> This could easily be changed if MemoryViews check the strides as "can be
>> interpreted as contiguous". That means that if shape[i] == 1, then
>> strides[i] are arbitrary (you can just change them if you like). This is
>> also the case for 0-sized arrays, which are arguably always contiguous,
>> no matter their strides are!
>
> I was under the impression that the primary value for contiguous is
> that it a foo[::1] can be interpreted as a foo*. Letting strides be
> arbitrary completely breaks this, right?

Nope. The natural definition of "C contiguous" is "the array entries
are arranged in memory in the same way they would be if they were a
multidimensional C array" (i.e., what you said.) But it turns out that
this is *not* the definition that numpy and cython use!

The issue is that the above definition is a constraint on the actual
locations of items in memory, i.e., given a shape, it tells you that
for every index,
 (a)  sum(index * strides) == sum(index * cumprod(shape[::-1])[::-1] * itemsize)
Obviously this equality holds if
 (b)  strides == cumprod(shape[::-1])[::-1] * itemsize
(Or for F-contiguity, we have
 (b')  strides == cumprod(shape) * itemsize
)

(a) is the natural definition of "C contiguous". (b) is the definition
of "C contiguous" used by numpy and cython. (b) implies (a). But (a)
does not imply (b), i.e., there are arrays that are C-contiguous which
numpy and cython think are discontiguous. (Also in numpy there are
some weird cases where numpy accidentally uses the correct definition,
I think, which is the point of Sebastian's example.)

In particular, if shape[i] == 1, then the value of stride[i] really
should be irrelevant to judging contiguity, because the only thing you
can do with strides[i] is multiply it by index[i], and if shape[i] ==
1 then index[i] is always 0. So an array of int8's with shape = (10,
1), strides = (1, 73) is contiguous according to (a), but not
according to (b). Also if shape[i] is 0 for any i, then the entire
contents of the strides array becomes irrelevant to judging
contiguity; all zero-sized arrays are contiguous according to (a), but
not (b).

(This is really annoying for numpy because given, say, a column vector
with shape (n, 1), it is impossible to be both C- and F-contiguous
according to the (b)-style definition. But people expect expect
various operations to preserve C versus F contiguity, so there are
heuristics in numpy that try to guess whether various result arrays
should pretend to be C- or F-contiguous, and we don't even have a
consistent idea of what it would mean for this code to be working
correctly, never mind test it and keep it working. OTOH if we just fix
numpy to use the (a) definition, then it turns out a bunch of
third-party code breaks, like, for example, cython.)

-n

From stefan_ml at behnel.de  Thu Feb 28 20:27:08 2013
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 28 Feb 2013 20:27:08 +0100
Subject: [Cython] About IndexNode and unicode[index]
In-Reply-To: <CAPOE21Sa7fvs2JjpO+CORNnWFM=Fiqb3M07nUNdwxaL9wD=dCg@mail.gmail.com>
References: <CAPOE21Ti03-WWad2jEbHMr5029NqrP6zrj-xrZaiigMq8UT+uw@mail.gmail.com>
	<CAPOE21Sa7fvs2JjpO+CORNnWFM=Fiqb3M07nUNdwxaL9wD=dCg@mail.gmail.com>
Message-ID: <512FAF8C.7020008@behnel.de>

ZS, 28.02.2013 19:31:
> 2013/2/28 ZS:
>> Looking into IndexNode class in ExprNode.py I have seen a possibility
>> for addition of more fast code path for unicode[index]  as it done in
>> method `generate_setitem_code` in case of lists.
>>
>> This is files for evaluation of performance difference:
>>
>> #### unicode_index.h
>>
>> /* This is striped version of __Pyx_GetItemInt_Unicode_Fast */
>> #include "unicodeobject.h"
>>
>> static inline Py_UCS4 unicode_char(PyObject* ustring, Py_ssize_t i);
>>
>> static inline Py_UCS4 unicode_char(PyObject* ustring, Py_ssize_t i) {
>> #if CYTHON_PEP393_ENABLED
>>     if (PyUnicode_READY(ustring) < 0) return (Py_UCS4)-1;
>> #endif
>>     return __Pyx_PyUnicode_READ_CHAR(ustring, i);
>> }

Sure, looks ok.


>> ##### unicode_index.pyx
>>
>> # coding: utf-8
>>
>> cdef extern from 'unicode_index.h':
>>     inline Py_UCS4 unicode_char(unicode ustring, int i)
>>
>> cdef unicode text = u"abcdefghigklmnopqrstuvwxyzabcdefghigklmnopqrstuvwxyz"
>>
>> def f_1(unicode text):
>>     cdef int i, j
>>     cdef int n = len(text)
>>     cdef Py_UCS4 ch
>>
>>     for j from 0<=j<=1000000:

Personally, I find a range() loop much easier to read than this beast.


>>         for i from 0<=i<=n-1:
>>             ch = text[i]
>>
>> def f_2(unicode text):
>>     cdef int i, j
>>     cdef int n = len(text)
>>     cdef Py_UCS4 ch
>>
>>     for j from 0<=j<=1000000:
>>         for i from 0<=i<=n-1:
>>             ch = unicode_char(text, i)
>>
>> def test_1():
>>     f_1(text)
>>
>> def test_2():
>>     f_2(text)
>>
>> Timing results:
>>
>> (py33) zbook:mytests $ python3.3 -m timeit -n 100 -r 10 -s "from
>> mytests.unicode_index import test_1" "test_1()"
>> 100 loops, best of 10: 89 msec per loop
>> (py33) zbook:mytests $ python3.3 -m timeit -n 100 -r 10 -s "from
>> mytests.unicode_index import test_2" "test_2()"
>> 100 loops, best of 10: 46.1 msec per loop

I seriously doubt that this translates to similar results in real-world
code. In the second example above, the C compiler should be able to remove
a lot of code, certainly including the useless character read. Maybe even
the loops, if it can determine that PyUnicode_READY() will always return
the same result. So you're almost certainly not benchmarking what you think
you are.


>> in setup.py globally:
>>
>>        "boundscheck": False
>>        "wraparound": False
>>        "nonecheck": False
>>
> For the sake of clarity I would like to add the following... This
> optimization is for the case when both `boundscheck(False)` and
> `wraparound(False)` is applied. Otherwise default path of evaluation
> (__Pyx_GetItemInt_Unicode) is applied.
> 
> This allows to write unicode text parsing code almost at C speed
> mostly in python (+ .pxd defintions).

I suggest simply adding a constant flag argument to the existing function
that states if checking should be done or not. Inlining will let the C
compiler drop the corresponding code, which may or may nor make it a little
faster.

Stefan


From szport at gmail.com  Thu Feb 28 21:07:03 2013
From: szport at gmail.com (ZS)
Date: Thu, 28 Feb 2013 23:07:03 +0300
Subject: [Cython] About IndexNode and unicode[index]
In-Reply-To: <512FAF8C.7020008@behnel.de>
References: <CAPOE21Ti03-WWad2jEbHMr5029NqrP6zrj-xrZaiigMq8UT+uw@mail.gmail.com>
	<CAPOE21Sa7fvs2JjpO+CORNnWFM=Fiqb3M07nUNdwxaL9wD=dCg@mail.gmail.com>
	<512FAF8C.7020008@behnel.de>
Message-ID: <CAPOE21SMRYFYnR8AmHyrrgFabHX0RG0T=jn-29QZix0kfBxGOw@mail.gmail.com>

2013/2/28 Stefan Behnel <stefan_ml at behnel.de>:
>> This allows to write unicode text parsing code almost at C speed
>> mostly in python (+ .pxd defintions).
>
> I suggest simply adding a constant flag argument to the existing function
> that states if checking should be done or not. Inlining will let the C
> compiler drop the corresponding code, which may or may nor make it a little
> faster.
It would be great.

To be sure I change the tests:

unicode_index.h
-----------------------

#include "unicodeobject.h"

static inline Py_UCS4 unicode_char(PyObject* ustring, Py_ssize_t i);

static inline Py_UCS4 unicode_char(PyObject* ustring, Py_ssize_t i) {
#if CYTHON_PEP393_ENABLED
    if (PyUnicode_READY(ustring) < 0) return (Py_UCS4)-1;
#endif
    return __Pyx_PyUnicode_READ_CHAR(ustring, i);
}

static inline Py_UCS4 unicode_char2(PyObject* ustring, Py_ssize_t i, int flag);

static inline Py_UCS4 unicode_char2(PyObject* ustring, Py_ssize_t i, int flag) {
    Py_ssize_t length;
#if CYTHON_PEP393_ENABLED
    if (PyUnicode_READY(ustring) < 0) return (Py_UCS4)-1;
#endif
    if (flag) {
        length = __Pyx_PyUnicode_GET_LENGTH(ustring);
        if ((0 <= i) & (i < length)) {
            return __Pyx_PyUnicode_READ_CHAR(ustring, i);
        } else if ((-length <= i) & (i < 0)) {
            return __Pyx_PyUnicode_READ_CHAR(ustring, i + length);
        } else {
            PyErr_SetString(PyExc_IndexError, "string index out of range");
            return (Py_UCS4)-1;
        }
    } else {
        return __Pyx_PyUnicode_READ_CHAR(ustring, i);
    }
}

unicode_index.pyx
--------------------------

cdef extern from 'unicode_index.h':
    inline Py_UCS4 unicode_char(unicode ustring, int i)
    inline Py_UCS4 unicode_char2(unicode ustring, int i, int flag)

cdef unicode text = u"abcdefghigklmnopqrstuvwxyzabcdefghigklmnopqrstuvwxyz"

cdef long f_1(unicode text):
    cdef int i, j
    cdef int n = len(text)
    cdef Py_UCS4 ch
    cdef long S = 0

    for j in range(1000000):
        for i in range(n):
            ch = text[i]
            S += <int>ch * j

    return S

cdef long f_2(unicode text):
    cdef int i, j
    cdef int n = len(text)
    cdef Py_UCS4 ch
    cdef long S = 0

    for j in range(1000000):
        for i in range(n):
            ch = unicode_char(text, i)
            S += <int>ch * j

    return S

cdef long f_3(unicode text):
    cdef int i, j
    cdef int n = len(text)
    cdef Py_UCS4 ch
    cdef long S = 0

    for j in range(1000000):
        for i in range(n):
            ch = unicode_char2(text, i, 0)
            S += <int>ch * j

    return S

def test_1():
    f_1(text)

def test_2():
    f_2(text)

def test_3():
    f_3(text)

Here are timings:

(py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from
mytests.unicode_index import test_1" "test_1()"
50 loops, best of 5: 152 msec per loop
(py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from
mytests.unicode_index import test_2" "test_2()"
50 loops, best of 5: 86.5 msec per loop
(py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from
mytests.unicode_index import test_3" "test_3()"
50 loops, best of 5: 86.5 msec per loop

So your suggestion would be preferable.

From stefan_ml at behnel.de  Thu Feb 28 22:16:09 2013
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 28 Feb 2013 22:16:09 +0100
Subject: [Cython] About IndexNode and unicode[index]
In-Reply-To: <CAPOE21SMRYFYnR8AmHyrrgFabHX0RG0T=jn-29QZix0kfBxGOw@mail.gmail.com>
References: <CAPOE21Ti03-WWad2jEbHMr5029NqrP6zrj-xrZaiigMq8UT+uw@mail.gmail.com>
	<CAPOE21Sa7fvs2JjpO+CORNnWFM=Fiqb3M07nUNdwxaL9wD=dCg@mail.gmail.com>
	<512FAF8C.7020008@behnel.de>
	<CAPOE21SMRYFYnR8AmHyrrgFabHX0RG0T=jn-29QZix0kfBxGOw@mail.gmail.com>
Message-ID: <512FC919.4010702@behnel.de>

ZS, 28.02.2013 21:07:
> 2013/2/28 Stefan Behnel:
>>> This allows to write unicode text parsing code almost at C speed
>>> mostly in python (+ .pxd defintions).
>>
>> I suggest simply adding a constant flag argument to the existing function
>> that states if checking should be done or not. Inlining will let the C
>> compiler drop the corresponding code, which may or may nor make it a little
>> faster.
> 
> static inline Py_UCS4 unicode_char2(PyObject* ustring, Py_ssize_t i, int flag) {
>     Py_ssize_t length;
> #if CYTHON_PEP393_ENABLED
>     if (PyUnicode_READY(ustring) < 0) return (Py_UCS4)-1;
> #endif
>     if (flag) {
>         length = __Pyx_PyUnicode_GET_LENGTH(ustring);
>         if ((0 <= i) & (i < length)) {
>             return __Pyx_PyUnicode_READ_CHAR(ustring, i);
>         } else if ((-length <= i) & (i < 0)) {
>             return __Pyx_PyUnicode_READ_CHAR(ustring, i + length);
>         } else {
>             PyErr_SetString(PyExc_IndexError, "string index out of range");
>             return (Py_UCS4)-1;
>         }
>     } else {
>         return __Pyx_PyUnicode_READ_CHAR(ustring, i);
>     }
> }

I think you could even pass in two flags, one for wraparound and one for
boundscheck, and then just evaluate them appropriately in the existing "if"
tests above. That should allow both features to be supported independently
in a fast way.


> Here are timings:
> 
> (py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from
> mytests.unicode_index import test_1" "test_1()"
> 50 loops, best of 5: 152 msec per loop
> (py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from
> mytests.unicode_index import test_2" "test_2()"
> 50 loops, best of 5: 86.5 msec per loop
> (py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from
> mytests.unicode_index import test_3" "test_3()"
> 50 loops, best of 5: 86.5 msec per loop
> 
> So your suggestion would be preferable.

Nice. Yes, looks like it' worth it.

Stefan