From greg.ewing at canterbury.ac.nz Fri Feb 1 01:11:42 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 01 Feb 2013 13:11:42 +1300 Subject: [Cython] [cython-users] Recommendations for efficient typed arrays in Cython? In-Reply-To: <510AD4B5.9000600@molden.no> References: <5107E786.7040202@molden.no> <4cf979ef-8684-4438-912e-4680afb01808@googlegroups.com> <510AD4B5.9000600@molden.no> Message-ID: <510B083E.5030203@canterbury.ac.nz> On 01/02/13 09:31, Sturla Molden wrote: > cdef object a > cdef list b > cdef foobar c > > etc to define Python variables. 'cdef' seems to indicate that it is a C > declaration, yet here it is not. Yes, it is. In this context, the cdef isn't about the type of the variable, it's about where and how it's stored and accessed. The above declarations result in the generation of C code something like: PyObject *a; PyListObject *b; Foobar *c; They are then accessed directly by the generated C code. Without the cdef, these variables would be stored wherever Python normally stores variables for the relevant scope, which could be in a module or instance dict, and the usual Python/C API machinery is used to access them. Distinguishing between Python and C types would be problematic anyway, since a PyObject* is both a Python type *and* a C type. > Neither does this cdef syntax allow us to declare Python int and float > statically. I've never found the need to declare a Python int or float statically, but a way could be provided to access these types if need be. Maybe Cython has already done this, I don't know. -- Greg From sturla at molden.no Fri Feb 1 17:11:44 2013 From: sturla at molden.no (Sturla Molden) Date: Fri, 01 Feb 2013 17:11:44 +0100 Subject: [Cython] [cython-users] Recommendations for efficient typed arrays in Cython? In-Reply-To: <510B083E.5030203@canterbury.ac.nz> References: <5107E786.7040202@molden.no> <4cf979ef-8684-4438-912e-4680afb01808@googlegroups.com> <510AD4B5.9000600@molden.no> <510B083E.5030203@canterbury.ac.nz> Message-ID: <510BE940.3030203@molden.no> On 01.02.2013 01:11, Greg Ewing wrote: > Without the cdef, these variables would be stored wherever Python > normally stores variables for the relevant scope, which could be > in a module or instance dict, and the usual Python/C API machinery > is used to access them. > Distinguishing between Python and C types would be problematic > anyway, since a PyObject* is both a Python type *and* a C type. Really? The way I see it, "object" is a Python type and "PyObject*" is a C type. That is, PyObject* is just a raw C pointer with respect to behavior. Sturla From greg.ewing at canterbury.ac.nz Sat Feb 2 01:23:29 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 02 Feb 2013 13:23:29 +1300 Subject: [Cython] [cython-users] Recommendations for efficient typed arrays in Cython? In-Reply-To: <510BE940.3030203@molden.no> References: <5107E786.7040202@molden.no> <4cf979ef-8684-4438-912e-4680afb01808@googlegroups.com> <510AD4B5.9000600@molden.no> <510B083E.5030203@canterbury.ac.nz> <510BE940.3030203@molden.no> Message-ID: <510C5C81.4060008@canterbury.ac.nz> Sturla Molden wrote: > The way I see it, "object" is a Python type and "PyObject*" is a C type. > That is, PyObject* is just a raw C pointer with respect to behavior. Well... while it's possible to declare something as PyObject* in Cython and get raw pointer behaviour, it's something you should only do in very rare circumstances, because you're then totally on your own when it comes to reference counting and exception handling. If you're suggesting that 'def object foo' should give Python reference semantics and 'cdef object foo' raw C pointer semantics, that would lead to a world of pain. -- Greg From sturla at molden.no Mon Feb 4 13:12:56 2013 From: sturla at molden.no (Sturla Molden) Date: Mon, 04 Feb 2013 13:12:56 +0100 Subject: [Cython] [cython-users] Recommendations for efficient typed arrays in Cython? In-Reply-To: <510C5C81.4060008@canterbury.ac.nz> References: <5107E786.7040202@molden.no> <4cf979ef-8684-4438-912e-4680afb01808@googlegroups.com> <510AD4B5.9000600@molden.no> <510B083E.5030203@canterbury.ac.nz> <510BE940.3030203@molden.no> <510C5C81.4060008@canterbury.ac.nz> Message-ID: <510FA5C8.2020602@molden.no> On 02.02.2013 01:23, Greg Ewing wrote: > If you're suggesting that 'def object foo' should give Python > reference semantics and 'cdef object foo' raw C pointer > semantics, No I was not. I was suggesting that static declarations of Python and C variables should have different keywords. Because they behave differently e.g. with respect to reference counting, it can be confusing to new users. For example I was replying to a Cython user who thought anything declared 'cdef' was reference counted. It might not be obvious to a new Cython user what can be put in a Python list and what can be put in an STL vector. "cdef" refers to storage in the generated C, not to the semantics of Cython. But how and where variables are stored in the generated C is an implementation detail. Semantically the difference is between static and dynamic variables. Sturla From robertwb at gmail.com Mon Feb 4 23:12:28 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Mon, 4 Feb 2013 14:12:28 -0800 Subject: [Cython] [cython-users] Recommendations for efficient typed arrays in Cython? In-Reply-To: <510FA5C8.2020602@molden.no> References: <5107E786.7040202@molden.no> <4cf979ef-8684-4438-912e-4680afb01808@googlegroups.com> <510AD4B5.9000600@molden.no> <510B083E.5030203@canterbury.ac.nz> <510BE940.3030203@molden.no> <510C5C81.4060008@canterbury.ac.nz> <510FA5C8.2020602@molden.no> Message-ID: On Mon, Feb 4, 2013 at 4:12 AM, Sturla Molden wrote: > On 02.02.2013 01:23, Greg Ewing wrote: > >> If you're suggesting that 'def object foo' should give Python >> reference semantics and 'cdef object foo' raw C pointer >> semantics, > > > No I was not. > > I was suggesting that static declarations of Python and C variables should > have different keywords. > > Because they behave differently e.g. with respect to reference counting, it > can be confusing to new users. For example I was replying to a Cython user > who thought anything declared 'cdef' was reference counted. It might not be > obvious to a new Cython user what can be put in a Python list and what can > be put in an STL vector. I find the distinction obvious: if Python understands it, it can be put in a Python list. If C++ understands it, it can be put in a STL container. Of course I'm the antithesis of a "new user." We should at least be producing obvious errors. > "cdef" refers to storage in the generated C, not to the semantics of Cython. > But how and where variables are stored in the generated C is an > implementation detail. Semantically the difference is between static and > dynamic variables. I think reference counting is much more of an implementation detail than how and where the variables are stored. When using Cython I hardly ever think about reference counts, it just does the right thing everywhere for me. From a performance perspective, aside from being able to manipulate raw C numeric types, one of the most important features is that functions and variables (both Python and C types) can be statically rather than dynamically bound, and specifying where it should be so. In any case, whether "cdef A a" is reference counted or not depends A in a straightforward manner (it's refcounted if and only if it can be, i.e. A is a subclass of object). Forcing the user to choose between two different forms of "cdef" based on the type of A would be entirely redundant. - Robert From greg.ewing at canterbury.ac.nz Tue Feb 5 00:17:43 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 05 Feb 2013 12:17:43 +1300 Subject: [Cython] [cython-users] Recommendations for efficient typed arrays in Cython? In-Reply-To: <510FA5C8.2020602@molden.no> References: <5107E786.7040202@molden.no> <4cf979ef-8684-4438-912e-4680afb01808@googlegroups.com> <510AD4B5.9000600@molden.no> <510B083E.5030203@canterbury.ac.nz> <510BE940.3030203@molden.no> <510C5C81.4060008@canterbury.ac.nz> <510FA5C8.2020602@molden.no> Message-ID: <51104197.9030806@canterbury.ac.nz> Sturla Molden wrote: > I was replying to a Cython > user who thought anything declared 'cdef' was reference counted That's a matter of user education. We can't use syntax to address every possible misconception a user might have. > "cdef" refers to storage in the generated C, not to the semantics of > Cython. But how and where variables are stored in the generated C is an > implementation detail. Not entirely -- you can't access a cdef attribute of an extension type using getattr(), for example. And external C code only has direct access to cdef variables and attributes. -- Greg From roed.math at gmail.com Tue Feb 5 01:28:45 2013 From: roed.math at gmail.com (David Roe) Date: Mon, 4 Feb 2013 18:28:45 -0600 Subject: [Cython] Two generators in one function Message-ID: Hi everyone, I ran into the following problem using Cython 0.17.4 (current version of Sage). If you try to compile a file with the following function in it: def test_double_gen(L): a = all(x != 0 for x in L) b = all(x != 1 for x in L) return a and b you get errors from the Cython compiler about 'genexpr' being redefined. Error compiling Cython file: ------------------------------------------------------------ ... def test_double_gen(L): a = all(x != 0 for x in L) b = all(x != 1 for x in L) ^ ------------------------------------------------------------ cython_test.pyx:5:14: 'genexpr' already declared Error compiling Cython file: ------------------------------------------------------------ ... def test_double_gen(L): a = all(x != 0 for x in L) ^ ------------------------------------------------------------ cython_test.pyx:4:14: Previous declaration is here Error compiling Cython file: ------------------------------------------------------------ ... def test_double_gen(L): a = all(x != 0 for x in L) b = all(x != 1 for x in L) ^ ------------------------------------------------------------ cython_test.pyx:5:14: 'genexpr' redeclared Are you currently only able to use one inline generator pre function? David -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrobertray at gmail.com Tue Feb 5 20:56:04 2013 From: jrobertray at gmail.com (J Robert Ray) Date: Tue, 5 Feb 2013 11:56:04 -0800 Subject: [Cython] SIGSEGV in __Pyx_CyFunction_traverse Message-ID: I was getting a crash during module init of a cython module if a garbage collection happens between a call to __Pyx_CyFunction_InitDefaults and the code to populate the defaults. The attached patch fixes the crash. This bug affects at least Cython 0.18 and 0.17.1. __Pyx_CyFunction_InitDefaults was not completely zeroing the newly allocated 'defaults' buffer. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cython.patch Type: application/octet-stream Size: 539 bytes Desc: not available URL: From Samuele.Kaplun at cern.ch Thu Feb 7 10:16:28 2013 From: Samuele.Kaplun at cern.ch (Samuele Kaplun) Date: Thu, 7 Feb 2013 10:16:28 +0100 Subject: [Cython] Possible bug when using cython -Wextra Message-ID: <1737486.SGKB3DSTKz@pcsk4> Hello, I am not sure if this is a bug or it is the intended behaviour, however, consider for example this snippet: [...] def test(): cdef int i for i from 0 <= i < 10: print "foo" [...] If I save it into x.pyx and compile it with: $ cython -Wextra x.pyx I obtain the warning: [...] warning: x.pyx:2:13: Unused entry 'i' [...] IMHO, this is a false positive since the i variable is indeed used as a counter in the loop. I guess cython considers it unused due to the fact that it does not appear on the right hand side of an assignment nor it is further used as an argument in a function, isn?t it? Best regards, Samuele -- Samuele Kaplun Invenio Developer ** From stefan_ml at behnel.de Thu Feb 7 12:11:47 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 07 Feb 2013 12:11:47 +0100 Subject: [Cython] Possible bug when using cython -Wextra In-Reply-To: <1737486.SGKB3DSTKz@pcsk4> References: <1737486.SGKB3DSTKz@pcsk4> Message-ID: <51138BF3.6030606@behnel.de> Samuele Kaplun, 07.02.2013 10:16: > I am not sure if this is a bug or it is the intended behaviour, however, > consider for example this snippet: > > [...] > def test(): > cdef int i > for i from 0 <= i < 10: > print "foo" > [...] > > If I save it into x.pyx and compile it with: > > $ cython -Wextra x.pyx > > I obtain the warning: > [...] > warning: x.pyx:2:13: Unused entry 'i' > [...] > > IMHO, this is a false positive since the i variable is indeed used as a > counter in the loop. I guess cython considers it unused due to the fact that > it does not appear on the right hand side of an assignment nor it is further > used as an argument in a function, isn?t it? Yes, it actually is an unused variable in your code. There is no reference to it, only assignments. Stefan From Samuele.Kaplun at cern.ch Thu Feb 7 13:00:37 2013 From: Samuele.Kaplun at cern.ch (Samuele Kaplun) Date: Thu, 7 Feb 2013 13:00:37 +0100 Subject: [Cython] Possible bug when using cython -Wextra In-Reply-To: <51138BF3.6030606@behnel.de> References: <1737486.SGKB3DSTKz@pcsk4> <51138BF3.6030606@behnel.de> Message-ID: <1395070.2WGoObUNKK@pcsk4> Dear Stefan, In data gioved? 7 febbraio 2013 12:11:47, Stefan Behnel ha scritto: > > [...] > > > > def test(): > > cdef int i > > > > for i from 0 <= i < 10: > > print "foo" > > > > [...] > > Yes, it actually is an unused variable in your code. There is no reference > to it, only assignments. mmh. But is it used albeit indirectly. Then what pattern would you suggest in this case (i.e. to repeat a certain body a given number of times), in order to avoid such warning? Cheers (and thanks for your time!), Samuele -- Samuele Kaplun Invenio Developer ** From stefan_ml at behnel.de Thu Feb 7 18:32:59 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 07 Feb 2013 18:32:59 +0100 Subject: [Cython] analyse_types() refactoring Message-ID: <5113E54B.6020708@behnel.de> Hi, I finally found the time to refactor the analysis phase. https://github.com/cython/cython/commit/f9c385e08401ed96b5b0afb8411480037dc772b9 The methods now return a node, which allows them to replace themselves with a different implementation. Note that the relatively large code impact of this change also means that you might easily run into merge conflicts with your own local changes, so here's how to fix them. The transformation pattern is pretty straight forward. The "analyse_types()" method returns "self", unless it wants to replace itself, i.e. this def analyse_types(self, env): self.index.analyse_types(env) becomes def analyse_types(self, env): self.index = self.index.analyse_types(env) return self The "analyse_target_types()" method works the same, but because it calls "analyse_types()" internally in most cases, it's more likely to look like this: def analyse_target_types(self, env): self.analyse_types(env) if self.type.is_pyobject: self.type = py_object_type which now turns into this: def analyse_target_types(self, env): node = self.analyse_types(env) if node.type.is_pyobject: node.type = py_object_type return node The same pattern obviously applies in the cases where the node needs to be replaced in "analyse_types()". It would simply build and return a different node. This also allows for in-place coercions of the current node, for example. With this change in place, we can now start to clean up old hacks like the "__class__" replacement in AttributeNode. If anyone wants to give it a try, please go ahead. :) Stefan From markflorisson88 at gmail.com Thu Feb 7 18:46:22 2013 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 7 Feb 2013 11:46:22 -0600 Subject: [Cython] analyse_types() refactoring In-Reply-To: <5113E54B.6020708@behnel.de> References: <5113E54B.6020708@behnel.de> Message-ID: On 7 February 2013 11:32, Stefan Behnel wrote: > Hi, > > I finally found the time to refactor the analysis phase. > > https://github.com/cython/cython/commit/f9c385e08401ed96b5b0afb8411480037dc772b9 > > The methods now return a node, which allows them to replace themselves with > a different implementation. > > Note that the relatively large code impact of this change also means that > you might easily run into merge conflicts with your own local changes, so > here's how to fix them. The transformation pattern is pretty straight > forward. The "analyse_types()" method returns "self", unless it wants to > replace itself, i.e. this > > def analyse_types(self, env): > self.index.analyse_types(env) > > becomes > > def analyse_types(self, env): > self.index = self.index.analyse_types(env) > return self > > The "analyse_target_types()" method works the same, but because it calls > "analyse_types()" internally in most cases, it's more likely to look like this: > > def analyse_target_types(self, env): > self.analyse_types(env) > if self.type.is_pyobject: > self.type = py_object_type > > which now turns into this: > > def analyse_target_types(self, env): > node = self.analyse_types(env) > if node.type.is_pyobject: > node.type = py_object_type > return node > > The same pattern obviously applies in the cases where the node needs to be > replaced in "analyse_types()". It would simply build and return a different > node. This also allows for in-place coercions of the current node, for example. > > With this change in place, we can now start to clean up old hacks like the > "__class__" replacement in AttributeNode. If anyone wants to give it a try, > please go ahead. :) > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel What, you didn't like overriding __class__? :) That's great work Stefan! Do you eventually want to move these methods to a visitor, or do you want to keep them as methods? From stefan_ml at behnel.de Thu Feb 7 18:53:58 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 07 Feb 2013 18:53:58 +0100 Subject: [Cython] analyse_types() refactoring In-Reply-To: References: <5113E54B.6020708@behnel.de> Message-ID: <5113EA36.3080700@behnel.de> mark florisson, 07.02.2013 18:46: > On 7 February 2013 11:32, Stefan Behnel wrote: >> I finally found the time to refactor the analysis phase. >> >> https://github.com/cython/cython/commit/f9c385e08401ed96b5b0afb8411480037dc772b9 >> >> The methods now return a node, which allows them to replace themselves with >> a different implementation. > > Do you eventually want to move these methods to a visitor, or > do you want to keep them as methods? I think it makes more sense to keep them as methods. It's not so uncommon that the order matters in which children are being analysed, and the result of one child might even impact how another child is being analysed. There's really a lot of logic in the analysis methods that makes it difficult to extract a more general visitor pattern. Stefan From stefan_ml at behnel.de Thu Feb 7 19:05:55 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 07 Feb 2013 19:05:55 +0100 Subject: [Cython] Possible bug when using cython -Wextra In-Reply-To: <1395070.2WGoObUNKK@pcsk4> References: <1737486.SGKB3DSTKz@pcsk4> <51138BF3.6030606@behnel.de> <1395070.2WGoObUNKK@pcsk4> Message-ID: <5113ED03.5060307@behnel.de> Samuele Kaplun, 07.02.2013 13:00: > In data gioved? 7 febbraio 2013 12:11:47, Stefan Behnel ha scritto: >>> [...] >>> >>> def test(): >>> cdef int i >>> >>> for i from 0 <= i < 10: >>> print "foo" >>> >>> [...] >> >> Yes, it actually is an unused variable in your code. There is no reference >> to it, only assignments. > > mmh. But is it used albeit indirectly. Then what pattern would you suggest in > this case (i.e. to repeat a certain body a given number of times), in order to > avoid such warning? The normal thing to do in Python would be to use an underscore (i.e. "_") as variable name. I don't think we currently special case that pattern, though. Maybe we should. Or maybe we should just drop the "unused variable" warning for loop variables as they actually do something and serve a purpose, even if they are never referenced. Stefan From d.s.seljebotn at astro.uio.no Thu Feb 7 22:04:51 2013 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 07 Feb 2013 22:04:51 +0100 Subject: [Cython] analyse_types() refactoring In-Reply-To: <5113E54B.6020708@behnel.de> References: <5113E54B.6020708@behnel.de> Message-ID: <511416F3.3050609@astro.uio.no> On 02/07/2013 06:32 PM, Stefan Behnel wrote: > Hi, > > I finally found the time to refactor the analysis phase. > > https://github.com/cython/cython/commit/f9c385e08401ed96b5b0afb8411480037dc772b9 > > The methods now return a node, which allows them to replace themselves with > a different implementation. > > Note that the relatively large code impact of this change also means that > you might easily run into merge conflicts with your own local changes, so > here's how to fix them. The transformation pattern is pretty straight > forward. The "analyse_types()" method returns "self", unless it wants to > replace itself, i.e. this > > def analyse_types(self, env): > self.index.analyse_types(env) > > becomes > > def analyse_types(self, env): > self.index = self.index.analyse_types(env) > return self Wohoo! Dag Sverre From stefan_ml at behnel.de Fri Feb 8 00:28:21 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 08 Feb 2013 00:28:21 +0100 Subject: [Cython] Keyword arguments for cdef functions and call for a little help Message-ID: <51143895.9040907@behnel.de> Hi, in the current master branch, I implemented support for passing keyword arguments into cdef functions. The names are mapped statically at compile time to the names declared in the signature. This means that you can now do this: cdef func(int x, bint flag): pass func(1, flag=True) and it will be compiled down into (essentially) this calling code: func(1, 1); Note that optional arguments at the end worked already, so this: cdef func(int x, bint flag1=False, bint flag2=True): pass func(1, flag1=True) is equivalent to this call: func(1, 1, 1); but you can't (currently) do this: func(1, flag2=True) # error: flag1 is missing! Obviously, you also can't use keyword arguments for functions that were declared without argument names, e.g. in a case like this: cdef extern from "some_header.h": int somefunc(int, char*) This feature also works for libc declarations, e.g. from libc.string cimport strstr print(strstr(needle="abc", haystack="xabcy")) where strstr() is declared as cdef extern from "string.h": char *strstr (const char *haystack, const char *needle) (I keep getting the argument order wrong here, so this really makes it easier to read for me.) We keep these declarations here in the source tree: https://github.com/cython/cython/tree/master/Cython/Includes However, I only now converted the parameter names in these standard declarations to lower case, they were previously written as upper case names (i.e. "HAYSTACK" and "NEEDLE"), which is a bit ugly when used as keyword arguments (but allowed parameter names like "FROM" instead of the reserved word "from"). Would someone be so kind to go over the standard declarations that we ship and check that the argument names they are declared with are a) available and b) proper lower case parameter names, as one would expect them? Preferably someone who has the glibc headers within reach to look them up if they are missing? I'm pretty sure that the current names were copied from there anyway, so most of the declarations should be ok already - but some may not be, and I'd like to get this straight before people start to rely on them. I also noticed that many of the C++ function/method declarations and posix declarations lack names. It would be nice if someone could add them, too. I admit that this is a boring and somewhat tedious task, but it would really help us. And, obviously, this new feature needs a bit of general testing. :) Thanks for any help, Stefan From dave.hirschfeld at gmail.com Fri Feb 8 17:54:28 2013 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Fri, 8 Feb 2013 16:54:28 +0000 (UTC) Subject: [Cython] Fused types don't work with cdef classes? Message-ID: Is this a bug? The following code fails to compile on windows VS2012, 32bit Python2.7 with a recent 0.19-pre github cython: cimport cython ctypedef fused char_or_float: cython.char cython.float cdef class FusedExample: def __init__(self, char_or_float x): pass # Resulting in the following exception: C:\temp>C:\dev\bin\Python27\python.exe setup.py build_ext --inplace -- compiler=msvc Compiling example.pyx because it changed. Cythonizing example.pyx running build_ext building 'example' extension C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\BIN\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -I. -IC:\dev\code\Gazprom.MT\pricing\gazprom\mt\pricing -IC:\dev\bin\Python27\lib\site-packages\numpy\core\include "-IC:\dev\lib\Intel\Composer XE 2013\mkl\include" -IC:\dev\bin\Python27\include -IC:\dev\bin\Python27\PC /Tp example.cpp /Fobuild\temp.win32-2.7\Release\example.obj /EHsc /openmp example.cpp example.cpp(1630) : error C2062: type 'int' unexpected example.cpp(1630) : error C2143: syntax error : missing ';' before '{' example.cpp(1630) : error C2447: '{' : missing function header (old-style formal list?) example.cpp(1687) : error C2062: type 'int' unexpected example.cpp(1687) : error C2143: syntax error : missing ';' before '{' example.cpp(1687) : error C2447: '{' : missing function header (old-style formal list?) example.cpp(3869) : error C2440: 'initializing' : cannot convert from 'PyObject *(__cdecl *)(PyObject *,PyObject *,PyObject *)' to 'initproc' None of the functions with this name in scope match the target type error: command '"C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\BIN\cl.exe"' failed with exit status 2 If the cdef is removed from the class it compiles fine. Are fused types supposed to work with cdef classes? Thanks, Dave From stefan_ml at behnel.de Sat Feb 9 10:44:01 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 09 Feb 2013 10:44:01 +0100 Subject: [Cython] How does a fused function differ from an overloaded function? Message-ID: <51161A61.8020101@behnel.de> Hi, I noticed that Cython currently fails to do this: cdef int (*int_abs)(int x) cdef object py_abs py_abs = int_abs = abs Here, abs() is an overloaded function with a couple of C signatures (fabs() and friends) and a Python signature (the builtin). While there is code in NameNode.coerce_to() that figures out that the RHS can be replaced by the Python builtin, the same is lacking for the general case of overloaded entries. While working on fixing this problem (and after turning ProxyNode into an actual node proxy when it comes to coercion), I thought it would be a good idea to make NameNode generally aware of alternative entries and just build a new NameNode with the right entry in its coerce_to() method. Then I noticed that the generic coerce_to() contains this code: if src_type.is_fused or dst_type.is_fused: # See if we are coercing a fused function to a pointer to a # specialized function if (src_type.is_cfunction and not dst_type.is_fused and dst_type.is_ptr and dst_type.base_type.is_cfunction): dst_type = dst_type.base_type for signature in src_type.get_all_specialized_function_types(): if signature.same_as(dst_type): src.type = signature src.entry = src.type.entry src.entry.used = True return self This is essentially the same idea, just done a bit differently (with the drawback that it modifies the node in place, which coerce_to() must *never* do). So, two questions: 1) why is the above code in the generic coerce_to() method and not in NameNode? It doesn't seem to do anything sensible for most other nodes, potentially not even AttributeNode. And it might fail silently when working on things like CloneNode that don't care about entries. Are there other nodes where it does what it should? 2) couldn't fused functions be mapped to a set of overloaded functions (read: entries) before hand, instead of special casing both in places like this? Stefan From dave.hirschfeld at gmail.com Sat Feb 9 14:03:03 2013 From: dave.hirschfeld at gmail.com (David Hirschfeld) Date: Sat, 9 Feb 2013 13:03:03 +0000 Subject: [Cython] Fwd: MemoryView.is_f_contig sometimes not defined? In-Reply-To: References: Message-ID: Reposting because I think my original got blocked because of attachments. Apologies if this appears twice. I want to allow arbitrary C/F contiguous arrays as input to a cdef class so I can dispatch to a different calculation method in each case, avoiding a potentially costly copy. Unfortunately, it appears that cython is generating incorrect code. The following minimal example reproduces the problem: cimport cython cdef class TestContig: cdef cython.bint contig def __init__(self, double[:,:] y): if y.is_c_contig(): self.contig = 1 elif y.is_f_contig(): self.contig = 1 else: self.contig = 0 property contig: def __get__(self): return self.contig # C:\temp> python setup.py build_ext --inplace running build_ext building 'example' extension C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\BIN\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -IC:\dev\bin\Python27\include -IC:\dev\bin\Python27\PC /Tpexample.cpp /Fobuild\temp.win32-2.7\Release\example.obj example.cpp example.cpp(1277) : error C3861: '__pyx_memviewslice_is_f_contig2': identifier not found error: command '"C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\BIN\cl.exe"' failed with exit status 2 C:\temp> I'm on Windows7 with 32bit Python2.7 and I tested that compilation fails with both VS2012 & MinGW32 4.6.1. NB: If you only check for f-contiguity (or c-contiguity) in the method it compiles fine, it appears that the bug only appears when you test for both f and c contiguity in the same method. Thanks, Dave From robertwb at gmail.com Sat Feb 9 22:26:08 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Sat, 9 Feb 2013 13:26:08 -0800 Subject: [Cython] analyse_types() refactoring In-Reply-To: <511416F3.3050609@astro.uio.no> References: <5113E54B.6020708@behnel.de> <511416F3.3050609@astro.uio.no> Message-ID: On Thu, Feb 7, 2013 at 1:04 PM, Dag Sverre Seljebotn wrote: > On 02/07/2013 06:32 PM, Stefan Behnel wrote: >> >> Hi, >> >> I finally found the time to refactor the analysis phase. >> >> >> https://github.com/cython/cython/commit/f9c385e08401ed96b5b0afb8411480037dc772b9 >> >> The methods now return a node, which allows them to replace themselves >> with >> a different implementation. >> >> Note that the relatively large code impact of this change also means that >> you might easily run into merge conflicts with your own local changes, so >> here's how to fix them. The transformation pattern is pretty straight >> forward. The "analyse_types()" method returns "self", unless it wants to >> replace itself, i.e. this >> >> def analyse_types(self, env): >> self.index.analyse_types(env) >> >> becomes >> >> def analyse_types(self, env): >> self.index = self.index.analyse_types(env) >> return self > > > Wohoo! Yay! - Robert From markflorisson88 at gmail.com Sun Feb 10 03:25:33 2013 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 9 Feb 2013 20:25:33 -0600 Subject: [Cython] How does a fused function differ from an overloaded function? In-Reply-To: <51161A61.8020101@behnel.de> References: <51161A61.8020101@behnel.de> Message-ID: On 9 February 2013 03:44, Stefan Behnel wrote: > Hi, > > I noticed that Cython currently fails to do this: > > cdef int (*int_abs)(int x) > cdef object py_abs > py_abs = int_abs = abs > > Here, abs() is an overloaded function with a couple of C signatures (fabs() > and friends) and a Python signature (the builtin). While there is code in > NameNode.coerce_to() that figures out that the RHS can be replaced by the > Python builtin, the same is lacking for the general case of overloaded entries. > > While working on fixing this problem (and after turning ProxyNode into an > actual node proxy when it comes to coercion), I thought it would be a good > idea to make NameNode generally aware of alternative entries and just build > a new NameNode with the right entry in its coerce_to() method. Then I > noticed that the generic coerce_to() contains this code: > > if src_type.is_fused or dst_type.is_fused: > # See if we are coercing a fused function to a pointer to a > # specialized function > if (src_type.is_cfunction and not dst_type.is_fused and > dst_type.is_ptr and dst_type.base_type.is_cfunction): > > dst_type = dst_type.base_type > for signature in src_type.get_all_specialized_function_types(): > if signature.same_as(dst_type): > src.type = signature > src.entry = src.type.entry > src.entry.used = True > return self > > This is essentially the same idea, just done a bit differently (with the > drawback that it modifies the node in place, which coerce_to() must *never* > do). > > So, two questions: > > 1) why is the above code in the generic coerce_to() method and not in > NameNode? It doesn't seem to do anything sensible for most other nodes, > potentially not even AttributeNode. And it might fail silently when working > on things like CloneNode that don't care about entries. Are there other > nodes where it does what it should? I think it works for names and attributes, it allows you to retrieve a specialized version of the fused c(p)def functions and methods. > 2) couldn't fused functions be mapped to a set of overloaded functions > (read: entries) before hand, instead of special casing both in places like > this? Quite possibly, although I'd have to dig in the codebase some more to verify that. You can give it a try, it'd be nice to unify the approaches under the same model. > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Sun Feb 10 03:41:12 2013 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 9 Feb 2013 20:41:12 -0600 Subject: [Cython] Fwd: MemoryView.is_f_contig sometimes not defined? In-Reply-To: References: Message-ID: On 9 February 2013 07:03, David Hirschfeld wrote: > Reposting because I think my original got blocked because of > attachments. Apologies if this appears twice. > > I want to allow arbitrary C/F contiguous arrays as input to a cdef > class so I can dispatch to a different calculation method in each > case, avoiding a potentially costly copy. > Unfortunately, it appears that cython is generating incorrect code. > The following minimal example reproduces the problem: > > cimport cython > > cdef class TestContig: > > cdef cython.bint contig > > def __init__(self, double[:,:] y): > if y.is_c_contig(): > self.contig = 1 > elif y.is_f_contig(): > self.contig = 1 > else: > self.contig = 0 > > property contig: > def __get__(self): > return self.contig > > # > > > C:\temp> python setup.py build_ext --inplace > running build_ext > building 'example' extension > C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\BIN\cl.exe /c > /nologo /Ox /MD /W3 /GS- /DNDEBUG -IC:\dev\bin\Python27\include > -IC:\dev\bin\Python27\PC /Tpexample.cpp > /Fobuild\temp.win32-2.7\Release\example.obj > example.cpp > example.cpp(1277) : error C3861: '__pyx_memviewslice_is_f_contig2': > identifier not found > error: command '"C:\Program Files (x86)\Microsoft Visual Studio > 11.0\VC\BIN\cl.exe"' failed with exit status 2 > C:\temp> > > > I'm on Windows7 with 32bit Python2.7 and I tested that compilation > fails with both VS2012 & MinGW32 4.6.1. > > NB: If you only check for f-contiguity (or c-contiguity) in the method it > compiles fine, it appears that the bug only appears when you test for > both f and c contiguity in the same method. > > > Thanks, > Dave > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Thanks, Cython seems to think (for some reason) that the second function is the same as the first and omits the definition. From stefan_ml at behnel.de Sun Feb 10 06:56:04 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 10 Feb 2013 06:56:04 +0100 Subject: [Cython] How does a fused function differ from an overloaded function? In-Reply-To: References: <51161A61.8020101@behnel.de> Message-ID: <51173674.1070706@behnel.de> mark florisson, 10.02.2013 03:25: > On 9 February 2013 03:44, Stefan Behnel wrote: >> Hi, >> >> I noticed that Cython currently fails to do this: >> >> cdef int (*int_abs)(int x) >> cdef object py_abs >> py_abs = int_abs = abs >> >> Here, abs() is an overloaded function with a couple of C signatures (fabs() >> and friends) and a Python signature (the builtin). While there is code in >> NameNode.coerce_to() that figures out that the RHS can be replaced by the >> Python builtin, the same is lacking for the general case of overloaded entries. >> >> While working on fixing this problem (and after turning ProxyNode into an >> actual node proxy when it comes to coercion), I thought it would be a good >> idea to make NameNode generally aware of alternative entries and just build >> a new NameNode with the right entry in its coerce_to() method. Then I >> noticed that the generic coerce_to() contains this code: >> >> if src_type.is_fused or dst_type.is_fused: >> # See if we are coercing a fused function to a pointer to a >> # specialized function >> if (src_type.is_cfunction and not dst_type.is_fused and >> dst_type.is_ptr and dst_type.base_type.is_cfunction): >> >> dst_type = dst_type.base_type >> for signature in src_type.get_all_specialized_function_types(): >> if signature.same_as(dst_type): >> src.type = signature >> src.entry = src.type.entry >> src.entry.used = True >> return self >> >> This is essentially the same idea, just done a bit differently (with the >> drawback that it modifies the node in place, which coerce_to() must *never* >> do). >> >> So, two questions: >> >> 1) why is the above code in the generic coerce_to() method and not in >> NameNode? It doesn't seem to do anything sensible for most other nodes, >> potentially not even AttributeNode. And it might fail silently when working >> on things like CloneNode that don't care about entries. Are there other >> nodes where it does what it should? > > I think it works for names and attributes, it allows you to retrieve a > specialized version of the fused c(p)def functions and methods. That's what I figured. I might have to take a look at AttributeNode a bit more to see if it really does the right thing in all cases. I would like to avoid having this in the generic coerce_to() method because if it's anything but a NameNode or AttributeNode, it can only have one type (unless I'm missing something), so coercion to different signatures won't be possible anyway. And I wouldn't mind letting the above two nodes share a bit more code, in one way or another. I also think that the idea of having a ProxyNode for reuse was quite right. I've started playing with it a little to let it support coercion delegation, i.e. it would have it's own coerce_to() method that builds CloneNodes at need and coerces either directly its argument or the CloneNode to the target type, depending on is_simple() and maybe other criteria. >> 2) couldn't fused functions be mapped to a set of overloaded functions >> (read: entries) before hand, instead of special casing both in places like >> this? > > Quite possibly, although I'd have to dig in the codebase some more to > verify that. You can give it a try, it'd be nice to unify the > approaches under the same model. What I would like to see, eventually, is that NameNode basically just looks up its entry on type analysis (including all overloaded entries), and then whatever uses the node (to call or assign it) would pass in the right signature/type into its coerce_to() method, which would then select the right entry and return a new NameNode for it (or fail if the signature can't be matched to any entry). AttributeNode would essentially do the same thing, just return either an AttributeNode or a NameNode on type analysis and/or coercion, depending on what entry it finds (and if more than one). Does this sound like it could work for fused types? Stefan From markflorisson88 at gmail.com Sun Feb 10 16:11:57 2013 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 10 Feb 2013 09:11:57 -0600 Subject: [Cython] How does a fused function differ from an overloaded function? In-Reply-To: <51173674.1070706@behnel.de> References: <51161A61.8020101@behnel.de> <51173674.1070706@behnel.de> Message-ID: On 9 February 2013 23:56, Stefan Behnel wrote: > mark florisson, 10.02.2013 03:25: >> On 9 February 2013 03:44, Stefan Behnel wrote: >>> Hi, >>> >>> I noticed that Cython currently fails to do this: >>> >>> cdef int (*int_abs)(int x) >>> cdef object py_abs >>> py_abs = int_abs = abs >>> >>> Here, abs() is an overloaded function with a couple of C signatures (fabs() >>> and friends) and a Python signature (the builtin). While there is code in >>> NameNode.coerce_to() that figures out that the RHS can be replaced by the >>> Python builtin, the same is lacking for the general case of overloaded entries. >>> >>> While working on fixing this problem (and after turning ProxyNode into an >>> actual node proxy when it comes to coercion), I thought it would be a good >>> idea to make NameNode generally aware of alternative entries and just build >>> a new NameNode with the right entry in its coerce_to() method. Then I >>> noticed that the generic coerce_to() contains this code: >>> >>> if src_type.is_fused or dst_type.is_fused: >>> # See if we are coercing a fused function to a pointer to a >>> # specialized function >>> if (src_type.is_cfunction and not dst_type.is_fused and >>> dst_type.is_ptr and dst_type.base_type.is_cfunction): >>> >>> dst_type = dst_type.base_type >>> for signature in src_type.get_all_specialized_function_types(): >>> if signature.same_as(dst_type): >>> src.type = signature >>> src.entry = src.type.entry >>> src.entry.used = True >>> return self >>> >>> This is essentially the same idea, just done a bit differently (with the >>> drawback that it modifies the node in place, which coerce_to() must *never* >>> do). >>> >>> So, two questions: >>> >>> 1) why is the above code in the generic coerce_to() method and not in >>> NameNode? It doesn't seem to do anything sensible for most other nodes, >>> potentially not even AttributeNode. And it might fail silently when working >>> on things like CloneNode that don't care about entries. Are there other >>> nodes where it does what it should? >> >> I think it works for names and attributes, it allows you to retrieve a >> specialized version of the fused c(p)def functions and methods. > > That's what I figured. I might have to take a look at AttributeNode a bit > more to see if it really does the right thing in all cases. > > I would like to avoid having this in the generic coerce_to() method because > if it's anything but a NameNode or AttributeNode, it can only have one type > (unless I'm missing something), so coercion to different signatures won't > be possible anyway. And I wouldn't mind letting the above two nodes share a > bit more code, in one way or another. > > I also think that the idea of having a ProxyNode for reuse was quite right. > I've started playing with it a little to let it support coercion > delegation, i.e. it would have it's own coerce_to() method that builds > CloneNodes at need and coerces either directly its argument or the > CloneNode to the target type, depending on is_simple() and maybe other > criteria. > > >>> 2) couldn't fused functions be mapped to a set of overloaded functions >>> (read: entries) before hand, instead of special casing both in places like >>> this? >> >> Quite possibly, although I'd have to dig in the codebase some more to >> verify that. You can give it a try, it'd be nice to unify the >> approaches under the same model. > > What I would like to see, eventually, is that NameNode basically just looks > up its entry on type analysis (including all overloaded entries), and then > whatever uses the node (to call or assign it) would pass in the right > signature/type into its coerce_to() method, which would then select the > right entry and return a new NameNode for it (or fail if the signature > can't be matched to any entry). > > AttributeNode would essentially do the same thing, just return either an > AttributeNode or a NameNode on type analysis and/or coercion, depending on > what entry it finds (and if more than one). > > Does this sound like it could work for fused types? It sounds this approach might be cleaner than catching this in a global coercion, but on the other hand you want full generality. For instance, there is also the cast syntax that can specialize a function. Or I might have a pointer to a known fused function or method, that I want to deference and specialize. Maybe we need a nicer way to deal with and register coercions, and with what an assignment expects and a value generates. A lot of assignment code seems similar but slightly different in tricky ways. > Stefan > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From robertwb at gmail.com Wed Feb 13 09:07:45 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Wed, 13 Feb 2013 00:07:45 -0800 Subject: [Cython] Fused types don't work with cdef classes? In-Reply-To: References: Message-ID: Yes, this is a bug; there is a bad interaction between fused types and special methods. I created http://trac.cython.org/cython_trac/ticket/802 On Fri, Feb 8, 2013 at 8:54 AM, Dave Hirschfeld wrote: > Is this a bug? > The following code fails to compile on windows VS2012, 32bit Python2.7 with a > recent 0.19-pre github cython: > > > cimport cython > > ctypedef fused char_or_float: > cython.char > cython.float > > > cdef class FusedExample: > > def __init__(self, char_or_float x): > pass > # > > Resulting in the following exception: > > C:\temp>C:\dev\bin\Python27\python.exe setup.py build_ext --inplace -- > compiler=msvc > Compiling example.pyx because it changed. > Cythonizing example.pyx > running build_ext > building 'example' extension > C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\BIN\cl.exe > /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -I. > -IC:\dev\code\Gazprom.MT\pricing\gazprom\mt\pricing > -IC:\dev\bin\Python27\lib\site-packages\numpy\core\include > "-IC:\dev\lib\Intel\Composer XE 2013\mkl\include" > -IC:\dev\bin\Python27\include > -IC:\dev\bin\Python27\PC > /Tp example.cpp > /Fobuild\temp.win32-2.7\Release\example.obj > /EHsc /openmp > example.cpp > example.cpp(1630) : error C2062: type 'int' unexpected > example.cpp(1630) : error C2143: syntax error : missing ';' before '{' > example.cpp(1630) : error C2447: '{' : missing function header (old-style formal > list?) > example.cpp(1687) : error C2062: type 'int' unexpected > example.cpp(1687) : error C2143: syntax error : missing ';' before '{' > example.cpp(1687) : error C2447: '{' : missing function header (old-style formal > list?) > example.cpp(3869) : error C2440: 'initializing' : cannot convert > from 'PyObject *(__cdecl *)(PyObject *,PyObject *,PyObject *)' to > 'initproc' > None of the functions with this name in scope match the target type > error: command '"C:\Program Files (x86)\Microsoft Visual Studio > 11.0\VC\BIN\cl.exe"' > failed with exit status 2 > > > If the cdef is removed from the class it compiles fine. Are fused types supposed > to work with cdef classes? > > > Thanks, > Dave > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From dave.hirschfeld at gmail.com Wed Feb 13 09:16:07 2013 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Wed, 13 Feb 2013 08:16:07 +0000 (UTC) Subject: [Cython] Fused types don't work with cdef classes? References: Message-ID: Robert Bradshaw writes: > > Yes, this is a bug; there is a bad interaction between fused types and > special methods. > > I created http://trac.cython.org/cython_trac/ticket/802 > Thanks for following up. My actual use-case was to allow either 1D or 2D MemoryView inputs to a function by simply transforming the 1D MemoryView to a column vector. For now the workaround is to simply disallow 1D inputs, but it would be nice to have it working. Thanks, Dave From stefan_ml at behnel.de Wed Feb 13 09:30:22 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 13 Feb 2013 09:30:22 +0100 Subject: [Cython] Fused types don't work with cdef classes? In-Reply-To: References: Message-ID: <511B4F1E.70409@behnel.de> Dave Hirschfeld, 13.02.2013 09:16: > Robert Bradshaw writes: >> Yes, this is a bug; there is a bad interaction between fused types and >> special methods. >> >> I created http://trac.cython.org/cython_trac/ticket/802 >> > > Thanks for following up. My actual use-case was to allow either 1D or 2D > MemoryView inputs to a function by simply transforming the 1D MemoryView > to a column vector. > > For now the workaround is to simply disallow 1D inputs, but it would be nice to > have it working. Depending on what the rest of your code looks like, a work-around might be to move the code from __init__() to a separate cdef function and just call that. Stefan From d.s.seljebotn at astro.uio.no Wed Feb 13 20:04:44 2013 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 13 Feb 2013 20:04:44 +0100 Subject: [Cython] cldoc Message-ID: <511BE3CC.6050903@astro.uio.no> Just a heads up about this project; there's bound to be something useful there for auto-wrapping. http://jessevdk.github.com/cldoc/ Dag Sverre From robertwb at gmail.com Thu Feb 14 05:49:46 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Wed, 13 Feb 2013 20:49:46 -0800 Subject: [Cython] SIGSEGV in __Pyx_CyFunction_traverse In-Reply-To: References: Message-ID: Thanks. On Tue, Feb 5, 2013 at 11:56 AM, J Robert Ray wrote: > I was getting a crash during module init of a cython module if a garbage > collection happens between a call to __Pyx_CyFunction_InitDefaults and the > code to populate the defaults. > > The attached patch fixes the crash. This bug affects at least Cython 0.18 > and 0.17.1. > > __Pyx_CyFunction_InitDefaults was not completely zeroing the newly allocated > 'defaults' buffer. > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From robertwb at gmail.com Thu Feb 14 06:08:53 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Wed, 13 Feb 2013 21:08:53 -0800 Subject: [Cython] Possible bug when using cython -Wextra In-Reply-To: <5113ED03.5060307@behnel.de> References: <1737486.SGKB3DSTKz@pcsk4> <51138BF3.6030606@behnel.de> <1395070.2WGoObUNKK@pcsk4> <5113ED03.5060307@behnel.de> Message-ID: On Thu, Feb 7, 2013 at 10:05 AM, Stefan Behnel wrote: > Samuele Kaplun, 07.02.2013 13:00: >> In data gioved? 7 febbraio 2013 12:11:47, Stefan Behnel ha scritto: >>>> [...] >>>> >>>> def test(): >>>> cdef int i >>>> >>>> for i from 0 <= i < 10: >>>> print "foo" >>>> >>>> [...] >>> >>> Yes, it actually is an unused variable in your code. There is no reference >>> to it, only assignments. >> >> mmh. But is it used albeit indirectly. Then what pattern would you suggest in >> this case (i.e. to repeat a certain body a given number of times), in order to >> avoid such warning? > > The normal thing to do in Python would be to use an underscore (i.e. "_") > as variable name. I don't think we currently special case that pattern, > though. Maybe we should. I agree. Done. > Or maybe we should just drop the "unused variable" warning for loop > variables as they actually do something and serve a purpose, even if they > are never referenced. +1 to this too. (Not yet done.) - Robert From robertwb at gmail.com Thu Feb 14 06:29:48 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Wed, 13 Feb 2013 21:29:48 -0800 Subject: [Cython] Two generators in one function In-Reply-To: References: Message-ID: This is due to the archaic --disable-function-redefinition flag. On Mon, Feb 4, 2013 at 4:28 PM, David Roe wrote: > Hi everyone, > I ran into the following problem using Cython 0.17.4 (current version of > Sage). > > If you try to compile a file with the following function in it: > > def test_double_gen(L): > a = all(x != 0 for x in L) > b = all(x != 1 for x in L) > return a and b > > you get errors from the Cython compiler about 'genexpr' being redefined. > > Error compiling Cython file: > ------------------------------------------------------------ > ... > > > def test_double_gen(L): > a = all(x != 0 for x in L) > b = all(x != 1 for x in L) > ^ > ------------------------------------------------------------ > > cython_test.pyx:5:14: 'genexpr' already declared > > Error compiling Cython file: > ------------------------------------------------------------ > ... > > > def test_double_gen(L): > a = all(x != 0 for x in L) > ^ > ------------------------------------------------------------ > > cython_test.pyx:4:14: Previous declaration is here > > Error compiling Cython file: > ------------------------------------------------------------ > ... > > > def test_double_gen(L): > a = all(x != 0 for x in L) > b = all(x != 1 for x in L) > ^ > ------------------------------------------------------------ > > cython_test.pyx:5:14: 'genexpr' redeclared > > Are you currently only able to use one inline generator pre function? > David > > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From roed.math at gmail.com Thu Feb 14 06:35:04 2013 From: roed.math at gmail.com (David Roe) Date: Wed, 13 Feb 2013 22:35:04 -0700 Subject: [Cython] Two generators in one function In-Reply-To: References: Message-ID: Thanks. David On Wed, Feb 13, 2013 at 10:29 PM, Robert Bradshaw wrote: > This is due to the archaic --disable-function-redefinition flag. > > On Mon, Feb 4, 2013 at 4:28 PM, David Roe wrote: > > Hi everyone, > > I ran into the following problem using Cython 0.17.4 (current version of > > Sage). > > > > If you try to compile a file with the following function in it: > > > > def test_double_gen(L): > > a = all(x != 0 for x in L) > > b = all(x != 1 for x in L) > > return a and b > > > > you get errors from the Cython compiler about 'genexpr' being redefined. > > > > Error compiling Cython file: > > ------------------------------------------------------------ > > ... > > > > > > def test_double_gen(L): > > a = all(x != 0 for x in L) > > b = all(x != 1 for x in L) > > ^ > > ------------------------------------------------------------ > > > > cython_test.pyx:5:14: 'genexpr' already declared > > > > Error compiling Cython file: > > ------------------------------------------------------------ > > ... > > > > > > def test_double_gen(L): > > a = all(x != 0 for x in L) > > ^ > > ------------------------------------------------------------ > > > > cython_test.pyx:4:14: Previous declaration is here > > > > Error compiling Cython file: > > ------------------------------------------------------------ > > ... > > > > > > def test_double_gen(L): > > a = all(x != 0 for x in L) > > b = all(x != 1 for x in L) > > ^ > > ------------------------------------------------------------ > > > > cython_test.pyx:5:14: 'genexpr' redeclared > > > > Are you currently only able to use one inline generator pre function? > > David > > > > > > _______________________________________________ > > cython-devel mailing list > > cython-devel at python.org > > http://mail.python.org/mailman/listinfo/cython-devel > > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Thu Feb 14 08:01:25 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 14 Feb 2013 08:01:25 +0100 Subject: [Cython] [cython-users] Re: Python 3 and string frustration In-Reply-To: References: <510F7512.7070101@behnel.de> Message-ID: <511C8BC5.7050404@behnel.de> Robert Bradshaw, 14.02.2013 06:51: > I've proposed having a compiler > directive that lets you specify an encoding (e.g. ascii, utf8) and > automatically endodes/decodes when converting between C and Python > strings. My main objection against that is that it would only work in one direction, from C strings to Python strings. The other direction requires an explicit intermediate bytes object in order to correctly do the memory management, so there's really nothing to win there. Doing anything implicit in that direction would just call for either trouble or inefficiency. For the first direction, C-to-Python, I don't see the major advantage between the implicit cdef unicode py_string = c_string # typing required here and the explicit py_string = c_string.decode('utf-8') # note: no typing here There is only one case where it's a bit simpler: py_string = c_string[:length] # no typing, auto-coercion in contrast to py_string = c_string[:length].decode('utf-8') Anyway, it's just a couple of characters difference, which are best hidden in an explicit "conversion + validation" function anyway. Auto-coercion of C strings will always be more inefficient and error prone than users should be asked to bare, and all we could add would only be the unidirectional conversion part, not any validation or whatever user code has to do in addition. The situation is entirely different for C++ strings. They have an efficient two-way auto-coercion and safely copy their content on creation. In their case, auto-coercion would basically behave like from __future__ import unicode_literals but for string coercion. I have no objections against that. I think it just needs implementing and then testing against a couple of real, existing code bases to see what the real-world tradeoff is then. It's just a matter of whether a user needs to write "" or "" in the right places. All of that being said, the proposal sounds like it's actually two: 1) specify an implicit encoding for coercion between C++ strings and Python unicode strings, and 2) automatically coerce between C++ strings and Python unicode strings by default. 1) means that cdef libcpp.string cs1 = ..., cs2 py_string = cs1 cs2 = py_string would auto-decode and -encode the string, 2) means that cdef libcpp.string cs1 = ..., cs2 py_string = cs1 cs2 = py_string would do it (including any implicit coercions to Python objects). If 2) is desirable at all, I think it makes sense to fold that into two separate directives, as many users will be better off without the second one. There's also the question whether you want coercion to and from "unicode" or to and from "str". Getting the latter right wouldn't be easy, most likely neither for us nor for users who want to apply it to their code. However, given that the only use case for that would be Py2 backwards compatibility, waiting a couple of years longer should nicely solve this problem for us. No need to burden the compiler with it now. Stefan From markflorisson88 at gmail.com Thu Feb 14 18:32:19 2013 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 14 Feb 2013 17:32:19 +0000 Subject: [Cython] Possible bug when using cython -Wextra In-Reply-To: References: <1737486.SGKB3DSTKz@pcsk4> <51138BF3.6030606@behnel.de> <1395070.2WGoObUNKK@pcsk4> <5113ED03.5060307@behnel.de> Message-ID: On 14 February 2013 05:08, Robert Bradshaw wrote: > On Thu, Feb 7, 2013 at 10:05 AM, Stefan Behnel wrote: >> Samuele Kaplun, 07.02.2013 13:00: >>> In data gioved? 7 febbraio 2013 12:11:47, Stefan Behnel ha scritto: >>>>> [...] >>>>> >>>>> def test(): >>>>> cdef int i >>>>> >>>>> for i from 0 <= i < 10: >>>>> print "foo" >>>>> >>>>> [...] >>>> >>>> Yes, it actually is an unused variable in your code. There is no reference >>>> to it, only assignments. >>> >>> mmh. But is it used albeit indirectly. Then what pattern would you suggest in >>> this case (i.e. to repeat a certain body a given number of times), in order to >>> avoid such warning? >> >> The normal thing to do in Python would be to use an underscore (i.e. "_") >> as variable name. I don't think we currently special case that pattern, >> though. Maybe we should. > > I agree. Done. > >> Or maybe we should just drop the "unused variable" warning for loop >> variables as they actually do something and serve a purpose, even if they >> are never referenced. > > +1 to this too. (Not yet done.) Yeah, I think that's the sanest thing. I already implemented this in Numba which bases its control flow on Cython's (because it's awesome, thanks to Vitja :)). It simply adds a flag to NameAssignment which is set for the ForNode's target variable. > - Robert > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From stefan_ml at behnel.de Fri Feb 15 08:53:19 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 15 Feb 2013 08:53:19 +0100 Subject: [Cython] Fwd: A new webpage promoting Compiler technology for CPython In-Reply-To: <3E96A7DD-C8A2-47FB-89C4-D18EB7AEF018@gmail.com> References: <3E96A7DD-C8A2-47FB-89C4-D18EB7AEF018@gmail.com> Message-ID: <511DE96F.4010109@behnel.de> This just came through python-dev: -------- Original-Message -------- Subject: A new webpage promoting Compiler technology for CPython Date: Fri, 15 Feb 2013 01:11:12 -0600 From: Travis Oliphant Hey all, With Numba and Blaze we have been doing a lot of work on what essentially is compiler technology and realizing more and more that we are treading on ground that has been plowed before with many other projects. So, we wanted to create a web-site and perhaps even a mailing list or forum where people could coordinate and communicate about compiler projects, compiler tools, and ways to share efforts and ideas. The website is: http://compilers.pydata.org/ This page is specifically for Compiler projects that either integrate with or work directly with the CPython run-time which is why PyPy is not presently listed. The PyPy project is a great project but we just felt that we wanted to explicitly create a collection of links to compilation projects that are accessible from CPython which are likely less well known. But that is just where we started from. The website is intended to be a community website constructed from a github repository. So, we welcome pull requests from anyone who would like to see the website updated to reflect their related project. Jon Riehl (Mython, PyFront, ROFL, and many other interesting projects) and Stephen Diehl (Blaze) and I will be moderating the pull requests to begin with. But, we welcome others with similar interests to participate in that effort of moderation. The github repository is here: https://github.com/pydata/compilers-webpage This is intended to be a community website for information spreading, and so we welcome any and all contributions. Thank you, Travis Oliphant -------------- next part -------------- _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From sturla at molden.no Mon Feb 18 19:32:40 2013 From: sturla at molden.no (Sturla Molden) Date: Mon, 18 Feb 2013 19:32:40 +0100 Subject: [Cython] PR on refcounting memoryview buffers Message-ID: <512273C8.4000005@molden.no> As Stefan suggested, I have posted a PR for a better fix for the issue when MinGW for some reason emits the symbol "__synch_fetch_and_add_4" instead of generating atomic opcode for the __synch_fetch_and_add builtin. The PR is here: https://github.com/cython/cython/pull/185 The discussion probably belongs on this list instead og Cython user: The problem this addresses is when GCC does not use atomic builtins and emits __synch_fetch_and_add_4 and __synch_fetch_and_sub_4 when Cython are internally refcounting memoryview buffers. For some reason it can even happen on x86 and amd64. My PR undos Marks quick fix that always uses PyThread_acquire_lock on MinGW. PyThread_acquire_lock uses a kernel object (semaphore) on Windows and is not very efficient. I want slicing memoryviews to be fast, and that means PyThread_acquire_lock must go. My PR uses Windows API atomic function InterlockedAdd to implement the semantics of __synch_fetch_and_add_4 and __synch_fetch_and_sub_4 instead of using a Python lock. Usually MinGW is configured to compile GNU atomic builtins correctly. I have yet to see a case where it is not. But obviously one user (JF Gallant) has encountered it. I don't think it is a MinGW specific problem, but currently it has only been seen on MinGW and the fix is MinGW specific (well, it should work on Cygwin too). But whenever MinGW does use atomic builtins it just uses them. So it incurs no speed penalty on well-behaved MinGW builds. I took the liberty to use GNU extensions __inline__ and __attribute(always_inline)__. They will make sure the functions always behave like macros. The rationale being that it is GCC specific code so we can assume GNU extensions are available. If we take them away the code should still work, but we have no guarantee the functions will be inlined. I did not use macros because __synch_fetch_and_add is emitted by the preprocessor, and thus GCC will presumably emit __synch_fetch_and_sub_4 after the preprocessing step, which could require __synch_fetch_and_sub_4 to be a function instead of another macro. (I have no way of finding it out since I cannot test for it.) Regarding Linux and OSX: Failure of GCC to use atomic builtins could also happen on other GCC builds though. I don't think it is a MinGW-only issue. It's probably due to how the GCC build was configured. So we should as a safeguard have this for other OSes too. http://developer.apple.com/library/ios/#DOCUMENTATION/System/Conceptual/ManPages_iPhoneOS/man3/OSAtomicAdd32.3.html We probably just need similar code to what I wrote for MinGW. I can write the code, but I don't have a Mac on which to test it. Also we should use OSAtomic* on clang/LLVM, which is now the platform C compiler on OSX. This will avoid PyThread_acquire_lock being the common synch mechanism for refcounting memoryview buffers on OSX. On Linux I am not sure what to suggest if GCC fails to use atomic builtins. I can handcode inline assembly for x86/amd64. I could also use pthreads and pth threads locks. But we could also assume that it never happen and just let the linker fail on __synch_fetch_and_add_4. Sturla From sturla at molden.no Wed Feb 20 11:55:51 2013 From: sturla at molden.no (Sturla Molden) Date: Wed, 20 Feb 2013 11:55:51 +0100 Subject: [Cython] PR on refcounting memoryview buffers In-Reply-To: <512273C8.4000005@molden.no> References: <512273C8.4000005@molden.no> Message-ID: <15C80BD0-302E-4576-ACF3-C0FFD700569B@molden.no> Den 18. feb. 2013 kl. 19:32 skrev Sturla Molden : > The problem this addresses is when GCC does not use atomic builtins and emits __synch_fetch_and_add_4 and __synch_fetch_and_sub_4 when Cython are internally refcounting memoryview buffers. For some reason it can even happen on x86 and amd64. > Specifically, atomic builtins are not used when compiling for i386, which is MinGWs default target architecture (unless we specify a different -march). GCC will always encounter this problem when targeting i386. Thus the correct fix is to use fallback when GCC is targeting i386 ? not when GCC is targeting MS Windows. So I am closing this PR. But Mark's fix must be corrected, because it does not really address the problem (which is i386, not MinGW)! Sturla From stefan_ml at behnel.de Thu Feb 21 07:46:35 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 21 Feb 2013 07:46:35 +0100 Subject: [Cython] Sage build broken Message-ID: <5125C2CB.5000603@behnel.de> Hi, I just noticed that the Sage build is broken: """ gcc -pthread -shared -L/jenkins/sage/sage-5.2/local/lib build/temp.linux-x86_64-2.7/sage/rings/polynomial/polydict.o -L/jenkins/sage/sage-5.2/local/lib -L/release/merger/sage-5.2/local/lib -lcsage -lstdc++ -lntl -lpython2.7 -o build/lib.linux-x86_64-2.7/sage/rings/polynomial/polydict.so /usr/bin/ld: build/temp.linux-x86_64-2.7/sage/rings/polynomial/polydict.o: relocation R_X86_64_PC32 against `__Pyx_PyDict_IterItems' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: final link failed: Bad value collect2: ld returned 1 exit status command 'gcc' failed with exit status 1 """ Looks like a problem in Sage to me, the gcc command really lacks the -fPIC here. Stefan From stefan_ml at behnel.de Thu Feb 21 21:48:12 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 21 Feb 2013 21:48:12 +0100 Subject: [Cython] Sage build broken In-Reply-To: <5125C2CB.5000603@behnel.de> References: <5125C2CB.5000603@behnel.de> Message-ID: <5126880C.3090903@behnel.de> Stefan Behnel, 21.02.2013 07:46: > I just noticed that the Sage build is broken: > > """ > gcc -pthread -shared -L/jenkins/sage/sage-5.2/local/lib > build/temp.linux-x86_64-2.7/sage/rings/polynomial/polydict.o > -L/jenkins/sage/sage-5.2/local/lib -L/release/merger/sage-5.2/local/lib > -lcsage -lstdc++ -lntl -lpython2.7 -o > build/lib.linux-x86_64-2.7/sage/rings/polynomial/polydict.so > > /usr/bin/ld: build/temp.linux-x86_64-2.7/sage/rings/polynomial/polydict.o: > relocation R_X86_64_PC32 against `__Pyx_PyDict_IterItems' can not be used > when making a shared object; recompile with -fPIC > > /usr/bin/ld: final link failed: Bad value > collect2: ld returned 1 exit status > command 'gcc' failed with exit status 1 > """ > > Looks like a problem in Sage to me, the gcc command really lacks the -fPIC > here. Sorry, my bad. I had a typo in a utility code section name, which prevented the actual implementation of that function from appearing in the C code. No idea what makes gcc generate that misleading error message above, though. Stefan From stefan_ml at behnel.de Thu Feb 21 22:37:13 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 21 Feb 2013 22:37:13 +0100 Subject: [Cython] [cython-users] To add datetime.pxd to cython.cpython? In-Reply-To: <94587c61-6e24-4627-b328-d079c69c2334@googlegroups.com> References: <19694bda-6d20-49ea-87bd-503dfc16dedb@googlegroups.com> <51262838.7040402@behnel.de> <94587c61-6e24-4627-b328-d079c69c2334@googlegroups.com> Message-ID: <51269389.2060400@behnel.de> Hi, I think this discussion is actually better suited for the cython-devel mailing list. We should move it over there. Zaur Shibzukhov, 21.02.2013 20:59: > ???????, 21 ??????? 2013 ?., 16:59:20 UTC+3 ???????????? Stefan Behnel > ???????: >> Zaur Shibzukhov, 21.02.2013 11:25: >>> Last time I actively used datetime module. Because I needed fast >>> creation >>> of date/time/datetime instances I wrote datetime.pxd. It contains much >>> of >>> datetime API from datetime.h + two extended version for time/datetime >>> creation. Does it make sense to include datetime.pxd in cython.cpython? >> >> Given that datetime.h is actually part of the header files that CPython >> installs, it makes total sense to me to include it. Please provide a pull >> request on github for it. > > OK. I will create pull request with datetime.pxd + tests Great. >> However, I don't know what you mean by "extended version for time/datetime >> creation". Could you show us the code for that first? >> > Datetime.h from cpython contains factory functions for creation > time/datetime without timezone info. > But actually datetime module contains public definition of factory > functions for creation time/date with timezone info, which are not in > cpython's datetime.h. > I could create datetime_ex.h for these functions in order to include them > in datetime.pxd. The problem: how to adopt datetime_ex.h to Cython... > > Current datetime.pxd looks like: > [...] I was more interested in the parts that are not in the public header file. Could you list those? Letting Cython generate those definitions isn't really all that much of a problem. We already do this for the stdlib array module, which doesn't have a public header file at all. Stefan From szport at gmail.com Fri Feb 22 08:01:06 2013 From: szport at gmail.com (ZS) Date: Fri, 22 Feb 2013 10:01:06 +0300 Subject: [Cython] To Add datetime.pxd to cython.cpython Message-ID: Extended part is in datetime_ex.h: #include "datetime.h" #define PyDateTime_FromDateAndTimeEx(year, month, day, hour, min, sec, usec, tzinfo) \ PyDateTimeAPI->DateTime_FromDateAndTime(year, month, day, hour, \ min, sec, usec, tzinfo, PyDateTimeAPI->DateTimeType) #define PyTime_FromTimeEx(hour, minute, second, usecond, tzinfo) \ PyDateTimeAPI->Time_FromTime(hour, minute, second, usecond, \ tzinfo, PyDateTimeAPI->TimeType) These macros allow to create dattime/time objects with tzinfo. Of course we could do: t = PyTime_FromTime(........) t = t.replace(tzinfo) in absence of that. Zaur Shibzukhov From szport at gmail.com Fri Feb 22 08:38:56 2013 From: szport at gmail.com (ZS) Date: Fri, 22 Feb 2013 10:38:56 +0300 Subject: [Cython] To Add datetime.pxd to cython.cpython In-Reply-To: References: Message-ID: >These macros allow to create dattime/time objects with tzinfo. >Of course we could do: > > t = PyTime_FromTime(........) > t = t.replace(tzinfo) Sorry last line has to be: t = t.replace(tzinfo=tzinfo) Zaur Shibzukhov From robertwb at gmail.com Fri Feb 22 09:03:47 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 22 Feb 2013 00:03:47 -0800 Subject: [Cython] To Add datetime.pxd to cython.cpython In-Reply-To: References: Message-ID: These could be provided as inline functions in the pxd rather rather than adding another hack like we did for array. On Thu, Feb 21, 2013 at 11:01 PM, ZS wrote: > Extended part is in datetime_ex.h: > > #include "datetime.h" > > #define PyDateTime_FromDateAndTimeEx(year, month, day, hour, min, sec, > usec, tzinfo) \ > PyDateTimeAPI->DateTime_FromDateAndTime(year, month, day, hour, \ > min, sec, usec, tzinfo, PyDateTimeAPI->DateTimeType) > > #define PyTime_FromTimeEx(hour, minute, second, usecond, tzinfo) \ > PyDateTimeAPI->Time_FromTime(hour, minute, second, usecond, \ > tzinfo, PyDateTimeAPI->TimeType) > > These macros allow to create dattime/time objects with tzinfo. > Of course we could do: > > t = PyTime_FromTime(........) > t = t.replace(tzinfo) > > in absence of that. > > > Zaur Shibzukhov > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From stefan_ml at behnel.de Sun Feb 24 16:58:32 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 24 Feb 2013 16:58:32 +0100 Subject: [Cython] [cython-users] freelist benchmarks In-Reply-To: References: <512A1B20.4050707@behnel.de> Message-ID: <512A38A8.3040905@behnel.de> mark florisson, 24.02.2013 15:50: > On 24 February 2013 13:52, Stefan Behnel wrote: >> for those who haven't notice my other e-mail, I implemented a new extension >> type decorator "@cython.freelist(N)" that replaces the normal object >> creation and deallocation with a freelist of N recently freed objects. >> Currently, it's only supported for types that do not have a base class (and >> lifting that restriction is not all that easy). > > Very cool! I've been wanting that for a while now :) So did I. > What's the hurdle with base classes? The problem is that the current way types are being instantiated is recursive. The top-most base class calls tp_alloc() and then each step in the hierarchy adds its bit of initialisation. If you want to introduce a freelist into this scheme, then it's still the top-most class that does the allocation, so it would need to manage all freelists of all of its children in order to return the right object struct for a given instantiation request. This cannot really be done at compile time. Imagine a subtype in a different module, for example, for which the code requests a freelist. The compilation of the base type wouldn't even know that it's supposed to manage a freelist at all, only the subtype knows it. There are a couple of ways to deal with this. One is to replicate the freelist in the base type for all subtypes that it finds at runtime. That might actually be the easiest way to do it, but it requires a bit of memory management in order to add a new freelist when a new subtype is found at runtime. It also means that we'd have to find the right freelist before we can get an object from it (or not, if it's empty), which would likely be an operation that's linear with the number of subtypes. And the freelist set would better be bounded in size to prevent users from flooding it with lots and lots of subtypes. Another option would be to split the initialisation up into two functions, one that allocates *and* initialises the instance and one that *only* initialises it. That would allow each hierarchy level to manage its own freelists and to take its own decision about where to get the object from. This approach comes with a couple of tricky details, as CPython doesn't provide support for this. So we'd need to find a way to handle type hierarchies that are implemented across modules. Maybe the best approach would be to let the base type manage everything and just statically limit the maximum number of subtypes for which it provides separate freelists, at a first come, first serve basis. And the freelist selection could be based on the object struct size (tp_basicsize) instead of the specific type. As long as we don't support inheriting from variable size objects (like tuple/bytes), that would cut down the problem quite nicely. I think I should just give it a try at some point. Stefan From markflorisson88 at gmail.com Sun Feb 24 18:56:06 2013 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 24 Feb 2013 17:56:06 +0000 Subject: [Cython] [cython-users] freelist benchmarks In-Reply-To: References: <512A1B20.4050707@behnel.de> <512A38A8.3040905@behnel.de> Message-ID: On 24 February 2013 17:50, mark florisson wrote: > On 24 February 2013 15:58, Stefan Behnel wrote: >> mark florisson, 24.02.2013 15:50: >>> On 24 February 2013 13:52, Stefan Behnel wrote: >>>> for those who haven't notice my other e-mail, I implemented a new extension >>>> type decorator "@cython.freelist(N)" that replaces the normal object >>>> creation and deallocation with a freelist of N recently freed objects. >>>> Currently, it's only supported for types that do not have a base class (and >>>> lifting that restriction is not all that easy). >>> >>> Very cool! I've been wanting that for a while now :) >> >> So did I. >> >> >>> What's the hurdle with base classes? >> >> The problem is that the current way types are being instantiated is >> recursive. The top-most base class calls tp_alloc() and then each step in >> the hierarchy adds its bit of initialisation. If you want to introduce a >> freelist into this scheme, then it's still the top-most class that does the >> allocation, so it would need to manage all freelists of all of its children >> in order to return the right object struct for a given instantiation request. >> >> This cannot really be done at compile time. Imagine a subtype in a >> different module, for example, for which the code requests a freelist. The >> compilation of the base type wouldn't even know that it's supposed to >> manage a freelist at all, only the subtype knows it. >> >> There are a couple of ways to deal with this. One is to replicate the >> freelist in the base type for all subtypes that it finds at runtime. That >> might actually be the easiest way to do it, but it requires a bit of memory >> management in order to add a new freelist when a new subtype is found at >> runtime. It also means that we'd have to find the right freelist before we >> can get an object from it (or not, if it's empty), which would likely be an >> operation that's linear with the number of subtypes. And the freelist set >> would better be bounded in size to prevent users from flooding it with lots >> and lots of subtypes. >> >> Another option would be to split the initialisation up into two functions, >> one that allocates *and* initialises the instance and one that *only* >> initialises it. That would allow each hierarchy level to manage its own >> freelists and to take its own decision about where to get the object from. >> This approach comes with a couple of tricky details, as CPython doesn't >> provide support for this. So we'd need to find a way to handle type >> hierarchies that are implemented across modules. > > Thanks for the explanation Stefan, this is the one I was thinking of, > but I suppose it'd need an extra pointer to the pure init function in > the type. Hm, since extension types don't do multiple inheritance (and excluding Python subclasses), couldn't you import those init functions across modules through capsules? >> Maybe the best approach would be to let the base type manage everything and >> just statically limit the maximum number of subtypes for which it provides >> separate freelists, at a first come, first serve basis. And the freelist >> selection could be based on the object struct size (tp_basicsize) instead >> of the specific type. As long as we don't support inheriting from variable >> size objects (like tuple/bytes), that would cut down the problem quite >> nicely. I think I should just give it a try at some point. > > What about using pyextensible type from SEP 200 and using a custom > freelist entry on the type? > >> Stefan >> >> -- >> >> --- >> You received this message because you are subscribed to the Google Groups "cython-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an email to cython-users+unsubscribe at googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> From markflorisson88 at gmail.com Sun Feb 24 18:50:47 2013 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 24 Feb 2013 17:50:47 +0000 Subject: [Cython] [cython-users] freelist benchmarks In-Reply-To: <512A38A8.3040905@behnel.de> References: <512A1B20.4050707@behnel.de> <512A38A8.3040905@behnel.de> Message-ID: On 24 February 2013 15:58, Stefan Behnel wrote: > mark florisson, 24.02.2013 15:50: >> On 24 February 2013 13:52, Stefan Behnel wrote: >>> for those who haven't notice my other e-mail, I implemented a new extension >>> type decorator "@cython.freelist(N)" that replaces the normal object >>> creation and deallocation with a freelist of N recently freed objects. >>> Currently, it's only supported for types that do not have a base class (and >>> lifting that restriction is not all that easy). >> >> Very cool! I've been wanting that for a while now :) > > So did I. > > >> What's the hurdle with base classes? > > The problem is that the current way types are being instantiated is > recursive. The top-most base class calls tp_alloc() and then each step in > the hierarchy adds its bit of initialisation. If you want to introduce a > freelist into this scheme, then it's still the top-most class that does the > allocation, so it would need to manage all freelists of all of its children > in order to return the right object struct for a given instantiation request. > > This cannot really be done at compile time. Imagine a subtype in a > different module, for example, for which the code requests a freelist. The > compilation of the base type wouldn't even know that it's supposed to > manage a freelist at all, only the subtype knows it. > > There are a couple of ways to deal with this. One is to replicate the > freelist in the base type for all subtypes that it finds at runtime. That > might actually be the easiest way to do it, but it requires a bit of memory > management in order to add a new freelist when a new subtype is found at > runtime. It also means that we'd have to find the right freelist before we > can get an object from it (or not, if it's empty), which would likely be an > operation that's linear with the number of subtypes. And the freelist set > would better be bounded in size to prevent users from flooding it with lots > and lots of subtypes. > > Another option would be to split the initialisation up into two functions, > one that allocates *and* initialises the instance and one that *only* > initialises it. That would allow each hierarchy level to manage its own > freelists and to take its own decision about where to get the object from. > This approach comes with a couple of tricky details, as CPython doesn't > provide support for this. So we'd need to find a way to handle type > hierarchies that are implemented across modules. Thanks for the explanation Stefan, this is the one I was thinking of, but I suppose it'd need an extra pointer to the pure init function in the type. > Maybe the best approach would be to let the base type manage everything and > just statically limit the maximum number of subtypes for which it provides > separate freelists, at a first come, first serve basis. And the freelist > selection could be based on the object struct size (tp_basicsize) instead > of the specific type. As long as we don't support inheriting from variable > size objects (like tuple/bytes), that would cut down the problem quite > nicely. I think I should just give it a try at some point. What about using pyextensible type from SEP 200 and using a custom freelist entry on the type? > Stefan > > -- > > --- > You received this message because you are subscribed to the Google Groups "cython-users" group. > To unsubscribe from this group and stop receiving emails from it, send an email to cython-users+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > > From stefan_ml at behnel.de Sun Feb 24 22:45:13 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 24 Feb 2013 22:45:13 +0100 Subject: [Cython] [cython-users] freelist benchmarks In-Reply-To: References: <512A1B20.4050707@behnel.de> <512A38A8.3040905@behnel.de> Message-ID: <512A89E9.2070104@behnel.de> mark florisson, 24.02.2013 18:56: > On 24 February 2013 17:50, mark florisson wrote: >> On 24 February 2013 15:58, Stefan Behnel wrote: >>> mark florisson, 24.02.2013 15:50: >>>> On 24 February 2013 13:52, Stefan Behnel wrote: >>>>> for those who haven't notice my other e-mail, I implemented a new extension >>>>> type decorator "@cython.freelist(N)" that replaces the normal object >>>>> creation and deallocation with a freelist of N recently freed objects. >>>>> Currently, it's only supported for types that do not have a base class (and >>>>> lifting that restriction is not all that easy). >>>> >>>> Very cool! I've been wanting that for a while now :) >>> >>> So did I. >>> >>> >>>> What's the hurdle with base classes? >>> >>> The problem is that the current way types are being instantiated is >>> recursive. The top-most base class calls tp_alloc() and then each step in >>> the hierarchy adds its bit of initialisation. If you want to introduce a >>> freelist into this scheme, then it's still the top-most class that does the >>> allocation, so it would need to manage all freelists of all of its children >>> in order to return the right object struct for a given instantiation request. >>> >>> This cannot really be done at compile time. Imagine a subtype in a >>> different module, for example, for which the code requests a freelist. The >>> compilation of the base type wouldn't even know that it's supposed to >>> manage a freelist at all, only the subtype knows it. >>> >>> There are a couple of ways to deal with this. One is to replicate the >>> freelist in the base type for all subtypes that it finds at runtime. That >>> might actually be the easiest way to do it, but it requires a bit of memory >>> management in order to add a new freelist when a new subtype is found at >>> runtime. It also means that we'd have to find the right freelist before we >>> can get an object from it (or not, if it's empty), which would likely be an >>> operation that's linear with the number of subtypes. And the freelist set >>> would better be bounded in size to prevent users from flooding it with lots >>> and lots of subtypes. >>> >>> Another option would be to split the initialisation up into two functions, >>> one that allocates *and* initialises the instance and one that *only* >>> initialises it. That would allow each hierarchy level to manage its own >>> freelists and to take its own decision about where to get the object from. >>> This approach comes with a couple of tricky details, as CPython doesn't >>> provide support for this. So we'd need to find a way to handle type >>> hierarchies that are implemented across modules. >> >> Thanks for the explanation Stefan, this is the one I was thinking of, >> but I suppose it'd need an extra pointer to the pure init function in >> the type. > > Hm, since extension types don't do multiple inheritance (and excluding > Python subclasses), couldn't you import those init functions across > modules through capsules? Well, yes, I suppose you could. However, that's quite some overhead. I think it's way easier to just provision a couple of freelists in advance and assign them to different subtype sizes as they come in. Even in somewhat large hierarchies, I doubt that the object structs will have all that many different sizes. Remember, the size only changes when you add cdef attributes, and only once when you start adding cdef methods. And even structs that appear in different subtrees of the hierarchy and that carry different attributes may end up having the same struct size due to layout coincidences. I would expect that even a type hierarchy of, say, 20 types, would have at most some 4-8 different struct sizes. Most of the time, subtypes are there to change behaviour, not state. The only real drawback is that you need to enable the base type to do all that's necessary, which you may not have control over in a few cases. But then again, if it's worth using a freelist on one subtype, it's probably worth using it in general, so it's best to fix the base type in any way. >>> Maybe the best approach would be to let the base type manage everything and >>> just statically limit the maximum number of subtypes for which it provides >>> separate freelists, at a first come, first serve basis. And the freelist >>> selection could be based on the object struct size (tp_basicsize) instead >>> of the specific type. As long as we don't support inheriting from variable >>> size objects (like tuple/bytes), that would cut down the problem quite >>> nicely. I think I should just give it a try at some point. I changed the current type pointer check to look at tp_basicsize instead. That made it work for almost all classes in lxml's own Element hierarchy, with only a couple of exceptions in lxml.objectify that have one additional object field. So, just extending the freelist support to use two different lists for different struct sizes instead of just one would make it work for all of lxml already. Taking a look at Sage to see how the situation appears over there would be interesting, I guess. Stefan From roed.math at gmail.com Mon Feb 25 00:00:31 2013 From: roed.math at gmail.com (David Roe) Date: Sun, 24 Feb 2013 16:00:31 -0700 Subject: [Cython] [cython-users] freelist benchmarks In-Reply-To: <512A89E9.2070104@behnel.de> References: <512A1B20.4050707@behnel.de> <512A38A8.3040905@behnel.de> <512A89E9.2070104@behnel.de> Message-ID: I changed the current type pointer check to look at tp_basicsize instead. > That made it work for almost all classes in lxml's own Element hierarchy, > with only a couple of exceptions in lxml.objectify that have one additional > object field. So, just extending the freelist support to use two different > lists for different struct sizes instead of just one would make it work for > all of lxml already. Taking a look at Sage to see how the situation appears > over there would be interesting, I guess. > I found some chains of length 5. This could be shortened to 4 by putting the freelist at the level of Element (which is where you most care about speed of object creation). SageObject -> Element (_parent attribute and cdef methods) -> Vector (_degree) -> FreeModuleElement (_is_mutable) -> FreeModuleElement_generic_dense (_entries) SageObject -> Element (_parent attribute and cdef methods) ->sage.structure.element.Matrix (_nrows) -> sage.matrix.matrix.Matrix (_base_ring) -> Matrix_integer_dense (_entries) This does look cool to have though. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Mon Feb 25 10:17:25 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 25 Feb 2013 10:17:25 +0100 Subject: [Cython] [cython-users] freelist benchmarks In-Reply-To: References: <512A1B20.4050707@behnel.de> <512A38A8.3040905@behnel.de> <512A89E9.2070104@behnel.de> Message-ID: <512B2C25.7080009@behnel.de> Hi, thanks for looking through it. David Roe, 25.02.2013 00:00: > I changed the current type pointer check to look at tp_basicsize instead. > >> That made it work for almost all classes in lxml's own Element hierarchy, >> with only a couple of exceptions in lxml.objectify that have one additional >> object field. So, just extending the freelist support to use two different >> lists for different struct sizes instead of just one would make it work for >> all of lxml already. Taking a look at Sage to see how the situation appears >> over there would be interesting, I guess. > > I found some chains of length 5. This could be shortened to 4 by putting > the freelist at the level of Element (which is where you most care about > speed of object creation). It's substantially easier to keep it in the top-level base class, though. Otherwise, we'd need a new protocol between inheriting types as I previously described. That add a *lot* of complexity. > SageObject > -> Element (_parent attribute and cdef methods) > -> Vector (_degree) > -> FreeModuleElement (_is_mutable) > -> FreeModuleElement_generic_dense (_entries) > > SageObject > -> Element (_parent attribute and cdef methods) > ->sage.structure.element.Matrix (_nrows) > -> sage.matrix.matrix.Matrix (_base_ring) > -> Matrix_integer_dense (_entries) Ok, so even for something as large as Sage, we'd apparently end up with just a couple of freelists for a given base type. That really makes it appear reasonable to make that number a compile time constant as well. I mean, even if you *really* oversize it, all you loose is the static memory for a couple of pointers. On a 64 bit system, if you use a freelist size of 8 objects and provision freelists for 8 differently sized subtypes, that's 8*8*8 bytes in total, or half a KB, statically allocated. Even a hundred times that size shouldn't hurt anyone. Unused subtype freelists really take almost no space and won't hurt performance either. > This does look cool to have though. It definitely is. Stefan From dave.hirschfeld at gmail.com Mon Feb 25 22:56:45 2013 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Mon, 25 Feb 2013 21:56:45 +0000 (UTC) Subject: [Cython] No matching signature with fused memoryview and None default Message-ID: With the following code I get a "No matching signature found" error. Is this a bug? ``` %%cython cimport cython ctypedef fused floating: cython.double cython.float def nosignature(floating[:] x, floating[:] myarray = None): print myarray is None return x ``` In [39]: nosignature(ones(1, dtype=np.float32), ones(1, dtype=np.float32)) False Out[39]: In [40]: nosignature(ones(1, dtype=np.float64), ones(1, dtype=np.float64)) False Out[40]: In [41]: nosignature(ones(1, dtype=np.float64)) True Out[41]: In [42]: nosignature(ones(1, dtype=np.float32)) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) in () ----> 1 nosignature(ones(1, dtype=np.float32)) ca9.pyd in ca9.__pyx_fused_cpdef ca9.c:2282)() TypeError: No matching signature found Thanks, Dave From dave.hirschfeld at gmail.com Tue Feb 26 13:47:54 2013 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Tue, 26 Feb 2013 12:47:54 +0000 (UTC) Subject: [Cython] Can't assign memview cast to memview slice Message-ID: The following works: ``` %%cython cimport cython import numpy as np cimport numpy as np def f(double[:,:] arr): cdef double[:] res = np.zeros(2*arr.size, dtype=np.float64) cdef double[:] tmp tmp = &arr[0,0] res[:arr.size] = tmp return res ``` whereas the following: ``` %%cython cimport cython import numpy as np cimport numpy as np def f(double[:,:] arr): cdef double[:] res = np.zeros(2*arr.size, dtype=np.float64) res[:arr.size] = &arr[0,0] return res ``` ...gives the below error: Error compiling Cython file: ------------------------------------------------------------ ... import numpy as np cimport numpy as np def f(double[:,:] arr): cdef double[:] res = np.zeros(2*arr.size, dtype=np.float64) res[:arr.size] = &arr[0,0] ^ ------------------------------------------------------------ d3ce.pyx:7:21: Cannot assign type 'double[::1]' to 'double' It would be nice if cython could take care of the temporary itself though the workaround is certainly simple enough that it's not a big issue at all. Thanks, Dave From robertwb at gmail.com Tue Feb 26 21:16:52 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Tue, 26 Feb 2013 12:16:52 -0800 Subject: [Cython] [cython-users] freelist benchmarks In-Reply-To: <512B2C25.7080009@behnel.de> References: <512A1B20.4050707@behnel.de> <512A38A8.3040905@behnel.de> <512A89E9.2070104@behnel.de> <512B2C25.7080009@behnel.de> Message-ID: On Mon, Feb 25, 2013 at 1:17 AM, Stefan Behnel wrote: > Hi, > > thanks for looking through it. > > David Roe, 25.02.2013 00:00: >> I changed the current type pointer check to look at tp_basicsize instead. >> >>> That made it work for almost all classes in lxml's own Element hierarchy, >>> with only a couple of exceptions in lxml.objectify that have one additional >>> object field. So, just extending the freelist support to use two different >>> lists for different struct sizes instead of just one would make it work for >>> all of lxml already. Taking a look at Sage to see how the situation appears >>> over there would be interesting, I guess. >> >> I found some chains of length 5. This could be shortened to 4 by putting >> the freelist at the level of Element (which is where you most care about >> speed of object creation). > > It's substantially easier to keep it in the top-level base class, though. > Otherwise, we'd need a new protocol between inheriting types as I > previously described. That add a *lot* of complexity. > > >> SageObject >> -> Element (_parent attribute and cdef methods) >> -> Vector (_degree) >> -> FreeModuleElement (_is_mutable) >> -> FreeModuleElement_generic_dense (_entries) >> >> SageObject >> -> Element (_parent attribute and cdef methods) >> ->sage.structure.element.Matrix (_nrows) >> -> sage.matrix.matrix.Matrix (_base_ring) >> -> Matrix_integer_dense (_entries) I don't know that (expensive) matrices are the best example, and often the chains are larger for elements one really cares about. sage: def base_tree(x): return [] if x is None else [x] + base_tree(x.__base__) ...: sage: base_tree(Integer) [, , , , , , , , , , ] sage: base_tree(RealDoubleElement) [, , , , , , , ] sage: base_tree(type(mod(1, 10))) [, , , , , , , , ] > Ok, so even for something as large as Sage, we'd apparently end up with > just a couple of freelists for a given base type. That really makes it > appear reasonable to make that number a compile time constant as well. I > mean, even if you *really* oversize it, all you loose is the static memory > for a couple of pointers. On a 64 bit system, if you use a freelist size of > 8 objects and provision freelists for 8 differently sized subtypes, that's > 8*8*8 bytes in total, or half a KB, statically allocated. Even a hundred > times that size shouldn't hurt anyone. Unused subtype freelists really take > almost no space and won't hurt performance either. Elements in Sage are typically larger than 8 bytes, and our experiments for Integer showed that the benefit (for this class) extended well beyond 8 items. On the other hand lots of elements are so expensive that they don't merit this at all. I think one thing to keep in mind is that Python's heap is essentially a "freelist" of objects of every size up to 128(?) bytes, so what are we trying to save by putting it at the base type and going up and down the __cinit__/__dealloc__ chain? I suppose we save the zero-ing out of memory and a function call or two, but that's not the expensive part. For our Integer free list, we save going up the __cinit__/__dealloc__ call, initializing a couple of members, setting the vtable pointers, which turns out to be the bulk of the cost. I'd love to see something like this work, if just for final classes. It may require the introduction of new functions to determine exactly how much cleanup/setup we want to do when inserting/removing stuff from the pool rather than giving up the memory completely. >> This does look cool to have though. > > It definitely is. Yes! - Robert From stefan_ml at behnel.de Wed Feb 27 08:24:55 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 27 Feb 2013 08:24:55 +0100 Subject: [Cython] [cython-users] freelist benchmarks In-Reply-To: References: <512A1B20.4050707@behnel.de> <512A38A8.3040905@behnel.de> <512A89E9.2070104@behnel.de> <512B2C25.7080009@behnel.de> Message-ID: <512DB4C7.8000405@behnel.de> Robert Bradshaw, 26.02.2013 21:16: > On Mon, Feb 25, 2013 at 1:17 AM, Stefan Behnel wrote: >> David Roe, 25.02.2013 00:00: >>> I changed the current type pointer check to look at tp_basicsize instead. >>> >>>> That made it work for almost all classes in lxml's own Element hierarchy, >>>> with only a couple of exceptions in lxml.objectify that have one additional >>>> object field. So, just extending the freelist support to use two different >>>> lists for different struct sizes instead of just one would make it work for >>>> all of lxml already. Taking a look at Sage to see how the situation appears >>>> over there would be interesting, I guess. >>> >>> I found some chains of length 5. This could be shortened to 4 by putting >>> the freelist at the level of Element (which is where you most care about >>> speed of object creation). >> >> It's substantially easier to keep it in the top-level base class, though. >> Otherwise, we'd need a new protocol between inheriting types as I >> previously described. That add a *lot* of complexity. >> >> >>> SageObject >>> -> Element (_parent attribute and cdef methods) >>> -> Vector (_degree) >>> -> FreeModuleElement (_is_mutable) >>> -> FreeModuleElement_generic_dense (_entries) >>> >>> SageObject >>> -> Element (_parent attribute and cdef methods) >>> ->sage.structure.element.Matrix (_nrows) >>> -> sage.matrix.matrix.Matrix (_base_ring) >>> -> Matrix_integer_dense (_entries) > > I don't know that (expensive) matrices are the best example, and often > the chains are larger for elements one really cares about. > > sage: def base_tree(x): return [] if x is None else [x] + > base_tree(x.__base__) > ...: > > sage: base_tree(Integer) > [, 'sage.structure.element.EuclideanDomainElement'>, 'sage.structure.element.PrincipalIdealDomainElement'>, 'sage.structure.element.DedekindDomainElement'>, 'sage.structure.element.IntegralDomainElement'>, 'sage.structure.element.CommutativeRingElement'>, 'sage.structure.element.RingElement'>, 'sage.structure.element.ModuleElement'>, 'sage.structure.element.Element'>, 'sage.structure.sage_object.SageObject'>, ] > > sage: base_tree(RealDoubleElement) > [, 'sage.structure.element.FieldElement'>, 'sage.structure.element.CommutativeRingElement'>, 'sage.structure.element.RingElement'>, 'sage.structure.element.ModuleElement'>, 'sage.structure.element.Element'>, 'sage.structure.sage_object.SageObject'>, ] > > sage: base_tree(type(mod(1, 10))) > [, 'sage.rings.finite_rings.integer_mod.IntegerMod_abstract'>, 'sage.rings.finite_rings.element_base.FiniteRingElement'>, 'sage.structure.element.CommutativeRingElement'>, 'sage.structure.element.RingElement'>, 'sage.structure.element.ModuleElement'>, 'sage.structure.element.Element'>, 'sage.structure.sage_object.SageObject'>, ] My original question was if they have differently sized object structs or not. Those that don't would currently go into the same freelist. >> Ok, so even for something as large as Sage, we'd apparently end up with >> just a couple of freelists for a given base type. That really makes it >> appear reasonable to make that number a compile time constant as well. I >> mean, even if you *really* oversize it, all you loose is the static memory >> for a couple of pointers. On a 64 bit system, if you use a freelist size of >> 8 objects and provision freelists for 8 differently sized subtypes, that's >> 8*8*8 bytes in total, or half a KB, statically allocated. Even a hundred >> times that size shouldn't hurt anyone. Unused subtype freelists really take >> almost no space and won't hurt performance either. > > Elements in Sage are typically larger than 8 bytes I wasn't adding up the size of the objects, only of the pointers in the freelists. If the objects end up in the freelist, they'll also be used on the next instantiation, so their size doesn't really matter. > and our > experiments for Integer showed that the benefit (for this class) > extended well beyond 8 items. On the other hand lots of elements are > so expensive that they don't merit this at all. > > I think one thing to keep in mind is that Python's heap is essentially > a "freelist" of objects of every size up to 128(?) bytes, so what are > we trying to save by putting it at the base type and going up and down > the __cinit__/__dealloc__ chain? Allocation still seems to take its time. > I suppose we save the zero-ing out of > memory and a function call or two, but that's not the expensive part. And I noticed now that we still have to do the zeroing out in order to initialise C typed attributes. And that it's actually not trivial to figure out in what cases we can safely put a subtype into the freelist. There are a couple of special cases in CPython's object allocation, e.g. for heap types. Their instances own a reference to the type, which is not the case for static types. > For our Integer free list, we save going up the __cinit__/__dealloc__ > call, initializing a couple of members, setting the vtable pointers, > which turns out to be the bulk of the cost. And your hierarchy examples above show that that they are implemented across multiple modules. I can imagine that being a major problem as the C compiler can't inline the tp_new calls in that case, can't really reorder the struct field assignments, etc. I imagine that the freelist could leave the initial vtable untouched in some cases, but that would mean that we need a freelist per actual type, instead of object struct size. Now, if we move the freelist handling into each subtype (as you and Mark proposed already), we'd get some of this for free, because the objects that get freed are already properly set up for the specific type, including vtable etc. All that remains to be done is to zero out the (known) C typed attributes, set the (known) object attributes to None, and call any __cinit__() methods in the super types to do the rest for us. We might have to do it in the right order, i.e. initialise some attributes, call the corresponding __cinit__() method, initialise some more attributes, ... So, basically, we'd manually inline the bottom-up aggregation of all tp_new functions into the current one, skipping those operations that we don't consider necessary in the freelist case, such as the vtable setup. Now, the only remaining issue is how to get at the __cinit__() functions if the base type isn't in the same module, but as Mark proposed, that could still be done if we require it to be exported in a C-API (and assume that it doesn't exist if not?). Would be better to know it at compile time, though... Stefan From robertwb at gmail.com Wed Feb 27 09:54:24 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Wed, 27 Feb 2013 00:54:24 -0800 Subject: [Cython] [cython-users] freelist benchmarks In-Reply-To: <512DB4C7.8000405@behnel.de> References: <512A1B20.4050707@behnel.de> <512A38A8.3040905@behnel.de> <512A89E9.2070104@behnel.de> <512B2C25.7080009@behnel.de> <512DB4C7.8000405@behnel.de> Message-ID: On Tue, Feb 26, 2013 at 11:24 PM, Stefan Behnel wrote: > Robert Bradshaw, 26.02.2013 21:16: >> On Mon, Feb 25, 2013 at 1:17 AM, Stefan Behnel wrote: >>> David Roe, 25.02.2013 00:00: >>>> I changed the current type pointer check to look at tp_basicsize instead. >>>> >>>>> That made it work for almost all classes in lxml's own Element hierarchy, >>>>> with only a couple of exceptions in lxml.objectify that have one additional >>>>> object field. So, just extending the freelist support to use two different >>>>> lists for different struct sizes instead of just one would make it work for >>>>> all of lxml already. Taking a look at Sage to see how the situation appears >>>>> over there would be interesting, I guess. >>>> >>>> I found some chains of length 5. This could be shortened to 4 by putting >>>> the freelist at the level of Element (which is where you most care about >>>> speed of object creation). >>> >>> It's substantially easier to keep it in the top-level base class, though. >>> Otherwise, we'd need a new protocol between inheriting types as I >>> previously described. That add a *lot* of complexity. >>> >>> >>>> SageObject >>>> -> Element (_parent attribute and cdef methods) >>>> -> Vector (_degree) >>>> -> FreeModuleElement (_is_mutable) >>>> -> FreeModuleElement_generic_dense (_entries) >>>> >>>> SageObject >>>> -> Element (_parent attribute and cdef methods) >>>> ->sage.structure.element.Matrix (_nrows) >>>> -> sage.matrix.matrix.Matrix (_base_ring) >>>> -> Matrix_integer_dense (_entries) >> >> I don't know that (expensive) matrices are the best example, and often >> the chains are larger for elements one really cares about. >> >> sage: def base_tree(x): return [] if x is None else [x] + >> base_tree(x.__base__) >> ...: >> >> sage: base_tree(Integer) >> [, > 'sage.structure.element.EuclideanDomainElement'>, > 'sage.structure.element.PrincipalIdealDomainElement'>, > 'sage.structure.element.DedekindDomainElement'>, > 'sage.structure.element.IntegralDomainElement'>, > 'sage.structure.element.CommutativeRingElement'>, > 'sage.structure.element.RingElement'>, > 'sage.structure.element.ModuleElement'>, > 'sage.structure.element.Element'>, > 'sage.structure.sage_object.SageObject'>, ] >> >> sage: base_tree(RealDoubleElement) >> [, > 'sage.structure.element.FieldElement'>, > 'sage.structure.element.CommutativeRingElement'>, > 'sage.structure.element.RingElement'>, > 'sage.structure.element.ModuleElement'>, > 'sage.structure.element.Element'>, > 'sage.structure.sage_object.SageObject'>, ] >> >> sage: base_tree(type(mod(1, 10))) >> [, > 'sage.rings.finite_rings.integer_mod.IntegerMod_abstract'>, > 'sage.rings.finite_rings.element_base.FiniteRingElement'>, > 'sage.structure.element.CommutativeRingElement'>, > 'sage.structure.element.RingElement'>, > 'sage.structure.element.ModuleElement'>, > 'sage.structure.element.Element'>, > 'sage.structure.sage_object.SageObject'>, ] > > My original question was if they have differently sized object structs or > not. Those that don't would currently go into the same freelist. They all add to the struct at the leaf subclass. >>> Ok, so even for something as large as Sage, we'd apparently end up with >>> just a couple of freelists for a given base type. That really makes it >>> appear reasonable to make that number a compile time constant as well. I >>> mean, even if you *really* oversize it, all you loose is the static memory >>> for a couple of pointers. On a 64 bit system, if you use a freelist size of >>> 8 objects and provision freelists for 8 differently sized subtypes, that's >>> 8*8*8 bytes in total, or half a KB, statically allocated. Even a hundred >>> times that size shouldn't hurt anyone. Unused subtype freelists really take >>> almost no space and won't hurt performance either. >> >> Elements in Sage are typically larger than 8 bytes > > I wasn't adding up the size of the objects, only of the pointers in the > freelists. If the objects end up in the freelist, they'll also be used on > the next instantiation, so their size doesn't really matter. It does if you use a lot of them, then they just sit around forever, but I suppose if you use them once you're willing to pay the price. It also doesn't make sense for a lot of them that are rather expensive anyways (e.g. every kind of matrix or polynomial specialization). >> and our >> experiments for Integer showed that the benefit (for this class) >> extended well beyond 8 items. On the other hand lots of elements are >> so expensive that they don't merit this at all. >> >> I think one thing to keep in mind is that Python's heap is essentially >> a "freelist" of objects of every size up to 128(?) bytes, so what are >> we trying to save by putting it at the base type and going up and down >> the __cinit__/__dealloc__ chain? > > Allocation still seems to take its time. Yes, it does. >> I suppose we save the zero-ing out of >> memory and a function call or two, but that's not the expensive part. > > And I noticed now that we still have to do the zeroing out in order to > initialise C typed attributes. Yep. > And that it's actually not trivial to figure > out in what cases we can safely put a subtype into the freelist. There are > a couple of special cases in CPython's object allocation, e.g. for heap > types. Their instances own a reference to the type, which is not the case > for static types. > > >> For our Integer free list, we save going up the __cinit__/__dealloc__ >> call, initializing a couple of members, setting the vtable pointers, >> which turns out to be the bulk of the cost. > > And your hierarchy examples above show that that they are implemented > across multiple modules. I can imagine that being a major problem as the C > compiler can't inline the tp_new calls in that case, can't really reorder > the struct field assignments, etc. Yes. And some of those modules are already 1000s of lines, so it's not like we should just put them all in one (though perhaps someday we'll support some kind of static linking...). > I imagine that the freelist could leave the initial vtable untouched in > some cases, but that would mean that we need a freelist per actual type, > instead of object struct size. > > Now, if we move the freelist handling into each subtype (as you and Mark > proposed already), we'd get some of this for free, because the objects that > get freed are already properly set up for the specific type, including > vtable etc. All that remains to be done is to zero out the (known) C typed > attributes, set the (known) object attributes to None, and call any > __cinit__() methods in the super types to do the rest for us. We might have > to do it in the right order, i.e. initialise some attributes, call the > corresponding __cinit__() method, initialise some more attributes, ... > > So, basically, we'd manually inline the bottom-up aggregation of all tp_new > functions into the current one, skipping those operations that we don't > consider necessary in the freelist case, such as the vtable setup. > > Now, the only remaining issue is how to get at the __cinit__() functions if > the base type isn't in the same module, but as Mark proposed, that could > still be done if we require it to be exported in a C-API (and assume that > it doesn't exist if not?). Would be better to know it at compile time, > though... Yes, and that's still going to (potentially) be expensive. I'd rather have a way of controlling what, if anything, gets zero'd out/set to None, as most of that (in Sage's case at least) will still be valid for the newly-reused type or instantly over-written (though perhaps the default could be to call __dealloc__/__cinit__). With this we could skip going up and down the type hierarchy at all. - Robert From stefan_ml at behnel.de Wed Feb 27 14:09:16 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 27 Feb 2013 14:09:16 +0100 Subject: [Cython] freelist benchmarks In-Reply-To: References: <512A1B20.4050707@behnel.de> <512A38A8.3040905@behnel.de> <512A89E9.2070104@behnel.de> <512B2C25.7080009@behnel.de> <512DB4C7.8000405@behnel.de> Message-ID: <512E057C.2010507@behnel.de> Robert Bradshaw, 27.02.2013 09:54: > On Tue, Feb 26, 2013 at 11:24 PM, Stefan Behnel wrote: >> I imagine that the freelist could leave the initial vtable untouched in >> some cases, but that would mean that we need a freelist per actual type, >> instead of object struct size. >> >> Now, if we move the freelist handling into each subtype (as you and Mark >> proposed already), we'd get some of this for free, because the objects that >> get freed are already properly set up for the specific type, including >> vtable etc. All that remains to be done is to zero out the (known) C typed >> attributes, set the (known) object attributes to None, and call any >> __cinit__() methods in the super types to do the rest for us. We might have >> to do it in the right order, i.e. initialise some attributes, call the >> corresponding __cinit__() method, initialise some more attributes, ... >> >> So, basically, we'd manually inline the bottom-up aggregation of all tp_new >> functions into the current one, skipping those operations that we don't >> consider necessary in the freelist case, such as the vtable setup. >> >> Now, the only remaining issue is how to get at the __cinit__() functions if >> the base type isn't in the same module, but as Mark proposed, that could >> still be done if we require it to be exported in a C-API (and assume that >> it doesn't exist if not?). Would be better to know it at compile time, >> though... > > Yes, and that's still going to (potentially) be expensive. I'd rather > have a way of controlling what, if anything, gets zero'd out/set to > None, as most of that (in Sage's case at least) will still be valid > for the newly-reused type or instantly over-written (though perhaps > the default could be to call __dealloc__/__cinit__). With this we > could skip going up and down the type hierarchy at all. I don't think the zeroing is a problem. Just bursting out static data to memory should be plenty fast these days and not incur any wait cycles or pipeline stalls, as long as the compiler/processor can figure out that there are no interdependencies between the assignments. The None assignments may be a problem due to the INCREFs, but even in that case, the C compiler and processor should be able to detect that they are all just incrementing the same address in memory and may end up reducing a series of updates into one. The only real problem are the calls to __cinit__(), which run user code and can thus do anything. If they can't be inlined, the C compiler needs to lessen a lot of its assumptions. Would it make sense to require users to implement __cinit__() as an inline method in a .pxd file if they want to use a freelist on a subtype? Or would that be overly restrictive? It would prevent them from using module globals, for example. That's quite a restriction normally, but I'm not sure how much it hurts the "average" code in the specific case of __cinit__(). Stefan From dave.hirschfeld at gmail.com Wed Feb 27 14:17:40 2013 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Wed, 27 Feb 2013 13:17:40 +0000 (UTC) Subject: [Cython] Non-deterministic behavoiur? Message-ID: Using the following test code: import numpy as np from lapack import dgelsy from numpy.linalg import lstsq A = np.array([[ 0.12, -8.19, 7.69, -2.26, -4.71], [-6.91, 2.22, -5.12, -9.08, 9.96], [-3.33, -8.94, -6.72, -4.4 , -9.98], [ 3.97, 3.33, -2.74, -7.92, -3.2 ]]) # b = np.array([[ 7.3 , 0.47, -6.28], [ 1.33, 6.58, -3.42], [ 2.68, -1.71, 3.46], [-9.62, -0.79, 0.41]]) # print '#################' print '# ACTUAL RESULT #' print '#################' print lstsq(A, b)[0] print print '#################' print '# DGELSY RESULT #' print '#################' print dgelsy(A, b) I get: ################# # ACTUAL RESULT # ################# [[-0.6858 -0.2431 0.0642] [-0.795 -0.0836 0.2118] [ 0.3767 0.1208 -0.6541] [ 0.2885 -0.2415 0.4176] [ 0.2916 0.3525 -0.3015]] ################# # DGELSY RESULT # ################# [[-0.6858 -0.2431 0.0642] [-0.795 -0.0836 0.2118] [ 0.3767 0.1208 -0.6541] [ 0.2885 -0.2415 0.4176] [ 0.2916 0.3525 -0.3015]] All well and good, however if I type the `tmp` variable as a memview in the following code cdef double[:] res cdef double[:,:] tmp tmp = np.zeros([ldb, nrhs], order='F', dtype=np.float64) tmp[:b.shape[0]] = b res = np.ravel(tmp, order='F') the result changes?!? ################# # DGELSY RESULT # ################# [[-0.7137 -0.2429 0.0876] [-0.2882 -0.0884 -0.2117] [-0.4282 0.1284 0.0185] [ 0.9564 -0.2478 -0.1404] [ 0.3625 0.3519 -0.3607]] Remove the `cdef double[:,:] tmp` and I'm back to the correct result. Does this make any sense? To try and figure out what was going on I put in a couple of debugging print statements: print 'res = ', repr(np.asarray(res)) print 'res.flags = {{{flags}}}'.format(flags=np.asarray(res).flags) Only changing these lines resulted in the same incorrect result ################# # DGELSY RESULT # ################# res = array([ 7.3 , 1.33, 2.68, -9.62, 0., 0.47, 6.58, -1.71, -0.79, 0., -6.28, -3.42, 3.46, 0.41, 0.]) res.flags = { C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False} [[-0.7137 -0.2429 0.0876] [-0.2882 -0.0884 -0.2117] [-0.4282 0.1284 0.0185] [ 0.9564 -0.2478 -0.1404] [ 0.3625 0.3519 -0.3607]] Removing (only) the print statements again gave me the correct results. So, it seems either typing the array as a memview or printing res will screw up the calculation. The cython code is given below. Any ideas if this is a cython bug or something I'm doing wrong? Thanks, Dave cdef extern from "mkl_lapack.h" nogil: void DGELSY(const MKL_INT* m, const MKL_INT* n, const MKL_INT* nrhs, double* a, const MKL_INT* lda, double* b, const MKL_INT* ldb, MKL_INT* jpvt, const double* rcond, MKL_INT* rank, double* work, const MKL_INT* lwork, MKL_INT* info) @cython.embedsignature(True) @cython.boundscheck(False) @cython.wraparound(False) @cython.cdivision(True) def dgelsy(double[:,:] A, double[:,:] b, double rcond=1e-15, overwrite_A=False, overwrite_b=False): cdef MKL_INT rank, info cdef MKL_INT m = A.shape[0] cdef MKL_INT n = A.shape[1] cdef MKL_INT nrhs = b.shape[1] cdef MKL_INT lda = m cdef MKL_INT ldb = max(m, n) cdef MKL_INT lwork = -1 cdef double worksize = 0 #cdef double[:,:] tmp cdef double[:] res, work cdef MKL_INT[:] jpvt if b.shape[0] != m: message = "b.shape[0] must be equal to A.shape[0].\n" message += "b.shape[0] = {b.shape[0]}\n" message += "A.shape[0] = {A.shape[0]}\n" raise MKL_LAPACK_ERROR(message.format(A=A, b=b)) flags = np.asarray(A).flags if not flags['F_CONTIGUOUS'] or not overwrite_A: A = A.copy_fortran() flags = np.asarray(b).flags if not flags['F_CONTIGUOUS'] or not overwrite_b or b.shape[0] < n: tmp = np.zeros([ldb, nrhs], order='F', dtype=np.float64) tmp[:b.shape[0]] = b res = np.ravel(tmp, order='F') else: res = np.ravel(b, order='F') #print 'res = ', repr(np.asarray(res)) #print 'res.flags = {{{flags}}}'.format(flags=np.asarray(res).flags) jpvt = np.empty(n, dtype=np.int32) DGELSY(&m, &n, &nrhs, &A[0,0], &lda, &res[0], &ldb, &jpvt[0], &rcond, &rank, &worksize, &lwork, &info) if info != 0: message = "Parameter {i} had an illegal value when calling gelsy." raise MKL_LAPACK_ERROR(message.format(i=info)) lwork = int(worksize) work = np.empty(lwork, dtype=np.float64) DGELSY(&m, &n, &nrhs, &A[0,0], &lda, &res[0], &ldb, &jpvt[0], &rcond, &rank, &worksize, &lwork, &info) if info != 0: message = "Parameter {i} had an illegal value when calling gelsy." raise MKL_LAPACK_ERROR(message.format(i=info)) return np.asarray(res).reshape(nrhs, -1).T[:n] # From robertwb at gmail.com Wed Feb 27 19:42:26 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Wed, 27 Feb 2013 10:42:26 -0800 Subject: [Cython] freelist benchmarks In-Reply-To: <512E057C.2010507@behnel.de> References: <512A1B20.4050707@behnel.de> <512A38A8.3040905@behnel.de> <512A89E9.2070104@behnel.de> <512B2C25.7080009@behnel.de> <512DB4C7.8000405@behnel.de> <512E057C.2010507@behnel.de> Message-ID: On Wed, Feb 27, 2013 at 5:09 AM, Stefan Behnel wrote: > Robert Bradshaw, 27.02.2013 09:54: >> On Tue, Feb 26, 2013 at 11:24 PM, Stefan Behnel wrote: >>> I imagine that the freelist could leave the initial vtable untouched in >>> some cases, but that would mean that we need a freelist per actual type, >>> instead of object struct size. >>> >>> Now, if we move the freelist handling into each subtype (as you and Mark >>> proposed already), we'd get some of this for free, because the objects that >>> get freed are already properly set up for the specific type, including >>> vtable etc. All that remains to be done is to zero out the (known) C typed >>> attributes, set the (known) object attributes to None, and call any >>> __cinit__() methods in the super types to do the rest for us. We might have >>> to do it in the right order, i.e. initialise some attributes, call the >>> corresponding __cinit__() method, initialise some more attributes, ... >>> >>> So, basically, we'd manually inline the bottom-up aggregation of all tp_new >>> functions into the current one, skipping those operations that we don't >>> consider necessary in the freelist case, such as the vtable setup. >>> >>> Now, the only remaining issue is how to get at the __cinit__() functions if >>> the base type isn't in the same module, but as Mark proposed, that could >>> still be done if we require it to be exported in a C-API (and assume that >>> it doesn't exist if not?). Would be better to know it at compile time, >>> though... >> >> Yes, and that's still going to (potentially) be expensive. I'd rather >> have a way of controlling what, if anything, gets zero'd out/set to >> None, as most of that (in Sage's case at least) will still be valid >> for the newly-reused type or instantly over-written (though perhaps >> the default could be to call __dealloc__/__cinit__). With this we >> could skip going up and down the type hierarchy at all. > > I don't think the zeroing is a problem. Just bursting out static data to > memory should be plenty fast these days and not incur any wait cycles or > pipeline stalls, as long as the compiler/processor can figure out that > there are no interdependencies between the assignments. The None > assignments may be a problem due to the INCREFs, but even in that case, the > C compiler and processor should be able to detect that they are all just > incrementing the same address in memory and may end up reducing a series of > updates into one. The only real problem are the calls to __cinit__(), which > run user code and can thus do anything. If they can't be inlined, the C > compiler needs to lessen a lot of its assumptions. > > Would it make sense to require users to implement __cinit__() as an inline > method in a .pxd file if they want to use a freelist on a subtype? Or would > that be overly restrictive? It would prevent them from using module > globals, for example. That's quite a restriction normally, but I'm not sure > how much it hurts the "average" code in the specific case of __cinit__(). It would hurt in the couple of examples I've thought about (e.g. fast Sage elements, where one wants to set the Parent field correctly). - Robert From dave.hirschfeld at gmail.com Wed Feb 27 20:05:08 2013 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Wed, 27 Feb 2013 19:05:08 +0000 (UTC) Subject: [Cython] MemoryViews require writeable arrays? Message-ID: %%cython cimport cython import numpy as np cimport numpy as np @cython.boundscheck(False) @cython.wraparound(False) @cython.cdivision(True) cpdef double[:] return_one(double[:] x): return np.array([1.0]) In [43]: x = randn(3) ...: return_one(x) Out[43]: In [44]: x.flags['WRITEABLE'] = False ...: return_one(x) Traceback (most recent call last): File "", line 2, in return_one(x) File "_cython_magic_7761e77f78c4e321261152684b47c674.pyx", line 11, in _cython_magic_7761e77f78c4e321261152684b47c674.return_one (C:\Users\dhirschfeld\.ipython\cython\_cython_magic_7761e77f78c4e321261152684b47 c674.c:1727) File "stringsource", line 619, in View.MemoryView.memoryview_cwrapper (C:\Users\dhirschfeld\.ipython\cython\_cython_magic_7761e77f78c4e321261152684b47 c674.c:8819) File "stringsource", line 327, in View.MemoryView.memoryview.__cinit__ (C:\Users\dhirschfeld\.ipython\cython\_cython_magic_7761e77f78c4e321261152684b47 c674.c:5594) ValueError: buffer source array is read-only Is this a required restriction? Is there any workaround? The context is calling cython routines using IPython.parallel. IIUC any input arrays sent over zmq are necessarily read-only. As can be seen with the example, even if we don't modify (or use) the input array at all we still get the error. Any help, esp. in regards to a workaround would be greatly appreciated! Thanks, Dave From dave.hirschfeld at gmail.com Wed Feb 27 20:16:08 2013 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Wed, 27 Feb 2013 19:16:08 +0000 (UTC) Subject: [Cython] MemoryViews require writeable arrays? References: Message-ID: Dave Hirschfeld writes: > > cpdef double[:] return_one(double[:] x): > return np.array([1.0]) > > In [43]: x = randn(3) > ...: return_one(x) > Out[43]: > > In [44]: x.flags['WRITEABLE'] = False > ...: return_one(x) > ValueError: buffer source array is read-only > > > Any help, esp. in regards to a workaround would be greatly appreciated! > > Thanks, > Dave > > It seems using the numpy buffer interface works but I guess it would still be good if this worked for memviews too: %%cython cimport cython import numpy as np cimport numpy as np ctypedef np.float64_t float64_t @cython.boundscheck(False) @cython.wraparound(False) @cython.cdivision(True) cpdef double[:] return_one_np(np.ndarray[float64_t, ndim=1] x): return np.array([1.0]) In [203]: return_one_np(x) Out[203]: Cheers, Dave From dave.hirschfeld at gmail.com Thu Feb 28 09:45:21 2013 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Thu, 28 Feb 2013 08:45:21 +0000 (UTC) Subject: [Cython] Non-deterministic behavoiur? References: Message-ID: Dave Hirschfeld writes: > > Using the following test code: > > So, it seems either typing the array as a memview or printing res > will screw up the calculation. > > The cython code is given below. Any ideas if this is a cython bug or something > I'm doing wrong? > > Thanks, > Dave > To answer my own question, it can't be that a simple print statement will change the program so I must be doing something wrong! It makes it hard to track down when it gives the right answer most of the time and segfaults randomly when nothing seems to have changed. I'm sure it's just incorrect arguments to dgelsy so I'll look into that... -Dave From szport at gmail.com Thu Feb 28 12:49:45 2013 From: szport at gmail.com (ZS) Date: Thu, 28 Feb 2013 14:49:45 +0300 Subject: [Cython] About IndexNode and unicode[index] Message-ID: Looking into IndexNode class in ExprNode.py I have seen a possibility for addition of more fast code path for unicode[index] as it done in method `generate_setitem_code` in case of lists. This is files for evaluation of performance difference: #### unicode_index.h /* This is striped version of __Pyx_GetItemInt_Unicode_Fast */ #include "unicodeobject.h" static inline Py_UCS4 unicode_char(PyObject* ustring, Py_ssize_t i); static inline Py_UCS4 unicode_char(PyObject* ustring, Py_ssize_t i) { #if CYTHON_PEP393_ENABLED if (PyUnicode_READY(ustring) < 0) return (Py_UCS4)-1; #endif return __Pyx_PyUnicode_READ_CHAR(ustring, i); } ##### unicode_index.pyx # coding: utf-8 cdef extern from 'unicode_index.h': inline Py_UCS4 unicode_char(unicode ustring, int i) cdef unicode text = u"abcdefghigklmnopqrstuvwxyzabcdefghigklmnopqrstuvwxyz" def f_1(unicode text): cdef int i, j cdef int n = len(text) cdef Py_UCS4 ch for j from 0<=j<=1000000: for i from 0<=i<=n-1: ch = text[i] def f_2(unicode text): cdef int i, j cdef int n = len(text) cdef Py_UCS4 ch for j from 0<=j<=1000000: for i from 0<=i<=n-1: ch = unicode_char(text, i) def test_1(): f_1(text) def test_2(): f_2(text) Timing results: (py33) zbook:mytests $ python3.3 -m timeit -n 100 -r 10 -s "from mytests.unicode_index import test_1" "test_1()" 100 loops, best of 10: 89 msec per loop (py33) zbook:mytests $ python3.3 -m timeit -n 100 -r 10 -s "from mytests.unicode_index import test_2" "test_2()" 100 loops, best of 10: 46.1 msec per loop in setup.py globally: "boundscheck": False "wraparound": False "nonecheck": False Zaur Shibzukhov From dave.hirschfeld at gmail.com Thu Feb 28 12:55:31 2013 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Thu, 28 Feb 2013 11:55:31 +0000 (UTC) Subject: [Cython] Non-deterministic behavoiur? References: Message-ID: Dave Hirschfeld writes: > > Dave Hirschfeld writes: > > > > > Using the following test code: > > > > > So, it seems either typing the array as a memview or printing res > > will screw up the calculation. > > > > The cython code is given below. Any ideas if this is a cython bug or something > > I'm doing wrong? > > > > Thanks, > > Dave > > > > To answer my own question, it can't be that a simple print statement will > change the program so I must be doing something wrong! It makes it hard > to track down when it gives the right answer most of the time and segfaults > randomly when nothing seems to have changed. I'm sure it's just incorrect > arguments to dgelsy so I'll look into that... > > -Dave > > And for those following, the obvious error was in using the double `worksize` instead of the array of size n, `work` in the 2nd call to DGELSY. DGELSY(&m, &n, &nrhs, &A[0,0], &lda, &res[0], &ldb, &jpvt[0], &rcond, &rank, &worksize, &lwork, &info) Sorry for the noise. -Dave From dave.hirschfeld at gmail.com Thu Feb 28 13:11:07 2013 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Thu, 28 Feb 2013 12:11:07 +0000 (UTC) Subject: [Cython] MemoryView Casting slow compared to ndarray buffer syntax Message-ID: %%cython cimport cython import numpy as np cimport numpy as np ctypedef np.float64_t float64_t @cython.boundscheck(False) @cython.wraparound(False) @cython.cdivision(True) def echo_numpy(np.ndarray[float64_t, ndim=1] x): return x @cython.boundscheck(False) @cython.wraparound(False) @cython.cdivision(True) def echo_memview(double[:] x): return np.asarray(x) @cython.boundscheck(False) @cython.wraparound(False) @cython.cdivision(True) def echo_memview_nocast(double[:] x): return x In [19]: %timeit echo_memview(x) ...: %timeit echo_memview_nocast(x) ...: %timeit echo_numpy(x) 10000 loops, best of 3: 38.1 ?s per loop 100000 loops, best of 3: 5.58 ?s per loop 1000000 loops, best of 3: 749 ns per loop In [20]: 38.1e-6/749e-9 Out[20]: 50.86782376502002 In [21]: 5.58e-6/749e-9 Out[21]: 7.449933244325767 So it seems that the MemoryView is 50x slower than using the ndarray buffer syntax and even 7.5x slower without casting to an array. Is there anything that can be done about this or is it jsut something to be aware of and use each of them in the situations where they perform best? Thanks, Dave From yury at shurup.com Thu Feb 28 13:58:08 2013 From: yury at shurup.com (Yury V. Zaytsev) Date: Thu, 28 Feb 2013 13:58:08 +0100 Subject: [Cython] Class methods returning C++ class references are not dealt with correctly? Message-ID: <1362056288.2913.28.camel@newpride> Hi, I'm sorry if my question would appear to be trivial, but what am I supposed to do, if I want to wrap class methods, that return a reference to another class? >From reading the list, I've gathered that apparently the best strategy of dealing with references is just to not to use them (convert to pointers immediately), because of some scoping rules issues. It works for me for a simple case of POD types, like cdef extern from "test.h": int& foo() cdef int* x = &foo() but in a more complex case, Cython generates incorrect C++ code (first it declares a reference, then assigns to it, which, of course, doesn't even compile): cdef extern from "token.h": cppclass Token: Token(const Datum&) except + cdef extern from "tokenstack.h": cppclass TokenStack: Token& top() except + cdef Token* tok = &self.pEngine.OStack.top() <-> Token *__pyx_v_tok; Token &__pyx_t_5; __pyx_t_5 = __pyx_v_self->pEngine->OStack.top(); __pyx_v_tok = (&__pyx_t_5); I would expect to see this instead: Token *__pyx_v_tok = &__pyx_v_self->pEngine->OStack.top(); Am I doing something wrong? Is there any other way to achieve what I want, other than writing custom C macros? Thanks, -- Sincerely yours, Yury V. Zaytsev From sturla at molden.no Thu Feb 28 14:29:59 2013 From: sturla at molden.no (Sturla Molden) Date: Thu, 28 Feb 2013 14:29:59 +0100 Subject: [Cython] MemoryViews require writeable arrays? In-Reply-To: References: Message-ID: <512F5BD7.9080906@molden.no> On 27.02.2013 20:05, Dave Hirschfeld wrote: > Is this a required restriction? Is there any workaround? http://www.python.org/dev/peps/pep-3118/ What you should consider is the "readonly" field in "struct bufferinfo" or the access flag "PyBUF_WRITEABLE". In short: A PEP3118 buffer can be readonly, and then you shouldn't write to it! When you set the readonly flag, Cython cannot retrieve the buffer with PyBUF_WRITEABLE. Thus, Cython helps you not to shoot yourself in the foot. I don't think you can declare a read-only memoryview in Cython. (Well, not by any means I know of.) Sturla From sturla at molden.no Thu Feb 28 15:34:31 2013 From: sturla at molden.no (Sturla Molden) Date: Thu, 28 Feb 2013 15:34:31 +0100 Subject: [Cython] Class methods returning C++ class references are not dealt with correctly? In-Reply-To: <1362056288.2913.28.camel@newpride> References: <1362056288.2913.28.camel@newpride> Message-ID: <512F6AF7.6040001@molden.no> On 28.02.2013 13:58, Yury V. Zaytsev wrote: > Hi, > > I'm sorry if my question would appear to be trivial, but what am I > supposed to do, if I want to wrap class methods, that return a reference > to another class? > > From reading the list, I've gathered that apparently the best strategy > of dealing with references is just to not to use them (convert to > pointers immediately), because of some scoping rules issues. > > It works for me for a simple case of POD types, like > > cdef extern from "test.h": > int& foo() > > cdef int* x = &foo() > > but in a more complex case, Cython generates incorrect C++ code (first > it declares a reference, then assigns to it, which, of course, doesn't > even compile): > > cdef extern from "token.h": > cppclass Token: > Token(const Datum&) except + > > cdef extern from "tokenstack.h": > cppclass TokenStack: > Token& top() except + > > cdef Token* tok = &self.pEngine.OStack.top() > > <-> > > Token *__pyx_v_tok; > Token &__pyx_t_5; > __pyx_t_5 = __pyx_v_self->pEngine->OStack.top(); > __pyx_v_tok = (&__pyx_t_5); This is clearly a bug in Cython. The generated code should be: Token *__pyx_v_tok; Token &__pyx_t_5 = __pyx_v_self->pEngine->OStack.top(); __pyx_v_tok = (&__pyx_t_5); One cannot let a C++ reference dangle: Token &__pyx_t_5; // illegal C++ Sturla From yury at shurup.com Thu Feb 28 15:46:48 2013 From: yury at shurup.com (Yury V. Zaytsev) Date: Thu, 28 Feb 2013 15:46:48 +0100 Subject: [Cython] Class methods returning C++ class references are not dealt with correctly? In-Reply-To: <512F6AF7.6040001@molden.no> References: <1362056288.2913.28.camel@newpride> <512F6AF7.6040001@molden.no> Message-ID: <1362062808.2913.62.camel@newpride> On Thu, 2013-02-28 at 15:34 +0100, Sturla Molden wrote: > > This is clearly a bug in Cython. One cannot let a C++ reference > dangle. Hi Sturla, Thanks for the confirmation! I had a closer look at it, and I think I know why this happens. My method call is actually wrapped in a try { ... } catch clause, because I declared it as being able to throw exceptions, so the reference can't be defined in this block, or it will not be accessible to the outside world. Apparently, Cython should rather do something like this instead: Token *__pyx_v_tok; Token *__pyx_t_5_p; try { Token &__pyx_t_5 = __pyx_v_self->pEngine->OStack.top(); __pyx_t_5_p = (&__pyx_t_5); } ... __pyx_v_tok = __pyx_t_5_p; I'm sorry, but I don't think that I can personally help fixing this, because even if I manage to come up with a patch to generate declarations inside try blocks with my non-existing knowledge of Cython internals, this simply not gonna work. I believe that some convention should be established regarding references handling, i.e. stating that Cython will generate correct code to convert them to pointers if such and such syntax is used... Hopefully, in the mean time, there is some other solution to the problem that I have overlooked. Z. -- Sincerely yours, Yury V. Zaytsev From dave.hirschfeld at gmail.com Thu Feb 28 15:55:15 2013 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Thu, 28 Feb 2013 14:55:15 +0000 (UTC) Subject: [Cython] MemoryViews require writeable arrays? References: <512F5BD7.9080906@molden.no> Message-ID: Sturla Molden writes: > > On 27.02.2013 20:05, Dave Hirschfeld wrote: > > > Is this a required restriction? Is there any workaround? > > http://www.python.org/dev/peps/pep-3118/ > > What you should consider is the "readonly" field in "struct bufferinfo" > or the access flag "PyBUF_WRITEABLE". > > In short: > > A PEP3118 buffer can be readonly, and then you shouldn't write to it! > When you set the readonly flag, Cython cannot retrieve the buffer with > PyBUF_WRITEABLE. Thus, Cython helps you not to shoot yourself in the > foot. I don't think you can declare a read-only memoryview in Cython. > (Well, not by any means I know of.) > > Sturla > > So the issue is that at present memoryviews can't be readonly? Presumably because this works for numpy arrays it would be possible to also make readonly memoryviews? I think that would certainly be nice to have, but maybe it's a niche use case. Certainly, for IPython.parallel use it's easy enough to write a shim which sets the array to writeable with the understanding that changes don't get propagated back. Thanks, Dave From sturla at molden.no Thu Feb 28 15:58:36 2013 From: sturla at molden.no (Sturla Molden) Date: Thu, 28 Feb 2013 15:58:36 +0100 Subject: [Cython] Class methods returning C++ class references are not dealt with correctly? In-Reply-To: <1362062808.2913.62.camel@newpride> References: <1362056288.2913.28.camel@newpride> <512F6AF7.6040001@molden.no> <1362062808.2913.62.camel@newpride> Message-ID: <512F709C.1070405@molden.no> On 28.02.2013 15:46, Yury V. Zaytsev wrote: > My method call is actually wrapped in a try { ... } catch clause, > because I declared it as being able to throw exceptions, so the > reference can't be defined in this block, or it will not be accessible > to the outside world. If Cython generates illegal C++ code (i.e. C++ that don't compile) it is a bug in Cython. There must be a general error in the handling of C++ references when they are declared without a target. Sturla From sturla at molden.no Thu Feb 28 16:41:27 2013 From: sturla at molden.no (Sturla Molden) Date: Thu, 28 Feb 2013 16:41:27 +0100 Subject: [Cython] MemoryViews require writeable arrays? In-Reply-To: References: <512F5BD7.9080906@molden.no> Message-ID: <512F7AA7.9060604@molden.no> On 28.02.2013 15:55, Dave Hirschfeld wrote: > So the issue is that at present memoryviews can't be readonly? https://github.com/cython/cython/blob/master/Cython/Compiler/MemoryView.py#L33 Typed memoryviews are thus acquired with the PyBUF_WRITEABLE flag. If the the assigned buffer is readonly, the request to acquire the PEP3118 buffer will fail. If you remove the PyBUF_WRITEABLE flag from lines 33 to 36, you can acquire a readonly buffer with typed memoryviews. But this is not recommended. In this case you would have to check for the readonly flag yourself and make sure you don't write to readonly buffer. Sturla From sebastian at sipsolutions.net Thu Feb 28 16:13:17 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 28 Feb 2013 16:13:17 +0100 Subject: [Cython] Be more forgiving about memoryview strides Message-ID: <1362064397.2663.14.camel@sebastian-laptop> Hey, Maybe someone here already saw it (I don't have a track account, or I would just create a ticket), but it would be nice if Cython was more forgiving about contiguous requirements on strides. In the future this would make it easier for numpy to go forward with changing the contiguous flags to be more reasonable for its purpose, and second also to allow old (and maybe for the moment remaining) corner cases in numpy to slip past (as well as possibly the same for other programs...). An example is (see also https://github.com/numpy/numpy/issues/2956 and the PR linked there for more details): def add_one(array): cdef double[::1] a = array a[0] += 1. return array giving: >>> add_one(np.ascontiguousarray(np.arange(10.)[::100])) ValueError: Buffer and memoryview are not contiguous in the same dimension. This could easily be changed if MemoryViews check the strides as "can be interpreted as contiguous". That means that if shape[i] == 1, then strides[i] are arbitrary (you can just change them if you like). This is also the case for 0-sized arrays, which are arguably always contiguous, no matter their strides are! Regards, Sebastian PS: A similar thing exists with np.ndarray[...] interface if the user accesses array.strides. They get the arrays strides not the buffers. This is not quite related, but if it would be easy to use the buffer's strides in that case, it may make it easier if we want to change the flags in numpy in the long term, since one could clean up strides for forced contiguous buffer requests. From brad.froehle at gmail.com Thu Feb 28 17:01:52 2013 From: brad.froehle at gmail.com (Bradley M. Froehle) Date: Thu, 28 Feb 2013 08:01:52 -0800 Subject: [Cython] Class methods returning C++ class references are not dealt with correctly? In-Reply-To: <1362056288.2913.28.camel@newpride> References: <1362056288.2913.28.camel@newpride> Message-ID: On Thu, Feb 28, 2013 at 4:58 AM, Yury V. Zaytsev wrote: > Hi, > > I'm sorry if my question would appear to be trivial, but what am I > supposed to do, if I want to wrap class methods, that return a reference > to another class? As a workaround you could use: cdef extern from "test.h": int* foo2ptr "&foo" () cdef int *x = foo2ptr() This could be extended to your other example as well. -Brad -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertwb at gmail.com Thu Feb 28 18:50:34 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Thu, 28 Feb 2013 09:50:34 -0800 Subject: [Cython] Be more forgiving about memoryview strides In-Reply-To: <1362064397.2663.14.camel@sebastian-laptop> References: <1362064397.2663.14.camel@sebastian-laptop> Message-ID: On Thu, Feb 28, 2013 at 7:13 AM, Sebastian Berg wrote: > Hey, > > Maybe someone here already saw it (I don't have a track account, or I > would just create a ticket), but it would be nice if Cython was more > forgiving about contiguous requirements on strides. In the future this > would make it easier for numpy to go forward with changing the > contiguous flags to be more reasonable for its purpose, and second also > to allow old (and maybe for the moment remaining) corner cases in numpy > to slip past (as well as possibly the same for other programs...). An > example is (see also https://github.com/numpy/numpy/issues/2956 and the > PR linked there for more details): > > def add_one(array): > cdef double[::1] a = array > a[0] += 1. > return array > > giving: > >>>> add_one(np.ascontiguousarray(np.arange(10.)[::100])) > ValueError: Buffer and memoryview are not contiguous in the same > dimension. > > This could easily be changed if MemoryViews check the strides as "can be > interpreted as contiguous". That means that if shape[i] == 1, then > strides[i] are arbitrary (you can just change them if you like). This is > also the case for 0-sized arrays, which are arguably always contiguous, > no matter their strides are! I was under the impression that the primary value for contiguous is that it a foo[::1] can be interpreted as a foo*. Letting strides be arbitrary completely breaks this, right? > PS: A similar thing exists with np.ndarray[...] interface if the user > accesses array.strides. They get the arrays strides not the buffers. > This is not quite related, but if it would be easy to use the buffer's > strides in that case, it may make it easier if we want to change the > flags in numpy in the long term, since one could clean up strides for > forced contiguous buffer requests. > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From yury at shurup.com Thu Feb 28 19:38:29 2013 From: yury at shurup.com (Yury V. Zaytsev) Date: Thu, 28 Feb 2013 19:38:29 +0100 Subject: [Cython] Class methods returning C++ class references are not dealt with correctly? In-Reply-To: References: <1362056288.2913.28.camel@newpride> Message-ID: <1362076709.2913.102.camel@newpride> Hi Brad, On Thu, 2013-02-28 at 08:01 -0800, Bradley M. Froehle wrote: > > cdef extern from "test.h": > int* foo2ptr "&foo" () > > cdef int *x = foo2ptr() Thank you for this interesting suggestion, but I must be missing something, because when I do the following: cdef extern from "tokenstack.h": cppclass TokenStack: Token* top "Token&" () except + cdef Token* tok = self.pEngine.OStack.top() I end up with the following generated code, which, of course, doesn't compile: Token *__pyx_t_5; __pyx_t_5 = __pyx_v_self->pEngine->OStack.Token&(); whereas, I'd like to see generated this: Token *__pyx_t_5; __pyx_t_5 = __pyx_v_self->pEngine->OStack->top(); Any ideas? -- Sincerely yours, Yury V. Zaytsev From szport at gmail.com Thu Feb 28 19:31:28 2013 From: szport at gmail.com (ZS) Date: Thu, 28 Feb 2013 21:31:28 +0300 Subject: [Cython] About IndexNode and unicode[index] In-Reply-To: References: Message-ID: 2013/2/28 ZS : > Looking into IndexNode class in ExprNode.py I have seen a possibility > for addition of more fast code path for unicode[index] as it done in > method `generate_setitem_code` in case of lists. > > This is files for evaluation of performance difference: > > #### unicode_index.h > > /* This is striped version of __Pyx_GetItemInt_Unicode_Fast */ > #include "unicodeobject.h" > > static inline Py_UCS4 unicode_char(PyObject* ustring, Py_ssize_t i); > > static inline Py_UCS4 unicode_char(PyObject* ustring, Py_ssize_t i) { > #if CYTHON_PEP393_ENABLED > if (PyUnicode_READY(ustring) < 0) return (Py_UCS4)-1; > #endif > return __Pyx_PyUnicode_READ_CHAR(ustring, i); > } > > ##### unicode_index.pyx > > # coding: utf-8 > > cdef extern from 'unicode_index.h': > inline Py_UCS4 unicode_char(unicode ustring, int i) > > cdef unicode text = u"abcdefghigklmnopqrstuvwxyzabcdefghigklmnopqrstuvwxyz" > > def f_1(unicode text): > cdef int i, j > cdef int n = len(text) > cdef Py_UCS4 ch > > for j from 0<=j<=1000000: > for i from 0<=i<=n-1: > ch = text[i] > > def f_2(unicode text): > cdef int i, j > cdef int n = len(text) > cdef Py_UCS4 ch > > for j from 0<=j<=1000000: > for i from 0<=i<=n-1: > ch = unicode_char(text, i) > > def test_1(): > f_1(text) > > def test_2(): > f_2(text) > > Timing results: > > (py33) zbook:mytests $ python3.3 -m timeit -n 100 -r 10 -s "from > mytests.unicode_index import test_1" "test_1()" > 100 loops, best of 10: 89 msec per loop > (py33) zbook:mytests $ python3.3 -m timeit -n 100 -r 10 -s "from > mytests.unicode_index import test_2" "test_2()" > 100 loops, best of 10: 46.1 msec per loop > > in setup.py globally: > > "boundscheck": False > "wraparound": False > "nonecheck": False > For the sake of clarity I would like to add the following... This optimization is for the case when both `boundscheck(False)` and `wraparound(False)` is applied. Otherwise default path of evaluation (__Pyx_GetItemInt_Unicode) is applied. This allows to write unicode text parsing code almost at C speed mostly in python (+ .pxd defintions). Zaur Shibzukhov From brad.froehle at gmail.com Thu Feb 28 20:00:18 2013 From: brad.froehle at gmail.com (Bradley M. Froehle) Date: Thu, 28 Feb 2013 11:00:18 -0800 Subject: [Cython] Class methods returning C++ class references are not dealt with correctly? In-Reply-To: <1362076709.2913.102.camel@newpride> References: <1362056288.2913.28.camel@newpride> <1362076709.2913.102.camel@newpride> Message-ID: Hey Yury: Yes, you are right. I was thinking this was a function and not a method. As an even ickier workaround: #define TokenStack_top_p(token_stack) &token_stack->top() cdef extern from "............": Token* TokenStack_top_p(TokenStack*) except + cdef Token* tok = TokenStack_top_p(self.pEngine.OStack) -Brad On Thu, Feb 28, 2013 at 10:38 AM, Yury V. Zaytsev wrote: > Hi Brad, > > On Thu, 2013-02-28 at 08:01 -0800, Bradley M. Froehle wrote: > > > > cdef extern from "test.h": > > int* foo2ptr "&foo" () > > > > cdef int *x = foo2ptr() > > Thank you for this interesting suggestion, but I must be missing > something, because when I do the following: > > cdef extern from "tokenstack.h": > cppclass TokenStack: > Token* top "Token&" () except + > > cdef Token* tok = self.pEngine.OStack.top() > > I end up with the following generated code, which, of course, doesn't > compile: > > Token *__pyx_t_5; > __pyx_t_5 = __pyx_v_self->pEngine->OStack.Token&(); > > whereas, I'd like to see generated this: > > Token *__pyx_t_5; > __pyx_t_5 = __pyx_v_self->pEngine->OStack->top(); > > Any ideas? > > -- > Sincerely yours, > Yury V. Zaytsev > > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Feb 28 20:12:09 2013 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 28 Feb 2013 19:12:09 +0000 Subject: [Cython] Be more forgiving about memoryview strides In-Reply-To: References: <1362064397.2663.14.camel@sebastian-laptop> Message-ID: On Thu, Feb 28, 2013 at 5:50 PM, Robert Bradshaw wrote: > On Thu, Feb 28, 2013 at 7:13 AM, Sebastian Berg > wrote: >> Hey, >> >> Maybe someone here already saw it (I don't have a track account, or I >> would just create a ticket), but it would be nice if Cython was more >> forgiving about contiguous requirements on strides. In the future this >> would make it easier for numpy to go forward with changing the >> contiguous flags to be more reasonable for its purpose, and second also >> to allow old (and maybe for the moment remaining) corner cases in numpy >> to slip past (as well as possibly the same for other programs...). An >> example is (see also https://github.com/numpy/numpy/issues/2956 and the >> PR linked there for more details): >> >> def add_one(array): >> cdef double[::1] a = array >> a[0] += 1. >> return array >> >> giving: >> >>>>> add_one(np.ascontiguousarray(np.arange(10.)[::100])) >> ValueError: Buffer and memoryview are not contiguous in the same >> dimension. >> >> This could easily be changed if MemoryViews check the strides as "can be >> interpreted as contiguous". That means that if shape[i] == 1, then >> strides[i] are arbitrary (you can just change them if you like). This is >> also the case for 0-sized arrays, which are arguably always contiguous, >> no matter their strides are! > > I was under the impression that the primary value for contiguous is > that it a foo[::1] can be interpreted as a foo*. Letting strides be > arbitrary completely breaks this, right? Nope. The natural definition of "C contiguous" is "the array entries are arranged in memory in the same way they would be if they were a multidimensional C array" (i.e., what you said.) But it turns out that this is *not* the definition that numpy and cython use! The issue is that the above definition is a constraint on the actual locations of items in memory, i.e., given a shape, it tells you that for every index, (a) sum(index * strides) == sum(index * cumprod(shape[::-1])[::-1] * itemsize) Obviously this equality holds if (b) strides == cumprod(shape[::-1])[::-1] * itemsize (Or for F-contiguity, we have (b') strides == cumprod(shape) * itemsize ) (a) is the natural definition of "C contiguous". (b) is the definition of "C contiguous" used by numpy and cython. (b) implies (a). But (a) does not imply (b), i.e., there are arrays that are C-contiguous which numpy and cython think are discontiguous. (Also in numpy there are some weird cases where numpy accidentally uses the correct definition, I think, which is the point of Sebastian's example.) In particular, if shape[i] == 1, then the value of stride[i] really should be irrelevant to judging contiguity, because the only thing you can do with strides[i] is multiply it by index[i], and if shape[i] == 1 then index[i] is always 0. So an array of int8's with shape = (10, 1), strides = (1, 73) is contiguous according to (a), but not according to (b). Also if shape[i] is 0 for any i, then the entire contents of the strides array becomes irrelevant to judging contiguity; all zero-sized arrays are contiguous according to (a), but not (b). (This is really annoying for numpy because given, say, a column vector with shape (n, 1), it is impossible to be both C- and F-contiguous according to the (b)-style definition. But people expect expect various operations to preserve C versus F contiguity, so there are heuristics in numpy that try to guess whether various result arrays should pretend to be C- or F-contiguous, and we don't even have a consistent idea of what it would mean for this code to be working correctly, never mind test it and keep it working. OTOH if we just fix numpy to use the (a) definition, then it turns out a bunch of third-party code breaks, like, for example, cython.) -n From stefan_ml at behnel.de Thu Feb 28 20:27:08 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 28 Feb 2013 20:27:08 +0100 Subject: [Cython] About IndexNode and unicode[index] In-Reply-To: References: Message-ID: <512FAF8C.7020008@behnel.de> ZS, 28.02.2013 19:31: > 2013/2/28 ZS: >> Looking into IndexNode class in ExprNode.py I have seen a possibility >> for addition of more fast code path for unicode[index] as it done in >> method `generate_setitem_code` in case of lists. >> >> This is files for evaluation of performance difference: >> >> #### unicode_index.h >> >> /* This is striped version of __Pyx_GetItemInt_Unicode_Fast */ >> #include "unicodeobject.h" >> >> static inline Py_UCS4 unicode_char(PyObject* ustring, Py_ssize_t i); >> >> static inline Py_UCS4 unicode_char(PyObject* ustring, Py_ssize_t i) { >> #if CYTHON_PEP393_ENABLED >> if (PyUnicode_READY(ustring) < 0) return (Py_UCS4)-1; >> #endif >> return __Pyx_PyUnicode_READ_CHAR(ustring, i); >> } Sure, looks ok. >> ##### unicode_index.pyx >> >> # coding: utf-8 >> >> cdef extern from 'unicode_index.h': >> inline Py_UCS4 unicode_char(unicode ustring, int i) >> >> cdef unicode text = u"abcdefghigklmnopqrstuvwxyzabcdefghigklmnopqrstuvwxyz" >> >> def f_1(unicode text): >> cdef int i, j >> cdef int n = len(text) >> cdef Py_UCS4 ch >> >> for j from 0<=j<=1000000: Personally, I find a range() loop much easier to read than this beast. >> for i from 0<=i<=n-1: >> ch = text[i] >> >> def f_2(unicode text): >> cdef int i, j >> cdef int n = len(text) >> cdef Py_UCS4 ch >> >> for j from 0<=j<=1000000: >> for i from 0<=i<=n-1: >> ch = unicode_char(text, i) >> >> def test_1(): >> f_1(text) >> >> def test_2(): >> f_2(text) >> >> Timing results: >> >> (py33) zbook:mytests $ python3.3 -m timeit -n 100 -r 10 -s "from >> mytests.unicode_index import test_1" "test_1()" >> 100 loops, best of 10: 89 msec per loop >> (py33) zbook:mytests $ python3.3 -m timeit -n 100 -r 10 -s "from >> mytests.unicode_index import test_2" "test_2()" >> 100 loops, best of 10: 46.1 msec per loop I seriously doubt that this translates to similar results in real-world code. In the second example above, the C compiler should be able to remove a lot of code, certainly including the useless character read. Maybe even the loops, if it can determine that PyUnicode_READY() will always return the same result. So you're almost certainly not benchmarking what you think you are. >> in setup.py globally: >> >> "boundscheck": False >> "wraparound": False >> "nonecheck": False >> > For the sake of clarity I would like to add the following... This > optimization is for the case when both `boundscheck(False)` and > `wraparound(False)` is applied. Otherwise default path of evaluation > (__Pyx_GetItemInt_Unicode) is applied. > > This allows to write unicode text parsing code almost at C speed > mostly in python (+ .pxd defintions). I suggest simply adding a constant flag argument to the existing function that states if checking should be done or not. Inlining will let the C compiler drop the corresponding code, which may or may nor make it a little faster. Stefan From szport at gmail.com Thu Feb 28 21:07:03 2013 From: szport at gmail.com (ZS) Date: Thu, 28 Feb 2013 23:07:03 +0300 Subject: [Cython] About IndexNode and unicode[index] In-Reply-To: <512FAF8C.7020008@behnel.de> References: <512FAF8C.7020008@behnel.de> Message-ID: 2013/2/28 Stefan Behnel : >> This allows to write unicode text parsing code almost at C speed >> mostly in python (+ .pxd defintions). > > I suggest simply adding a constant flag argument to the existing function > that states if checking should be done or not. Inlining will let the C > compiler drop the corresponding code, which may or may nor make it a little > faster. It would be great. To be sure I change the tests: unicode_index.h ----------------------- #include "unicodeobject.h" static inline Py_UCS4 unicode_char(PyObject* ustring, Py_ssize_t i); static inline Py_UCS4 unicode_char(PyObject* ustring, Py_ssize_t i) { #if CYTHON_PEP393_ENABLED if (PyUnicode_READY(ustring) < 0) return (Py_UCS4)-1; #endif return __Pyx_PyUnicode_READ_CHAR(ustring, i); } static inline Py_UCS4 unicode_char2(PyObject* ustring, Py_ssize_t i, int flag); static inline Py_UCS4 unicode_char2(PyObject* ustring, Py_ssize_t i, int flag) { Py_ssize_t length; #if CYTHON_PEP393_ENABLED if (PyUnicode_READY(ustring) < 0) return (Py_UCS4)-1; #endif if (flag) { length = __Pyx_PyUnicode_GET_LENGTH(ustring); if ((0 <= i) & (i < length)) { return __Pyx_PyUnicode_READ_CHAR(ustring, i); } else if ((-length <= i) & (i < 0)) { return __Pyx_PyUnicode_READ_CHAR(ustring, i + length); } else { PyErr_SetString(PyExc_IndexError, "string index out of range"); return (Py_UCS4)-1; } } else { return __Pyx_PyUnicode_READ_CHAR(ustring, i); } } unicode_index.pyx -------------------------- cdef extern from 'unicode_index.h': inline Py_UCS4 unicode_char(unicode ustring, int i) inline Py_UCS4 unicode_char2(unicode ustring, int i, int flag) cdef unicode text = u"abcdefghigklmnopqrstuvwxyzabcdefghigklmnopqrstuvwxyz" cdef long f_1(unicode text): cdef int i, j cdef int n = len(text) cdef Py_UCS4 ch cdef long S = 0 for j in range(1000000): for i in range(n): ch = text[i] S += ch * j return S cdef long f_2(unicode text): cdef int i, j cdef int n = len(text) cdef Py_UCS4 ch cdef long S = 0 for j in range(1000000): for i in range(n): ch = unicode_char(text, i) S += ch * j return S cdef long f_3(unicode text): cdef int i, j cdef int n = len(text) cdef Py_UCS4 ch cdef long S = 0 for j in range(1000000): for i in range(n): ch = unicode_char2(text, i, 0) S += ch * j return S def test_1(): f_1(text) def test_2(): f_2(text) def test_3(): f_3(text) Here are timings: (py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from mytests.unicode_index import test_1" "test_1()" 50 loops, best of 5: 152 msec per loop (py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from mytests.unicode_index import test_2" "test_2()" 50 loops, best of 5: 86.5 msec per loop (py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from mytests.unicode_index import test_3" "test_3()" 50 loops, best of 5: 86.5 msec per loop So your suggestion would be preferable. From stefan_ml at behnel.de Thu Feb 28 22:16:09 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 28 Feb 2013 22:16:09 +0100 Subject: [Cython] About IndexNode and unicode[index] In-Reply-To: References: <512FAF8C.7020008@behnel.de> Message-ID: <512FC919.4010702@behnel.de> ZS, 28.02.2013 21:07: > 2013/2/28 Stefan Behnel: >>> This allows to write unicode text parsing code almost at C speed >>> mostly in python (+ .pxd defintions). >> >> I suggest simply adding a constant flag argument to the existing function >> that states if checking should be done or not. Inlining will let the C >> compiler drop the corresponding code, which may or may nor make it a little >> faster. > > static inline Py_UCS4 unicode_char2(PyObject* ustring, Py_ssize_t i, int flag) { > Py_ssize_t length; > #if CYTHON_PEP393_ENABLED > if (PyUnicode_READY(ustring) < 0) return (Py_UCS4)-1; > #endif > if (flag) { > length = __Pyx_PyUnicode_GET_LENGTH(ustring); > if ((0 <= i) & (i < length)) { > return __Pyx_PyUnicode_READ_CHAR(ustring, i); > } else if ((-length <= i) & (i < 0)) { > return __Pyx_PyUnicode_READ_CHAR(ustring, i + length); > } else { > PyErr_SetString(PyExc_IndexError, "string index out of range"); > return (Py_UCS4)-1; > } > } else { > return __Pyx_PyUnicode_READ_CHAR(ustring, i); > } > } I think you could even pass in two flags, one for wraparound and one for boundscheck, and then just evaluate them appropriately in the existing "if" tests above. That should allow both features to be supported independently in a fast way. > Here are timings: > > (py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from > mytests.unicode_index import test_1" "test_1()" > 50 loops, best of 5: 152 msec per loop > (py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from > mytests.unicode_index import test_2" "test_2()" > 50 loops, best of 5: 86.5 msec per loop > (py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from > mytests.unicode_index import test_3" "test_3()" > 50 loops, best of 5: 86.5 msec per loop > > So your suggestion would be preferable. Nice. Yes, looks like it' worth it. Stefan