From wesmckinn at gmail.com Tue May 1 00:05:34 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Mon, 30 Apr 2012 18:05:34 -0400 Subject: [Cython] Wacky idea: proper macros In-Reply-To: References:

<4F9EFABD.5080408@astro.uio.no> <339e72d3-3ef5-44d9-ac89-79dd494cc460@email.android.com> Message-ID: On Mon, Apr 30, 2012 at 5:36 PM, William Stein wrote: > On Mon, Apr 30, 2012 at 2:32 PM, Dag Sverre Seljebotn > wrote: >> >> >> Wes McKinney wrote: >> >>>On Mon, Apr 30, 2012 at 4:55 PM, Nathaniel Smith wrote: >>>> On Mon, Apr 30, 2012 at 9:49 PM, Dag Sverre Seljebotn >>>> wrote: >>>>> JIT is really the way to go. It is one thing that a JIT could >>>optimize the >>>>> case where you pass a callback to a function and inline it run-time. >>>But >>>>> even if it doesn't get that fancy, it'd be great to just be able to >>>write >>>>> something like "cython.eval(s)" and have that be compiled (I guess >>>you could >>>>> do that now, but the sheer overhead of the C compiler and all the >>>.so files >>>>> involved means nobody would sanely use that as the main way of >>>stringing >>>>> together something like pandas). >>>> >>>> The overhead of running a fully optimizing compiler over pandas on >>>> every import is pretty high, though. You can come up with various >>>> caching mechanisms, but they all mean introducing some kind of >>>compile >>>> time/run time distinction. So I'm skeptical we'll just be able to get >>>> rid of that concept, even in a brave new LLVM/PyPy/Julia world. >>>> >>>> -- Nathaniel >>>> _______________________________________________ >>>> cython-devel mailing list >>>> cython-devel at python.org >>>> http://mail.python.org/mailman/listinfo/cython-devel >>> >>>I'd be perfectly OK with just having to compile pandas's "data engine" >>>and generate loads of C/C++ code. JIT-compiling little array >>>expressions would be cool too. I've got enough of an itch that I might >>>have to start scratching pretty soon. >> >> I think a good start is: >> >> Myself I'd look into just using Jinja2 to generate all the Cython code, rather than those horrible Python interpolated strings...that should give you something that's at least rather pleasant for you to work with once you are used to it (even if it is a bit horrible to newcomers to the code base). >> >> You can even check in the generated sources. >> >> And we've discussed letting cython be smart with templating languages and error report on a line in the original template, such features will certainly accepted once somebody codes it up. >> >> ?(I can give you me breakdown of how I eliminate other templating languages than Jinja2 for this purpose tomorrow if you are interested). > > Can you point us to a good example of you using jinja2 for this purpose? > > I'm a big fan of Jinja2 in general (e.g., for HTML)... > >> >> Dag >> >>>_______________________________________________ >>>cython-devel mailing list >>>cython-devel at python.org >>>http://mail.python.org/mailman/listinfo/cython-devel >> >> -- >> Sent from my Android phone with K-9 Mail. Please excuse my brevity. >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > > > -- > William Stein > Professor of Mathematics > University of Washington > http://wstein.org > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel I agree, it'd be cool to see an example or two. I have some ideas for a mini DSL / code-generation framework that might suit my needs; jinja2 might be then the right tool for doing the templating / codegen. If I could cut the amount of Cython code I have in half (and make it easier to write simple functions, which are currently more than 50% boilerplate) that would be a big win for me. - Wes From ask at linet.dk Tue May 1 09:53:17 2012 From: ask at linet.dk (Ask F. Jakobsen) Date: Tue, 1 May 2012 09:53:17 +0200 (CEST) Subject: [Cython] Code generated for the expression int(x)+1 In-Reply-To: <2012339557.72.1335857921632.JavaMail.root@pippin.linet.dk> Message-ID: <1359224693.78.1335858797533.JavaMail.root@pippin.linet.dk> Hi all, I am having a simple performance problem that can be resolved by splitting up an expression in two lines. I don't know if it is a bug or I am missing something. The piece of code below is translated to slow code 1) cdef int i i=int(x)+1 whereas the code below is translated to fast code 2) cdef int i i=int(x) i=i+1 Snippet of generated code by cython 1) /* "test.pyx":4 * cdef double x=3.2 * cdef int i * i=int(x)+1 # <<<<<<<<<<<<<< * return i * */ __pyx_t_1 = PyFloat_FromDouble(__pyx_v_x); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_t_2 = PyTuple_New(1); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_2)); PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_1); __Pyx_GIVEREF(__pyx_t_1); __pyx_t_1 = 0; __pyx_t_1 = PyObject_Call(((PyObject *)((PyObject*)(&PyInt_Type))), ((PyObject *)__pyx_t_2), NULL); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __Pyx_DECREF(((PyObject *)__pyx_t_2)); __pyx_t_2 = 0; __pyx_t_2 = PyNumber_Add(__pyx_t_1, __pyx_int_1); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; __pyx_t_3 = __Pyx_PyInt_AsInt(__pyx_t_2); if (unlikely((__pyx_t_3 == (int)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; __pyx_v_i = __pyx_t_3; 2) /* "test.pyx":11 * cdef double x=3.2 * cdef int i * i=int(x) # <<<<<<<<<<<<<< * i=i+1 * return i */ __pyx_v_i = ((int)__pyx_v_x); /* "test.pyx":12 * cdef int i * i=int(x) * i=i+1 # <<<<<<<<<<<<<< * return i * */ __pyx_v_i = (__pyx_v_i + 1); I am using Cython-0.15.1 Best regards, Ask From stefan_ml at behnel.de Tue May 1 10:28:58 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 01 May 2012 10:28:58 +0200 Subject: [Cython] Code generated for the expression int(x)+1 In-Reply-To: <1359224693.78.1335858797533.JavaMail.root@pippin.linet.dk> References: <1359224693.78.1335858797533.JavaMail.root@pippin.linet.dk> Message-ID: <4F9F9ECA.6010102@behnel.de> Ask F. Jakobsen, 01.05.2012 09:53: > I am having a simple performance problem that can be resolved by splitting up an expression in two lines. I don't know if it is a bug or I am missing something. > > The piece of code below is translated to slow code > > 1) > cdef int i > i=int(x)+1 What you are saying here is: Convert x (known to be a C double) to an arbitrary size Python integer value, add 1, convert it to a C int and assign it to i. > whereas the code below is translated to fast code > > 2) > cdef int i > i=int(x) > i=i+1 This means: Convert x (known to be a C double) to an arbitrary size Python integer value, convert that to a C int and assign it to i, then add 1 and assign the result to i. In the first case, Cython cannot safely assume that the result of the int() conversion will fit into a C int and will therefore evaluate the expression in Python space. Note that the "+1" only hits a specific case where it looks safe, if you had written "int(x) // 200", this decision would make a lot more sense, because the intermediate result of int(x) could really be larger than a C int, even though the result of the division will have to fit into one (or will be made to fit, because you say so). In the second case, you explicitly tell Cython that the result of the int() conversion will fit into a C int and that *you* accept the responsibility for any overflows, so Cython can safely optimise the Python coercion away and reduce the int() call to a bare C cast from double to int. You can get the same result by writing down the cast yourself. Stefan From d.s.seljebotn at astro.uio.no Tue May 1 10:29:04 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 01 May 2012 10:29:04 +0200 Subject: [Cython] Wacky idea: proper macros In-Reply-To: References:

<4F9EFABD.5080408@astro.uio.no> <339e72d3-3ef5-44d9-ac89-79dd494cc460@email.android.com> Message-ID: <4F9F9ED0.2010905@astro.uio.no> On 04/30/2012 11:36 PM, William Stein wrote: > On Mon, Apr 30, 2012 at 2:32 PM, Dag Sverre Seljebotn > wrote: >> >> >> Wes McKinney wrote: >> >>> On Mon, Apr 30, 2012 at 4:55 PM, Nathaniel Smith wrote: >>>> On Mon, Apr 30, 2012 at 9:49 PM, Dag Sverre Seljebotn >>>> wrote: >>>>> JIT is really the way to go. It is one thing that a JIT could >>> optimize the >>>>> case where you pass a callback to a function and inline it run-time. >>> But >>>>> even if it doesn't get that fancy, it'd be great to just be able to >>> write >>>>> something like "cython.eval(s)" and have that be compiled (I guess >>> you could >>>>> do that now, but the sheer overhead of the C compiler and all the >>> .so files >>>>> involved means nobody would sanely use that as the main way of >>> stringing >>>>> together something like pandas). >>>> >>>> The overhead of running a fully optimizing compiler over pandas on >>>> every import is pretty high, though. You can come up with various >>>> caching mechanisms, but they all mean introducing some kind of >>> compile >>>> time/run time distinction. So I'm skeptical we'll just be able to get >>>> rid of that concept, even in a brave new LLVM/PyPy/Julia world. >>>> >>>> -- Nathaniel >>>> _______________________________________________ >>>> cython-devel mailing list >>>> cython-devel at python.org >>>> http://mail.python.org/mailman/listinfo/cython-devel >>> >>> I'd be perfectly OK with just having to compile pandas's "data engine" >>> and generate loads of C/C++ code. JIT-compiling little array >>> expressions would be cool too. I've got enough of an itch that I might >>> have to start scratching pretty soon. >> >> I think a good start is: >> >> Myself I'd look into just using Jinja2 to generate all the Cython code, rather than those horrible Python interpolated strings...that should give you something that's at least rather pleasant for you to work with once you are used to it (even if it is a bit horrible to newcomers to the code base). >> >> You can even check in the generated sources. >> >> And we've discussed letting cython be smart with templating languages and error report on a line in the original template, such features will certainly accepted once somebody codes it up. >> >> (I can give you me breakdown of how I eliminate other templating languages than Jinja2 for this purpose tomorrow if you are interested). > > Can you point us to a good example of you using jinja2 for this purpose? Sure, I just needed some sleep... I only use it for C code, haven't used it for Cython so far (I tend to write things in C and wrap it in Cython). 1) https://github.com/dagss/elemental4py/blob/master/src/elemental_wrapper.cpp.in (work-in-progress) Here I use Jinja2 to write a C wrapper around Elemental (Elemental is a library for dense linear algebra over MPI). The C++ library is a heavy user of templates, I replace the templates with run-time dispatches using if-tests, so that rather than "DistMatrix" you have an elem_matrix struct with ELEM_DOUBLE, ELEM_MC, ELEM_MR. 2) https://github.com/wavemoth/wavemoth/blob/master/src/legendre_transform.c.in This is a numerical kernel where I do loop unrolling etc. using metaprogramming (with Tempita, not Jinja2). https://github.com/wavemoth/wavemoth/blob/cuda/wavemoth/cuda/legendre_transform.cu.in 3) https://github.com/wavemoth/wavemoth/blob/cuda/wavemoth/cuda/legendre_transform.cu.in Numerical kernel in templated CUDA using Tempita. On templating languages I tried) I've scanned through a few and actually tried Tempita, Mako, Jinja2. The features I need: - Pythonic syntax and ability to embed arbitrary Python code - A "call-block", such as this {% call catch('A->grid->ctx') %} BODY {% endcall %} i.e. in Jinja 2, one of the arguments to the function "catch" here is "caller", which when called invokes the body (and can be called multiple times with different arguments) I started out with Tempita because it's so simple to ship, but the lack of a call-block construct + the inability to break lines where I wanted drove me crazy. Then I tried Mako, because it has the largest set of features, but the syntax was simply too gruesome. I first tried to ignore this, but simply couldn't, it made my code totally unreadable. Finally, Jinja2 has most of what I need. Slight disadvantage is it tries to be "pure" and not allow too much arbitrary Python, Ideally what I'd like is something like Tempita but developed further to allow line-breaks and call-blocks, but lacking that I use Jinja2. I don't remember why I didn't like Cheetah (perhaps it doesn't do call-blocks?) Dag From d.s.seljebotn at astro.uio.no Tue May 1 10:32:48 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 01 May 2012 10:32:48 +0200 Subject: [Cython] Wacky idea: proper macros In-Reply-To: <4F9F9ED0.2010905@astro.uio.no> References:

<4F9EFABD.5080408@astro.uio.no> <339e72d3-3ef5-44d9-ac89-79dd494cc460@email.android.com> <4F9F9ED0.2010905@astro.uio.no> Message-ID: <4F9F9FB0.7060600@astro.uio.no> On 05/01/2012 10:29 AM, Dag Sverre Seljebotn wrote: > On 04/30/2012 11:36 PM, William Stein wrote: >> On Mon, Apr 30, 2012 at 2:32 PM, Dag Sverre Seljebotn >> wrote: >>> >>> >>> Wes McKinney wrote: >>> >>>> On Mon, Apr 30, 2012 at 4:55 PM, Nathaniel Smith wrote: >>>>> On Mon, Apr 30, 2012 at 9:49 PM, Dag Sverre Seljebotn >>>>> wrote: >>>>>> JIT is really the way to go. It is one thing that a JIT could >>>> optimize the >>>>>> case where you pass a callback to a function and inline it run-time. >>>> But >>>>>> even if it doesn't get that fancy, it'd be great to just be able to >>>> write >>>>>> something like "cython.eval(s)" and have that be compiled (I guess >>>> you could >>>>>> do that now, but the sheer overhead of the C compiler and all the >>>> .so files >>>>>> involved means nobody would sanely use that as the main way of >>>> stringing >>>>>> together something like pandas). >>>>> >>>>> The overhead of running a fully optimizing compiler over pandas on >>>>> every import is pretty high, though. You can come up with various >>>>> caching mechanisms, but they all mean introducing some kind of >>>> compile >>>>> time/run time distinction. So I'm skeptical we'll just be able to get >>>>> rid of that concept, even in a brave new LLVM/PyPy/Julia world. >>>>> >>>>> -- Nathaniel >>>>> _______________________________________________ >>>>> cython-devel mailing list >>>>> cython-devel at python.org >>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>> >>>> I'd be perfectly OK with just having to compile pandas's "data engine" >>>> and generate loads of C/C++ code. JIT-compiling little array >>>> expressions would be cool too. I've got enough of an itch that I might >>>> have to start scratching pretty soon. >>> >>> I think a good start is: >>> >>> Myself I'd look into just using Jinja2 to generate all the Cython >>> code, rather than those horrible Python interpolated strings...that >>> should give you something that's at least rather pleasant for you to >>> work with once you are used to it (even if it is a bit horrible to >>> newcomers to the code base). >>> >>> You can even check in the generated sources. >>> >>> And we've discussed letting cython be smart with templating languages >>> and error report on a line in the original template, such features >>> will certainly accepted once somebody codes it up. >>> >>> (I can give you me breakdown of how I eliminate other templating >>> languages than Jinja2 for this purpose tomorrow if you are interested). >> >> Can you point us to a good example of you using jinja2 for this purpose? > > Sure, I just needed some sleep... > > I only use it for C code, haven't used it for Cython so far (I tend to > write things in C and wrap it in Cython). > > 1) > > https://github.com/dagss/elemental4py/blob/master/src/elemental_wrapper.cpp.in > > > (work-in-progress) Here I use Jinja2 to write a C wrapper around > Elemental (Elemental is a library for dense linear algebra over MPI). > The C++ library is a heavy user of templates, I replace the templates > with run-time dispatches using if-tests, so that rather than > "DistMatrix" you have an elem_matrix struct with > ELEM_DOUBLE, ELEM_MC, ELEM_MR. > > 2) > > https://github.com/wavemoth/wavemoth/blob/master/src/legendre_transform.c.in > > > This is a numerical kernel where I do loop unrolling etc. using > metaprogramming (with Tempita, not Jinja2). > > https://github.com/wavemoth/wavemoth/blob/cuda/wavemoth/cuda/legendre_transform.cu.in > > > 3) > > https://github.com/wavemoth/wavemoth/blob/cuda/wavemoth/cuda/legendre_transform.cu.in > > > Numerical kernel in templated CUDA using Tempita. > > On templating languages I tried) > > I've scanned through a few and actually tried Tempita, Mako, Jinja2. > > The features I need: > > - Pythonic syntax and ability to embed arbitrary Python code > > - A "call-block", such as this > > {% call catch('A->grid->ctx') %} > BODY > {% endcall %} > > > i.e. in Jinja 2, one of the arguments to the function "catch" here is > "caller", which when called invokes the body (and can be called multiple > times with different arguments) > > I started out with Tempita because it's so simple to ship, but the lack > of a call-block construct + the inability to break lines where I wanted > drove me crazy. > > Then I tried Mako, because it has the largest set of features, but the > syntax was simply too gruesome. I first tried to ignore this, but simply > couldn't, it made my code totally unreadable. > > Finally, Jinja2 has most of what I need. Slight disadvantage is it tries > to be "pure" and not allow too much arbitrary Python, [sorry:] Slight disadvantage is it tries to be "pure" and not allow too much arbitrary Python, but one can work around that by using an auxiliary Python module and pass that Python module to the template when instantiating it -- so one kind of can use arbitrary Python in the templating process, one just need to edit separate files. (Which is perhaps better -- I'm unable to make up my mind on such trivial issues.) Dag > > Ideally what I'd like is something like Tempita but developed further to > allow line-breaks and call-blocks, but lacking that I use Jinja2. > > I don't remember why I didn't like Cheetah (perhaps it doesn't do > call-blocks?) > > Dag From stefan_ml at behnel.de Tue May 1 11:21:12 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 01 May 2012 11:21:12 +0200 Subject: [Cython] [cython-users] Conditional import in pure Python mode In-Reply-To: References:

Message-ID: <4F9FAB08.2050506@behnel.de> >>> On 29 April 2012 01:33, Ian Bell wrote: >>>> idiom like >>>> >>>> if cython.compiled: >>>> cython.import('from libc.math cimport sin') >>>> else: >>>> from math import sin Actually, in this particular case, I would even accept a solution that special cases the "math" module internally by automatically cimporting libc.math as an override (or rather an adapted version as plain "math.pxd"). This CEP describes a general approach: http://wiki.cython.org/enhancements/overlaypythonmodules It's partly outdated, so things may have become easier these days. Stefan From stefan_ml at behnel.de Tue May 1 21:14:34 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 01 May 2012 21:14:34 +0200 Subject: [Cython] [cython-users] Conditional import in pure Python mode In-Reply-To: References:

<4F9FAB08.2050506@behnel.de> Message-ID: <4FA0361A.4040005@behnel.de> Ian Bell, 01.05.2012 15:50: > On Tue, May 1, 2012 at 9:21 PM, Stefan Behnel wrote: >>>>> On 29 April 2012 01:33, Ian Bell wrote: >>>>>> idiom like >>>>>> >>>>>> if cython.compiled: >>>>>> cython.import('from libc.math cimport sin') >>>>>> else: >>>>>> from math import sin >> >> Actually, in this particular case, I would even accept a solution that >> special cases the "math" module internally by automatically cimporting >> libc.math as an override (or rather an adapted version as plain >> "math.pxd"). >> >> This CEP describes a general approach: >> >> http://wiki.cython.org/enhancements/overlaypythonmodules >> >> It's partly outdated, so things may have become easier these days. > > That is exactly what I was looking for. If we could implement that, it > would solve all my problems. It would meet all my needs - on this front at > least. There are two things to do here: 1) Write up a math.pxd that contains declarations equivalent to Python's math module. Note that this may not be entirely trivial because the math module does some error handling and type special casing under the hood. Some of this may still be required for the C level equivalents, although the type special casing would better be done by overriding function signatures using this feature: http://docs.cython.org/src/userguide/external_C_code.html#resolving-naming-conflicts-c-name-specifications Basically, you would declare two (or more) function signatures under the same name, but with different C names. 2) Use math.pxd as an override for the math module. I'm not sure yet how that would best be made to work, but it shouldn't be all that complex. It already works (mostly?) for numpy.pxd, for example, although that's done explicitly in user code. I think we should start with 2) to see how to get this to work in general, before we put too much work into 1). Could you sign up for the cython-devel mailing list please, so that we can coordinate the work there? Stefan From stefan_ml at behnel.de Tue May 1 21:22:21 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 01 May 2012 21:22:21 +0200 Subject: [Cython] Conditional import in pure Python mode In-Reply-To: <4FA0361A.4040005@behnel.de> References:

<4F9FAB08.2050506@behnel.de> <4FA0361A.4040005@behnel.de> Message-ID: <4FA037ED.90205@behnel.de> Stefan Behnel, 01.05.2012 21:14: > 2) Use math.pxd as an override for the math module. I'm not sure yet how > that would best be made to work, but it shouldn't be all that complex. It > already works (mostly?) for numpy.pxd, for example, although that's done > explicitly in user code. BTW, I think it would be helpful to make the numpy.pxd cimport automatic as well whenever someone does "import numpy" and friends, right? Stefan From markflorisson88 at gmail.com Tue May 1 21:39:12 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 1 May 2012 20:39:12 +0100 Subject: [Cython] Conditional import in pure Python mode In-Reply-To: <4FA037ED.90205@behnel.de> References:

<4F9FAB08.2050506@behnel.de> <4FA0361A.4040005@behnel.de> <4FA037ED.90205@behnel.de> Message-ID: On 1 May 2012 20:22, Stefan Behnel wrote: > Stefan Behnel, 01.05.2012 21:14: >> 2) Use math.pxd as an override for the math module. I'm not sure yet how >> that would best be made to work, but it shouldn't be all that complex. It >> already works (mostly?) for numpy.pxd, for example, although that's done >> explicitly in user code. > > BTW, I think it would be helpful to make the numpy.pxd cimport automatic as > well whenever someone does "import numpy" and friends, right? > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel I'm not sure, it means the user has to have numpy development headers. From faltet at pytables.org Tue May 1 21:49:40 2012 From: faltet at pytables.org (Francesc Alted) Date: Tue, 01 May 2012 14:49:40 -0500 Subject: [Cython] Conditional import in pure Python mode In-Reply-To: References:

<4F9FAB08.2050506@behnel.de> <4FA0361A.4040005@behnel.de> <4FA037ED.90205@behnel.de> Message-ID: <4FA03E54.1020406@pytables.org> On 5/1/12 2:39 PM, mark florisson wrote: > On 1 May 2012 20:22, Stefan Behnel wrote: >> Stefan Behnel, 01.05.2012 21:14: >>> 2) Use math.pxd as an override for the math module. I'm not sure yet how >>> that would best be made to work, but it shouldn't be all that complex. It >>> already works (mostly?) for numpy.pxd, for example, although that's done >>> explicitly in user code. >> BTW, I think it would be helpful to make the numpy.pxd cimport automatic as >> well whenever someone does "import numpy" and friends, right? >> >> Stefan >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > I'm not sure, it means the user has to have numpy development headers. But if the user is going to compile a NumPy application, it sounds like strange to me that she should not be required to install the NumPy development headers, right? -- Francesc Alted From stefan_ml at behnel.de Tue May 1 21:51:18 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 01 May 2012 21:51:18 +0200 Subject: [Cython] Conditional import in pure Python mode In-Reply-To: References:

<4F9FAB08.2050506@behnel.de> <4FA0361A.4040005@behnel.de> <4FA037ED.90205@behnel.de> Message-ID: <4FA03EB6.6010801@behnel.de> mark florisson, 01.05.2012 21:39: > On 1 May 2012 20:22, Stefan Behnel wrote: >> Stefan Behnel, 01.05.2012 21:14: >>> 2) Use math.pxd as an override for the math module. I'm not sure yet how >>> that would best be made to work, but it shouldn't be all that complex. It >>> already works (mostly?) for numpy.pxd, for example, although that's done >>> explicitly in user code. >> >> BTW, I think it would be helpful to make the numpy.pxd cimport automatic as >> well whenever someone does "import numpy" and friends, right? > > I'm not sure, it means the user has to have numpy development headers. Hmm, right. What about making it an explicit compile time option then? Something like # cython: override_modules = math,numpy Or should we go for an opt-out? # cython: python_modules = math,numpy Sounds like it would hit the more common case by default. Stefan From stefan_ml at behnel.de Tue May 1 22:02:28 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 01 May 2012 22:02:28 +0200 Subject: [Cython] Conditional import in pure Python mode In-Reply-To: <4FA03E54.1020406@pytables.org> References:

<4F9FAB08.2050506@behnel.de> <4FA0361A.4040005@behnel.de> <4FA037ED.90205@behnel.de> <4FA03E54.1020406@pytables.org> Message-ID: <4FA04154.5020402@behnel.de> Francesc Alted, 01.05.2012 21:49: > On 5/1/12 2:39 PM, mark florisson wrote: >> On 1 May 2012 20:22, Stefan Behnel wrote: >>> Stefan Behnel, 01.05.2012 21:14: >>>> 2) Use math.pxd as an override for the math module. I'm not sure yet how >>>> that would best be made to work, but it shouldn't be all that complex. It >>>> already works (mostly?) for numpy.pxd, for example, although that's done >>>> explicitly in user code. >>> BTW, I think it would be helpful to make the numpy.pxd cimport automatic as >>> well whenever someone does "import numpy" and friends, right? >>> >> I'm not sure, it means the user has to have numpy development headers. > > But if the user is going to compile a NumPy application, it sounds like > strange to me that she should not be required to install the NumPy > development headers, right? Let's say it's not impossible that someone uses NumPy and Cython without any accelerated C level connection between the two, but it's rather unlikely, given that Cython already has all dependencies that this connection would require as well, except for the NumPy header files. So I would suggest to make the automatic override the default for any module for which a .pxd file with the same fully qualified name is found in the search path, and to require users to explicitly disable this feature for a given module using a module level (or external) compiler directive if they feel like getting slower code (or working around a bug or whatever). Anyway, given that this feature isn't even implemented yet, it may appear a bit premature to discuss these details. Stefan From robertwb at gmail.com Wed May 2 08:56:08 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Tue, 1 May 2012 23:56:08 -0700 Subject: [Cython] Conditional import in pure Python mode In-Reply-To: <4FA04154.5020402@behnel.de> References:

<4F9FAB08.2050506@behnel.de> <4FA0361A.4040005@behnel.de> <4FA037ED.90205@behnel.de> <4FA03E54.1020406@pytables.org> <4FA04154.5020402@behnel.de> Message-ID: On Tue, May 1, 2012 at 1:02 PM, Stefan Behnel wrote: > Francesc Alted, 01.05.2012 21:49: >> On 5/1/12 2:39 PM, mark florisson wrote: >>> On 1 May 2012 20:22, Stefan Behnel wrote: >>>> Stefan Behnel, 01.05.2012 21:14: >>>>> 2) Use math.pxd as an override for the math module. I'm not sure yet how >>>>> that would best be made to work, but it shouldn't be all that complex. It >>>>> already works (mostly?) for numpy.pxd, for example, although that's done >>>>> explicitly in user code. math.pxd would be a bit trickier, as we're trying to shadow python functions with independent c implementations (rather than declaring structure to the single numpy array object and exposing c-level only methods. We'd need to support stuff like double x = ... double y = sin(x) # fast cdef object f = sin # grab the builtin one? but this is by no means insurmountable and could be really useful. >>>> BTW, I think it would be helpful to make the numpy.pxd cimport automatic as >>>> well whenever someone does "import numpy" and friends, right? >>>> >>> I'm not sure, it means the user has to have numpy development headers. >> >> But if the user is going to compile a NumPy application, it sounds like >> strange to me that she should not be required to install the NumPy >> development headers, right? > > Let's say it's not impossible that someone uses NumPy and Cython without > any accelerated C level connection between the two, but it's rather > unlikely, given that Cython already has all dependencies that this > connection would require as well, except for the NumPy header files. > > So I would suggest to make the automatic override the default for any > module for which a .pxd file with the same fully qualified name is found in > the search path, and to require users to explicitly disable this feature > for a given module using a module level (or external) compiler directive if > they feel like getting slower code (or working around a bug or whatever). There is another consideration: this can introduce unnecessary and potentially costly dependencies. For example, in Sage one has sage/rings/integer.pxd. Not everything that imports from this file needs c-level access to the Integer type, and requiring everything that imports from sage.rings.integer to be re-compiled when this file changes would increase the (admittedly already lengthy) re-compile, as well as sucking in a (chain of) un-needed declarations. As Cython becomes more and more common, a similar effect could happen between projects as well. This may be the exception rather than the rule, so perhaps it's not *that* bad to let opt-in be the default, i.e. # cython: cimport_on_import = __all__ - Robert From robertwb at gmail.com Wed May 2 09:15:21 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Wed, 2 May 2012 00:15:21 -0700 Subject: [Cython] [cython-users] cimport numpy fails with Python 3 semantics In-Reply-To: <4F9E35AF.6050209@behnel.de> References: <15456913.1041.1335634688690.JavaMail.geo-discussion-forums@vbli11> <4F9C3BDC.9040108@behnel.de> <4F9E35AF.6050209@behnel.de> Message-ID: On Sun, Apr 29, 2012 at 11:48 PM, Stefan Behnel wrote: > mark florisson, 28.04.2012 21:57: >> On 28 April 2012 19:50, Stefan Behnel wrote: >>> mark florisson, 28.04.2012 20:33: >>>> I think each module should have its own language level, so I think >>>> that's a bug. I think the rules should be: >>>> >>>> ? ? - if passed as command line argument, use that for all cimported >>>> modules, unless they define their only language level through the >>>> directive >>>> ? ? - if set as a directive, the language level will apply only to that module >>> >>> That's how it works. We don't run the tests with language level 3 in >>> Jenkins because the majority of the tests is not meant to be run with Py3 >>> semantics. Maybe it's time to add a numpy_cy3 test. >>> >>> If there are more problems than just this (which was a bug in numpy.pxd), >>> we may consider setting language level 2 explicitly in numpy.pxd. >> >> Ah, great. Do we have any documentation for that? > > We do now. ;) > > However, I'm not sure cimported .pxd files should always inherit the > language_level setting. It's somewhat of a grey area because user provided > .pxd files would benefit from it since they likely all use the same > language level as the main module, whereas the Cython shipped (and > otherwise globally installed) .pxd files wouldn't gain anything and could > potentially break. > > I think we may want to keep the current behaviour and set the language > level explicitly in the few shipped .pxd files that are not language level > agnostic (i.e. those that actually contain code). +1, I'm not worried about breaking the ones that ship with Cython, as we can manually specify the language level on those if necessary. This does have implications for automatically cimporting on import when a .pxd is found though, especially if one pulls in .pxd files from another project. - Robert From stefan_ml at behnel.de Wed May 2 09:33:43 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 02 May 2012 09:33:43 +0200 Subject: [Cython] Conditional import in pure Python mode In-Reply-To: References:

<4F9FAB08.2050506@behnel.de> <4FA0361A.4040005@behnel.de> <4FA037ED.90205@behnel.de> <4FA03E54.1020406@pytables.org> <4FA04154.5020402@behnel.de> Message-ID: <4FA0E357.1000303@behnel.de> Robert Bradshaw, 02.05.2012 08:56: > On Tue, May 1, 2012 at 1:02 PM, Stefan Behnel wrote: >> Francesc Alted, 01.05.2012 21:49: >>> On 5/1/12 2:39 PM, mark florisson wrote: >>>> On 1 May 2012 20:22, Stefan Behnel wrote: >>>>> Stefan Behnel, 01.05.2012 21:14: >>>>>> 2) Use math.pxd as an override for the math module. I'm not sure yet how >>>>>> that would best be made to work, but it shouldn't be all that complex. It >>>>>> already works (mostly?) for numpy.pxd, for example, although that's done >>>>>> explicitly in user code. > > math.pxd would be a bit trickier, as we're trying to shadow python > functions with independent c implementations (rather than declaring > structure to the single numpy array object and exposing c-level only > methods. We'd need to support stuff like > > double x = ... > double y = sin(x) # fast > cdef object f = sin # grab the builtin one? > > but this is by no means insurmountable and could be really useful. I already did that for the builtin abs() function. Works nicely so far, although not from a .pxd but declared internally in Builtin.py. It's not currently supported for methods (I tried it for one of the builtin types and it seemed to require more work than I wanted to invest at that point), but I don't think we need that here. Module level functions should totally be enough for math.pxd. >>>>> BTW, I think it would be helpful to make the numpy.pxd cimport automatic as >>>>> well whenever someone does "import numpy" and friends, right? >>>>> >>>> I'm not sure, it means the user has to have numpy development headers. >>> >>> But if the user is going to compile a NumPy application, it sounds like >>> strange to me that she should not be required to install the NumPy >>> development headers, right? >> >> Let's say it's not impossible that someone uses NumPy and Cython without >> any accelerated C level connection between the two, but it's rather >> unlikely, given that Cython already has all dependencies that this >> connection would require as well, except for the NumPy header files. >> >> So I would suggest to make the automatic override the default for any >> module for which a .pxd file with the same fully qualified name is found in >> the search path, and to require users to explicitly disable this feature >> for a given module using a module level (or external) compiler directive if >> they feel like getting slower code (or working around a bug or whatever). > > There is another consideration: this can introduce unnecessary and > potentially costly dependencies. For example, in Sage one has > sage/rings/integer.pxd. Not everything that imports from this file > needs c-level access to the Integer type, and requiring everything > that imports from sage.rings.integer to be re-compiled when this file > changes would increase the (admittedly already lengthy) re-compile, as > well as sucking in a (chain of) un-needed declarations. Ah, yes, I see your point. The .pxd may actually introduce substantially more (and different) dependencies than the already compiled .so file. Just because the import happens in Cython code and not in Python code doesn't mean it should do different things. > As Cython > becomes more and more common, a similar effect could happen between > projects as well. Agreed. A compile time C level dependency is much more fragile and version dependent than a Python level import. I can see a lot of cases where this matters. > This may be the exception rather than the rule, so perhaps it's not > *that* bad to let opt-in be the default, i.e. > > # cython: cimport_on_import = __all__ So, you would allow it to receive either a sequence of explicit module names or "__all__" to enable it by default, right? Sounds like a reasonable directive to me. Stefan From stefan_ml at behnel.de Wed May 2 09:36:55 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 02 May 2012 09:36:55 +0200 Subject: [Cython] [cython-users] cimport numpy fails with Python 3 semantics In-Reply-To: References: <15456913.1041.1335634688690.JavaMail.geo-discussion-forums@vbli11> <4F9C3BDC.9040108@behnel.de> <4F9E35AF.6050209@behnel.de> Message-ID: <4FA0E417.70802@behnel.de> Robert Bradshaw, 02.05.2012 09:15: > On Sun, Apr 29, 2012 at 11:48 PM, Stefan Behnel wrote: >> mark florisson, 28.04.2012 21:57: >>> On 28 April 2012 19:50, Stefan Behnel wrote: >>>> mark florisson, 28.04.2012 20:33: >>>>> I think each module should have its own language level, so I think >>>>> that's a bug. I think the rules should be: >>>>> >>>>> - if passed as command line argument, use that for all cimported >>>>> modules, unless they define their only language level through the >>>>> directive >>>>> - if set as a directive, the language level will apply only to that module >>>> >>>> That's how it works. We don't run the tests with language level 3 in >>>> Jenkins because the majority of the tests is not meant to be run with Py3 >>>> semantics. Maybe it's time to add a numpy_cy3 test. >>>> >>>> If there are more problems than just this (which was a bug in numpy.pxd), >>>> we may consider setting language level 2 explicitly in numpy.pxd. >>> >>> Ah, great. Do we have any documentation for that? >> >> We do now. ;) >> >> However, I'm not sure cimported .pxd files should always inherit the >> language_level setting. It's somewhat of a grey area because user provided >> .pxd files would benefit from it since they likely all use the same >> language level as the main module, whereas the Cython shipped (and >> otherwise globally installed) .pxd files wouldn't gain anything and could >> potentially break. >> >> I think we may want to keep the current behaviour and set the language >> level explicitly in the few shipped .pxd files that are not language level >> agnostic (i.e. those that actually contain code). > > +1, I'm not worried about breaking the ones that ship with Cython, as > we can manually specify the language level on those if necessary. This > does have implications for automatically cimporting on import when a > .pxd is found though, especially if one pulls in .pxd files from > another project. Right. Given that some people have started distributing .pxd files for some C libraries on PyPI, we should advertise it in a visible part of the documentation that authors of .pxd files have to take care to define a language level if they depend on it. Stefan From robertwb at gmail.com Wed May 2 09:59:46 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Wed, 2 May 2012 00:59:46 -0700 Subject: [Cython] Conditional import in pure Python mode In-Reply-To: <4FA0E357.1000303@behnel.de> References:

<4F9FAB08.2050506@behnel.de> <4FA0361A.4040005@behnel.de> <4FA037ED.90205@behnel.de> <4FA03E54.1020406@pytables.org> <4FA04154.5020402@behnel.de> <4FA0E357.1000303@behnel.de> Message-ID: On Wed, May 2, 2012 at 12:33 AM, Stefan Behnel wrote: > Robert Bradshaw, 02.05.2012 08:56: >> On Tue, May 1, 2012 at 1:02 PM, Stefan Behnel wrote: >>> Francesc Alted, 01.05.2012 21:49: >>>> On 5/1/12 2:39 PM, mark florisson wrote: >>>>> On 1 May 2012 20:22, Stefan Behnel wrote: >>>>>> Stefan Behnel, 01.05.2012 21:14: >>>>>>> 2) Use math.pxd as an override for the math module. I'm not sure yet how >>>>>>> that would best be made to work, but it shouldn't be all that complex. It >>>>>>> already works (mostly?) for numpy.pxd, for example, although that's done >>>>>>> explicitly in user code. >> >> math.pxd would be a bit trickier, as we're trying to shadow python >> functions with independent c implementations (rather than declaring >> structure to the single numpy array object and exposing c-level only >> methods. We'd need to support stuff like >> >> double x = ... >> double y = sin(x) # fast >> cdef object f = sin # grab the builtin one? >> >> but this is by no means insurmountable and could be really useful. > > I already did that for the builtin abs() function. Works nicely so far, > although not from a .pxd but declared internally in Builtin.py. > > It's not currently supported for methods (I tried it for one of the builtin > types and it seemed to require more work than I wanted to invest at that > point), but I don't think we need that here. Module level functions should > totally be enough for math.pxd. Yep. >>>>>> BTW, I think it would be helpful to make the numpy.pxd cimport automatic as >>>>>> well whenever someone does "import numpy" and friends, right? >>>>>> >>>>> I'm not sure, it means the user has to have numpy development headers. >>>> >>>> But if the user is going to compile a NumPy application, it sounds like >>>> strange to me that she should not be required to install the NumPy >>>> development headers, right? >>> >>> Let's say it's not impossible that someone uses NumPy and Cython without >>> any accelerated C level connection between the two, but it's rather >>> unlikely, given that Cython already has all dependencies that this >>> connection would require as well, except for the NumPy header files. >>> >>> So I would suggest to make the automatic override the default for any >>> module for which a .pxd file with the same fully qualified name is found in >>> the search path, and to require users to explicitly disable this feature >>> for a given module using a module level (or external) compiler directive if >>> they feel like getting slower code (or working around a bug or whatever). >> >> There is another consideration: this can introduce unnecessary and >> potentially costly dependencies. For example, in Sage one has >> sage/rings/integer.pxd. Not everything that imports from this file >> needs c-level access to the Integer type, and requiring everything >> that imports from sage.rings.integer to be re-compiled when this file >> changes would increase the (admittedly already lengthy) re-compile, as >> well as sucking in a (chain of) un-needed declarations. > > Ah, yes, I see your point. The .pxd may actually introduce substantially > more (and different) dependencies than the already compiled .so file. Just > because the import happens in Cython code and not in Python code doesn't > mean it should do different things. > > >> As Cython >> becomes more and more common, a similar effect could happen between >> projects as well. > > Agreed. A compile time C level dependency is much more fragile and version > dependent than a Python level import. I can see a lot of cases where this > matters. > > >> This may be the exception rather than the rule, so perhaps it's not >> *that* bad to let opt-in be the default, i.e. >> >> # cython: cimport_on_import = __all__ > > So, you would allow it to receive either a sequence of explicit module > names or "__all__" to enable it by default, right? Sounds like a reasonable > directive to me. Perhaps the default could be "those that ship with Cython" or even some other hand-picked list. (In this case, we'd want users to be able to add to and remove from the default set, e.g. # cython: cimport_on_import = +my_module, -math It'd also be nice to be able to specify a package and all its submodules...) - Robert From stefan_ml at behnel.de Wed May 2 15:49:44 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 02 May 2012 15:49:44 +0200 Subject: [Cython] Conditional import in pure Python mode In-Reply-To: <4FA0361A.4040005@behnel.de> References:

<4F9FAB08.2050506@behnel.de> <4FA0361A.4040005@behnel.de> Message-ID: <4FA13B78.5070003@behnel.de> Stefan Behnel, 01.05.2012 21:14: > 1) Write up a math.pxd that contains declarations equivalent to Python's > math module. Note that this may not be entirely trivial because the math > module does some error handling and type special casing under the hood. Having taken a slightly deeper look at this now, I think it would make sense to start with this part. An equivalent implementation will need to do the same kind of error handling, so there will be more than just libm function declarations in the file. Inline functions work in .pxd files (although I'm not sure inlining is really a good idea here). > Some of this may still be required for the C level equivalents, although > the type special casing would better be done by overriding function > signatures using this feature: > > http://docs.cython.org/src/userguide/external_C_code.html#resolving-naming-conflicts-c-name-specifications > > Basically, you would declare two (or more) function signatures under the > same name, but with different C names. Given that most functions work on double values and return double, this won't be an issue. The functions that return integers are an issue, though, because Python can easily handle this under the hood by simply returning arbitrary sized integer objects. A C version cannot safely return C integers without risking overflows. We should leave those out entirely for now and just fall back to the normal Python implementation. Ian, could you give the .pxd file a try? You should be able to test this by importing math, followed by a cimport of your new math module. Stefan From ian.h.bell at gmail.com Thu May 3 03:09:08 2012 From: ian.h.bell at gmail.com (Ian Bell) Date: Thu, 3 May 2012 13:09:08 +1200 Subject: [Cython] PXD file for overriding math functions Message-ID: Ok, I think I am missing something - I am a bit lost in the Cython jargon. Here is where I stand, let me know where I am going wrong... I compiled with cython.py -a test.py, but it isn't working. If I am completely not on the right track, feel free to let me know :) Ian ##### test.py (Pure python file) ##### from math import sin def f(r): s=sin(r) return s r=3.141592654/3.0 print f(r) #### test.pxd ##### cimport cython import cython cimport math_override @cython.locals(s=cython. double) cpdef f(double r) #### math_override.pxd ##### cdef extern from "math.h": double sin_d "sin" (double x) float sin_f "sin" (float x) cpdef inline double sin(double x): return sin_d(x) cpdef inline float sin(float x): return sin_f(x) -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Thu May 3 14:24:32 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 03 May 2012 14:24:32 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8F38F9.7020008@astro.uio.no> <4F8FEF7A.2090501@astro.uio.no> <4F9010B0.5080100@astro.uio.no> Message-ID: <4FA27900.80101@astro.uio.no> I'm afraid I'm going to try to kick this thread alive again. I want us to have something that Travis can implement in numba and "his" portion of SciPy, and also that could be used by NumPy devs. Since the decisions are rather arbitrary, perhaps we can try to quickly get to the "+1" stage (or, depending on how things turn out, a tournament starting with at most one proposal per person). On 04/20/2012 09:30 AM, Robert Bradshaw wrote: > On Thu, Apr 19, 2012 at 6:18 AM, Dag Sverre Seljebotn > wrote: >> On 04/19/2012 01:20 PM, Nathaniel Smith wrote: >>> >>> On Thu, Apr 19, 2012 at 11:56 AM, Dag Sverre Seljebotn >>> wrote: >>>> >>>> I thought of some drawbacks of getfuncptr: >>>> >>>> - Important: Doesn't allow you to actually inspect the supported >>>> signatures, which is needed (or at least convenient) if you want to use >>>> an >>>> FFI library or do some JIT-ing. So an iteration mechanism is still needed >>>> in >>>> addition, meaning the number of things for the object to implement grows >>>> a >>>> bit large. Default implementations help -- OTOH there really wasn't a >>>> major >>>> drawback with the table approach as long as JIT's can just replace it? >>> >>> >>> But this is orthogonal to the table vs. getfuncptr discussion. We're >>> assuming that the table might be extended at runtime, which means you >>> can't use it to determine which signatures are supported. So we need >>> some sort of extra interface for the caller and callee to negotiate a >>> type anyway. (I'm intentionally agnostic about whether it makes more >>> sense for the caller or the callee to be doing the iterating... in >>> general type negotiation could be quite complicated, and I don't think >>> we know enough to get that interface right yet.) >> >> >> Hmm. Right. Let's define an explicit goal for the CEP then. >> >> What I care about at is getting the spec right enough such that, e.g., NumPy >> and SciPy, and other (mostly manually written) C extensions with slow >> development pace, can be forward-compatible with whatever crazy things >> Cython or Numba does. >> >> There's 4 cases: >> >> 1) JIT calls JIT (ruled out straight away) >> >> 2) JIT calls static: Say that Numba wants to optimize calls to np.sin etc. >> without special-casing; this seem to require reading a table of static >> signatures >> >> 3) Static calls JIT: This is the case when scipy.integrate routines calls a >> Numba callback and Numba generates a specialization for the dtype they >> explicitly needs. This calls for getfuncptr (but perhaps in a form which we >> can't quite determine yet?). >> >> 4) Static calls static: Either table or getfuncptr works. >> >> My gut feeling is go for 2) and 4) in this round => table. > > getfuncptr is really simple and flexible, but I'm with you on both of > these to points, and the overhead was not trivial. It's interesting to hear you say the overhead was not trivial (that was my hunch too but I sort of yielded to peer pressure). I think SAGE has some history with this -- isn't one of the reasons for the "cpdef" vs. "cdef" split that "cpdef" has the cost of a single lookup for the presence of a __dict__ on the object, which was an unacceptable penalty for parts of Sage? That can't have been much more than a 1ns penalty per instance. > Of course we could offer both, i.e. look at the table first, if it's > not there call getfuncptr if it's non-null, then fall back to "slow" > call or error. These are all opt-in depending on how hard you want to > try to optimize things. That's actually exactly what I was envisioning -- in time (with JITs on both ends) the table could act sort of as a cache for commonly used overloads, and getfuncptr would access the others more slowly. > As far as keys vs. interning, I'm also tempted to try to have my cake > and eat it too. Define a space-friendly encoding for signatures and > require interning for anything that doesn't fit into a single > sizeof(void*). The fact that this cutoff would vary for 32 vs 64-bit > would require some care, but could be done with macros in C. If the > signatures produce non-aligned "pointer" values there won't be any > collisions, and this way libraries only have to share in the global > (Python-level?) interning scheme iff they want to expose/use "large" > signatures. That was the approach I described to Nathaniel as having the "worst features of both" -- lack of readable gdb dumps of the keys, and having to define an interning mechanism for use by the 5% cases that don't fit. To sum up hat's been said earlier: The only thing that would blow the key size above 64 bits except very many arguments would be things like classes/interfaces/vtables. But in that case, reasonable-sized keys for the vtables can be computed (whether by interning, cryptographic hashing, or a GUID like Microsoft COM). So I'm still +1 on my proposal; but I would be happy with an intern-based proposal if somebody bothers to flesh it out a bit (I don't quite know how I'd do it and would get lost in PyObject* vs. char* and cross-language state sharing...). My proposal in summary: - Table with variable-sized entries (not getfuncptr, not interning) that can be scanned by the caller in 128-bit increments. - Only use 64 bit pointers, in order to keep table format the same on 32 bit and 64 bit. - Do encoding of the signature strings. Utility functions to work with this (both to scan tables and encode/decode a format string) will be provided as C code by the CEP that can be bundled. Pros: - Table format is not specific to Python world (it makes as much sense to use, e.g., internally in Julia) - No state needs to be shared between packages run-time (they can use the bundled C code in isolation if they wish) - No need for an interning machinery - More easily compatible with multiple interpreter states (?) - Minor performance benefit of table over getfuncptr (intern vs. key didn't matter). [Cue comment that this doesn't matter.] Cons: - Lack of instant low-level debuggability, like in the interned case (a human needs to run a function on the key constant to see what it corresponds to) - Not as extendable as getfuncptr (though currently we don't quite know how we would extend it, and it's easy to add getfuncptr in the future) Notes: - When extended to handle vtable argument types, these still needs to be some interning or crypto-hashing. But that is likely to come up anyway as part of a COM-like queryInterface protocol, and at that point we will be better at making those decisions and design a good interning mechanism. Dag From robertwb at gmail.com Thu May 3 22:18:03 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Thu, 3 May 2012 13:18:03 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4FA27900.80101@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8F38F9.7020008@astro.uio.no> <4F8FEF7A.2090501@astro.uio.no> <4F9010B0.5080100@astro.uio.no> <4FA27900.80101@astro.uio.no> Message-ID: On Thu, May 3, 2012 at 5:24 AM, Dag Sverre Seljebotn wrote: > I'm afraid I'm going to try to kick this thread alive again. I want us to > have something that Travis can implement in numba and "his" portion of > SciPy, and also that could be used by NumPy devs. That's great, I'd like to get things moving forward on this. > Since the decisions are rather arbitrary, perhaps we can try to quickly get > to the "+1" stage (or, depending on how things turn out, a tournament > starting with at most one proposal per person). > > > On 04/20/2012 09:30 AM, Robert Bradshaw wrote: >> >> On Thu, Apr 19, 2012 at 6:18 AM, Dag Sverre Seljebotn >> ?wrote: >>> >>> On 04/19/2012 01:20 PM, Nathaniel Smith wrote: >>>> >>>> >>>> On Thu, Apr 19, 2012 at 11:56 AM, Dag Sverre Seljebotn >>>> ? ?wrote: >>>>> >>>>> >>>>> I thought of some drawbacks of getfuncptr: >>>>> >>>>> ?- Important: Doesn't allow you to actually inspect the supported >>>>> signatures, which is needed (or at least convenient) if you want to use >>>>> an >>>>> FFI library or do some JIT-ing. So an iteration mechanism is still >>>>> needed >>>>> in >>>>> addition, meaning the number of things for the object to implement >>>>> grows >>>>> a >>>>> bit large. Default implementations help -- OTOH there really wasn't a >>>>> major >>>>> drawback with the table approach as long as JIT's can just replace it? >>>> >>>> >>>> >>>> But this is orthogonal to the table vs. getfuncptr discussion. We're >>>> assuming that the table might be extended at runtime, which means you >>>> can't use it to determine which signatures are supported. So we need >>>> some sort of extra interface for the caller and callee to negotiate a >>>> type anyway. (I'm intentionally agnostic about whether it makes more >>>> sense for the caller or the callee to be doing the iterating... in >>>> general type negotiation could be quite complicated, and I don't think >>>> we know enough to get that interface right yet.) >>> >>> >>> >>> Hmm. Right. Let's define an explicit goal for the CEP then. >>> >>> What I care about at is getting the spec right enough such that, e.g., >>> NumPy >>> and SciPy, and other (mostly manually written) C extensions with slow >>> development pace, can be forward-compatible with whatever crazy things >>> Cython or Numba does. >>> >>> There's 4 cases: >>> >>> ?1) JIT calls JIT (ruled out straight away) >>> >>> ?2) JIT calls static: Say that Numba wants to optimize calls to np.sin >>> etc. >>> without special-casing; this seem to require reading a table of static >>> signatures >>> >>> ?3) Static calls JIT: This is the case when scipy.integrate routines >>> calls a >>> Numba callback and Numba generates a specialization for the dtype they >>> explicitly needs. This calls for getfuncptr (but perhaps in a form which >>> we >>> can't quite determine yet?). >>> >>> ?4) Static calls static: Either table or getfuncptr works. >>> >>> My gut feeling is go for 2) and 4) in this round => ?table. >> >> >> getfuncptr is really simple and flexible, but I'm with you on both of >> these to points, and the overhead was not trivial. > > > It's interesting to hear you say the overhead was not trivial (that was my > hunch too but I sort of yielded to peer pressure). I think SAGE has some > history with this -- isn't one of the reasons for the "cpdef" vs. "cdef" > split that "cpdef" has the cost of a single lookup for the presence of a > __dict__ on the object, which was an unacceptable penalty for parts of Sage? > That can't have been much more than a 1ns penalty per instance. It's mostly historical, as a lot of Sage was written before cpdef existed (and people following this pattern after the fact). There are also some cases where cdef is used because the "leaf" classes are often in Python but have no need to override the given method, and an actual dictionary lookup would be required otherwise (e.g. in the coercion model). >> Of course we could offer both, i.e. look at the table first, if it's >> not there call getfuncptr if it's non-null, then fall back to "slow" >> call or error. These are all opt-in depending on how hard you want to >> try to optimize things. > > > That's actually exactly what I was envisioning -- in time (with JITs on both > ends) the table could act sort of as a cache for commonly used overloads, > and getfuncptr would access the others more slowly. OK, then +1 >> As far as keys vs. interning, I'm also tempted to try to have my cake >> and eat it too. Define a space-friendly encoding for signatures and >> require interning for anything that doesn't fit into a single >> sizeof(void*). The fact that this cutoff would vary for 32 vs 64-bit >> would require some care, but could be done with macros in C. If the >> signatures produce non-aligned "pointer" values there won't be any >> collisions, and this way libraries only have to share in the global >> (Python-level?) interning scheme iff they want to expose/use "large" >> signatures. > > > That was the approach I described to Nathaniel as having the "worst features > of both" -- lack of readable gdb dumps of the keys, and having to define an > interning mechanism for use by the 5% cases that don't fit. Yes, it has the best and worst features of both. > To sum up hat's been said earlier: The only thing that would blow the key > size above 64 bits except very many arguments would be things like > classes/interfaces/vtables. But in that case, reasonable-sized keys for the > vtables can be computed (whether by interning, cryptographic hashing, or a > GUID like Microsoft COM). > > So I'm still +1 on my proposal; but I would be happy with an intern-based > proposal if somebody bothers to flesh it out a bit (I don't quite know how > I'd do it and would get lost in PyObject* vs. char* and cross-language state > sharing...). > > My proposal in summary: > > ?- Table with variable-sized entries (not getfuncptr, not interning) that > can be scanned by the caller in 128-bit increments. > > ?- Only use 64 bit pointers, in order to keep table format the same on 32 > bit and 64 bit. > > ?- Do encoding of the signature strings. Utility functions to work with this > (both to scan tables and encode/decode a format string) will be provided as > C code by the CEP that can be bundled. > > Pros: > > ?- Table format is not specific to Python world (it makes as much sense to > use, e.g., internally in Julia) > > ?- No state needs to be shared between packages run-time (they can use the > bundled C code in isolation if they wish) > > ?- No need for an interning machinery > > ?- More easily compatible with multiple interpreter states (?) > > ?- Minor performance benefit of table over getfuncptr (intern vs. key didn't > matter). [Cue comment that this doesn't matter.] > > Cons: > > ?- Lack of instant low-level debuggability, like in the interned case (a > human needs to run a function on the key constant to see what it corresponds > to) > > ?- Not as extendable as getfuncptr (though currently we don't quite know how > we would extend it, and it's easy to add getfuncptr in the future) > > Notes: > > ?- When extended to handle vtable argument types, these still needs to be > some interning or crypto-hashing. But that is likely to come up anyway as > part of a COM-like queryInterface protocol, and at that point we will be > better at making those decisions and design a good interning mechanism. +1 to going with this, with the following suggestions for future interoperability: 1) Even if we don't flesh out getfuncptr at this point, lets leave a slot in the spec for it which must be set to NULL. 2) Lets define the encoding to emit odd first words, to allow using (alligned) pointers in some future interning extension without worrying about collision. This could be used to prevent matching on the 2n+1th words as well when scanning the table. - Robert From ian.h.bell at gmail.com Sat May 5 11:55:38 2012 From: ian.h.bell at gmail.com (Ian Bell) Date: Sat, 5 May 2012 21:55:38 +1200 Subject: [Cython] Suggestion of adding working examples to website Message-ID: One "feature" that matplotlib (Python's 2D plotting library) has which makes it easy to jump into matplotlib is the huge section of working examples: http://matplotlib.sourceforge.net/examples/index.html and http://matplotlib.sourceforge.net/gallery.html . From this, within a couple of days you can get minimally proficient with matplotlib. Having been (and continuing to be) a new user of Cython, I have found the learning curve to be very steep. The documentation online is pretty good (though it could use some work in places). Sometimes all it would take would be some working examples and the documentation would be completely clear. I taught myself the use of matplotlib through the old cut&paste and iterate method. I find that the one thing that is consistently the most challenging about the Cython docs is the lack of distutils setup.py files for more interesting configurations. Without them it requires a certain amount of guessing, playing, and Googling to make sense of how the pieces are supposed to go together. A working examples section could be VERY helpful in this regards. Also, the options for the distutils extensions are not documented at all so far as I can tell. Since the docs were built with Sphinx, it ought to be pretty easy to pull in docstrings if they exist. Just my 2 cents. I would be happy to work with you all to compile some simple examples for common uses - like the numpy convolve example for instance, and the integration example as well. Regards, Ian -------------- next part -------------- An HTML attachment was scrubbed... URL: From markflorisson88 at gmail.com Sat May 5 13:08:59 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 5 May 2012 12:08:59 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4FA27900.80101@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8F38F9.7020008@astro.uio.no> <4F8FEF7A.2090501@astro.uio.no> <4F9010B0.5080100@astro.uio.no> <4FA27900.80101@astro.uio.no> Message-ID: On 3 May 2012 13:24, Dag Sverre Seljebotn wrote: > I'm afraid I'm going to try to kick this thread alive again. I want us to > have something that Travis can implement in numba and "his" portion of > SciPy, and also that could be used by NumPy devs. > > Since the decisions are rather arbitrary, perhaps we can try to quickly get > to the "+1" stage (or, depending on how things turn out, a tournament > starting with at most one proposal per person). > > > On 04/20/2012 09:30 AM, Robert Bradshaw wrote: >> >> On Thu, Apr 19, 2012 at 6:18 AM, Dag Sverre Seljebotn >> ?wrote: >>> >>> On 04/19/2012 01:20 PM, Nathaniel Smith wrote: >>>> >>>> >>>> On Thu, Apr 19, 2012 at 11:56 AM, Dag Sverre Seljebotn >>>> ? ?wrote: >>>>> >>>>> >>>>> I thought of some drawbacks of getfuncptr: >>>>> >>>>> ?- Important: Doesn't allow you to actually inspect the supported >>>>> signatures, which is needed (or at least convenient) if you want to use >>>>> an >>>>> FFI library or do some JIT-ing. So an iteration mechanism is still >>>>> needed >>>>> in >>>>> addition, meaning the number of things for the object to implement >>>>> grows >>>>> a >>>>> bit large. Default implementations help -- OTOH there really wasn't a >>>>> major >>>>> drawback with the table approach as long as JIT's can just replace it? >>>> >>>> >>>> >>>> But this is orthogonal to the table vs. getfuncptr discussion. We're >>>> assuming that the table might be extended at runtime, which means you >>>> can't use it to determine which signatures are supported. So we need >>>> some sort of extra interface for the caller and callee to negotiate a >>>> type anyway. (I'm intentionally agnostic about whether it makes more >>>> sense for the caller or the callee to be doing the iterating... in >>>> general type negotiation could be quite complicated, and I don't think >>>> we know enough to get that interface right yet.) >>> >>> >>> >>> Hmm. Right. Let's define an explicit goal for the CEP then. >>> >>> What I care about at is getting the spec right enough such that, e.g., >>> NumPy >>> and SciPy, and other (mostly manually written) C extensions with slow >>> development pace, can be forward-compatible with whatever crazy things >>> Cython or Numba does. >>> >>> There's 4 cases: >>> >>> ?1) JIT calls JIT (ruled out straight away) >>> >>> ?2) JIT calls static: Say that Numba wants to optimize calls to np.sin >>> etc. >>> without special-casing; this seem to require reading a table of static >>> signatures >>> >>> ?3) Static calls JIT: This is the case when scipy.integrate routines >>> calls a >>> Numba callback and Numba generates a specialization for the dtype they >>> explicitly needs. This calls for getfuncptr (but perhaps in a form which >>> we >>> can't quite determine yet?). >>> >>> ?4) Static calls static: Either table or getfuncptr works. >>> >>> My gut feeling is go for 2) and 4) in this round => ?table. >> >> >> getfuncptr is really simple and flexible, but I'm with you on both of >> these to points, and the overhead was not trivial. > > > It's interesting to hear you say the overhead was not trivial (that was my > hunch too but I sort of yielded to peer pressure). I think SAGE has some > history with this -- isn't one of the reasons for the "cpdef" vs. "cdef" > split that "cpdef" has the cost of a single lookup for the presence of a > __dict__ on the object, which was an unacceptable penalty for parts of Sage? > That can't have been much more than a 1ns penalty per instance. > > >> Of course we could offer both, i.e. look at the table first, if it's >> not there call getfuncptr if it's non-null, then fall back to "slow" >> call or error. These are all opt-in depending on how hard you want to >> try to optimize things. > > > That's actually exactly what I was envisioning -- in time (with JITs on both > ends) the table could act sort of as a cache for commonly used overloads, > and getfuncptr would access the others more slowly. > > >> As far as keys vs. interning, I'm also tempted to try to have my cake >> and eat it too. Define a space-friendly encoding for signatures and >> require interning for anything that doesn't fit into a single >> sizeof(void*). The fact that this cutoff would vary for 32 vs 64-bit >> would require some care, but could be done with macros in C. If the >> signatures produce non-aligned "pointer" values there won't be any >> collisions, and this way libraries only have to share in the global >> (Python-level?) interning scheme iff they want to expose/use "large" >> signatures. > > > That was the approach I described to Nathaniel as having the "worst features > of both" -- lack of readable gdb dumps of the keys, and having to define an > interning mechanism for use by the 5% cases that don't fit. > > To sum up hat's been said earlier: The only thing that would blow the key > size above 64 bits except very many arguments would be things like > classes/interfaces/vtables. But in that case, reasonable-sized keys for the > vtables can be computed (whether by interning, cryptographic hashing, or a > GUID like Microsoft COM). > > So I'm still +1 on my proposal; but I would be happy with an intern-based > proposal if somebody bothers to flesh it out a bit (I don't quite know how > I'd do it and would get lost in PyObject* vs. char* and cross-language state > sharing...). > > My proposal in summary: > > ?- Table with variable-sized entries (not getfuncptr, not interning) that > can be scanned by the caller in 128-bit increments. Hm, so the caller knows what kind of key it needs to compare to, so if it has a 64 bits key then it won't need to compare 128 bits (padded with zeroes?). But if it doesn't compare 128 bits, then it means 128 bit keys cannot have 64 bit keys as prefix. Will that be a problem, or would it make sense to make the first entry a pointer pointing to 128 bit keys, and the rest are all 64 bit keys (or even 32 bit keys and two pointers)? e.g. a contiguous list of [128 bit key/pointer list-pointer, 64-bit keys & func pointers, 128 bit keys & func pointers, NULL] Even with a naive encoding scheme you could encode 3 scalar arguments and a return value in 32 bits (e.g. 'dddd'). That might be better on x86? > ?- Only use 64 bit pointers, in order to keep table format the same on 32 > bit and 64 bit. Pointer to the function? I think that would only be harder to use than native pointers? > ?- Do encoding of the signature strings. Utility functions to work with this > (both to scan tables and encode/decode a format string) will be provided as > C code by the CEP that can be bundled. > > Pros: > > ?- Table format is not specific to Python world (it makes as much sense to > use, e.g., internally in Julia) > > ?- No state needs to be shared between packages run-time (they can use the > bundled C code in isolation if they wish) > > ?- No need for an interning machinery > > ?- More easily compatible with multiple interpreter states (?) > > ?- Minor performance benefit of table over getfuncptr (intern vs. key didn't > matter). [Cue comment that this doesn't matter.] > > Cons: > > ?- Lack of instant low-level debuggability, like in the interned case (a > human needs to run a function on the key constant to see what it corresponds > to) > > ?- Not as extendable as getfuncptr (though currently we don't quite know how > we would extend it, and it's easy to add getfuncptr in the future) > > Notes: > > ?- When extended to handle vtable argument types, these still needs to be > some interning or crypto-hashing. But that is likely to come up anyway as > part of a COM-like queryInterface protocol, and at that point we will be > better at making those decisions and design a good interning mechanism. > > Dag > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From d.s.seljebotn at astro.uio.no Sat May 5 18:27:11 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 05 May 2012 18:27:11 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8F38F9.7020008@astro.uio.no> <4F8FEF7A.2090501@astro.uio.no> <4F9010B0.5080100@astro.uio.no> <4FA27900.80101@astro.uio.no> Message-ID: <4FA554DF.9040702@astro.uio.no> On 05/05/2012 01:08 PM, mark florisson wrote: > On 3 May 2012 13:24, Dag Sverre Seljebotn wrote: >> I'm afraid I'm going to try to kick this thread alive again. I want us to >> have something that Travis can implement in numba and "his" portion of >> SciPy, and also that could be used by NumPy devs. >> >> Since the decisions are rather arbitrary, perhaps we can try to quickly get >> to the "+1" stage (or, depending on how things turn out, a tournament >> starting with at most one proposal per person). >> >> >> On 04/20/2012 09:30 AM, Robert Bradshaw wrote: >>> >>> On Thu, Apr 19, 2012 at 6:18 AM, Dag Sverre Seljebotn >>> wrote: >>>> >>>> On 04/19/2012 01:20 PM, Nathaniel Smith wrote: >>>>> >>>>> >>>>> On Thu, Apr 19, 2012 at 11:56 AM, Dag Sverre Seljebotn >>>>> wrote: >>>>>> >>>>>> >>>>>> I thought of some drawbacks of getfuncptr: >>>>>> >>>>>> - Important: Doesn't allow you to actually inspect the supported >>>>>> signatures, which is needed (or at least convenient) if you want to use >>>>>> an >>>>>> FFI library or do some JIT-ing. So an iteration mechanism is still >>>>>> needed >>>>>> in >>>>>> addition, meaning the number of things for the object to implement >>>>>> grows >>>>>> a >>>>>> bit large. Default implementations help -- OTOH there really wasn't a >>>>>> major >>>>>> drawback with the table approach as long as JIT's can just replace it? >>>>> >>>>> >>>>> >>>>> But this is orthogonal to the table vs. getfuncptr discussion. We're >>>>> assuming that the table might be extended at runtime, which means you >>>>> can't use it to determine which signatures are supported. So we need >>>>> some sort of extra interface for the caller and callee to negotiate a >>>>> type anyway. (I'm intentionally agnostic about whether it makes more >>>>> sense for the caller or the callee to be doing the iterating... in >>>>> general type negotiation could be quite complicated, and I don't think >>>>> we know enough to get that interface right yet.) >>>> >>>> >>>> >>>> Hmm. Right. Let's define an explicit goal for the CEP then. >>>> >>>> What I care about at is getting the spec right enough such that, e.g., >>>> NumPy >>>> and SciPy, and other (mostly manually written) C extensions with slow >>>> development pace, can be forward-compatible with whatever crazy things >>>> Cython or Numba does. >>>> >>>> There's 4 cases: >>>> >>>> 1) JIT calls JIT (ruled out straight away) >>>> >>>> 2) JIT calls static: Say that Numba wants to optimize calls to np.sin >>>> etc. >>>> without special-casing; this seem to require reading a table of static >>>> signatures >>>> >>>> 3) Static calls JIT: This is the case when scipy.integrate routines >>>> calls a >>>> Numba callback and Numba generates a specialization for the dtype they >>>> explicitly needs. This calls for getfuncptr (but perhaps in a form which >>>> we >>>> can't quite determine yet?). >>>> >>>> 4) Static calls static: Either table or getfuncptr works. >>>> >>>> My gut feeling is go for 2) and 4) in this round => table. >>> >>> >>> getfuncptr is really simple and flexible, but I'm with you on both of >>> these to points, and the overhead was not trivial. >> >> >> It's interesting to hear you say the overhead was not trivial (that was my >> hunch too but I sort of yielded to peer pressure). I think SAGE has some >> history with this -- isn't one of the reasons for the "cpdef" vs. "cdef" >> split that "cpdef" has the cost of a single lookup for the presence of a >> __dict__ on the object, which was an unacceptable penalty for parts of Sage? >> That can't have been much more than a 1ns penalty per instance. >> >> >>> Of course we could offer both, i.e. look at the table first, if it's >>> not there call getfuncptr if it's non-null, then fall back to "slow" >>> call or error. These are all opt-in depending on how hard you want to >>> try to optimize things. >> >> >> That's actually exactly what I was envisioning -- in time (with JITs on both >> ends) the table could act sort of as a cache for commonly used overloads, >> and getfuncptr would access the others more slowly. >> >> >>> As far as keys vs. interning, I'm also tempted to try to have my cake >>> and eat it too. Define a space-friendly encoding for signatures and >>> require interning for anything that doesn't fit into a single >>> sizeof(void*). The fact that this cutoff would vary for 32 vs 64-bit >>> would require some care, but could be done with macros in C. If the >>> signatures produce non-aligned "pointer" values there won't be any >>> collisions, and this way libraries only have to share in the global >>> (Python-level?) interning scheme iff they want to expose/use "large" >>> signatures. >> >> >> That was the approach I described to Nathaniel as having the "worst features >> of both" -- lack of readable gdb dumps of the keys, and having to define an >> interning mechanism for use by the 5% cases that don't fit. >> >> To sum up hat's been said earlier: The only thing that would blow the key >> size above 64 bits except very many arguments would be things like >> classes/interfaces/vtables. But in that case, reasonable-sized keys for the >> vtables can be computed (whether by interning, cryptographic hashing, or a >> GUID like Microsoft COM). >> >> So I'm still +1 on my proposal; but I would be happy with an intern-based >> proposal if somebody bothers to flesh it out a bit (I don't quite know how >> I'd do it and would get lost in PyObject* vs. char* and cross-language state >> sharing...). >> >> My proposal in summary: >> >> - Table with variable-sized entries (not getfuncptr, not interning) that >> can be scanned by the caller in 128-bit increments. > > Hm, so the caller knows what kind of key it needs to compare to, so if > it has a 64 bits key then it won't need to compare 128 bits (padded > with zeroes?). But if it doesn't compare 128 bits, then it means 128 > bit keys cannot have 64 bit keys as prefix. Will that be a problem, or Did you read the CEP? I also clarified this in a post in response to Nathaniel. The idea is that the scanner doesn't need to branch on the key-length anywhere. This requires a) making each key n*64 bits long where n is odd => function pointers are always at (m*128 + 64) bits from the start for some non-negative integer m, b) insert some protective prefix for every 128 bits in the key. > would it make sense to make the first entry a pointer pointing to 128 > bit keys, and the rest are all 64 bit keys (or even 32 bit keys and > two pointers)? e.g. a contiguous list of [128 bit key/pointer > list-pointer, 64-bit keys& func pointers, 128 bit keys& func > pointers, NULL] I don't really understand this description, but in general I'm sceptical about the pipelining abilities of pointer-chasing code. It may be OK, but it would require a benchmark, and if there's not a reason to have it... > Even with a naive encoding scheme you could encode 3 scalar arguments > and a return value in 32 bits (e.g. 'dddd'). That might be better on > x86? Me and Robert have been assuming some non-ASCII encoding that would allow many more arguments in 64 bits. > >> - Only use 64 bit pointers, in order to keep table format the same on 32 >> bit and 64 bit. > > Pointer to the function? I think that would only be harder to use than > native pointers? That was to make the multiple-of-128-bit-entry idea work without having to require that keys are different between 32 bits and 64 bits platforms. Dag >> - Do encoding of the signature strings. Utility functions to work with this >> (both to scan tables and encode/decode a format string) will be provided as >> C code by the CEP that can be bundled. >> >> Pros: >> >> - Table format is not specific to Python world (it makes as much sense to >> use, e.g., internally in Julia) >> >> - No state needs to be shared between packages run-time (they can use the >> bundled C code in isolation if they wish) >> >> - No need for an interning machinery >> >> - More easily compatible with multiple interpreter states (?) >> >> - Minor performance benefit of table over getfuncptr (intern vs. key didn't >> matter). [Cue comment that this doesn't matter.] >> >> Cons: >> >> - Lack of instant low-level debuggability, like in the interned case (a >> human needs to run a function on the key constant to see what it corresponds >> to) >> >> - Not as extendable as getfuncptr (though currently we don't quite know how >> we would extend it, and it's easy to add getfuncptr in the future) >> >> Notes: >> >> - When extended to handle vtable argument types, these still needs to be >> some interning or crypto-hashing. But that is likely to come up anyway as >> part of a COM-like queryInterface protocol, and at that point we will be >> better at making those decisions and design a good interning mechanism. >> >> Dag >> >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Sat May 5 19:13:04 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 5 May 2012 18:13:04 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4FA554DF.9040702@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8F38F9.7020008@astro.uio.no> <4F8FEF7A.2090501@astro.uio.no> <4F9010B0.5080100@astro.uio.no> <4FA27900.80101@astro.uio.no> <4FA554DF.9040702@astro.uio.no> Message-ID: On 5 May 2012 17:27, Dag Sverre Seljebotn wrote: > On 05/05/2012 01:08 PM, mark florisson wrote: >> >> On 3 May 2012 13:24, Dag Sverre Seljebotn >> ?wrote: >>> >>> I'm afraid I'm going to try to kick this thread alive again. I want us to >>> have something that Travis can implement in numba and "his" portion of >>> SciPy, and also that could be used by NumPy devs. >>> >>> Since the decisions are rather arbitrary, perhaps we can try to quickly >>> get >>> to the "+1" stage (or, depending on how things turn out, a tournament >>> starting with at most one proposal per person). >>> >>> >>> On 04/20/2012 09:30 AM, Robert Bradshaw wrote: >>>> >>>> >>>> On Thu, Apr 19, 2012 at 6:18 AM, Dag Sverre Seljebotn >>>> ? ?wrote: >>>>> >>>>> >>>>> On 04/19/2012 01:20 PM, Nathaniel Smith wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Apr 19, 2012 at 11:56 AM, Dag Sverre Seljebotn >>>>>> ? ? ?wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> I thought of some drawbacks of getfuncptr: >>>>>>> >>>>>>> ?- Important: Doesn't allow you to actually inspect the supported >>>>>>> signatures, which is needed (or at least convenient) if you want to >>>>>>> use >>>>>>> an >>>>>>> FFI library or do some JIT-ing. So an iteration mechanism is still >>>>>>> needed >>>>>>> in >>>>>>> addition, meaning the number of things for the object to implement >>>>>>> grows >>>>>>> a >>>>>>> bit large. Default implementations help -- OTOH there really wasn't a >>>>>>> major >>>>>>> drawback with the table approach as long as JIT's can just replace >>>>>>> it? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> But this is orthogonal to the table vs. getfuncptr discussion. We're >>>>>> assuming that the table might be extended at runtime, which means you >>>>>> can't use it to determine which signatures are supported. So we need >>>>>> some sort of extra interface for the caller and callee to negotiate a >>>>>> type anyway. (I'm intentionally agnostic about whether it makes more >>>>>> sense for the caller or the callee to be doing the iterating... in >>>>>> general type negotiation could be quite complicated, and I don't think >>>>>> we know enough to get that interface right yet.) >>>>> >>>>> >>>>> >>>>> >>>>> Hmm. Right. Let's define an explicit goal for the CEP then. >>>>> >>>>> What I care about at is getting the spec right enough such that, e.g., >>>>> NumPy >>>>> and SciPy, and other (mostly manually written) C extensions with slow >>>>> development pace, can be forward-compatible with whatever crazy things >>>>> Cython or Numba does. >>>>> >>>>> There's 4 cases: >>>>> >>>>> ?1) JIT calls JIT (ruled out straight away) >>>>> >>>>> ?2) JIT calls static: Say that Numba wants to optimize calls to np.sin >>>>> etc. >>>>> without special-casing; this seem to require reading a table of static >>>>> signatures >>>>> >>>>> ?3) Static calls JIT: This is the case when scipy.integrate routines >>>>> calls a >>>>> Numba callback and Numba generates a specialization for the dtype they >>>>> explicitly needs. This calls for getfuncptr (but perhaps in a form >>>>> which >>>>> we >>>>> can't quite determine yet?). >>>>> >>>>> ?4) Static calls static: Either table or getfuncptr works. >>>>> >>>>> My gut feeling is go for 2) and 4) in this round => ? ?table. >>>> >>>> >>>> >>>> getfuncptr is really simple and flexible, but I'm with you on both of >>>> these to points, and the overhead was not trivial. >>> >>> >>> >>> It's interesting to hear you say the overhead was not trivial (that was >>> my >>> hunch too but I sort of yielded to peer pressure). I think SAGE has some >>> history with this -- isn't one of the reasons for the "cpdef" vs. "cdef" >>> split that "cpdef" has the cost of a single lookup for the presence of a >>> __dict__ on the object, which was an unacceptable penalty for parts of >>> Sage? >>> That can't have been much more than a 1ns penalty per instance. >>> >>> >>>> Of course we could offer both, i.e. look at the table first, if it's >>>> not there call getfuncptr if it's non-null, then fall back to "slow" >>>> call or error. These are all opt-in depending on how hard you want to >>>> try to optimize things. >>> >>> >>> >>> That's actually exactly what I was envisioning -- in time (with JITs on >>> both >>> ends) the table could act sort of as a cache for commonly used overloads, >>> and getfuncptr would access the others more slowly. >>> >>> >>>> As far as keys vs. interning, I'm also tempted to try to have my cake >>>> and eat it too. Define a space-friendly encoding for signatures and >>>> require interning for anything that doesn't fit into a single >>>> sizeof(void*). The fact that this cutoff would vary for 32 vs 64-bit >>>> would require some care, but could be done with macros in C. If the >>>> signatures produce non-aligned "pointer" values there won't be any >>>> collisions, and this way libraries only have to share in the global >>>> (Python-level?) interning scheme iff they want to expose/use "large" >>>> signatures. >>> >>> >>> >>> That was the approach I described to Nathaniel as having the "worst >>> features >>> of both" -- lack of readable gdb dumps of the keys, and having to define >>> an >>> interning mechanism for use by the 5% cases that don't fit. >>> >>> To sum up hat's been said earlier: The only thing that would blow the key >>> size above 64 bits except very many arguments would be things like >>> classes/interfaces/vtables. But in that case, reasonable-sized keys for >>> the >>> vtables can be computed (whether by interning, cryptographic hashing, or >>> a >>> GUID like Microsoft COM). >>> >>> So I'm still +1 on my proposal; but I would be happy with an intern-based >>> proposal if somebody bothers to flesh it out a bit (I don't quite know >>> how >>> I'd do it and would get lost in PyObject* vs. char* and cross-language >>> state >>> sharing...). >>> >>> My proposal in summary: >>> >>> ?- Table with variable-sized entries (not getfuncptr, not interning) that >>> can be scanned by the caller in 128-bit increments. >> >> >> Hm, so the caller knows what kind of key it needs to compare to, so if >> it has a 64 bits key then it won't need to compare 128 bits (padded >> with zeroes?). But if it doesn't compare 128 bits, then it means 128 >> bit keys cannot have 64 bit keys as prefix. Will that be a problem, or > > > Did you read the CEP? I also clarified this in a post in response to > Nathaniel. The idea is that the scanner doesn't need to branch on the > key-length anywhere. This requires a) making each key n*64 bits long where n > is odd => function pointers are always at (m*128 + 64) bits from the start > for some non-negative integer m, b) insert some protective prefix for every > 128 bits in the key. > > Oh sorry, I didn't read the updated CEP, you want arbitrarily sized keys. I assumed you would hash any key larger than 64 bits to a 128 bit key (e.g. md5). For instance if you have a large(r) number of signatures, some of which are complex (greater than 64 bits, so hashed) and some of which are simple, then if you know the signature you need in advance, you can either follow the pointer to the 128 bit keys, or skip the pointer entirely and continue with the 64 bit keys. I suppose the common case is a few signatures, in which case a linear scan is likely faster in the 128 bit case (which is the uncommon case). >> would it make sense to make the first entry a pointer pointing to 128 >> bit keys, and the rest are all 64 bit keys (or even 32 bit keys and >> two pointers)? e.g. a contiguous list of [128 bit key/pointer >> list-pointer, 64-bit keys& ?func pointers, 128 bit keys& ?func >> pointers, NULL] > > > I don't really understand this description, but in general I'm sceptical > about the pipelining abilities of pointer-chasing code. It may be OK, but it > would require a benchmark, and if there's not a reason to have it... > > >> Even with a naive encoding scheme you could encode 3 scalar arguments >> and a return value in 32 bits (e.g. 'dddd'). That might be better on >> x86? > > > Me and Robert have been assuming some non-ASCII encoding that would allow > many more arguments in 64 bits. > Sure. The point was that the worst encoding scheme can already serve the common case. >> >>> ?- Only use 64 bit pointers, in order to keep table format the same on 32 >>> bit and 64 bit. >> >> >> Pointer to the function? I think that would only be harder to use than >> native pointers? > > > That was to make the multiple-of-128-bit-entry idea work without having to > require that keys are different between 32 bits and 64 bits platforms. Right, I got that now (reading the CEP is kind of mandatory :). Thanks. > Dag > > >>> ?- Do encoding of the signature strings. Utility functions to work with >>> this >>> (both to scan tables and encode/decode a format string) will be provided >>> as >>> C code by the CEP that can be bundled. >>> >>> Pros: >>> >>> ?- Table format is not specific to Python world (it makes as much sense >>> to >>> use, e.g., internally in Julia) >>> >>> ?- No state needs to be shared between packages run-time (they can use >>> the >>> bundled C code in isolation if they wish) >>> >>> ?- No need for an interning machinery >>> >>> ?- More easily compatible with multiple interpreter states (?) >>> >>> ?- Minor performance benefit of table over getfuncptr (intern vs. key >>> didn't >>> matter). [Cue comment that this doesn't matter.] >>> >>> Cons: >>> >>> ?- Lack of instant low-level debuggability, like in the interned case (a >>> human needs to run a function on the key constant to see what it >>> corresponds >>> to) >>> >>> ?- Not as extendable as getfuncptr (though currently we don't quite know >>> how >>> we would extend it, and it's easy to add getfuncptr in the future) >>> >>> Notes: >>> >>> ?- When extended to handle vtable argument types, these still needs to be >>> some interning or crypto-hashing. But that is likely to come up anyway as >>> part of a COM-like queryInterface protocol, and at that point we will be >>> better at making those decisions and design a good interning mechanism. >>> >>> Dag >>> >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From stefan_ml at behnel.de Sat May 5 21:50:51 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 05 May 2012 21:50:51 +0200 Subject: [Cython] Python array support (#113) In-Reply-To: References: Message-ID: <4FA5849B.5090004@behnel.de> > https://github.com/cython/cython/pull/113 This looks ok to me now. There have been objections back when we discussed the initial patch for array.array support, so what do you think about merging this in? Stefan From stefan_ml at behnel.de Sun May 6 07:42:18 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 06 May 2012 07:42:18 +0200 Subject: [Cython] Python array support (#113) In-Reply-To: <4FA5849B.5090004@behnel.de> References: <4FA5849B.5090004@behnel.de> Message-ID: <4FA60F3A.2020101@behnel.de> Stefan Behnel, 05.05.2012 21:50: >> https://github.com/cython/cython/pull/113 > > This looks ok to me now. There have been objections back when we discussed > the initial patch for array.array support, so what do you think about > merging this in? One think I'm not sure about is how to deal with the header file. It would be nice to not rely on an external dependency that users need to ship with their code. Moving this into our utility code to write it into the C file would remove that need, but we don't currently have a way to trigger utility code insertion from .pxd files explicitly. Should we special case "cimport cpython.array" for this? Oh, and maybe we should also provide a fused type for the supported array item types to make it easier for users to write generic array code? (Although the mass of types may be overkill for most users...) Stefan From robertwb at gmail.com Sun May 6 09:16:03 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Sun, 6 May 2012 00:16:03 -0700 Subject: [Cython] Python array support (#113) In-Reply-To: <4FA60F3A.2020101@behnel.de> References: <4FA5849B.5090004@behnel.de> <4FA60F3A.2020101@behnel.de> Message-ID: On Sat, May 5, 2012 at 10:42 PM, Stefan Behnel wrote: > Stefan Behnel, 05.05.2012 21:50: >>> ? https://github.com/cython/cython/pull/113 >> >> This looks ok to me now. There have been objections back when we discussed >> the initial patch for array.array support, so what do you think about >> merging this in? > > One think I'm not sure about is how to deal with the header file. It would > be nice to not rely on an external dependency that users need to ship with > their code. Moving this into our utility code to write it into the C file > would remove that need, but we don't currently have a way to trigger > utility code insertion from .pxd files explicitly. Should we special case > "cimport cpython.array" for this? That's my biggest concern as well (though I only quickly skimmed through the code). I would be OK with special casing this Python builtin. > Oh, and maybe we should also provide a fused type for the supported array > item types to make it easier for users to write generic array code? > (Although the mass of types may be overkill for most users...) I think it's easy enough for the end user to make a fused type of the specializations they care about. - Robert From markflorisson88 at gmail.com Sun May 6 10:56:17 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 6 May 2012 09:56:17 +0100 Subject: [Cython] Python array support (#113) In-Reply-To: <4FA5849B.5090004@behnel.de> References: <4FA5849B.5090004@behnel.de> Message-ID: On 5 May 2012 20:50, Stefan Behnel wrote: >> ? https://github.com/cython/cython/pull/113 > > This looks ok to me now. There have been objections back when we discussed > the initial patch for array.array support, so what do you think about > merging this in? > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Great, I think it can be quite useful for some people. I think only some documentation is missing. Also, very minor complaint, the way it allocates shape, strides and the format string in __getbuffer__ is weird and complicated for no good reason. I think it's better to declare two Py_ssize_t scalars or one-sized arrays as class attributes and one char[2] array, and use those (then you can also get rid of __releasebuffer__). From stefan_ml at behnel.de Sun May 6 11:05:51 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 06 May 2012 11:05:51 +0200 Subject: [Cython] Python array support (#113) In-Reply-To: References: <4FA5849B.5090004@behnel.de> Message-ID: <4FA63EEF.5030604@behnel.de> mark florisson, 06.05.2012 10:56: > On 5 May 2012 20:50, Stefan Behnel wrote: >>> https://github.com/cython/cython/pull/113 >> >> This looks ok to me now. There have been objections back when we discussed >> the initial patch for array.array support, so what do you think about >> merging this in? > > Great, I think it can be quite useful for some people. I think only > some documentation is missing. > > Also, very minor complaint, the way it allocates shape, strides and > the format string in __getbuffer__ is weird and complicated for no > good reason. Maybe the reason is just that it wasn't written by you. ;) I take it that it's best to merge this pull request then, and to further fix it up afterwards. > I think it's better to declare two Py_ssize_t scalars or > one-sized arrays as class attributes and one char[2] array, and use > those (then you can also get rid of __releasebuffer__). Yes, that would be good. Note that itemsize is specifically designed so that it can be pointed to by strides for 1D arrays, and I guess shape can similarly just point to ob_size. Stefan From markflorisson88 at gmail.com Sun May 6 11:05:58 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 6 May 2012 10:05:58 +0100 Subject: [Cython] Python array support (#113) In-Reply-To: References: <4FA5849B.5090004@behnel.de> Message-ID: On 6 May 2012 09:56, mark florisson wrote: > On 5 May 2012 20:50, Stefan Behnel wrote: >>> ? https://github.com/cython/cython/pull/113 >> >> This looks ok to me now. There have been objections back when we discussed >> the initial patch for array.array support, so what do you think about >> merging this in? >> >> Stefan >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > Great, I think it can be quite useful for some people. I think only > some documentation is missing. > > Also, very minor complaint, the way it allocates shape, strides and > the format string in __getbuffer__ is weird and complicated for no > good reason. I think it's better to declare two Py_ssize_t scalars or > one-sized arrays as class attributes and one char[2] array, and use > those (then you can also get rid of __releasebuffer__). Or hm, that might be a problem with their variable nature? Then the shape of the buffer would suddenly change... Maybe malloc is better in __getbuffer__ then. From markflorisson88 at gmail.com Sun May 6 11:07:14 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 6 May 2012 10:07:14 +0100 Subject: [Cython] Python array support (#113) In-Reply-To: <4FA63EEF.5030604@behnel.de> References: <4FA5849B.5090004@behnel.de> <4FA63EEF.5030604@behnel.de> Message-ID: On 6 May 2012 10:05, Stefan Behnel wrote: > mark florisson, 06.05.2012 10:56: >> On 5 May 2012 20:50, Stefan Behnel wrote: >>>> ? https://github.com/cython/cython/pull/113 >>> >>> This looks ok to me now. There have been objections back when we discussed >>> the initial patch for array.array support, so what do you think about >>> merging this in? >> >> Great, I think it can be quite useful for some people. I think only >> some documentation is missing. >> >> Also, very minor complaint, the way it allocates shape, strides and >> the format string in __getbuffer__ is weird and complicated for no >> good reason. > > Maybe the reason is just that it wasn't written by you. ;) > > I take it that it's best to merge this pull request then, and to further > fix it up afterwards. > Definitely, +1. >> I think it's better to declare two Py_ssize_t scalars or >> one-sized arrays as class attributes and one char[2] array, and use >> those (then you can also get rid of __releasebuffer__). > > Yes, that would be good. Note that itemsize is specifically designed so > that it can be pointed to by strides for 1D arrays, and I guess shape can > similarly just point to ob_size. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From stefan_ml at behnel.de Sun May 6 11:32:24 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 06 May 2012 11:32:24 +0200 Subject: [Cython] Python array support (#113) In-Reply-To: References: <4FA5849B.5090004@behnel.de>

Message-ID: <4FA64528.5010305@behnel.de> mark florisson, 06.05.2012 11:05: > On 6 May 2012 09:56, mark florisson wrote: >> On 5 May 2012 20:50, Stefan Behnel wrote: >>>> https://github.com/cython/cython/pull/113 >>> >>> This looks ok to me now. There have been objections back when we discussed >>> the initial patch for array.array support, so what do you think about >>> merging this in? >> >> Great, I think it can be quite useful for some people. I think only >> some documentation is missing. >> >> Also, very minor complaint, the way it allocates shape, strides and >> the format string in __getbuffer__ is weird and complicated for no >> good reason. I think it's better to declare two Py_ssize_t scalars or >> one-sized arrays as class attributes and one char[2] array, and use >> those (then you can also get rid of __releasebuffer__). > > Or hm, that might be a problem with their variable nature? Then the > shape of the buffer would suddenly change... I'm fine with saying that any user who changes the size of an array while a buffer view on it is being held (and used) is just plain out of warranty. After all, a realloc() is allowed to move the memory buffer around and may really do it in some cases, so the length is the least of our problems, even if the array doesn't shrink but only grows. I just noticed that the array module supports the buffer interface natively in Python 3. That makes this whole patch somewhat less interesting, because it's essentially just a work-around for a missing feature in Py2. Py3 does the setup like this: view->buf = (void *)self->ob_item; view->obj = (PyObject*)self; Py_INCREF(self); if (view->buf == NULL) view->buf = (void *)emptybuf; view->len = (Py_SIZE(self)) * self->ob_descr->itemsize; view->readonly = 0; view->ndim = 1; view->itemsize = self->ob_descr->itemsize; view->suboffsets = NULL; view->shape = NULL; if ((flags & PyBUF_ND)==PyBUF_ND) { view->shape = &((Py_SIZE(self))); } view->strides = NULL; if ((flags & PyBUF_STRIDES)==PyBUF_STRIDES) view->strides = &(view->itemsize); view->format = NULL; view->internal = NULL; if ((flags & PyBUF_FORMAT) == PyBUF_FORMAT) view->format = self->ob_descr->formats; It also counts the number of exports and prevents resizing while a buffer is being exported. The current .pxd implementation cannot achieve that, which is really unfortunate. ISTM that it should fall through to the native buffer interface if the underlying array supports it. Stefan From markflorisson88 at gmail.com Sun May 6 16:28:43 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 6 May 2012 15:28:43 +0100 Subject: [Cython] 0.17 Message-ID: Hey, I think we already have quite a bit of functionality (nearly) ready, after merging some pending pull requests maybe it will be a good time for a 0.17 release? I think it would be good to also document to what extent pypy support works, what works and what doesn't. Stefan, since you added a large majority of the features, would you want to be the release manager? In summary, the following pull requests should likely go in - array.array support (unless further discussion prevents that) - fused types runtime buffer dispatch - newaxis - more? The memoryview documentation should also be reworked a bit. Matthew, are you still willing to have a go at that? Otherwise I can clean up the mess first, some things are no longer true and simply outdated, and then have a second opinion. Mark From d.s.seljebotn at astro.uio.no Sun May 6 19:51:56 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sun, 06 May 2012 19:51:56 +0200 Subject: [Cython] 0.17 In-Reply-To: References: Message-ID: <4FA6BA3C.3030001@astro.uio.no> On 05/06/2012 04:28 PM, mark florisson wrote: > Hey, > > I think we already have quite a bit of functionality (nearly) ready, > after merging some pending pull requests maybe it will be a good time > for a 0.17 release? I think it would be good to also document to what > extent pypy support works, what works and what doesn't. Stefan, since > you added a large majority of the features, would you want to be the > release manager? > > In summary, the following pull requests should likely go in > - array.array support (unless further discussion prevents that) > - fused types runtime buffer dispatch > - newaxis > - more? Sounds more like a 0.16.1? (Did we have any rules for that -- except the obvious one that breaking backwards compatibility in noticeable ways has to increment the major?) Dag > > The memoryview documentation should also be reworked a bit. Matthew, > are you still willing to have a go at that? Otherwise I can clean up > the mess first, some things are no longer true and simply outdated, > and then have a second opinion. > > Mark > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From stefan_ml at behnel.de Sun May 6 20:22:51 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 06 May 2012 20:22:51 +0200 Subject: [Cython] 0.17 In-Reply-To: <4FA6BA3C.3030001@astro.uio.no> References: <4FA6BA3C.3030001@astro.uio.no> Message-ID: <4FA6C17B.1080303@behnel.de> Dag Sverre Seljebotn, 06.05.2012 19:51: > On 05/06/2012 04:28 PM, mark florisson wrote: >> I think we already have quite a bit of functionality (nearly) ready, >> after merging some pending pull requests maybe it will be a good time >> for a 0.17 release? I think it would be good to also document to what >> extent pypy support works, what works and what doesn't. Stefan, since >> you added a large majority of the features, would you want to be the >> release manager? >> >> In summary, the following pull requests should likely go in >> - array.array support (unless further discussion prevents that) >> - fused types runtime buffer dispatch >> - newaxis >> - more? > > > Sounds more like a 0.16.1? (Did we have any rules for that -- except the > obvious one that breaking backwards compatibility in noticeable ways has to > increment the major?) Those are only the pending pull requests, the current feature set in the master branch is way larger than that. I'll start writing up the release notes soon. Stefan From vitja.makarov at gmail.com Sun May 6 20:29:30 2012 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Sun, 6 May 2012 22:29:30 +0400 Subject: [Cython] 0.17 In-Reply-To: <4FA6BA3C.3030001@astro.uio.no> References: <4FA6BA3C.3030001@astro.uio.no> Message-ID: 2012/5/6 Dag Sverre Seljebotn : > On 05/06/2012 04:28 PM, mark florisson wrote: >> >> Hey, >> >> I think we already have quite a bit of functionality (nearly) ready, >> after merging some pending pull requests maybe it will be a good time >> for a 0.17 release? I think it would be good to also document to what >> extent pypy support works, what works and what doesn't. Stefan, since >> you added a large majority of the features, would you want to be the >> release manager? >> >> In summary, the following pull requests should likely go in >> ? ? - array.array support (unless further discussion prevents that) >> ? ? - fused types runtime buffer dispatch >> ? ? - newaxis >> ? ? - more? > > > > Sounds more like a 0.16.1? (Did we have any rules for that -- except the > obvious one that breaking backwards compatibility in noticeable ways has to > increment the major?) > > Dag > > >> >> The memoryview documentation should also be reworked a bit. Matthew, >> are you still willing to have a go at that? Otherwise I can clean up >> the mess first, some things are no longer true and simply outdated, >> and then have a second opinion. >> +1, I think that 0.16 has some bugs and we should make a bugfix release. -- vitja. From stefan_ml at behnel.de Sun May 6 20:38:15 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 06 May 2012 20:38:15 +0200 Subject: [Cython] 0.17 In-Reply-To: References: Message-ID: <4FA6C517.8080208@behnel.de> mark florisson, 06.05.2012 16:28: > I think we already have quite a bit of functionality (nearly) ready, > after merging some pending pull requests maybe it will be a good time > for a 0.17 release? I think it would be good to also document to what > extent pypy support works, what works and what doesn't. Sure, although it's basically just "what works, works". However, there are certainly things that users must know in order to make their own code work. > Stefan, since > you added a large majority of the features, would you want to be the > release manager? I agree that it would make sense, but I'll be head under water for the next month or so. Maybe it would be better to put out a 0.16.1 with only selected fixes in the meantime? Stefan From markflorisson88 at gmail.com Sun May 6 20:41:04 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 6 May 2012 19:41:04 +0100 Subject: [Cython] 0.17 In-Reply-To: References: <4FA6BA3C.3030001@astro.uio.no> Message-ID: On 6 May 2012 19:29, Vitja Makarov wrote: > 2012/5/6 Dag Sverre Seljebotn : >> On 05/06/2012 04:28 PM, mark florisson wrote: >>> >>> Hey, >>> >>> I think we already have quite a bit of functionality (nearly) ready, >>> after merging some pending pull requests maybe it will be a good time >>> for a 0.17 release? I think it would be good to also document to what >>> extent pypy support works, what works and what doesn't. Stefan, since >>> you added a large majority of the features, would you want to be the >>> release manager? >>> >>> In summary, the following pull requests should likely go in >>> ? ? - array.array support (unless further discussion prevents that) >>> ? ? - fused types runtime buffer dispatch >>> ? ? - newaxis >>> ? ? - more? >> >> >> >> Sounds more like a 0.16.1? (Did we have any rules for that -- except the >> obvious one that breaking backwards compatibility in noticeable ways has to >> increment the major?) >> >> Dag >> >> >>> >>> The memoryview documentation should also be reworked a bit. Matthew, >>> are you still willing to have a go at that? Otherwise I can clean up >>> the mess first, some things are no longer true and simply outdated, >>> and then have a second opinion. >>> > > +1, I think that 0.16 has some bugs and we should make a bugfix release. > > -- > vitja. > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Stefan bumped the version number to 0.17pre a while back, we had a discussion about this before that. I think the features are large enough to warrant a major release. If we do want a bugfix release, we'll probably have to cherrypick the fixes over, that would be fine as well. From markflorisson88 at gmail.com Sun May 6 20:41:43 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 6 May 2012 19:41:43 +0100 Subject: [Cython] 0.17 In-Reply-To: <4FA6C517.8080208@behnel.de> References: <4FA6C517.8080208@behnel.de> Message-ID: On 6 May 2012 19:38, Stefan Behnel wrote: > mark florisson, 06.05.2012 16:28: >> I think we already have quite a bit of functionality (nearly) ready, >> after merging some pending pull requests maybe it will be a good time >> for a 0.17 release? I think it would be good to also document to what >> extent pypy support works, what works and what doesn't. > > Sure, although it's basically just "what works, works". However, there are > certainly things that users must know in order to make their own code work. > > >> Stefan, since >> you added a large majority of the features, would you want to be the >> release manager? > > I agree that it would make sense, but I'll be head under water for the next > month or so. > > Maybe it would be better to put out a 0.16.1 with only selected fixes in > the meantime? Ok, if no one else wants to take it up (please do speak up if you do), I could give it another go. > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From matthew.brett at gmail.com Sun May 6 21:41:41 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 6 May 2012 12:41:41 -0700 Subject: [Cython] 0.17 In-Reply-To: References: Message-ID: Hi, On Sun, May 6, 2012 at 7:28 AM, mark florisson wrote: > Hey, > > I think we already have quite a bit of functionality (nearly) ready, > after merging some pending pull requests maybe it will be a good time > for a 0.17 release? I think it would be good to also document to what > extent pypy support works, what works and what doesn't. Stefan, since > you added a large majority of the features, would you want to be the > release manager? > > In summary, the following pull requests should likely go in > ? ?- array.array support (unless further discussion prevents that) > ? ?- fused types runtime buffer dispatch > ? ?- newaxis > ? ?- more? > > The memoryview documentation should also be reworked a bit. Matthew, > are you still willing to have a go at that? Otherwise I can clean up > the mess first, some things are no longer true and simply outdated, > and then have a second opinion. Yes, sorry, I have been taken up by releasing my own project. What's the deadline do you think? I have another big release to do for the end of next week, but I might be able to carve out some time, See you, Matthew From markflorisson88 at gmail.com Sun May 6 23:24:17 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 6 May 2012 22:24:17 +0100 Subject: [Cython] 0.17 In-Reply-To: References: Message-ID: On 6 May 2012 20:41, Matthew Brett wrote: > Hi, > > On Sun, May 6, 2012 at 7:28 AM, mark florisson > wrote: >> Hey, >> >> I think we already have quite a bit of functionality (nearly) ready, >> after merging some pending pull requests maybe it will be a good time >> for a 0.17 release? I think it would be good to also document to what >> extent pypy support works, what works and what doesn't. Stefan, since >> you added a large majority of the features, would you want to be the >> release manager? >> >> In summary, the following pull requests should likely go in >> ? ?- array.array support (unless further discussion prevents that) >> ? ?- fused types runtime buffer dispatch >> ? ?- newaxis >> ? ?- more? >> >> The memoryview documentation should also be reworked a bit. Matthew, >> are you still willing to have a go at that? Otherwise I can clean up >> the mess first, some things are no longer true and simply outdated, >> and then have a second opinion. > > Yes, sorry, I have been taken up by releasing my own project. What's > the deadline do you think? ?I have another big release to do for the > end of next week, but I might be able to carve out some time, > > See you, > > Matthew Great, I'd say we're probably not going to release anything within the next two weeks, so take your time, there is no hurry really :). From d.s.seljebotn at astro.uio.no Mon May 7 12:40:50 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 07 May 2012 12:40:50 +0200 Subject: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function In-Reply-To: <4FA7A618.4000503@astro.uio.no> References: <4FA7A618.4000503@astro.uio.no> Message-ID: <4FA7A6B2.5000801@astro.uio.no> [moving to dev list] On 05/07/2012 11:17 AM, Stefan Behnel wrote: > Dag Sverre Seljebotn, 07.05.2012 10:44: >> On 05/07/2012 07:48 AM, Stefan Behnel wrote: >>> shaunc, 07.05.2012 07:13: >>>> The following code: >>>> >>>> cdef int foo( double[:] bar ) nogil: >>>> return bar is None >>>> >>>> causes: "Converting to Python object not allowed without gil" >>>> >>>> However, I was under the impression that: "When comparing a value with >>>> None, >>>> keep in mind that, if x is a Python object, x is None and x is not None are >>>> very efficient because they translate directly to C pointer comparisons," >>>> >>>> I guess the problem is that the memoryview is not a python object -- >>>> indeed, this compiles in the form: >>>> >>>> cdef int foo( object bar ) nogil: >>>> >>>> return bar is None >>>> >>>> But this is a bit counterintuitive... do I need to do "with gil" to check >>>> if a memoryview is None? And in a nogil function, I'm not necessarily >>>> guaranteed that I don't have the gil -- what is the best way ensure I have >>>> the gil? (Is there a "secret system call" or should I use a try block?) >>>> >>>> It would seem more appropriate (IMHO, of course :)) to allow "bar is None" >>>> also when bar is a memoryview.... >>> >>> I wonder why a memory view should be allowed to be None in the first place. >>> Buffer arguments aren't (because they get unpacked on entry), so why should >>> memory views? >> >> ? At least when I implemented it, buffers get unpacked but the case of a >> None buffer is treated specially, and you're fully allowed (and segfault if >> you [] it). > > Hmm, ok, maybe I just got confused by the code then. > > I think the docs should state that buffer arguments are best used together > with the "not None" declaration then. I use them with "=None" default values all the time... then do a None-check manually. It's really no different from cdef classes. > And I remember that we wanted to change the default settings for extension > type arguments from "or None" to "not None" years ago but never actually > did it. I remember that there was such a debate, but I certainly don't remember that this was the conclusion :-) I didn't agree with that view then and I don't now. I don't remember what Robert's view was... As far as I can remember (which might be biased towards my personal view), the conclusion was that we left the current semantics in place, relying on better control flow analysis to make None-checks cheaper, and when those are cheap enough, make the nonecheck directive default to True (Java is sort of prior art that this can indeed be done?). Dag From stefan_ml at behnel.de Mon May 7 13:10:56 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 07 May 2012 13:10:56 +0200 Subject: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function In-Reply-To: <4FA7A6B2.5000801@astro.uio.no> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> Message-ID: <4FA7ADC0.40501@behnel.de> Dag Sverre Seljebotn, 07.05.2012 12:40: > moving to dev list Makes sense. > On 05/07/2012 11:17 AM, Stefan Behnel wrote: >> Dag Sverre Seljebotn, 07.05.2012 10:44: >>> On 05/07/2012 07:48 AM, Stefan Behnel wrote: >>>> I wonder why a memory view should be allowed to be None in the first >>>> place. >>>> Buffer arguments aren't (because they get unpacked on entry), so why >>>> should memory views? >>> >>> ? At least when I implemented it, buffers get unpacked but the case of a >>> None buffer is treated specially, and you're fully allowed (and segfault if >>> you [] it). >> >> Hmm, ok, maybe I just got confused by the code then. >> >> I think the docs should state that buffer arguments are best used together >> with the "not None" declaration then. ... which made me realise that that wasn't even supported. I can't believe no-one ever reported that as a bug... https://github.com/cython/cython/commit/f2de49fd0ac82a02a070b931bf4d2dab47135d0b It's still not supported for memory views. BTW, is there a reason why we shouldn't allow a "not None" declaration for cdef functions? Obviously, the caller would have to do the check in that case. Hmm, maybe it's not that important, because None checks are best done at entry points from user code, which usually means Python code. It seems like "not None" is not supported on cpdef functions, though. > I use them with "=None" default values all the time... then do a > None-check manually. Interesting. Could you given an example? What's the advantage over letting Cython raise an error for you? And, since you are using it as a default argument, why would someone want to call your code entirely without a buffer argument? > It's really no different from cdef classes. I find it at least a bit more surprising because a buffer unpacking argument is a rather strong hint that you expect something that supports this protocol. The fact that you type your function argument with it hints at the intention to properly unpack it on entry. I'm sure there are lots of users who were or will be surprised when they realise that that doesn't exclude None values. >> And I remember that we wanted to change the default settings for extension >> type arguments from "or None" to "not None" years ago but never actually >> did it. > > I remember that there was such a debate, but I certainly don't remember > that this was the conclusion :-) Maybe not, yes. > I didn't agree with that view then and > I don't now. I don't remember what Robert's view was... > > As far as I can remember (which might be biased towards my personal > view), the conclusion was that we left the current semantics in place, > relying on better control flow analysis to make None-checks cheaper, and > when those are cheap enough, make the nonecheck directive default to > True At least for buffer arguments, it silently corrupts data or segfaults in the current state of affairs, as you pointed out. Not exactly ideal. That's another reason why I see a difference between the behaviour of extension types and that of buffer arguments. Buffer indexing is also way more performance critical than the average method call or attribute access on a cdef class. > (Java is sort of prior art that this can indeed be done?). Java was designed to have a JIT compiler underneath which handles external parameters, and its compilers are way smarter than Cython. I agree that there is still a lot we can do based on better static analysis, but there will always be limits. Stefan From d.s.seljebotn at astro.uio.no Mon May 7 13:48:18 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 07 May 2012 13:48:18 +0200 Subject: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function In-Reply-To: <4FA7ADC0.40501@behnel.de> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> Message-ID: <4FA7B682.5050300@astro.uio.no> On 05/07/2012 01:10 PM, Stefan Behnel wrote: > Dag Sverre Seljebotn, 07.05.2012 12:40: >> moving to dev list > > Makes sense. > >> On 05/07/2012 11:17 AM, Stefan Behnel wrote: >>> Dag Sverre Seljebotn, 07.05.2012 10:44: >>>> On 05/07/2012 07:48 AM, Stefan Behnel wrote: >>>>> I wonder why a memory view should be allowed to be None in the first >>>>> place. >>>>> Buffer arguments aren't (because they get unpacked on entry), so why >>>>> should memory views? >>>> >>>> ? At least when I implemented it, buffers get unpacked but the case of a >>>> None buffer is treated specially, and you're fully allowed (and segfault if >>>> you [] it). >>> >>> Hmm, ok, maybe I just got confused by the code then. >>> >>> I think the docs should state that buffer arguments are best used together >>> with the "not None" declaration then. > > ... which made me realise that that wasn't even supported. I can't believe > no-one ever reported that as a bug... > > https://github.com/cython/cython/commit/f2de49fd0ac82a02a070b931bf4d2dab47135d0b > > It's still not supported for memory views. > > BTW, is there a reason why we shouldn't allow a "not None" declaration for > cdef functions? Obviously, the caller would have to do the check in that > case. Hmm, maybe it's not that important, because None checks are best done > at entry points from user code, which usually means Python code. It seems > like "not None" is not supported on cpdef functions, though. > > >> I use them with "=None" default values all the time... then do a >> None-check manually. > > Interesting. Could you given an example? What's the advantage over letting > Cython raise an error for you? And, since you are using it as a default > argument, why would someone want to call your code entirely without a > buffer argument? Here you go: def foo(np.ndarray[double] a, np.ndarray[double] out=None): if out is None: out = np.empty_like(a) # compute result in out return out The pattern of handing in the memory area to write to is one of the fundamental basics of numerical computing; you often just can't implement an algorithm if the called function returns the result in a newly-allocated array. I can explain why that is in detail, but I'd rather you just trusted the testimony of somebody doing numerical computation... It's just a convenience, but often (in particular when testing) it's incredibly convenient to not have to bother with allocating the output array. Another pattern is: def do_something(np.ndarray[double] a, np.ndarray[double] sin_of_a=None): ... so if your caller happened to already have computed something, the function uses it, but OTOH the "something" is a function of the inputs and can be computed on the fly. AND, sometimes it can be computed on the fly in ways more efficient than what the caller could have done, because of memory bus issues etc. etc. Both of these can be "fixed" by a) not allowing the convenient shorthand, or b) declare the argument "object" first and then type it after the "preamble". So the REAL reason I'm arguing this case is consistency with cdef classes. > > >> It's really no different from cdef classes. > > I find it at least a bit more surprising because a buffer unpacking > argument is a rather strong hint that you expect something that supports > this protocol. The fact that you type your function argument with it hints > at the intention to properly unpack it on entry. I'm sure there are lots of > users who were or will be surprised when they realise that that doesn't > exclude None values. Whereas I think there would be more users surprised by the opposite. So there -- we won't know who's right without actually finding some users. And chances are we are both right, since users are different from one another. > > >>> And I remember that we wanted to change the default settings for extension >>> type arguments from "or None" to "not None" years ago but never actually >>> did it. >> >> I remember that there was such a debate, but I certainly don't remember >> that this was the conclusion :-) > > Maybe not, yes. > > >> I didn't agree with that view then and >> I don't now. I don't remember what Robert's view was... >> >> As far as I can remember (which might be biased towards my personal >> view), the conclusion was that we left the current semantics in place, >> relying on better control flow analysis to make None-checks cheaper, and >> when those are cheap enough, make the nonecheck directive default to >> True > > At least for buffer arguments, it silently corrupts data or segfaults in > the current state of affairs, as you pointed out. Not exactly ideal. No different than writing to a field in a cdef class... > > That's another reason why I see a difference between the behaviour of > extension types and that of buffer arguments. Buffer indexing is also way > more performance critical than the average method call or attribute access > on a cdef class. Perhaps, but that's a bit hand-wavy to turn into a principle of language design? "This is performance critical, so therefore we suddenly invert the normal rule"? I just think we should be consistent, not have more special rules for buffers than we need to. The intention all the time was that "np.ndarray[double]" is just a glorified "np.ndarray". People expect it to behave like an optimized "np.ndarray". If "np.ndarray" can be None, why can't "np.ndarray[double]"? BTW, with the coming of memoryviews, me and Mark talked about just deprecating the "mytype[...]" meaning buffers, and rather treat it as np.ndarray, array.array etc. being some sort of "template types". That is, we disallow "object[int]" and require some special declarations in the relevant pxd files. >> (Java is sort of prior art that this can indeed be done?). > > Java was designed to have a JIT compiler underneath which handles external > parameters, and its compilers are way smarter than Cython. I agree that > there is still a lot we can do based on better static analysis, but there > will always be limits. Any static analysis will be able to get you to the point of "not None" if the user has a manual test. And the Python way is often to just spell things out rather than brevity; I think an explicit if-test is much more newbie friendly than "not None", "or None", etc. Performance beyond that is rather theoretical for the moment. I agree that for memoryviews that can be passed in acquired-state to cdef functions there is the question of eliminating an extra branch or so, but that is still far-fetched, and I'd rather Mark raise the issue if it comes an issue than the two of us bikeshedding over it. I'll try to make this my last post to this thread, I feel we're slipping into Dag-and-Stefan-endless-thread territory... Dag From d.s.seljebotn at astro.uio.no Mon May 7 13:51:00 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 07 May 2012 13:51:00 +0200 Subject: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function In-Reply-To: <4FA7B682.5050300@astro.uio.no> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> Message-ID: <4FA7B724.5050008@astro.uio.no> On 05/07/2012 01:48 PM, Dag Sverre Seljebotn wrote: > On 05/07/2012 01:10 PM, Stefan Behnel wrote: >> Dag Sverre Seljebotn, 07.05.2012 12:40: >>> moving to dev list >> >> Makes sense. >> >>> On 05/07/2012 11:17 AM, Stefan Behnel wrote: >>>> Dag Sverre Seljebotn, 07.05.2012 10:44: >>>>> On 05/07/2012 07:48 AM, Stefan Behnel wrote: >>>>>> I wonder why a memory view should be allowed to be None in the first >>>>>> place. >>>>>> Buffer arguments aren't (because they get unpacked on entry), so why >>>>>> should memory views? >>>>> >>>>> ? At least when I implemented it, buffers get unpacked but the case >>>>> of a >>>>> None buffer is treated specially, and you're fully allowed (and >>>>> segfault if >>>>> you [] it). >>>> >>>> Hmm, ok, maybe I just got confused by the code then. >>>> >>>> I think the docs should state that buffer arguments are best used >>>> together >>>> with the "not None" declaration then. >> >> ... which made me realise that that wasn't even supported. I can't >> believe >> no-one ever reported that as a bug... >> >> https://github.com/cython/cython/commit/f2de49fd0ac82a02a070b931bf4d2dab47135d0b >> >> >> It's still not supported for memory views. >> >> BTW, is there a reason why we shouldn't allow a "not None" declaration >> for >> cdef functions? Obviously, the caller would have to do the check in that >> case. Hmm, maybe it's not that important, because None checks are best >> done >> at entry points from user code, which usually means Python code. It seems >> like "not None" is not supported on cpdef functions, though. >> >> >>> I use them with "=None" default values all the time... then do a >>> None-check manually. >> >> Interesting. Could you given an example? What's the advantage over >> letting >> Cython raise an error for you? And, since you are using it as a default >> argument, why would someone want to call your code entirely without a >> buffer argument? > > Here you go: > > def foo(np.ndarray[double] a, np.ndarray[double] out=None): > if out is None: > out = np.empty_like(a) > # compute result in out > return out > > The pattern of handing in the memory area to write to is one of the > fundamental basics of numerical computing; you often just can't > implement an algorithm if the called function returns the result in a > newly-allocated array. I can explain why that is in detail, but I'd > rather you just trusted the testimony of somebody doing numerical > computation... > > It's just a convenience, but often (in particular when testing) it's > incredibly convenient to not have to bother with allocating the output > array. > > Another pattern is: > > def do_something(np.ndarray[double] a, > np.ndarray[double] sin_of_a=None): > ... > > so if your caller happened to already have computed something, the > function uses it, but OTOH the "something" is a function of the inputs > and can be computed on the fly. AND, sometimes it can be computed on the > fly in ways more efficient than what the caller could have done, because > of memory bus issues etc. etc. > > Both of these can be "fixed" by a) not allowing the convenient > shorthand, or b) declare the argument "object" first and then type it > after the "preamble". > > So the REAL reason I'm arguing this case is consistency with cdef classes. > > > >> >> >>> It's really no different from cdef classes. >> >> I find it at least a bit more surprising because a buffer unpacking >> argument is a rather strong hint that you expect something that supports >> this protocol. The fact that you type your function argument with it >> hints >> at the intention to properly unpack it on entry. I'm sure there are >> lots of >> users who were or will be surprised when they realise that that doesn't >> exclude None values. > > Whereas I think there would be more users surprised by the opposite. > > So there -- we won't know who's right without actually finding some > users. And chances are we are both right, since users are different from > one another. > >> >> >>>> And I remember that we wanted to change the default settings for >>>> extension >>>> type arguments from "or None" to "not None" years ago but never >>>> actually >>>> did it. >>> >>> I remember that there was such a debate, but I certainly don't remember >>> that this was the conclusion :-) >> >> Maybe not, yes. >> >> >>> I didn't agree with that view then and >>> I don't now. I don't remember what Robert's view was... >>> >>> As far as I can remember (which might be biased towards my personal >>> view), the conclusion was that we left the current semantics in place, >>> relying on better control flow analysis to make None-checks cheaper, and >>> when those are cheap enough, make the nonecheck directive default to >>> True >> >> At least for buffer arguments, it silently corrupts data or segfaults in >> the current state of affairs, as you pointed out. Not exactly ideal. > > No different than writing to a field in a cdef class... Also, I believe that in the strided case, the strides are all set to 0, and the data-pointer is NULL, so you will never corrupt data, you will always try to access *NULL and segfault. Though If you put mode='c' and a very high index you'll corrupt data. Dag > >> >> That's another reason why I see a difference between the behaviour of >> extension types and that of buffer arguments. Buffer indexing is also way >> more performance critical than the average method call or attribute >> access >> on a cdef class. > > Perhaps, but that's a bit hand-wavy to turn into a principle of language > design? "This is performance critical, so therefore we suddenly invert > the normal rule"? > > I just think we should be consistent, not have more special rules for > buffers than we need to. > > The intention all the time was that "np.ndarray[double]" is just a > glorified "np.ndarray". People expect it to behave like an optimized > "np.ndarray". If "np.ndarray" can be None, why can't "np.ndarray[double]"? > > BTW, with the coming of memoryviews, me and Mark talked about just > deprecating the "mytype[...]" meaning buffers, and rather treat it as > np.ndarray, array.array etc. being some sort of "template types". That > is, we disallow "object[int]" and require some special declarations in > the relevant pxd files. > >>> (Java is sort of prior art that this can indeed be done?). >> >> Java was designed to have a JIT compiler underneath which handles >> external >> parameters, and its compilers are way smarter than Cython. I agree that >> there is still a lot we can do based on better static analysis, but there >> will always be limits. > > Any static analysis will be able to get you to the point of "not None" > if the user has a manual test. And the Python way is often to just spell > things out rather than brevity; I think an explicit if-test is much more > newbie friendly than "not None", "or None", etc. > > Performance beyond that is rather theoretical for the moment. > > I agree that for memoryviews that can be passed in acquired-state to > cdef functions there is the question of eliminating an extra branch or > so, but that is still far-fetched, and I'd rather Mark raise the issue > if it comes an issue than the two of us bikeshedding over it. > > I'll try to make this my last post to this thread, I feel we're slipping > into Dag-and-Stefan-endless-thread territory... > > Dag From stefan_ml at behnel.de Mon May 7 15:04:18 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 07 May 2012 15:04:18 +0200 Subject: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function In-Reply-To: <4FA7B682.5050300@astro.uio.no> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> Message-ID: <4FA7C852.9020004@behnel.de> Dag Sverre Seljebotn, 07.05.2012 13:48: > On 05/07/2012 01:10 PM, Stefan Behnel wrote: >> Dag Sverre Seljebotn, 07.05.2012 12:40: >>> On 05/07/2012 11:17 AM, Stefan Behnel wrote: >>>> Dag Sverre Seljebotn, 07.05.2012 10:44: >>>>> On 05/07/2012 07:48 AM, Stefan Behnel wrote: >>>>>> I wonder why a memory view should be allowed to be None in the first >>>>>> place. >>>>>> Buffer arguments aren't (because they get unpacked on entry), so why >>>>>> should memory views? >>>>> >>>>> ? At least when I implemented it, buffers get unpacked but the case of a >>>>> None buffer is treated specially, and you're fully allowed (and >>>>> segfault if you [] it). >>>> >>>> Hmm, ok, maybe I just got confused by the code then. >>>> >>>> I think the docs should state that buffer arguments are best used together >>>> with the "not None" declaration then. >>> >>> I use them with "=None" default values all the time... then do a >>> None-check manually. >> >> Interesting. Could you given an example? What's the advantage over letting >> Cython raise an error for you? And, since you are using it as a default >> argument, why would someone want to call your code entirely without a >> buffer argument? > > Here you go: > > def foo(np.ndarray[double] a, np.ndarray[double] out=None): > if out is None: > out = np.empty_like(a) Ah, right - output arguments. Hadn't thought of those. Still, since you pass None explicitly as a default argument, this code wouldn't be impacted by disallowing None for buffers by default. That case is already handled specially in the compiler. But a better default would prevent the *first* argument from being None. So, basically, it would do the right thing straight away in your case and generate safer and more efficient code for it, whereas now you have to test 'a' for being None explicitly and Cython won't understand that hint due to insufficient static analysis. At least, since my last commit you can make Cython do the same thing by declaring it "not None". >>> It's really no different from cdef classes. >> >> I find it at least a bit more surprising because a buffer unpacking >> argument is a rather strong hint that you expect something that supports >> this protocol. The fact that you type your function argument with it hints >> at the intention to properly unpack it on entry. I'm sure there are lots of >> users who were or will be surprised when they realise that that doesn't >> exclude None values. > > Whereas I think there would be more users surprised by the opposite. We've had enough complaints from users about None being allowed for typed arguments already to consider it at least a gotcha of the language. The main reason we didn't change this behaviour back then was that it would clearly break user code and we thought we could do without that. That's different from considering it "right" and "good". >>>> And I remember that we wanted to change the default settings for extension >>>> type arguments from "or None" to "not None" years ago but never actually >>>> did it. >>> >>> I remember that there was such a debate, but I certainly don't remember >>> that this was the conclusion :-) >> >> Maybe not, yes. >> >> >>> I didn't agree with that view then and >>> I don't now. I don't remember what Robert's view was... >>> >>> As far as I can remember (which might be biased towards my personal >>> view), the conclusion was that we left the current semantics in place, >>> relying on better control flow analysis to make None-checks cheaper, and >>> when those are cheap enough, make the nonecheck directive default to >>> True >> >> At least for buffer arguments, it silently corrupts data or segfaults in >> the current state of affairs, as you pointed out. Not exactly ideal. > > No different than writing to a field in a cdef class... Hmm, aren't those None checked? At least cdef method calls are AFAIR. I think we should really get back to the habit of making code safe first and fast afterwards. >> That's another reason why I see a difference between the behaviour of >> extension types and that of buffer arguments. Buffer indexing is also way >> more performance critical than the average method call or attribute access >> on a cdef class. > > Perhaps, but that's a bit hand-wavy to turn into a principle of language > design? "This is performance critical, so therefore we suddenly invert the > normal rule"? Since when is the "normal rule" to consider performance so important that we prefer a crash over raising an exception? That's the current state of buffer arguments, after all, so we already inverted the "normal rule", IMHO. > I just think we should be consistent, not have more special rules for > buffers than we need to. Agreed. So, would you accept that we add a None check to every buffer indexing access now and try to eliminate them over time (or with user interaction)? > The intention all the time was that "np.ndarray[double]" is just a > glorified "np.ndarray". People expect it to behave like an optimized > "np.ndarray". If "np.ndarray" can be None, why can't "np.ndarray[double]"? Because it uses syntax that is expected to unpack the buffer. If that buffer doesn't exist, I'd expect an error. It's like using interfaces: I want something here that implements the buffer interface. If it doesn't - reject it. Besides, I hope you are aware that your argumentation stands on the (IMHO questionable) fact that "np.ndarray" by itself can be None by default. If np.ndarray should not be be allowed to be None by default, why should np.ndarray[double]? That argument works in both ways. > BTW, with the coming of memoryviews, me and Mark talked about just > deprecating the "mytype[...]" meaning buffers, and rather treat it as > np.ndarray, array.array etc. being some sort of "template types". That is, > we disallow "object[int]" and require some special declarations in the > relevant pxd files. Hmm, yes, it's unfortunate that we have two different types of syntax now, one that declares the item type before the brackets and one that declares it afterwards. >>> (Java is sort of prior art that this can indeed be done?). >> >> Java was designed to have a JIT compiler underneath which handles external >> parameters, and its compilers are way smarter than Cython. I agree that >> there is still a lot we can do based on better static analysis, but there >> will always be limits. > > Any static analysis will be able to get you to the point of "not None" if > the user has a manual test. Sure. It will at least expect a fatal error at the first operation that won't work with a None value and know that it can't be None afterwards. > And the Python way is often to just spell > things out rather than brevity; I think an explicit if-test is much more > newbie friendly than "not None", "or None", etc. ... with a good default being even more pythonic. ;) Stefan From stefan_ml at behnel.de Mon May 7 16:16:32 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 07 May 2012 16:16:32 +0200 Subject: [Cython] buffer syntax vs. memory view syntax (was: Re: checking for "None" in nogil function) In-Reply-To: <4FA7C852.9020004@behnel.de> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> Message-ID: <4FA7D940.5030607@behnel.de> Stefan Behnel, 07.05.2012 15:04: > Dag Sverre Seljebotn, 07.05.2012 13:48: >> BTW, with the coming of memoryviews, me and Mark talked about just >> deprecating the "mytype[...]" meaning buffers, and rather treat it as >> np.ndarray, array.array etc. being some sort of "template types". That is, >> we disallow "object[int]" and require some special declarations in the >> relevant pxd files. > > Hmm, yes, it's unfortunate that we have two different types of syntax now, > one that declares the item type before the brackets and one that declares > it afterwards. I actually think this merits some more discussion. Should we consider the buffer interface syntax deprecated and focus on the memory view syntax? The words-to-punctuation ratio of the latter may hurt the eyes when encountering it unprepared, but at least it doesn't require two type names, of which the one before the brackets (i.e. "object") is mostly useless. (Although it does reflect the notion that we are dealing with an object here ...) Stefan From vitja.makarov at gmail.com Mon May 7 17:08:09 2012 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Mon, 7 May 2012 19:08:09 +0400 Subject: [Cython] How do you trigger a jenkins build? Message-ID: I've noticed that old one URL hook doesn't work for me now. I tried to check "Build when a change is pushed to GitHub" and set "Jenkins Hook URL" to https://sage.math.washington.edu:8091/hudson/github-webhook/ That doesn't work. What is the right way? -- vitja. From d.s.seljebotn at astro.uio.no Mon May 7 17:48:14 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 07 May 2012 17:48:14 +0200 Subject: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function In-Reply-To: <4FA7C852.9020004@behnel.de> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> Message-ID: <4FA7EEBE.7060508@astro.uio.no> On 05/07/2012 03:04 PM, Stefan Behnel wrote: > Dag Sverre Seljebotn, 07.05.2012 13:48: >> Here you go: >> >> def foo(np.ndarray[double] a, np.ndarray[double] out=None): >> if out is None: >> out = np.empty_like(a) > > Ah, right - output arguments. Hadn't thought of those. > > Still, since you pass None explicitly as a default argument, this code > wouldn't be impacted by disallowing None for buffers by default. That case > is already handled specially in the compiler. But a better default would > prevent the *first* argument from being None. > > So, basically, it would do the right thing straight away in your case and > generate safer and more efficient code for it, whereas now you have to test > 'a' for being None explicitly and Cython won't understand that hint due to > insufficient static analysis. At least, since my last commit you can make > Cython do the same thing by declaring it "not None". Yes, thanks! >>>> It's really no different from cdef classes. >>> >>> I find it at least a bit more surprising because a buffer unpacking >>> argument is a rather strong hint that you expect something that supports >>> this protocol. The fact that you type your function argument with it hints >>> at the intention to properly unpack it on entry. I'm sure there are lots of >>> users who were or will be surprised when they realise that that doesn't >>> exclude None values. >> >> Whereas I think there would be more users surprised by the opposite. > > We've had enough complaints from users about None being allowed for typed > arguments already to consider it at least a gotcha of the language. > > The main reason we didn't change this behaviour back then was that it would > clearly break user code and we thought we could do without that. That's > different from considering it "right" and "good". > > >>>>> And I remember that we wanted to change the default settings for extension >>>>> type arguments from "or None" to "not None" years ago but never actually >>>>> did it. >>>> >>>> I remember that there was such a debate, but I certainly don't remember >>>> that this was the conclusion :-) >>> >>> Maybe not, yes. >>> >>> >>>> I didn't agree with that view then and >>>> I don't now. I don't remember what Robert's view was... >>>> >>>> As far as I can remember (which might be biased towards my personal >>>> view), the conclusion was that we left the current semantics in place, >>>> relying on better control flow analysis to make None-checks cheaper, and >>>> when those are cheap enough, make the nonecheck directive default to >>>> True >>> >>> At least for buffer arguments, it silently corrupts data or segfaults in >>> the current state of affairs, as you pointed out. Not exactly ideal. >> >> No different than writing to a field in a cdef class... > > Hmm, aren't those None checked? At least cdef method calls are AFAIR. Not at all. That's my whole point -- currently, the rule for None in Cython is "it's your responsibility to never do a native operation on None". I don't like that either, but that's just inherited from Pyrex (and many projects would get speed regressions etc.). I'm not against changing that to "we safely None-check", if done nicely -- it's just that that should be done everywhere at once. In current master (and as far back as I can remember), this code: cdef class A: cdef int field cdef int method(self): print self.field def f(): cdef A a = None a.field = 3 a.method() Turns into: __pyx_v_a = ((struct __pyx_obj_5test2_A *)Py_None); __pyx_v_a->field = 3; ((struct __pyx_vtabstruct_5test2_A *) __pyx_v_a->__pyx_vtab)->method(__pyx_v_a); > I think we should really get back to the habit of making code safe first > and fast afterwards. Nobody has argued otherwise for some time (since the cdivision thread I believe), this is all about Pyrex legacy. Guess part of the story is that there's lots of performance-sensitive code in SAGE using cdef classes which was written in Pyrex before Cython was around... In fact, the nonecheck directive was written by yours truly! And I argued for making it the default at the time! > Because it uses syntax that is expected to unpack the buffer. If that > buffer doesn't exist, I'd expect an error. It's like using interfaces: I > want something here that implements the buffer interface. If it doesn't - > reject it. > > Besides, I hope you are aware that your argumentation stands on the (IMHO > questionable) fact that "np.ndarray" by itself can be None by default. If > np.ndarray should not be be allowed to be None by default, why should > np.ndarray[double]? That argument works in both ways. I'm well aware of it... Dag From d.s.seljebotn at astro.uio.no Mon May 7 18:00:20 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 07 May 2012 18:00:20 +0200 Subject: [Cython] buffer syntax vs. memory view syntax In-Reply-To: <4FA7D940.5030607@behnel.de> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7D940.5030607@behnel.de> Message-ID: <4FA7F194.5080008@astro.uio.no> On 05/07/2012 04:16 PM, Stefan Behnel wrote: > Stefan Behnel, 07.05.2012 15:04: >> Dag Sverre Seljebotn, 07.05.2012 13:48: >>> BTW, with the coming of memoryviews, me and Mark talked about just >>> deprecating the "mytype[...]" meaning buffers, and rather treat it as >>> np.ndarray, array.array etc. being some sort of "template types". That is, >>> we disallow "object[int]" and require some special declarations in the >>> relevant pxd files. >> >> Hmm, yes, it's unfortunate that we have two different types of syntax now, >> one that declares the item type before the brackets and one that declares >> it afterwards. > > I actually think this merits some more discussion. Should we consider the > buffer interface syntax deprecated and focus on the memory view syntax? I think that's the very-long-term intention. Then again, it may be too early to really tell yet, we just need to see how the memory views play out in real life and whether they'll be able to replace np.ndarray[double] among real users. We don't want to shove things down users throats. But the use of the trailing-[] syntax needs some cleaning up. Me and Mark agreed we'd put this proposal forward when we got around to it: - Deprecate the "object[double]" form, where [dtype] can be stuck on any extension type - But, do NOT (for the next year at least) deprecate np.ndarray[double], array.array[double], etc. Basically, there should be a magic flag in extension type declarations saying "I can be a buffer". For one thing, that is sort of needed to open up things for templated cdef classes/fused types cdef classes, if that is ever implemented. The semantic meaning of trailing [] is still sort of like the C++ meaning; that it templates the argument types (except it's lots of special cases in the compiler for various things rather than a Turing-complete template language...) Dag > > The words-to-punctuation ratio of the latter may hurt the eyes when > encountering it unprepared, but at least it doesn't require two type names, > of which the one before the brackets (i.e. "object") is mostly useless. > (Although it does reflect the notion that we are dealing with an object > here ...) > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From d.s.seljebotn at astro.uio.no Mon May 7 18:03:44 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 07 May 2012 18:03:44 +0200 Subject: [Cython] buffer syntax vs. memory view syntax In-Reply-To: <4FA7F194.5080008@astro.uio.no> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7D940.5030607@behnel.de> <4FA7F194.5080008@astro.uio.no> Message-ID: <4FA7F260.7010403@astro.uio.no> On 05/07/2012 06:00 PM, Dag Sverre Seljebotn wrote: > On 05/07/2012 04:16 PM, Stefan Behnel wrote: >> Stefan Behnel, 07.05.2012 15:04: >>> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>> BTW, with the coming of memoryviews, me and Mark talked about just >>>> deprecating the "mytype[...]" meaning buffers, and rather treat it as >>>> np.ndarray, array.array etc. being some sort of "template types". >>>> That is, >>>> we disallow "object[int]" and require some special declarations in the >>>> relevant pxd files. >>> >>> Hmm, yes, it's unfortunate that we have two different types of syntax >>> now, >>> one that declares the item type before the brackets and one that >>> declares >>> it afterwards. >> >> I actually think this merits some more discussion. Should we consider the >> buffer interface syntax deprecated and focus on the memory view syntax? > > I think that's the very-long-term intention. Then again, it may be too > early to really tell yet, we just need to see how the memory views play > out in real life and whether they'll be able to replace > np.ndarray[double] among real users. We don't want to shove things down > users throats. > > But the use of the trailing-[] syntax needs some cleaning up. Me and > Mark agreed we'd put this proposal forward when we got around to it: > > - Deprecate the "object[double]" form, where [dtype] can be stuck on any > extension type > > - But, do NOT (for the next year at least) deprecate np.ndarray[double], > array.array[double], etc. Basically, there should be a magic flag in > extension type declarations saying "I can be a buffer". > > For one thing, that is sort of needed to open up things for templated > cdef classes/fused types cdef classes, if that is ever implemented. > > The semantic meaning of trailing [] is still sort of like the C++ > meaning; that it templates the argument types (except it's lots of > special cases in the compiler for various things rather than a > Turing-complete template language...) s/argument types/base type/ Dag > > Dag > >> >> The words-to-punctuation ratio of the latter may hurt the eyes when >> encountering it unprepared, but at least it doesn't require two type >> names, >> of which the one before the brackets (i.e. "object") is mostly useless. >> (Although it does reflect the notion that we are dealing with an object >> here ...) >> >> Stefan >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Mon May 7 18:03:43 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 7 May 2012 17:03:43 +0100 Subject: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function In-Reply-To: <4FA7B724.5050008@astro.uio.no> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7B724.5050008@astro.uio.no> Message-ID: On 7 May 2012 12:51, Dag Sverre Seljebotn wrote: > On 05/07/2012 01:48 PM, Dag Sverre Seljebotn wrote: >> >> On 05/07/2012 01:10 PM, Stefan Behnel wrote: >>> >>> Dag Sverre Seljebotn, 07.05.2012 12:40: >>>> >>>> moving to dev list >>> >>> >>> Makes sense. >>> >>>> On 05/07/2012 11:17 AM, Stefan Behnel wrote: >>>>> >>>>> Dag Sverre Seljebotn, 07.05.2012 10:44: >>>>>> >>>>>> On 05/07/2012 07:48 AM, Stefan Behnel wrote: >>>>>>> >>>>>>> I wonder why a memory view should be allowed to be None in the first >>>>>>> place. >>>>>>> Buffer arguments aren't (because they get unpacked on entry), so why >>>>>>> should memory views? >>>>>> >>>>>> >>>>>> ? At least when I implemented it, buffers get unpacked but the case >>>>>> of a >>>>>> None buffer is treated specially, and you're fully allowed (and >>>>>> segfault if >>>>>> you [] it). >>>>> >>>>> >>>>> Hmm, ok, maybe I just got confused by the code then. >>>>> >>>>> I think the docs should state that buffer arguments are best used >>>>> together >>>>> with the "not None" declaration then. >>> >>> >>> ... which made me realise that that wasn't even supported. I can't >>> believe >>> no-one ever reported that as a bug... >>> >>> >>> https://github.com/cython/cython/commit/f2de49fd0ac82a02a070b931bf4d2dab47135d0b >>> >>> >>> It's still not supported for memory views. >>> >>> BTW, is there a reason why we shouldn't allow a "not None" declaration >>> for >>> cdef functions? Obviously, the caller would have to do the check in that >>> case. Hmm, maybe it's not that important, because None checks are best >>> done >>> at entry points from user code, which usually means Python code. It seems >>> like "not None" is not supported on cpdef functions, though. >>> >>> >>>> I use them with "=None" default values all the time... then do a >>>> None-check manually. >>> >>> >>> Interesting. Could you given an example? What's the advantage over >>> letting >>> Cython raise an error for you? And, since you are using it as a default >>> argument, why would someone want to call your code entirely without a >>> buffer argument? >> >> >> Here you go: >> >> def foo(np.ndarray[double] a, np.ndarray[double] out=None): >> if out is None: >> out = np.empty_like(a) >> # compute result in out >> return out >> >> The pattern of handing in the memory area to write to is one of the >> fundamental basics of numerical computing; you often just can't >> implement an algorithm if the called function returns the result in a >> newly-allocated array. I can explain why that is in detail, but I'd >> rather you just trusted the testimony of somebody doing numerical >> computation... >> >> It's just a convenience, but often (in particular when testing) it's >> incredibly convenient to not have to bother with allocating the output >> array. >> >> Another pattern is: >> >> def do_something(np.ndarray[double] a, >> np.ndarray[double] sin_of_a=None): >> ... >> >> so if your caller happened to already have computed something, the >> function uses it, but OTOH the "something" is a function of the inputs >> and can be computed on the fly. AND, sometimes it can be computed on the >> fly in ways more efficient than what the caller could have done, because >> of memory bus issues etc. etc. >> >> Both of these can be "fixed" by a) not allowing the convenient >> shorthand, or b) declare the argument "object" first and then type it >> after the "preamble". >> >> So the REAL reason I'm arguing this case is consistency with cdef classes. >> >> >> >>> >>> >>>> It's really no different from cdef classes. >>> >>> >>> I find it at least a bit more surprising because a buffer unpacking >>> argument is a rather strong hint that you expect something that supports >>> this protocol. The fact that you type your function argument with it >>> hints >>> at the intention to properly unpack it on entry. I'm sure there are >>> lots of >>> users who were or will be surprised when they realise that that doesn't >>> exclude None values. >> >> >> Whereas I think there would be more users surprised by the opposite. >> >> So there -- we won't know who's right without actually finding some >> users. And chances are we are both right, since users are different from >> one another. >> >>> >>> >>>>> And I remember that we wanted to change the default settings for >>>>> extension >>>>> type arguments from "or None" to "not None" years ago but never >>>>> actually >>>>> did it. >>>> >>>> >>>> I remember that there was such a debate, but I certainly don't remember >>>> that this was the conclusion :-) >>> >>> >>> Maybe not, yes. >>> >>> >>>> I didn't agree with that view then and >>>> I don't now. I don't remember what Robert's view was... >>>> >>>> As far as I can remember (which might be biased towards my personal >>>> view), the conclusion was that we left the current semantics in place, >>>> relying on better control flow analysis to make None-checks cheaper, and >>>> when those are cheap enough, make the nonecheck directive default to >>>> True >>> >>> >>> At least for buffer arguments, it silently corrupts data or segfaults in >>> the current state of affairs, as you pointed out. Not exactly ideal. >> >> >> No different than writing to a field in a cdef class... > > > Also, I believe that in the strided case, the strides are all set to 0, and > the data-pointer is NULL, so you will never corrupt data, you will always > try to access *NULL and segfault. > > Though If you put mode='c' and a very high index you'll corrupt data. > > Dag > If you have boundschecking on, you'll get an out of bounds error, which is pretty weird :) >> >>> >>> That's another reason why I see a difference between the behaviour of >>> extension types and that of buffer arguments. Buffer indexing is also way >>> more performance critical than the average method call or attribute >>> access >>> on a cdef class. >> >> >> Perhaps, but that's a bit hand-wavy to turn into a principle of language >> design? "This is performance critical, so therefore we suddenly invert >> the normal rule"? >> >> I just think we should be consistent, not have more special rules for >> buffers than we need to. >> >> The intention all the time was that "np.ndarray[double]" is just a >> glorified "np.ndarray". People expect it to behave like an optimized >> "np.ndarray". If "np.ndarray" can be None, why can't "np.ndarray[double]"? >> >> BTW, with the coming of memoryviews, me and Mark talked about just >> deprecating the "mytype[...]" meaning buffers, and rather treat it as >> np.ndarray, array.array etc. being some sort of "template types". That >> is, we disallow "object[int]" and require some special declarations in >> the relevant pxd files. >> >>>> (Java is sort of prior art that this can indeed be done?). >>> >>> >>> Java was designed to have a JIT compiler underneath which handles >>> external >>> parameters, and its compilers are way smarter than Cython. I agree that >>> there is still a lot we can do based on better static analysis, but there >>> will always be limits. >> >> >> Any static analysis will be able to get you to the point of "not None" >> if the user has a manual test. And the Python way is often to just spell >> things out rather than brevity; I think an explicit if-test is much more >> newbie friendly than "not None", "or None", etc. >> >> Performance beyond that is rather theoretical for the moment. >> >> I agree that for memoryviews that can be passed in acquired-state to >> cdef functions there is the question of eliminating an extra branch or >> so, but that is still far-fetched, and I'd rather Mark raise the issue >> if it comes an issue than the two of us bikeshedding over it. >> >> I'll try to make this my last post to this thread, I feel we're slipping >> into Dag-and-Stefan-endless-thread territory... >> >> Dag > > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From vitja.makarov at gmail.com Mon May 7 18:04:01 2012 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Mon, 7 May 2012 20:04:01 +0400 Subject: [Cython] 0.17 In-Reply-To: References:

Message-ID: 2012/5/7 mark florisson : > On 6 May 2012 20:41, Matthew Brett wrote: >> Hi, >> >> On Sun, May 6, 2012 at 7:28 AM, mark florisson >> wrote: >>> Hey, >>> >>> I think we already have quite a bit of functionality (nearly) ready, >>> after merging some pending pull requests maybe it will be a good time >>> for a 0.17 release? I think it would be good to also document to what >>> extent pypy support works, what works and what doesn't. Stefan, since >>> you added a large majority of the features, would you want to be the >>> release manager? >>> >>> In summary, the following pull requests should likely go in >>> ? ?- array.array support (unless further discussion prevents that) >>> ? ?- fused types runtime buffer dispatch >>> ? ?- newaxis >>> ? ?- more? >>> >>> The memoryview documentation should also be reworked a bit. Matthew, >>> are you still willing to have a go at that? Otherwise I can clean up >>> the mess first, some things are no longer true and simply outdated, >>> and then have a second opinion. >> >> Yes, sorry, I have been taken up by releasing my own project. What's >> the deadline do you think? ?I have another big release to do for the >> end of next week, but I might be able to carve out some time, >> >> See you, >> >> Matthew > > Great, I'd say we're probably not going to release anything within the > next two weeks, so take your time, there is no hurry really :). Hmm, it seems to me that master is currently broken: https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/BACKEND=c,PYVERSION=py27-ext/ -- vitja. From markflorisson88 at gmail.com Mon May 7 18:04:01 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 7 May 2012 17:04:01 +0100 Subject: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function In-Reply-To: <4FA7ADC0.40501@behnel.de> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> Message-ID: On 7 May 2012 12:10, Stefan Behnel wrote: > Dag Sverre Seljebotn, 07.05.2012 12:40: >> moving to dev list > > Makes sense. > >> On 05/07/2012 11:17 AM, Stefan Behnel wrote: >>> Dag Sverre Seljebotn, 07.05.2012 10:44: >>>> On 05/07/2012 07:48 AM, Stefan Behnel wrote: >>>>> I wonder why a memory view should be allowed to be None in the first >>>>> place. >>>>> Buffer arguments aren't (because they get unpacked on entry), so why >>>>> should memory views? >>>> >>>> ? At least when I implemented it, buffers get unpacked but the case of a >>>> None buffer is treated specially, and you're fully allowed (and segfault if >>>> you [] it). >>> >>> Hmm, ok, maybe I just got confused by the code then. >>> >>> I think the docs should state that buffer arguments are best used together >>> with the "not None" declaration then. > > ... which made me realise that that wasn't even supported. I can't believe > no-one ever reported that as a bug... > > https://github.com/cython/cython/commit/f2de49fd0ac82a02a070b931bf4d2dab47135d0b > > It's still not supported for memory views. Yeah, that was never implemented, but probably should be. > BTW, is there a reason why we shouldn't allow a "not None" declaration for > cdef functions? Obviously, the caller would have to do the check in that > case. Why can't the callee just check it? If it's None, just raise an exception like usual? > Hmm, maybe it's not that important, because None checks are best done > at entry points from user code, which usually means Python code. It seems > like "not None" is not supported on cpdef functions, though. > > >> I use them with "=None" default values all the time... then do a >> None-check manually. > > Interesting. Could you given an example? What's the advantage over letting > Cython raise an error for you? And, since you are using it as a default > argument, why would someone want to call your code entirely without a > buffer argument? > > >> It's really no different from cdef classes. > > I find it at least a bit more surprising because a buffer unpacking > argument is a rather strong hint that you expect something that supports > this protocol. The fact that you type your function argument with it hints > at the intention to properly unpack it on entry. I'm sure there are lots of > users who were or will be surprised when they realise that that doesn't > exclude None values. > > >>> And I remember that we wanted to change the default settings for extension >>> type arguments from "or None" to "not None" years ago but never actually >>> did it. >> >> I remember that there was such a debate, but I certainly don't remember >> that this was the conclusion :-) > > Maybe not, yes. > > >> I didn't agree with that view then and >> I don't now. I don't remember what Robert's view was... >> >> As far as I can remember (which might be biased towards my personal >> view), the conclusion was that we left the current semantics in place, >> relying on better control flow analysis to make None-checks cheaper, and >> when those are cheap enough, make the nonecheck directive default to >> True > > At least for buffer arguments, it silently corrupts data or segfaults in > the current state of affairs, as you pointed out. Not exactly ideal. > > That's another reason why I see a difference between the behaviour of > extension types and that of buffer arguments. Buffer indexing is also way > more performance critical than the average method call or attribute access > on a cdef class. > > >> (Java is sort of prior art that this can indeed be done?). > > Java was designed to have a JIT compiler underneath which handles external > parameters, and its compilers are way smarter than Cython. I agree that > there is still a lot we can do based on better static analysis, but there > will always be limits. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From d.s.seljebotn at astro.uio.no Mon May 7 18:07:22 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 07 May 2012 18:07:22 +0200 Subject: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function In-Reply-To: References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> Message-ID: <4FA7F33A.9020903@astro.uio.no> On 05/07/2012 06:04 PM, mark florisson wrote: > On 7 May 2012 12:10, Stefan Behnel wrote: >> Dag Sverre Seljebotn, 07.05.2012 12:40: >>> moving to dev list >> >> Makes sense. >> >>> On 05/07/2012 11:17 AM, Stefan Behnel wrote: >>>> Dag Sverre Seljebotn, 07.05.2012 10:44: >>>>> On 05/07/2012 07:48 AM, Stefan Behnel wrote: >>>>>> I wonder why a memory view should be allowed to be None in the first >>>>>> place. >>>>>> Buffer arguments aren't (because they get unpacked on entry), so why >>>>>> should memory views? >>>>> >>>>> ? At least when I implemented it, buffers get unpacked but the case of a >>>>> None buffer is treated specially, and you're fully allowed (and segfault if >>>>> you [] it). >>>> >>>> Hmm, ok, maybe I just got confused by the code then. >>>> >>>> I think the docs should state that buffer arguments are best used together >>>> with the "not None" declaration then. >> >> ... which made me realise that that wasn't even supported. I can't believe >> no-one ever reported that as a bug... >> >> https://github.com/cython/cython/commit/f2de49fd0ac82a02a070b931bf4d2dab47135d0b >> >> It's still not supported for memory views. > > Yeah, that was never implemented, but probably should be. > >> BTW, is there a reason why we shouldn't allow a "not None" declaration for >> cdef functions? Obviously, the caller would have to do the check in that >> case. > > Why can't the callee just check it? If it's None, just raise an > exception like usual? It's just that there's a lot more potential for rather easy optimization if the caller does it. Dag From stefan_ml at behnel.de Mon May 7 18:12:39 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 07 May 2012 18:12:39 +0200 Subject: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function In-Reply-To: <4FA7F33A.9020903@astro.uio.no> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7F33A.9020903@astro.uio.no> Message-ID: <4FA7F477.30701@behnel.de> Dag Sverre Seljebotn, 07.05.2012 18:07: > On 05/07/2012 06:04 PM, mark florisson wrote: >> On 7 May 2012 12:10, Stefan Behnel wrote: >>> BTW, is there a reason why we shouldn't allow a "not None" declaration for >>> cdef functions? Obviously, the caller would have to do the check in that >>> case. >> >> Why can't the callee just check it? If it's None, just raise an >> exception like usual? > > It's just that there's a lot more potential for rather easy optimization if > the caller does it. Exactly. The NoneCheckNode is easy to get rid of at any stage in the pipeline, whereas a hard coded None check has a fixed cost at runtime. Stefan From markflorisson88 at gmail.com Mon May 7 18:12:56 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 7 May 2012 17:12:56 +0100 Subject: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function In-Reply-To: <4FA7EEBE.7060508@astro.uio.no> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7EEBE.7060508@astro.uio.no> Message-ID: On 7 May 2012 16:48, Dag Sverre Seljebotn wrote: > On 05/07/2012 03:04 PM, Stefan Behnel wrote: >> >> Dag Sverre Seljebotn, 07.05.2012 13:48: >> >>> Here you go: >>> >>> def foo(np.ndarray[double] a, np.ndarray[double] out=None): >>> ? ? if out is None: >>> ? ? ? ? out = np.empty_like(a) >> >> >> Ah, right - output arguments. Hadn't thought of those. >> >> Still, since you pass None explicitly as a default argument, this code >> wouldn't be impacted by disallowing None for buffers by default. That case >> is already handled specially in the compiler. But a better default would >> prevent the *first* argument from being None. >> >> So, basically, it would do the right thing straight away in your case and >> generate safer and more efficient code for it, whereas now you have to >> test >> 'a' for being None explicitly and Cython won't understand that hint due to >> insufficient static analysis. At least, since my last commit you can make >> Cython do the same thing by declaring it "not None". > > > Yes, thanks! > > >>>>> It's really no different from cdef classes. >>>> >>>> >>>> I find it at least a bit more surprising because a buffer unpacking >>>> argument is a rather strong hint that you expect something that supports >>>> this protocol. The fact that you type your function argument with it >>>> hints >>>> at the intention to properly unpack it on entry. I'm sure there are lots >>>> of >>>> users who were or will be surprised when they realise that that doesn't >>>> exclude None values. >>> >>> >>> Whereas I think there would be more users surprised by the opposite. >> >> >> We've had enough complaints from users about None being allowed for typed >> arguments already to consider it at least a gotcha of the language. >> >> The main reason we didn't change this behaviour back then was that it >> would >> clearly break user code and we thought we could do without that. That's >> different from considering it "right" and "good". >> >> >>>>>> And I remember that we wanted to change the default settings for >>>>>> extension >>>>>> type arguments from "or None" to "not None" years ago but never >>>>>> actually >>>>>> did it. >>>>> >>>>> >>>>> I remember that there was such a debate, but I certainly don't remember >>>>> that this was the conclusion :-) >>>> >>>> >>>> Maybe not, yes. >>>> >>>> >>>>> I didn't agree with that view then and >>>>> I don't now. I don't remember what Robert's view was... >>>>> >>>>> As far as I can remember (which might be biased towards my personal >>>>> view), the conclusion was that we left the current semantics in place, >>>>> relying on better control flow analysis to make None-checks cheaper, >>>>> and >>>>> when those are cheap enough, make the nonecheck directive default to >>>>> True >>>> >>>> >>>> At least for buffer arguments, it silently corrupts data or segfaults in >>>> the current state of affairs, as you pointed out. Not exactly ideal. >>> >>> >>> No different than writing to a field in a cdef class... >> >> >> Hmm, aren't those None checked? At least cdef method calls are AFAIR. > > > Not at all. That's my whole point -- currently, the rule for None in Cython > is "it's your responsibility to never do a native operation on None". > > I don't like that either, but that's just inherited from Pyrex (and many > projects would get speed regressions etc.). > > I'm not against changing that to "we safely None-check", if done nicely -- > it's just that that should be done everywhere at once. > > In current master (and as far back as I can remember), this code: > > cdef class A: > ? ?cdef int field > ? ?cdef int method(self): > ? ? ? ?print self.field > def f(): > ? ?cdef A a = None > ? ?a.field = 3 > ? ?a.method() > > Turns into: > > ?__pyx_v_a = ((struct __pyx_obj_5test2_A *)Py_None); > ?__pyx_v_a->field = 3; > ?((struct __pyx_vtabstruct_5test2_A *) > __pyx_v_a->__pyx_vtab)->method(__pyx_v_a); > > > > >> I think we should really get back to the habit of making code safe first >> and fast afterwards. > > > Nobody has argued otherwise for some time (since the cdivision thread I > believe), this is all about Pyrex legacy. Guess part of the story is that > there's lots of performance-sensitive code in SAGE using cdef classes which > was written in Pyrex before Cython was around... > > In fact, the nonecheck directive was written by yours truly! And I argued > for making it the default at the time! > > >> Because it uses syntax that is expected to unpack the buffer. If that >> buffer doesn't exist, I'd expect an error. It's like using interfaces: I >> want something here that implements the buffer interface. If it doesn't - >> reject it. >> >> Besides, I hope you are aware that your argumentation stands on the (IMHO >> questionable) fact that "np.ndarray" by itself can be None by default. If >> np.ndarray should not be be allowed to be None by default, why should >> np.ndarray[double]? That argument works in both ways. > > > I'm well aware of it... > > Dag > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel None checking could and should be optimized, it can be done but is a bit tricky. A problem are class attributes, as you can at certain point determine that it's not None, but after any function call etc it can suddenly be None because some code decided to set it to None (maybe weird, but possible). We could do well for local variables of which no address is taken, though. You'd have to recheck after each assignment though. We should probably start implementing single static assignment first, also to implement boundschecking and eliminating some common subexpressions. From markflorisson88 at gmail.com Mon May 7 18:16:42 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 7 May 2012 17:16:42 +0100 Subject: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function In-Reply-To: <4FA7F477.30701@behnel.de> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7F33A.9020903@astro.uio.no> <4FA7F477.30701@behnel.de> Message-ID: On 7 May 2012 17:12, Stefan Behnel wrote: > Dag Sverre Seljebotn, 07.05.2012 18:07: >> On 05/07/2012 06:04 PM, mark florisson wrote: >>> On 7 May 2012 12:10, Stefan Behnel wrote: >>>> BTW, is there a reason why we shouldn't allow a "not None" declaration for >>>> cdef functions? Obviously, the caller would have to do the check in that >>>> case. >>> >>> Why can't the callee just check it? If it's None, just raise an >>> exception like usual? >> >> It's just that there's a lot more potential for rather easy optimization if >> the caller does it. > > Exactly. The NoneCheckNode is easy to get rid of at any stage in the > pipeline, whereas a hard coded None check has a fixed cost at runtime. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel I see, yes. I expect a pointer comparison to be reasonably insignificant compared to function call overhead, but it would also reduce the code in the instruction cache. If you take the address of the function though, or if you declare it public in a pxd, you probably don't want to do that, as you still want to be safe when called from C. Could do the same trick as in the 'less annotations' CEP though, that would be nice. From markflorisson88 at gmail.com Mon May 7 18:18:09 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 7 May 2012 17:18:09 +0100 Subject: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function In-Reply-To: References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7F33A.9020903@astro.uio.no> <4FA7F477.30701@behnel.de> Message-ID: On 7 May 2012 17:16, mark florisson wrote: > On 7 May 2012 17:12, Stefan Behnel wrote: >> Dag Sverre Seljebotn, 07.05.2012 18:07: >>> On 05/07/2012 06:04 PM, mark florisson wrote: >>>> On 7 May 2012 12:10, Stefan Behnel wrote: >>>>> BTW, is there a reason why we shouldn't allow a "not None" declaration for >>>>> cdef functions? Obviously, the caller would have to do the check in that >>>>> case. >>>> >>>> Why can't the callee just check it? If it's None, just raise an >>>> exception like usual? >>> >>> It's just that there's a lot more potential for rather easy optimization if >>> the caller does it. >> >> Exactly. The NoneCheckNode is easy to get rid of at any stage in the >> pipeline, whereas a hard coded None check has a fixed cost at runtime. >> >> Stefan >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > I see, yes. I expect a pointer comparison to be reasonably > insignificant compared to function call overhead, but it would also > reduce the code in the instruction cache. If you take the address of > the function though, or if you declare it public in a pxd, you > probably don't want to do that, as you still want to be safe when > called from C. Could do the same trick as in the 'less annotations' > CEP though, that would be nice. ... or you could document that 'not None' means the caller cannot pass it in, but that would be weird as you could do it from Cython and get an exception, but not from C :) That would be better specified in the documentation of the function as its contract or whatever. From markflorisson88 at gmail.com Mon May 7 18:19:47 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 7 May 2012 17:19:47 +0100 Subject: [Cython] 0.17 In-Reply-To: References:

Message-ID: On 7 May 2012 17:04, Vitja Makarov wrote: > 2012/5/7 mark florisson : >> On 6 May 2012 20:41, Matthew Brett wrote: >>> Hi, >>> >>> On Sun, May 6, 2012 at 7:28 AM, mark florisson >>> wrote: >>>> Hey, >>>> >>>> I think we already have quite a bit of functionality (nearly) ready, >>>> after merging some pending pull requests maybe it will be a good time >>>> for a 0.17 release? I think it would be good to also document to what >>>> extent pypy support works, what works and what doesn't. Stefan, since >>>> you added a large majority of the features, would you want to be the >>>> release manager? >>>> >>>> In summary, the following pull requests should likely go in >>>> ? ?- array.array support (unless further discussion prevents that) >>>> ? ?- fused types runtime buffer dispatch >>>> ? ?- newaxis >>>> ? ?- more? >>>> >>>> The memoryview documentation should also be reworked a bit. Matthew, >>>> are you still willing to have a go at that? Otherwise I can clean up >>>> the mess first, some things are no longer true and simply outdated, >>>> and then have a second opinion. >>> >>> Yes, sorry, I have been taken up by releasing my own project. What's >>> the deadline do you think? ?I have another big release to do for the >>> end of next week, but I might be able to carve out some time, >>> >>> See you, >>> >>> Matthew >> >> Great, I'd say we're probably not going to release anything within the >> next two weeks, so take your time, there is no hurry really :). > > Hmm, it seems to me that master is currently broken: > > https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/BACKEND=c,PYVERSION=py27-ext/ > > -- > vitja. > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Quite broken, in fact :) It doesn't ever print error messages property anymore. From d.s.seljebotn at astro.uio.no Mon May 7 18:22:38 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 07 May 2012 18:22:38 +0200 Subject: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function In-Reply-To: References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7F33A.9020903@astro.uio.no> <4FA7F477.30701@behnel.de> Message-ID: <4FA7F6CE.3090005@astro.uio.no> On 05/07/2012 06:18 PM, mark florisson wrote: > On 7 May 2012 17:16, mark florisson wrote: >> On 7 May 2012 17:12, Stefan Behnel wrote: >>> Dag Sverre Seljebotn, 07.05.2012 18:07: >>>> On 05/07/2012 06:04 PM, mark florisson wrote: >>>>> On 7 May 2012 12:10, Stefan Behnel wrote: >>>>>> BTW, is there a reason why we shouldn't allow a "not None" declaration for >>>>>> cdef functions? Obviously, the caller would have to do the check in that >>>>>> case. >>>>> >>>>> Why can't the callee just check it? If it's None, just raise an >>>>> exception like usual? >>>> >>>> It's just that there's a lot more potential for rather easy optimization if >>>> the caller does it. >>> >>> Exactly. The NoneCheckNode is easy to get rid of at any stage in the >>> pipeline, whereas a hard coded None check has a fixed cost at runtime. >>> >>> Stefan >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> >> I see, yes. I expect a pointer comparison to be reasonably >> insignificant compared to function call overhead, but it would also >> reduce the code in the instruction cache. If you take the address of >> the function though, or if you declare it public in a pxd, you >> probably don't want to do that, as you still want to be safe when >> called from C. Could do the same trick as in the 'less annotations' >> CEP though, that would be nice. > > ... or you could document that 'not None' means the caller cannot pass > it in, but that would be weird as you could do it from Cython and get > an exception, but not from C :) That would be better specified in the > documentation of the function as its contract or whatever. We're going to need a "Cython ABI" at some point anyway. "Caller checks for None" goes in the ABI docs. Dag From markflorisson88 at gmail.com Mon May 7 18:28:19 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 7 May 2012 17:28:19 +0100 Subject: [Cython] buffer syntax vs. memory view syntax In-Reply-To: <4FA7F194.5080008@astro.uio.no> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7D940.5030607@behnel.de> <4FA7F194.5080008@astro.uio.no> Message-ID: On 7 May 2012 17:00, Dag Sverre Seljebotn wrote: > On 05/07/2012 04:16 PM, Stefan Behnel wrote: >> >> Stefan Behnel, 07.05.2012 15:04: >>> >>> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>> >>>> BTW, with the coming of memoryviews, me and Mark talked about just >>>> deprecating the "mytype[...]" meaning buffers, and rather treat it as >>>> np.ndarray, array.array etc. being some sort of "template types". That >>>> is, >>>> we disallow "object[int]" and require some special declarations in the >>>> relevant pxd files. >>> >>> >>> Hmm, yes, it's unfortunate that we have two different types of syntax >>> now, >>> one that declares the item type before the brackets and one that declares >>> it afterwards. >> >> >> I actually think this merits some more discussion. Should we consider the >> buffer interface syntax deprecated and focus on the memory view syntax? > > > I think that's the very-long-term intention. Then again, it may be too early > to really tell yet, we just need to see how the memory views play out in > real life and whether they'll be able to replace np.ndarray[double] among > real users. We don't want to shove things down users throats. > > But the use of the trailing-[] syntax needs some cleaning up. Me and Mark > agreed we'd put this proposal forward when we got around to it: > > ?- Deprecate the "object[double]" form, where [dtype] can be stuck on any > extension type > > ?- But, do NOT (for the next year at least) deprecate np.ndarray[double], > array.array[double], etc. Basically, there should be a magic flag in > extension type declarations saying "I can be a buffer". > > For one thing, that is sort of needed to open up things for templated cdef > classes/fused types cdef classes, if that is ever implemented. Deprecating is definitely a good start. I think at least if you only allow two types as buffers it will be at least reasonably clear when one is dealing with fused types or buffers. Basically, I think memoryviews should live up to demands of the users, which would mean there would be no reason to keep the buffer syntax. One thing to do is make memoryviews coerce cheaply back to the original objects if wanted (which is likely). Writting np.asarray(mymemview) is kind of annoying. Also, OT (sorry), but I'm kind of worried about the memoryview ABI. If it changes (and I intend to do so), cython modules compiled with different cython versions will become incompatible if they call each other through pxds. Maybe that should be defined as UB... > The semantic meaning of trailing [] is still sort of like the C++ meaning; > that it templates the argument types (except it's lots of special cases in > the compiler for various things rather than a Turing-complete template > language...) > > Dag > >> >> The words-to-punctuation ratio of the latter may hurt the eyes when >> encountering it unprepared, but at least it doesn't require two type >> names, >> of which the one before the brackets (i.e. "object") is mostly useless. >> (Although it does reflect the notion that we are dealing with an object >> here ...) >> >> Stefan >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From stefan_ml at behnel.de Mon May 7 18:52:56 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 07 May 2012 18:52:56 +0200 Subject: [Cython] checking for "None" in nogil function In-Reply-To: <4FA7EEBE.7060508@astro.uio.no> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7EEBE.7060508@astro.uio.no> Message-ID: <4FA7FDE8.9050807@behnel.de> Dag Sverre Seljebotn, 07.05.2012 17:48: > On 05/07/2012 03:04 PM, Stefan Behnel wrote: >> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>>> As far as I can remember (which might be biased towards my personal >>>>> view), the conclusion was that we left the current semantics in place, >>>>> relying on better control flow analysis to make None-checks cheaper, and >>>>> when those are cheap enough, make the nonecheck directive default to >>>>> True >>>> >>>> At least for buffer arguments, it silently corrupts data or segfaults in >>>> the current state of affairs, as you pointed out. Not exactly ideal. >>> >>> No different than writing to a field in a cdef class... >> >> Hmm, aren't those None checked? At least cdef method calls are AFAIR. > > Not at all. That's my whole point -- currently, the rule for None in Cython > is "it's your responsibility to never do a native operation on None". > > I don't like that either, but that's just inherited from Pyrex (and many > projects would get speed regressions etc.). > > I'm not against changing that to "we safely None-check", if done nicely -- > it's just that that should be done everywhere at once. I think that gets both of us back on the same track then. :) > In current master (and as far back as I can remember), this code: > > cdef class A: > cdef int field > cdef int method(self): > print self.field > def f(): > cdef A a = None > a.field = 3 > a.method() > > Turns into: > > __pyx_v_a = ((struct __pyx_obj_5test2_A *)Py_None); > __pyx_v_a->field = 3; > ((struct __pyx_vtabstruct_5test2_A *) > __pyx_v_a->__pyx_vtab)->method(__pyx_v_a); Guess I've just been working on the builtins optimiser too long. There, it's obviously not allowed to inject unprotected code like this automatically. It would be fun if we could eventually get to the point where Cython replaces all of the code in f() with an AttributeError, as a combined effort of control flow analysis and dead code removal. A part of that is already there, i.e. Cython would know that 'a' "may be None" in the last two lines and would thus generate a None check with an AttributeError if we allowed it to do that. It wouldn't know that it's always going to be raised, though, so the dead code removal can't strike. I guess that case is just not important enough to implement. BTW, I recently tried to enable None checks in a couple of places and it broke memory views for some reason that I didn't want to investigate. The main problems really seem to be unknown argument values and the lack of proper exception prediction, e.g. in this case: def add_one_2d(int[:,:] buf): for x in xrange(buf.shape[0]): for y in xrange(buf.shape[1]): buf[x,y] += 1 it's statically obvious that only the first access to .shape (outside of all loops) needs a None check and will raise an AttributeError for None, so the check for the second loop can be eliminated as well as the None check on indexing. >> I think we should really get back to the habit of making code safe first >> and fast afterwards. > > Nobody has argued otherwise for some time (since the cdivision thread I > believe), this is all about Pyrex legacy. Guess part of the story is that > there's lots of performance-sensitive code in SAGE using cdef classes which > was written in Pyrex before Cython was around... > > In fact, the nonecheck directive was written by yours truly! And I argued > for making it the default at the time! I've been working on the None checks (and on removing them) repeatedly, although I didn't remember the particular details of discussing the nonecheck directive. Stefan From stefan_ml at behnel.de Mon May 7 19:00:47 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 07 May 2012 19:00:47 +0200 Subject: [Cython] buffer syntax vs. memory view syntax In-Reply-To: References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7D940.5030607@behnel.de> <4FA7F194.5080008@astro.uio.no> Message-ID: <4FA7FFBF.4010905@behnel.de> mark florisson, 07.05.2012 18:28: > On 7 May 2012 17:00, Dag Sverre Seljebotn wrote: >> On 05/07/2012 04:16 PM, Stefan Behnel wrote: >>> Stefan Behnel, 07.05.2012 15:04: >>>> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>>> BTW, with the coming of memoryviews, me and Mark talked about just >>>>> deprecating the "mytype[...]" meaning buffers, and rather treat it as >>>>> np.ndarray, array.array etc. being some sort of "template types". That >>>>> is, >>>>> we disallow "object[int]" and require some special declarations in the >>>>> relevant pxd files. >>>> >>>> Hmm, yes, it's unfortunate that we have two different types of syntax >>>> now, >>>> one that declares the item type before the brackets and one that declares >>>> it afterwards. >>> >>> I actually think this merits some more discussion. Should we consider the >>> buffer interface syntax deprecated and focus on the memory view syntax? >> >> I think that's the very-long-term intention. Then again, it may be too early >> to really tell yet, we just need to see how the memory views play out in >> real life and whether they'll be able to replace np.ndarray[double] among >> real users. We don't want to shove things down users throats. >> >> But the use of the trailing-[] syntax needs some cleaning up. Me and Mark >> agreed we'd put this proposal forward when we got around to it: >> >> - Deprecate the "object[double]" form, where [dtype] can be stuck on any >> extension type >> >> - But, do NOT (for the next year at least) deprecate np.ndarray[double], >> array.array[double], etc. Basically, there should be a magic flag in >> extension type declarations saying "I can be a buffer". >> >> For one thing, that is sort of needed to open up things for templated cdef >> classes/fused types cdef classes, if that is ever implemented. > > Deprecating is definitely a good start. Then the first step on that road is to rework the documentation so that it pushes users into going for memory views instead of the plain buffer syntax. > I think at least if you only > allow two types as buffers it will be at least reasonably clear when > one is dealing with fused types or buffers. > > Basically, I think memoryviews should live up to demands of the users, > which would mean there would be no reason to keep the buffer syntax. > One thing to do is make memoryviews coerce cheaply back to the > original objects if wanted (which is likely). Writting > np.asarray(mymemview) is kind of annoying. ... and also doesn't do the same thing, I believe. > Also, OT (sorry), but I'm kind of worried about the memoryview ABI. If > it changes (and I intend to do so), cython modules compiled with > different cython versions will become incompatible if they call each > other through pxds. Maybe that should be defined as UB... Would there be a way to only use the plain buffer interface for cross module memory view exchange? That could be an acceptable overhead to pay for ABI independence. Stefan From stefan_ml at behnel.de Mon May 7 19:06:17 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 07 May 2012 19:06:17 +0200 Subject: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function In-Reply-To: References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7F33A.9020903@astro.uio.no> <4FA7F477.30701@behnel.de> Message-ID: <4FA80109.9020201@behnel.de> mark florisson, 07.05.2012 18:18: > On 7 May 2012 17:16, mark florisson wrote: >> On 7 May 2012 17:12, Stefan Behnel wrote: >>> Dag Sverre Seljebotn, 07.05.2012 18:07: >>>> On 05/07/2012 06:04 PM, mark florisson wrote: >>>>> On 7 May 2012 12:10, Stefan Behnel wrote: >>>>>> BTW, is there a reason why we shouldn't allow a "not None" declaration for >>>>>> cdef functions? Obviously, the caller would have to do the check in that >>>>>> case. >>>>> >>>>> Why can't the callee just check it? If it's None, just raise an >>>>> exception like usual? >>>> >>>> It's just that there's a lot more potential for rather easy optimization if >>>> the caller does it. >>> >>> Exactly. The NoneCheckNode is easy to get rid of at any stage in the >>> pipeline, whereas a hard coded None check has a fixed cost at runtime. >> >> I see, yes. I expect a pointer comparison to be reasonably >> insignificant compared to function call overhead, but it would also >> reduce the code in the instruction cache. If you take the address of >> the function though, or if you declare it public in a pxd, you >> probably don't want to do that, as you still want to be safe when >> called from C. Could do the same trick as in the 'less annotations' >> CEP though, that would be nice. > > ... or you could document that 'not None' means the caller cannot pass > it in, but that would be weird as you could do it from Cython and get > an exception, but not from C :) That would be better specified in the > documentation of the function as its contract or whatever. "not None" on a cdef function means what all declarations on cdef functions mean: the caller is responsible for doing the appropriate type conversions and checks. If a function accepts an int32 and the caller puts a float32 on the stack, it's not the fault of the callee. The same applies to extension type arguments and None checks. Stefan From stefan_ml at behnel.de Mon May 7 19:08:40 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 07 May 2012 19:08:40 +0200 Subject: [Cython] 0.17 In-Reply-To: References:

Message-ID: <4FA80198.8070303@behnel.de> mark florisson, 07.05.2012 18:19: > On 7 May 2012 17:04, Vitja Makarov wrote: >> Hmm, it seems to me that master is currently broken: >> >> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/BACKEND=c,PYVERSION=py27-ext/ >> > Quite broken, in fact :) It doesn't ever print error messages property anymore. Yes, Robert broke the compiler error processing while trying to fix it up for parallel compilation. https://github.com/cython/cython/commit/5d1fddb87fd68991e7fbc79c469273398638b6ff Stefan From markflorisson88 at gmail.com Mon May 7 19:08:52 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 7 May 2012 18:08:52 +0100 Subject: [Cython] buffer syntax vs. memory view syntax In-Reply-To: <4FA7FFBF.4010905@behnel.de> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7D940.5030607@behnel.de> <4FA7F194.5080008@astro.uio.no> <4FA7FFBF.4010905@behnel.de> Message-ID: On 7 May 2012 18:00, Stefan Behnel wrote: > mark florisson, 07.05.2012 18:28: >> On 7 May 2012 17:00, Dag Sverre Seljebotn wrote: >>> On 05/07/2012 04:16 PM, Stefan Behnel wrote: >>>> Stefan Behnel, 07.05.2012 15:04: >>>>> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>>>> BTW, with the coming of memoryviews, me and Mark talked about just >>>>>> deprecating the "mytype[...]" meaning buffers, and rather treat it as >>>>>> np.ndarray, array.array etc. being some sort of "template types". That >>>>>> is, >>>>>> we disallow "object[int]" and require some special declarations in the >>>>>> relevant pxd files. >>>>> >>>>> Hmm, yes, it's unfortunate that we have two different types of syntax >>>>> now, >>>>> one that declares the item type before the brackets and one that declares >>>>> it afterwards. >>>> >>>> I actually think this merits some more discussion. Should we consider the >>>> buffer interface syntax deprecated and focus on the memory view syntax? >>> >>> I think that's the very-long-term intention. Then again, it may be too early >>> to really tell yet, we just need to see how the memory views play out in >>> real life and whether they'll be able to replace np.ndarray[double] among >>> real users. We don't want to shove things down users throats. >>> >>> But the use of the trailing-[] syntax needs some cleaning up. Me and Mark >>> agreed we'd put this proposal forward when we got around to it: >>> >>> ?- Deprecate the "object[double]" form, where [dtype] can be stuck on any >>> extension type >>> >>> ?- But, do NOT (for the next year at least) deprecate np.ndarray[double], >>> array.array[double], etc. Basically, there should be a magic flag in >>> extension type declarations saying "I can be a buffer". >>> >>> For one thing, that is sort of needed to open up things for templated cdef >>> classes/fused types cdef classes, if that is ever implemented. >> >> Deprecating is definitely a good start. > > Then the first step on that road is to rework the documentation so that it > pushes users into going for memory views instead of the plain buffer syntax. > Well, memoryviews are not yet entirely bug free (although the next release will aim to fix the problems pointed out by users so far), and they also have some other problems. >> I think at least if you only >> allow two types as buffers it will be at least reasonably clear when >> one is dealing with fused types or buffers. >> >> Basically, I think memoryviews should live up to demands of the users, >> which would mean there would be no reason to keep the buffer syntax. >> One thing to do is make memoryviews coerce cheaply back to the >> original objects if wanted (which is likely). Writting >> np.asarray(mymemview) is kind of annoying. > > ... and also doesn't do the same thing, I believe. > > >> Also, OT (sorry), but I'm kind of worried about the memoryview ABI. If >> it changes (and I intend to do so), cython modules compiled with >> different cython versions will become incompatible if they call each >> other through pxds. Maybe that should be defined as UB... > > Would there be a way to only use the plain buffer interface for cross > module memory view exchange? That could be an acceptable overhead to pay > for ABI independence. I want to store extra flags and pointers in there as well, so I don't think that will be enough. It will also be rather annoying and complicate calling code. > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From d.s.seljebotn at astro.uio.no Mon May 7 19:10:42 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 07 May 2012 19:10:42 +0200 Subject: [Cython] buffer syntax vs. memory view syntax In-Reply-To: <4FA7FFBF.4010905@behnel.de> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7D940.5030607@behnel.de> <4FA7F194.5080008@astro.uio.no> <4FA7FFBF.4010905@behnel.de> Message-ID: <4FA80212.1030808@astro.uio.no> On 05/07/2012 07:00 PM, Stefan Behnel wrote: > mark florisson, 07.05.2012 18:28: >> On 7 May 2012 17:00, Dag Sverre Seljebotn wrote: >>> On 05/07/2012 04:16 PM, Stefan Behnel wrote: >>>> Stefan Behnel, 07.05.2012 15:04: >>>>> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>>>> BTW, with the coming of memoryviews, me and Mark talked about just >>>>>> deprecating the "mytype[...]" meaning buffers, and rather treat it as >>>>>> np.ndarray, array.array etc. being some sort of "template types". That >>>>>> is, >>>>>> we disallow "object[int]" and require some special declarations in the >>>>>> relevant pxd files. >>>>> >>>>> Hmm, yes, it's unfortunate that we have two different types of syntax >>>>> now, >>>>> one that declares the item type before the brackets and one that declares >>>>> it afterwards. >>>> >>>> I actually think this merits some more discussion. Should we consider the >>>> buffer interface syntax deprecated and focus on the memory view syntax? >>> >>> I think that's the very-long-term intention. Then again, it may be too early >>> to really tell yet, we just need to see how the memory views play out in >>> real life and whether they'll be able to replace np.ndarray[double] among >>> real users. We don't want to shove things down users throats. >>> >>> But the use of the trailing-[] syntax needs some cleaning up. Me and Mark >>> agreed we'd put this proposal forward when we got around to it: >>> >>> - Deprecate the "object[double]" form, where [dtype] can be stuck on any >>> extension type >>> >>> - But, do NOT (for the next year at least) deprecate np.ndarray[double], >>> array.array[double], etc. Basically, there should be a magic flag in >>> extension type declarations saying "I can be a buffer". >>> >>> For one thing, that is sort of needed to open up things for templated cdef >>> classes/fused types cdef classes, if that is ever implemented. >> >> Deprecating is definitely a good start. > > Then the first step on that road is to rework the documentation so that it > pushes users into going for memory views instead of the plain buffer syntax. -1, premature. Dag > > >> I think at least if you only >> allow two types as buffers it will be at least reasonably clear when >> one is dealing with fused types or buffers. >> >> Basically, I think memoryviews should live up to demands of the users, >> which would mean there would be no reason to keep the buffer syntax. >> One thing to do is make memoryviews coerce cheaply back to the >> original objects if wanted (which is likely). Writting >> np.asarray(mymemview) is kind of annoying. > > ... and also doesn't do the same thing, I believe. > > >> Also, OT (sorry), but I'm kind of worried about the memoryview ABI. If >> it changes (and I intend to do so), cython modules compiled with >> different cython versions will become incompatible if they call each >> other through pxds. Maybe that should be defined as UB... > > Would there be a way to only use the plain buffer interface for cross > module memory view exchange? That could be an acceptable overhead to pay > for ABI independence. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Mon May 7 19:13:15 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 7 May 2012 18:13:15 +0100 Subject: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function In-Reply-To: <4FA80109.9020201@behnel.de> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7F33A.9020903@astro.uio.no> <4FA7F477.30701@behnel.de> <4FA80109.9020201@behnel.de> Message-ID: On 7 May 2012 18:06, Stefan Behnel wrote: > mark florisson, 07.05.2012 18:18: >> On 7 May 2012 17:16, mark florisson wrote: >>> On 7 May 2012 17:12, Stefan Behnel wrote: >>>> Dag Sverre Seljebotn, 07.05.2012 18:07: >>>>> On 05/07/2012 06:04 PM, mark florisson wrote: >>>>>> On 7 May 2012 12:10, Stefan Behnel wrote: >>>>>>> BTW, is there a reason why we shouldn't allow a "not None" declaration for >>>>>>> cdef functions? Obviously, the caller would have to do the check in that >>>>>>> case. >>>>>> >>>>>> Why can't the callee just check it? If it's None, just raise an >>>>>> exception like usual? >>>>> >>>>> It's just that there's a lot more potential for rather easy optimization if >>>>> the caller does it. >>>> >>>> Exactly. The NoneCheckNode is easy to get rid of at any stage in the >>>> pipeline, whereas a hard coded None check has a fixed cost at runtime. >>> >>> I see, yes. I expect a pointer comparison to be reasonably >>> insignificant compared to function call overhead, but it would also >>> reduce the code in the instruction cache. If you take the address of >>> the function though, or if you declare it public in a pxd, you >>> probably don't want to do that, as you still want to be safe when >>> called from C. Could do the same trick as in the 'less annotations' >>> CEP though, that would be nice. >> >> ... or you could document that 'not None' means the caller cannot pass >> it in, but that would be weird as you could do it from Cython and get >> an exception, but not from C :) That would be better specified in the >> documentation of the function as its contract or whatever. > > "not None" on a cdef function means what all declarations on cdef functions > mean: the caller is responsible for doing the appropriate type conversions > and checks. > > If a function accepts an int32 and the caller puts a float32 on the stack, > it's not the fault of the callee. The same applies to extension type > arguments and None checks. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Well, 'with gil' makes the callee do something. I would personally expect not None to be enforced at least conceptually in the function itself. In any case, I also think it's really not an important issue, as it's likely pretty uncommon to call it from C. If it does break, it will be easy enough to figure out (unless you accidentally corrupt your memory :) So either solution would be fine with me. From markflorisson88 at gmail.com Mon May 7 19:18:08 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 7 May 2012 18:18:08 +0100 Subject: [Cython] checking for "None" in nogil function In-Reply-To: <4FA7FDE8.9050807@behnel.de> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7EEBE.7060508@astro.uio.no> <4FA7FDE8.9050807@behnel.de> Message-ID: On 7 May 2012 17:52, Stefan Behnel wrote: > Dag Sverre Seljebotn, 07.05.2012 17:48: >> On 05/07/2012 03:04 PM, Stefan Behnel wrote: >>> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>>>> As far as I can remember (which might be biased towards my personal >>>>>> view), the conclusion was that we left the current semantics in place, >>>>>> relying on better control flow analysis to make None-checks cheaper, and >>>>>> when those are cheap enough, make the nonecheck directive default to >>>>>> True >>>>> >>>>> At least for buffer arguments, it silently corrupts data or segfaults in >>>>> the current state of affairs, as you pointed out. Not exactly ideal. >>>> >>>> No different than writing to a field in a cdef class... >>> >>> Hmm, aren't those None checked? At least cdef method calls are AFAIR. >> >> Not at all. That's my whole point -- currently, the rule for None in Cython >> is "it's your responsibility to never do a native operation on None". >> >> I don't like that either, but that's just inherited from Pyrex (and many >> projects would get speed regressions etc.). >> >> I'm not against changing that to "we safely None-check", if done nicely -- >> it's just that that should be done everywhere at once. > > I think that gets both of us back on the same track then. :) > > >> In current master (and as far back as I can remember), this code: >> >> cdef class A: >> ? ? cdef int field >> ? ? cdef int method(self): >> ? ? ? ? print self.field >> def f(): >> ? ? cdef A a = None >> ? ? a.field = 3 >> ? ? a.method() >> >> Turns into: >> >> ? __pyx_v_a = ((struct __pyx_obj_5test2_A *)Py_None); >> ? __pyx_v_a->field = 3; >> ? ((struct __pyx_vtabstruct_5test2_A *) >> __pyx_v_a->__pyx_vtab)->method(__pyx_v_a); > > Guess I've just been working on the builtins optimiser too long. There, > it's obviously not allowed to inject unprotected code like this automatically. > > It would be fun if we could eventually get to the point where Cython > replaces all of the code in f() with an AttributeError, as a combined > effort of control flow analysis and dead code removal. A part of that is > already there, i.e. Cython would know that 'a' "may be None" in the last > two lines and would thus generate a None check with an AttributeError if we > allowed it to do that. It wouldn't know that it's always going to be > raised, though, so the dead code removal can't strike. I guess that case is > just not important enough to implement. > > BTW, I recently tried to enable None checks in a couple of places and it > broke memory views for some reason that I didn't want to investigate. If you do want to implement it, don't hesitate to ask about any memoryview shenanigans a certain person implemented. > The > main problems really seem to be unknown argument values and the lack of > proper exception prediction, e.g. in this case: > > ?def add_one_2d(int[:,:] buf): > ? ? ?for x in xrange(buf.shape[0]): > ? ? ? ? ?for y in xrange(buf.shape[1]): > ? ? ? ? ? ? ?buf[x,y] += 1 > > it's statically obvious that only the first access to .shape (outside of > all loops) needs a None check and will raise an AttributeError for None, so > the check for the second loop can be eliminated as well as the None check > on indexing. > Yes. This can be generalized to common subexpression elimination, for bounds checking, for nonechecking, even for wraparound. >>> I think we should really get back to the habit of making code safe first >>> and fast afterwards. >> >> Nobody has argued otherwise for some time (since the cdivision thread I >> believe), this is all about Pyrex legacy. Guess part of the story is that >> there's lots of performance-sensitive code in SAGE using cdef classes which >> was written in Pyrex before Cython was around... >> >> In fact, the nonecheck directive was written by yours truly! And I argued >> for making it the default at the time! > > I've been working on the None checks (and on removing them) repeatedly, > although I didn't remember the particular details of discussing the > nonecheck directive. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Mon May 7 19:20:44 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 7 May 2012 18:20:44 +0100 Subject: [Cython] checking for "None" in nogil function In-Reply-To: References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7EEBE.7060508@astro.uio.no> <4FA7FDE8.9050807@behnel.de> Message-ID: On 7 May 2012 18:18, mark florisson wrote: > On 7 May 2012 17:52, Stefan Behnel wrote: >> Dag Sverre Seljebotn, 07.05.2012 17:48: >>> On 05/07/2012 03:04 PM, Stefan Behnel wrote: >>>> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>>>>> As far as I can remember (which might be biased towards my personal >>>>>>> view), the conclusion was that we left the current semantics in place, >>>>>>> relying on better control flow analysis to make None-checks cheaper, and >>>>>>> when those are cheap enough, make the nonecheck directive default to >>>>>>> True >>>>>> >>>>>> At least for buffer arguments, it silently corrupts data or segfaults in >>>>>> the current state of affairs, as you pointed out. Not exactly ideal. >>>>> >>>>> No different than writing to a field in a cdef class... >>>> >>>> Hmm, aren't those None checked? At least cdef method calls are AFAIR. >>> >>> Not at all. That's my whole point -- currently, the rule for None in Cython >>> is "it's your responsibility to never do a native operation on None". >>> >>> I don't like that either, but that's just inherited from Pyrex (and many >>> projects would get speed regressions etc.). >>> >>> I'm not against changing that to "we safely None-check", if done nicely -- >>> it's just that that should be done everywhere at once. >> >> I think that gets both of us back on the same track then. :) >> >> >>> In current master (and as far back as I can remember), this code: >>> >>> cdef class A: >>> ? ? cdef int field >>> ? ? cdef int method(self): >>> ? ? ? ? print self.field >>> def f(): >>> ? ? cdef A a = None >>> ? ? a.field = 3 >>> ? ? a.method() >>> >>> Turns into: >>> >>> ? __pyx_v_a = ((struct __pyx_obj_5test2_A *)Py_None); >>> ? __pyx_v_a->field = 3; >>> ? ((struct __pyx_vtabstruct_5test2_A *) >>> __pyx_v_a->__pyx_vtab)->method(__pyx_v_a); >> >> Guess I've just been working on the builtins optimiser too long. There, >> it's obviously not allowed to inject unprotected code like this automatically. >> >> It would be fun if we could eventually get to the point where Cython >> replaces all of the code in f() with an AttributeError, as a combined >> effort of control flow analysis and dead code removal. A part of that is >> already there, i.e. Cython would know that 'a' "may be None" in the last >> two lines and would thus generate a None check with an AttributeError if we >> allowed it to do that. It wouldn't know that it's always going to be >> raised, though, so the dead code removal can't strike. I guess that case is >> just not important enough to implement. >> >> BTW, I recently tried to enable None checks in a couple of places and it >> broke memory views for some reason that I didn't want to investigate. > > If you do want to implement it, don't hesitate to ask about any > memoryview shenanigans a certain person implemented. > >> The >> main problems really seem to be unknown argument values and the lack of >> proper exception prediction, e.g. in this case: >> >> ?def add_one_2d(int[:,:] buf): >> ? ? ?for x in xrange(buf.shape[0]): >> ? ? ? ? ?for y in xrange(buf.shape[1]): >> ? ? ? ? ? ? ?buf[x,y] += 1 >> >> it's statically obvious that only the first access to .shape (outside of >> all loops) needs a None check and will raise an AttributeError for None, so >> the check for the second loop can be eliminated as well as the None check >> on indexing. >> > > Yes. This can be generalized to common subexpression elimination, for > bounds checking, for nonechecking, even for wraparound. Given the awesome control flow we have now, I don't think implementing SSA is very hard at all. From there it's also not too hard to implement these things. Pulling these things out of loops is slightly harder though, given guards etc, so you need two implementations, one with all checks in there, and one without any checks. You take the checking version when your conditions outside the loop don't match, as you need to raise the (potential) exception at the right point. >>>> I think we should really get back to the habit of making code safe first >>>> and fast afterwards. >>> >>> Nobody has argued otherwise for some time (since the cdivision thread I >>> believe), this is all about Pyrex legacy. Guess part of the story is that >>> there's lots of performance-sensitive code in SAGE using cdef classes which >>> was written in Pyrex before Cython was around... >>> >>> In fact, the nonecheck directive was written by yours truly! And I argued >>> for making it the default at the time! >> >> I've been working on the None checks (and on removing them) repeatedly, >> although I didn't remember the particular details of discussing the >> nonecheck directive. >> >> Stefan >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel From d.s.seljebotn at astro.uio.no Mon May 7 20:40:50 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 07 May 2012 20:40:50 +0200 Subject: [Cython] buffer syntax vs. memory view syntax In-Reply-To: References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7D940.5030607@behnel.de> <4FA7F194.5080008@astro.uio.no> Message-ID: <95c0afc3-08f4-47d1-8649-7b80f931be54@email.android.com> mark florisson wrote: >On 7 May 2012 17:00, Dag Sverre Seljebotn >wrote: >> On 05/07/2012 04:16 PM, Stefan Behnel wrote: >>> >>> Stefan Behnel, 07.05.2012 15:04: >>>> >>>> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>>> >>>>> BTW, with the coming of memoryviews, me and Mark talked about just >>>>> deprecating the "mytype[...]" meaning buffers, and rather treat it >as >>>>> np.ndarray, array.array etc. being some sort of "template types". >That >>>>> is, >>>>> we disallow "object[int]" and require some special declarations in >the >>>>> relevant pxd files. >>>> >>>> >>>> Hmm, yes, it's unfortunate that we have two different types of >syntax >>>> now, >>>> one that declares the item type before the brackets and one that >declares >>>> it afterwards. >>> >>> >>> I actually think this merits some more discussion. Should we >consider the >>> buffer interface syntax deprecated and focus on the memory view >syntax? >> >> >> I think that's the very-long-term intention. Then again, it may be >too early >> to really tell yet, we just need to see how the memory views play out >in >> real life and whether they'll be able to replace np.ndarray[double] >among >> real users. We don't want to shove things down users throats. >> >> But the use of the trailing-[] syntax needs some cleaning up. Me and >Mark >> agreed we'd put this proposal forward when we got around to it: >> >> ?- Deprecate the "object[double]" form, where [dtype] can be stuck on >any >> extension type >> >> ?- But, do NOT (for the next year at least) deprecate >np.ndarray[double], >> array.array[double], etc. Basically, there should be a magic flag in >> extension type declarations saying "I can be a buffer". >> >> For one thing, that is sort of needed to open up things for templated >cdef >> classes/fused types cdef classes, if that is ever implemented. > >Deprecating is definitely a good start. I think at least if you only >allow two types as buffers it will be at least reasonably clear when >one is dealing with fused types or buffers. > >Basically, I think memoryviews should live up to demands of the users, >which would mean there would be no reason to keep the buffer syntax. But they are different approaches -- use a different type/API, or just try to speed up parts of NumPy.. >One thing to do is make memoryviews coerce cheaply back to the >original objects if wanted (which is likely). Writting >np.asarray(mymemview) is kind of annoying. > It is going to be very confusing to have type(mymemview), repr(mymemview), and so on come out as NumPy arrays, but not have the full API of NumPy. Unless you auto-convert on getattr to... If you want to eradicate the distinction between the backing array and the memory view and make it transparent, I really suggest you kick back alive np.ndarray (it can exist in some 'unrealized' state with delayed construction after slicing, and so on). Implementation much the same either way, it is all about how it is presented to the user. Something like mymemview.asobject() could work though, and while not much shorter, it would have some polymorphism that np.asarray does not have (based probably on some custom PEP 3118 extension) Dag >Also, OT (sorry), but I'm kind of worried about the memoryview ABI. If >it changes (and I intend to do so), cython modules compiled with >different cython versions will become incompatible if they call each >other through pxds. Maybe that should be defined as UB... > >> The semantic meaning of trailing [] is still sort of like the C++ >meaning; >> that it templates the argument types (except it's lots of special >cases in >> the compiler for various things rather than a Turing-complete >template >> language...) >> >> Dag >> >>> >>> The words-to-punctuation ratio of the latter may hurt the eyes when >>> encountering it unprepared, but at least it doesn't require two type >>> names, >>> of which the one before the brackets (i.e. "object") is mostly >useless. >>> (Although it does reflect the notion that we are dealing with an >object >>> here ...) >>> >>> Stefan >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> >> >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >_______________________________________________ >cython-devel mailing list >cython-devel at python.org >http://mail.python.org/mailman/listinfo/cython-devel -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From markflorisson88 at gmail.com Mon May 7 23:21:04 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 7 May 2012 22:21:04 +0100 Subject: [Cython] buffer syntax vs. memory view syntax In-Reply-To: <95c0afc3-08f4-47d1-8649-7b80f931be54@email.android.com> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7D940.5030607@behnel.de> <4FA7F194.5080008@astro.uio.no> <95c0afc3-08f4-47d1-8649-7b80f931be54@email.android.com> Message-ID: On 7 May 2012 19:40, Dag Sverre Seljebotn wrote: > > > mark florisson wrote: > >>On 7 May 2012 17:00, Dag Sverre Seljebotn >>wrote: >>> On 05/07/2012 04:16 PM, Stefan Behnel wrote: >>>> >>>> Stefan Behnel, 07.05.2012 15:04: >>>>> >>>>> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>>>> >>>>>> BTW, with the coming of memoryviews, me and Mark talked about just >>>>>> deprecating the "mytype[...]" meaning buffers, and rather treat it >>as >>>>>> np.ndarray, array.array etc. being some sort of "template types". >>That >>>>>> is, >>>>>> we disallow "object[int]" and require some special declarations in >>the >>>>>> relevant pxd files. >>>>> >>>>> >>>>> Hmm, yes, it's unfortunate that we have two different types of >>syntax >>>>> now, >>>>> one that declares the item type before the brackets and one that >>declares >>>>> it afterwards. >>>> >>>> >>>> I actually think this merits some more discussion. Should we >>consider the >>>> buffer interface syntax deprecated and focus on the memory view >>syntax? >>> >>> >>> I think that's the very-long-term intention. Then again, it may be >>too early >>> to really tell yet, we just need to see how the memory views play out >>in >>> real life and whether they'll be able to replace np.ndarray[double] >>among >>> real users. We don't want to shove things down users throats. >>> >>> But the use of the trailing-[] syntax needs some cleaning up. Me and >>Mark >>> agreed we'd put this proposal forward when we got around to it: >>> >>> ?- Deprecate the "object[double]" form, where [dtype] can be stuck on >>any >>> extension type >>> >>> ?- But, do NOT (for the next year at least) deprecate >>np.ndarray[double], >>> array.array[double], etc. Basically, there should be a magic flag in >>> extension type declarations saying "I can be a buffer". >>> >>> For one thing, that is sort of needed to open up things for templated >>cdef >>> classes/fused types cdef classes, if that is ever implemented. >> >>Deprecating is definitely a good start. I think at least if you only >>allow two types as buffers it will be at least reasonably clear when >>one is dealing with fused types or buffers. >> >>Basically, I think memoryviews should live up to demands of the users, >>which would mean there would be no reason to keep the buffer syntax. > > But they are different approaches -- use a different type/API, or just try to speed up parts of NumPy.. > >>One thing to do is make memoryviews coerce cheaply back to the >>original objects if wanted (which is likely). Writting >>np.asarray(mymemview) is kind of annoying. >> > > > It is going to be very confusing to have type(mymemview), repr(mymemview), and so on come out as NumPy arrays, but not have the full API of NumPy. Unless you auto-convert on getattr to... Yeah, the idea is as very simple, as you mention, just keep the object around cached, and when you slice construct one lazily. > If you want to eradicate the distinction between the backing array and the memory view and make it transparent, I really suggest you kick back alive np.ndarray (it can exist in some 'unrealized' state with delayed construction after slicing, and so on). Implementation much the same either way, it is all about how it is presented to the user. You mean the buffer syntax? > Something like mymemview.asobject() could work though, and while not much shorter, it would have some polymorphism that np.asarray does not have (based probably on some custom PEP 3118 extension) I was thinking you could allow the user to register a callback, and use that to coerce from a memoryview back to an object (given a memoryview object). For numpy this would be np.asarray, and the implementation is allowed to cache the result (which it will). It may be too magicky though... but it will be convenient. The memoryview will act as a subclass, meaning that any of its methods will override methods of the converted object. > Dag > > > >>Also, OT (sorry), but I'm kind of worried about the memoryview ABI. If >>it changes (and I intend to do so), cython modules compiled with >>different cython versions will become incompatible if they call each >>other through pxds. Maybe that should be defined as UB... >> >>> The semantic meaning of trailing [] is still sort of like the C++ >>meaning; >>> that it templates the argument types (except it's lots of special >>cases in >>> the compiler for various things rather than a Turing-complete >>template >>> language...) >>> >>> Dag >>> >>>> >>>> The words-to-punctuation ratio of the latter may hurt the eyes when >>>> encountering it unprepared, but at least it doesn't require two type >>>> names, >>>> of which the one before the brackets (i.e. "object") is mostly >>useless. >>>> (Although it does reflect the notion that we are dealing with an >>object >>>> here ...) >>>> >>>> Stefan >>>> _______________________________________________ >>>> cython-devel mailing list >>>> cython-devel at python.org >>>> http://mail.python.org/mailman/listinfo/cython-devel >>> >>> >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >>_______________________________________________ >>cython-devel mailing list >>cython-devel at python.org >>http://mail.python.org/mailman/listinfo/cython-devel > > -- > Sent from my Android phone with K-9 Mail. Please excuse my brevity. > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From robertwb at gmail.com Tue May 8 00:35:29 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Mon, 7 May 2012 15:35:29 -0700 Subject: [Cython] buffer syntax vs. memory view syntax In-Reply-To: <95c0afc3-08f4-47d1-8649-7b80f931be54@email.android.com> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7D940.5030607@behnel.de> <4FA7F194.5080008@astro.uio.no> <95c0afc3-08f4-47d1-8649-7b80f931be54@email.android.com> Message-ID: On Mon, May 7, 2012 at 11:40 AM, Dag Sverre Seljebotn wrote: > > mark florisson wrote: > >>On 7 May 2012 17:00, Dag Sverre Seljebotn >>wrote: >>> On 05/07/2012 04:16 PM, Stefan Behnel wrote: >>>> >>>> Stefan Behnel, 07.05.2012 15:04: >>>>> >>>>> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>>>> >>>>>> BTW, with the coming of memoryviews, me and Mark talked about just >>>>>> deprecating the "mytype[...]" meaning buffers, and rather treat it >>as >>>>>> np.ndarray, array.array etc. being some sort of "template types". >>That >>>>>> is, >>>>>> we disallow "object[int]" and require some special declarations in >>the >>>>>> relevant pxd files. >>>>> >>>>> >>>>> Hmm, yes, it's unfortunate that we have two different types of >>syntax >>>>> now, >>>>> one that declares the item type before the brackets and one that >>declares >>>>> it afterwards. >>>> >>>> >>>> I actually think this merits some more discussion. Should we >>consider the >>>> buffer interface syntax deprecated and focus on the memory view >>syntax? >>> >>> >>> I think that's the very-long-term intention. Then again, it may be >>too early >>> to really tell yet, we just need to see how the memory views play out >>in >>> real life and whether they'll be able to replace np.ndarray[double] >>among >>> real users. We don't want to shove things down users throats. >>> >>> But the use of the trailing-[] syntax needs some cleaning up. Me and >>Mark >>> agreed we'd put this proposal forward when we got around to it: >>> >>> ?- Deprecate the "object[double]" form, where [dtype] can be stuck on >>any >>> extension type >>> >>> ?- But, do NOT (for the next year at least) deprecate >>np.ndarray[double], >>> array.array[double], etc. Basically, there should be a magic flag in >>> extension type declarations saying "I can be a buffer". >>> >>> For one thing, that is sort of needed to open up things for templated >>cdef >>> classes/fused types cdef classes, if that is ever implemented. >> >>Deprecating is definitely a good start. I think at least if you only >>allow two types as buffers it will be at least reasonably clear when >>one is dealing with fused types or buffers. >> >>Basically, I think memoryviews should live up to demands of the users, >>which would mean there would be no reason to keep the buffer syntax. > > But they are different approaches -- use a different type/API, or just try to speed up parts of NumPy.. Part of the question here is whether using np.ndarray[...] currently (or will) offer any additional functionality. While we should likely start steering people this direction, especially over object[...], it seems too soon to deprecate the old-style buffer access. >>One thing to do is make memoryviews coerce cheaply back to the >>original objects if wanted (which is likely). Writting >>np.asarray(mymemview) is kind of annoying. >> > > > It is going to be very confusing to have type(mymemview), repr(mymemview), and so on come out as NumPy arrays, but not have the full API of NumPy. Unless you auto-convert on getattr to... > > If you want to eradicate the distinction between the backing array and the memory view and make it transparent, I really suggest you kick back alive np.ndarray (it can exist in some 'unrealized' state with delayed construction after slicing, and so on). Implementation much the same either way, it is all about how it is presented to the user. > > Something like mymemview.asobject() could work though, and while not much shorter, it would have some polymorphism that np.asarray does not have (based probably on some custom PEP 3118 extension) I think it's valuable to have a single name refer to both the Python object (on which methods can be called, and a new one might have to be created if there was slicing) and the memory view. In this light, being able to specify something is both a NumPy array (to use some (overlay optimized?) methods on it and a memory view (for fast indexing) without having two different variables can result in much cleaner code (and an easier transition from untyped NumPy). >>Also, OT (sorry), but I'm kind of worried about the memoryview ABI. If >>it changes (and I intend to do so), cython modules compiled with >>different cython versions will become incompatible if they call each >>other through pxds. Maybe that should be defined as UB... >> >>> The semantic meaning of trailing [] is still sort of like the C++ >>meaning; >>> that it templates the argument types (except it's lots of special >>cases in >>> the compiler for various things rather than a Turing-complete >>template >>> language...) >>> >>> Dag >>> >>>> >>>> The words-to-punctuation ratio of the latter may hurt the eyes when >>>> encountering it unprepared, but at least it doesn't require two type >>>> names, >>>> of which the one before the brackets (i.e. "object") is mostly >>useless. >>>> (Although it does reflect the notion that we are dealing with an >>object >>>> here ...) >>>> >>>> Stefan >>>> _______________________________________________ >>>> cython-devel mailing list >>>> cython-devel at python.org >>>> http://mail.python.org/mailman/listinfo/cython-devel >>> >>> >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >>_______________________________________________ >>cython-devel mailing list >>cython-devel at python.org >>http://mail.python.org/mailman/listinfo/cython-devel > > -- > Sent from my Android phone with K-9 Mail. Please excuse my brevity. > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From robertwb at gmail.com Tue May 8 00:42:04 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Mon, 7 May 2012 15:42:04 -0700 Subject: [Cython] 0.17 In-Reply-To: <4FA80198.8070303@behnel.de> References:

<4FA80198.8070303@behnel.de> Message-ID: On Mon, May 7, 2012 at 10:08 AM, Stefan Behnel wrote: > mark florisson, 07.05.2012 18:19: >> On 7 May 2012 17:04, Vitja Makarov wrote: >>> Hmm, it seems to me that master is currently broken: >>> >>> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/BACKEND=c,PYVERSION=py27-ext/ >>> >> Quite broken, in fact :) It doesn't ever print error messages property anymore. > > Yes, Robert broke the compiler error processing while trying to fix it up > for parallel compilation. > > https://github.com/cython/cython/commit/5d1fddb87fd68991e7fbc79c469273398638b6ff Argh, I made that change at the last minute when I was removing a couple of debug print statements. I say we wait another week or so at least to see if any new bug reports come in, but prep a release to be cut soon, but holding it up on any new features that have not gone in yet. - Robert From robertwb at gmail.com Tue May 8 00:52:41 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Mon, 7 May 2012 15:52:41 -0700 Subject: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function In-Reply-To: <4FA7A6B2.5000801@astro.uio.no> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> Message-ID: On Mon, May 7, 2012 at 3:40 AM, Dag Sverre Seljebotn wrote: > [moving to dev list] > > > On 05/07/2012 11:17 AM, Stefan Behnel wrote: >> >> Dag Sverre Seljebotn, 07.05.2012 10:44: >>> >>> On 05/07/2012 07:48 AM, Stefan Behnel wrote: >>>> >>>> shaunc, 07.05.2012 07:13: >>>>> >>>>> The following code: >>>>> >>>>> cdef int foo( double[:] bar ) nogil: >>>>> ? ? ?return bar is None >>>>> >>>>> causes: "Converting to Python object not allowed without gil" >>>>> >>>>> However, I was under the impression that: "When comparing a value with >>>>> None, >>>>> keep in mind that, if x is a Python object, x is None and x is not None >>>>> are >>>>> very efficient because they translate directly to C pointer >>>>> comparisons," >>>>> >>>>> I guess the problem is that the memoryview is not a python object -- >>>>> indeed, this compiles in the form: >>>>> >>>>> cdef int foo( object bar ) nogil: >>>>> >>>>> ? ? ?return bar is None >>>>> >>>>> But this is a bit counterintuitive... do I need to do "with gil" to >>>>> check >>>>> if a memoryview is None? And in a nogil function, I'm not necessarily >>>>> guaranteed that I don't have the gil -- what is the best way ensure I >>>>> have >>>>> the gil? (Is there a "secret system call" or should I use a try block?) >>>>> >>>>> It would seem more appropriate (IMHO, of course :)) to allow "bar is >>>>> None" >>>>> also when bar is a memoryview.... >>>> >>>> >>>> I wonder why a memory view should be allowed to be None in the first >>>> place. >>>> Buffer arguments aren't (because they get unpacked on entry), so why >>>> should >>>> memory views? >>> >>> >>> ? At least when I implemented it, buffers get unpacked but the case of a >>> None buffer is treated specially, and you're fully allowed (and segfault >>> if >>> you [] it). >> >> >> Hmm, ok, maybe I just got confused by the code then. >> >> I think the docs should state that buffer arguments are best used together >> with the "not None" declaration then. > > > I use them with "=None" default values all the time... then do a > None-check manually. > > It's really no different from cdef classes. > > >> And I remember that we wanted to change the default settings for extension >> type arguments from "or None" to "not None" years ago but never actually >> did it. > > > I remember that there was such a debate, but I certainly don't remember > that this was the conclusion :-) I didn't agree with that view then and > I don't now. I don't remember what Robert's view was... > > As far as I can remember (which might be biased towards my personal > view), the conclusion was that we left the current semantics in place, > relying on better control flow analysis to make None-checks cheaper, and > when those are cheap enough, make the nonecheck directive default to > True (Java is sort of prior art that this can indeed be done?). Yes, that was exactly my point of view. - Robert From robertwb at gmail.com Tue May 8 01:13:33 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Mon, 7 May 2012 16:13:33 -0700 Subject: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function In-Reply-To: <4FA7EEBE.7060508@astro.uio.no> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7EEBE.7060508@astro.uio.no> Message-ID: On Mon, May 7, 2012 at 8:48 AM, Dag Sverre Seljebotn wrote: > On 05/07/2012 03:04 PM, Stefan Behnel wrote: >> >> Dag Sverre Seljebotn, 07.05.2012 13:48: >> >>> Here you go: >>> >>> def foo(np.ndarray[double] a, np.ndarray[double] out=None): >>> ? ? if out is None: >>> ? ? ? ? out = np.empty_like(a) >> >> >> Ah, right - output arguments. Hadn't thought of those. >> >> Still, since you pass None explicitly as a default argument, this code >> wouldn't be impacted by disallowing None for buffers by default. That case >> is already handled specially in the compiler. But a better default would >> prevent the *first* argument from being None. >> >> So, basically, it would do the right thing straight away in your case and >> generate safer and more efficient code for it, whereas now you have to >> test >> 'a' for being None explicitly and Cython won't understand that hint due to >> insufficient static analysis. At least, since my last commit you can make >> Cython do the same thing by declaring it "not None". > > > Yes, thanks! > > >>>>> It's really no different from cdef classes. >>>> >>>> >>>> I find it at least a bit more surprising because a buffer unpacking >>>> argument is a rather strong hint that you expect something that supports >>>> this protocol. The fact that you type your function argument with it >>>> hints >>>> at the intention to properly unpack it on entry. I'm sure there are lots >>>> of >>>> users who were or will be surprised when they realise that that doesn't >>>> exclude None values. >>> >>> >>> Whereas I think there would be more users surprised by the opposite. >> >> >> We've had enough complaints from users about None being allowed for typed >> arguments already to consider it at least a gotcha of the language. >> >> The main reason we didn't change this behaviour back then was that it >> would >> clearly break user code and we thought we could do without that. That's >> different from considering it "right" and "good". >> >> >>>>>> And I remember that we wanted to change the default settings for >>>>>> extension >>>>>> type arguments from "or None" to "not None" years ago but never >>>>>> actually >>>>>> did it. >>>>> >>>>> >>>>> I remember that there was such a debate, but I certainly don't remember >>>>> that this was the conclusion :-) >>>> >>>> >>>> Maybe not, yes. >>>> >>>> >>>>> I didn't agree with that view then and >>>>> I don't now. I don't remember what Robert's view was... >>>>> >>>>> As far as I can remember (which might be biased towards my personal >>>>> view), the conclusion was that we left the current semantics in place, >>>>> relying on better control flow analysis to make None-checks cheaper, >>>>> and >>>>> when those are cheap enough, make the nonecheck directive default to >>>>> True >>>> >>>> >>>> At least for buffer arguments, it silently corrupts data or segfaults in >>>> the current state of affairs, as you pointed out. Not exactly ideal. >>> >>> >>> No different than writing to a field in a cdef class... >> >> >> Hmm, aren't those None checked? At least cdef method calls are AFAIR. > > > Not at all. That's my whole point -- currently, the rule for None in Cython > is "it's your responsibility to never do a native operation on None". > > I don't like that either, but that's just inherited from Pyrex (and many > projects would get speed regressions etc.). > > I'm not against changing that to "we safely None-check", if done nicely -- > it's just that that should be done everywhere at once. > > In current master (and as far back as I can remember), this code: > > cdef class A: > ? ?cdef int field > ? ?cdef int method(self): > ? ? ? ?print self.field > def f(): > ? ?cdef A a = None > ? ?a.field = 3 > ? ?a.method() > > Turns into: > > ?__pyx_v_a = ((struct __pyx_obj_5test2_A *)Py_None); > ?__pyx_v_a->field = 3; > ?((struct __pyx_vtabstruct_5test2_A *) > __pyx_v_a->__pyx_vtab)->method(__pyx_v_a); > > > > >> I think we should really get back to the habit of making code safe first >> and fast afterwards. > > > Nobody has argued otherwise for some time (since the cdivision thread I > believe), this is all about Pyrex legacy. Guess part of the story is that > there's lots of performance-sensitive code in SAGE using cdef classes which > was written in Pyrex before Cython was around... I think there's a difference between making a new feature fast instead of safe, and introducing a (significant) performance regression to add safety to existing code. Also, the proposed change of "or None" is backwards incompatible, and the eventual solution (as far as I understand it) is to switch back to allowing None (for consistency everywhere else they occur) once we have cheap non checks in place. We can't get around the fact that cdef classes might be None, due to attributes (which must be initialized to something initially). Doing a None check on every buffer access in a loop falls into the significant performance regression, but ideally we could pull it out. - Robert From greg.ewing at canterbury.ac.nz Tue May 8 02:05:16 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 08 May 2012 12:05:16 +1200 Subject: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function In-Reply-To: <4FA7C852.9020004@behnel.de> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> Message-ID: <4FA8633C.2040500@canterbury.ac.nz> Stefan Behnel wrote: > The main reason we didn't change this behaviour back then was that it would > clearly break user code and we thought we could do without that. That's > different from considering it "right" and "good". I changed the None-checking behaviour in Pyrex because I *wanted* to break user code. Or rather, I didn't think it would be a bad thing to make people revisit their code and think properly about whether they really wanted to allow None in each case. -- Greg From robertwb at gmail.com Tue May 8 02:19:58 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Mon, 7 May 2012 17:19:58 -0700 Subject: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function In-Reply-To: <4FA8633C.2040500@canterbury.ac.nz> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA8633C.2040500@canterbury.ac.nz> Message-ID: On Mon, May 7, 2012 at 5:05 PM, Greg Ewing wrote: > Stefan Behnel wrote: > >> The main reason we didn't change this behaviour back then was that it >> would >> clearly break user code and we thought we could do without that. That's >> different from considering it "right" and "good". > > I changed the None-checking behaviour in Pyrex because I *wanted* > to break user code. Or rather, I didn't think it would be a > bad thing to make people revisit their code and think properly > about whether they really wanted to allow None in each case. That's great if you have the time, but revisiting half a million lines of code (e.g. Sage) can be quite expensive... especially as a short-term patch for a better long-term solution (mostly optimized away None-checks on access). By bigger issue of why I don't think this is the right long-term solution is that (cp)def foo(ExnClass arg): ... should behave the same as (cp)def foo(arg): cdef ExnClass a = arg I think part of the difference is also how strongly the line is drawn between the compiled and un-compiled portions of the program. Cython blurs the line (more) between "called from Cython" and "called from Python," and only the latter doing None-checking is inconsistent too. - Robert From stefan_ml at behnel.de Tue May 8 06:41:04 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 08 May 2012 06:41:04 +0200 Subject: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function In-Reply-To: <4FA8633C.2040500@canterbury.ac.nz> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA8633C.2040500@canterbury.ac.nz> Message-ID: <4FA8A3E0.2010109@behnel.de> Greg Ewing, 08.05.2012 02:05: > Stefan Behnel wrote: > >> The main reason we didn't change this behaviour back then was that it would >> clearly break user code and we thought we could do without that. That's >> different from considering it "right" and "good". > > I changed the None-checking behaviour in Pyrex because I *wanted* > to break user code. Or rather, I didn't think it would be a > bad thing to make people revisit their code and think properly > about whether they really wanted to allow None in each case. The problem here is that it's not very likely that people specifically tested their code with None values, especially if they didn't carefully think of it already when writing it. So changing the default to make people think may not result in making them think before their code starts throwing exceptions somewhere in production. And having to revisit a large amount of code at that point may turn out to be rather expensive. Stefan From stefan_ml at behnel.de Tue May 8 08:03:49 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 08 May 2012 08:03:49 +0200 Subject: [Cython] How do you trigger a jenkins build? In-Reply-To: References: Message-ID: <4FA8B745.6080700@behnel.de> Vitja Makarov, 07.05.2012 17:08: > I've noticed that old one URL hook doesn't work for me now. > > I tried to check "Build when a change is pushed to GitHub" That should work. > and set "Jenkins Hook URL" to > https://sage.math.washington.edu:8091/hudson/github-webhook/ That isn't configured in Jenkins but in your own GitHub repo as a "post receive URL" (admin->service hooks). Stefan From stefan_ml at behnel.de Tue May 8 08:12:18 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 08 May 2012 08:12:18 +0200 Subject: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function In-Reply-To: References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7F33A.9020903@astro.uio.no> <4FA7F477.30701@behnel.de> <4FA80109.9020201@behnel.de> Message-ID: <4FA8B942.5040306@behnel.de> mark florisson, 07.05.2012 19:13: > On 7 May 2012 18:06, Stefan Behnel wrote: >> mark florisson, 07.05.2012 18:18: >>> On 7 May 2012 17:16, mark florisson wrote: >>>> On 7 May 2012 17:12, Stefan Behnel wrote: >>>>> Dag Sverre Seljebotn, 07.05.2012 18:07: >>>>>> On 05/07/2012 06:04 PM, mark florisson wrote: >>>>>>> On 7 May 2012 12:10, Stefan Behnel wrote: >>>>>>>> BTW, is there a reason why we shouldn't allow a "not None" declaration for >>>>>>>> cdef functions? Obviously, the caller would have to do the check in that >>>>>>>> case. >>>>>>> >>>>>>> Why can't the callee just check it? If it's None, just raise an >>>>>>> exception like usual? >>>>>> >>>>>> It's just that there's a lot more potential for rather easy optimization if >>>>>> the caller does it. >>>>> >>>>> Exactly. The NoneCheckNode is easy to get rid of at any stage in the >>>>> pipeline, whereas a hard coded None check has a fixed cost at runtime. >>>> >>>> I see, yes. I expect a pointer comparison to be reasonably >>>> insignificant compared to function call overhead, but it would also >>>> reduce the code in the instruction cache. If you take the address of >>>> the function though, or if you declare it public in a pxd, you >>>> probably don't want to do that, as you still want to be safe when >>>> called from C. Could do the same trick as in the 'less annotations' >>>> CEP though, that would be nice. >>> >>> ... or you could document that 'not None' means the caller cannot pass >>> it in, but that would be weird as you could do it from Cython and get >>> an exception, but not from C :) That would be better specified in the >>> documentation of the function as its contract or whatever. >> >> "not None" on a cdef function means what all declarations on cdef functions >> mean: the caller is responsible for doing the appropriate type conversions >> and checks. >> >> If a function accepts an int32 and the caller puts a float32 on the stack, >> it's not the fault of the callee. The same applies to extension type >> arguments and None checks. > > Well, 'with gil' makes the callee do something. There's two sides to this one. A "with gil" function can be called from nogil code, so, in a way, the "with gil" declaration is only a short hand for a "nogil" declaration with a "with gil" block inside of the function. It's also a historical artefact of the (long) time before we actually had "with gil" blocks, but it's convenient in that it saves a level of indention that would otherwise uselessly cover the whole function. > I would personally > expect not None to be enforced at least conceptually in the function > itself. As usual: in Python functions, yes, in C functions, no. Vitek's Python wrapper split was a good step into a better design here that reflects this separation of concerns. > In any case, I also think it's really not an important issue, > as it's likely pretty uncommon to call it from C. If it does break, it > will be easy enough to figure out (unless you accidentally corrupt > your memory :) So either solution would be fine with me. Good. Stefan From stefan_ml at behnel.de Tue May 8 08:22:55 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 08 May 2012 08:22:55 +0200 Subject: [Cython] buffer syntax vs. memory view syntax In-Reply-To: <4FA80212.1030808@astro.uio.no> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7D940.5030607@behnel.de> <4FA7F194.5080008@astro.uio.no> <4FA7FFBF.4010905@behnel.de> <4FA80212.1030808@astro.uio.no> Message-ID: <4FA8BBBF.5080808@behnel.de> Dag Sverre Seljebotn, 07.05.2012 19:10: > On 05/07/2012 07:00 PM, Stefan Behnel wrote: >> mark florisson, 07.05.2012 18:28: >>> On 7 May 2012 17:00, Dag Sverre Seljebotn wrote: >>>> On 05/07/2012 04:16 PM, Stefan Behnel wrote: >>>>> Stefan Behnel, 07.05.2012 15:04: >>>>>> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>>>>> BTW, with the coming of memoryviews, me and Mark talked about just >>>>>>> deprecating the "mytype[...]" meaning buffers, and rather treat it as >>>>>>> np.ndarray, array.array etc. being some sort of "template types". That >>>>>>> is, >>>>>>> we disallow "object[int]" and require some special declarations in the >>>>>>> relevant pxd files. >>>>>> >>>>>> Hmm, yes, it's unfortunate that we have two different types of syntax >>>>>> now, >>>>>> one that declares the item type before the brackets and one that >>>>>> declares it afterwards. >>>>> >>>>> I actually think this merits some more discussion. Should we consider the >>>>> buffer interface syntax deprecated and focus on the memory view syntax? >>>> >>>> I think that's the very-long-term intention. Then again, it may be too >>>> early >>>> to really tell yet, we just need to see how the memory views play out in >>>> real life and whether they'll be able to replace np.ndarray[double] among >>>> real users. We don't want to shove things down users throats. >>>> >>>> But the use of the trailing-[] syntax needs some cleaning up. Me and Mark >>>> agreed we'd put this proposal forward when we got around to it: >>>> >>>> - Deprecate the "object[double]" form, where [dtype] can be stuck on any >>>> extension type >>>> >>>> - But, do NOT (for the next year at least) deprecate np.ndarray[double], >>>> array.array[double], etc. Basically, there should be a magic flag in >>>> extension type declarations saying "I can be a buffer". >>>> >>>> For one thing, that is sort of needed to open up things for templated cdef >>>> classes/fused types cdef classes, if that is ever implemented. >>> >>> Deprecating is definitely a good start. >> >> Then the first step on that road is to rework the documentation so that it >> pushes users into going for memory views instead of the plain buffer syntax. > > -1, premature. Ok, fine. Then we should at least put them next to each other in the NumPy docs and explain a) what the differences are and b) which one users should choose for use cases X, Y and Z. The docs should also make it clear that using "np.ndarray" is only useful for making code work with CPython < 2.6 (and maybe some other cases where NumPy's C-API is leveraged internally), but that this declaration has the drawback of making code less versatile, e.g. because it will *not* work with memoryviews and other kinds of buffers but only plain NumPy arrays. Currently, it basically tells people that statically typed NumPy arrays are the only way to get things working. If it's known to be likely that something will become less important or even deprecated at some point in the future, it's best to make users aware by adapting the documentation ASAP, so that less impacted code gets written in the meantime. Stefan From russ at perspexis.com Tue May 8 08:25:11 2012 From: russ at perspexis.com (Russell Warren) Date: Tue, 8 May 2012 02:25:11 -0400 Subject: [Cython] Bug report: enumerate does not accept the "start" argument Message-ID: Python's built-in function 'enumerate' has a lesser-known 2nd argument that allows the start value of the enumeration to be set. See the python docs here: http://docs.python.org/library/functions.html#enumerate Cython 0.16 doesn't like it, and only allows one argument. Here is a simple file to reproduce the failure: for i in enumerate("abc", 1): > print i And the resulting output complaint: Error compiling Cython file: > ------------------------------------------------------------ > ... > for i in enumerate("abc", 1): > ^ > ------------------------------------------------------------ > deploy/_working/_cython_test.pyx:1:18: enumerate() takes at most 1 argument I have requested a trac login to file bugs like this, but the request is pending (just sent). -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertwb at gmail.com Tue May 8 08:36:02 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Mon, 7 May 2012 23:36:02 -0700 Subject: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function In-Reply-To: <4FA8A3E0.2010109@behnel.de> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA8633C.2040500@canterbury.ac.nz> <4FA8A3E0.2010109@behnel.de> Message-ID: On Mon, May 7, 2012 at 9:41 PM, Stefan Behnel wrote: > Greg Ewing, 08.05.2012 02:05: >> Stefan Behnel wrote: >> >>> The main reason we didn't change this behaviour back then was that it would >>> clearly break user code and we thought we could do without that. That's >>> different from considering it "right" and "good". >> >> I changed the None-checking behaviour in Pyrex because I *wanted* >> to break user code. Or rather, I didn't think it would be a >> bad thing to make people revisit their code and think properly >> about whether they really wanted to allow None in each case. > > The problem here is that it's not very likely that people specifically > tested their code with None values, especially if they didn't carefully > think of it already when writing it. > > So changing the default to make people think may not result in making them > think before their code starts throwing exceptions somewhere in production. > And having to revisit a large amount of code at that point may turn out to > be rather expensive. There's also the problem of people (including me) who wrote a lot of code that *does* correctly handle the None case which, with this change, would suddenly start (erroneously) throwing exceptions without all being revisited. - Robert From vitja.makarov at gmail.com Tue May 8 08:43:33 2012 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Tue, 8 May 2012 10:43:33 +0400 Subject: [Cython] How do you trigger a jenkins build? In-Reply-To: <4FA8B745.6080700@behnel.de> References: <4FA8B745.6080700@behnel.de> Message-ID: 2012/5/8 Stefan Behnel : > Vitja Makarov, 07.05.2012 17:08: >> I've noticed that old one URL hook doesn't work for me now. >> >> I tried to check "Build when a change is pushed to GitHub" > > That should work. > > >> and set "Jenkins Hook URL" ?to >> https://sage.math.washington.edu:8091/hudson/github-webhook/ > > That isn't configured in Jenkins but in your own GitHub repo as a "post > receive URL" (admin->service hooks). > Thanks! -- vitja. From stefan_ml at behnel.de Tue May 8 09:37:00 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 08 May 2012 09:37:00 +0200 Subject: [Cython] Bug report: enumerate does not accept the "start" argument In-Reply-To: References: Message-ID: <4FA8CD1C.6030803@behnel.de> Russell Warren, 08.05.2012 08:25: > Python's built-in function 'enumerate' has a lesser-known 2nd argument that > allows the start value of the enumeration to be set. See the python docs > here: > http://docs.python.org/library/functions.html#enumerate > > Cython 0.16 doesn't like it, and only allows one argument. > > Here is a simple file to reproduce the failure: > > for i in enumerate("abc", 1): >> print i > > > And the resulting output complaint: > > Error compiling Cython file: >> ------------------------------------------------------------ >> ... >> for i in enumerate("abc", 1): >> ^ >> ------------------------------------------------------------ >> deploy/_working/_cython_test.pyx:1:18: enumerate() takes at most 1 argument Thanks for the report, here is a fix: https://github.com/cython/cython/commit/2e3a306d0b624993d41a02f790725d8b2100e57d > I have requested a trac login to file bugs like this, but the request is > pending (just sent). Please file it anyway (when you get your account) so that we can document in the tracker that it's fixed. Stefan From d.s.seljebotn at astro.uio.no Tue May 8 09:57:44 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 08 May 2012 09:57:44 +0200 Subject: [Cython] buffer syntax vs. memory view syntax In-Reply-To: References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7D940.5030607@behnel.de> <4FA7F194.5080008@astro.uio.no> <95c0afc3-08f4-47d1-8649-7b80f931be54@email.android.com> Message-ID: <4FA8D1F8.5020109@astro.uio.no> On 05/07/2012 11:21 PM, mark florisson wrote: > On 7 May 2012 19:40, Dag Sverre Seljebotn wrote: >> >> >> mark florisson wrote: >> >>> On 7 May 2012 17:00, Dag Sverre Seljebotn >>> wrote: >>>> On 05/07/2012 04:16 PM, Stefan Behnel wrote: >>>>> >>>>> Stefan Behnel, 07.05.2012 15:04: >>>>>> >>>>>> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>>>>> >>>>>>> BTW, with the coming of memoryviews, me and Mark talked about just >>>>>>> deprecating the "mytype[...]" meaning buffers, and rather treat it >>> as >>>>>>> np.ndarray, array.array etc. being some sort of "template types". >>> That >>>>>>> is, >>>>>>> we disallow "object[int]" and require some special declarations in >>> the >>>>>>> relevant pxd files. >>>>>> >>>>>> >>>>>> Hmm, yes, it's unfortunate that we have two different types of >>> syntax >>>>>> now, >>>>>> one that declares the item type before the brackets and one that >>> declares >>>>>> it afterwards. >>>>> >>>>> >>>>> I actually think this merits some more discussion. Should we >>> consider the >>>>> buffer interface syntax deprecated and focus on the memory view >>> syntax? >>>> >>>> >>>> I think that's the very-long-term intention. Then again, it may be >>> too early >>>> to really tell yet, we just need to see how the memory views play out >>> in >>>> real life and whether they'll be able to replace np.ndarray[double] >>> among >>>> real users. We don't want to shove things down users throats. >>>> >>>> But the use of the trailing-[] syntax needs some cleaning up. Me and >>> Mark >>>> agreed we'd put this proposal forward when we got around to it: >>>> >>>> - Deprecate the "object[double]" form, where [dtype] can be stuck on >>> any >>>> extension type >>>> >>>> - But, do NOT (for the next year at least) deprecate >>> np.ndarray[double], >>>> array.array[double], etc. Basically, there should be a magic flag in >>>> extension type declarations saying "I can be a buffer". >>>> >>>> For one thing, that is sort of needed to open up things for templated >>> cdef >>>> classes/fused types cdef classes, if that is ever implemented. >>> >>> Deprecating is definitely a good start. I think at least if you only >>> allow two types as buffers it will be at least reasonably clear when >>> one is dealing with fused types or buffers. >>> >>> Basically, I think memoryviews should live up to demands of the users, >>> which would mean there would be no reason to keep the buffer syntax. >> >> But they are different approaches -- use a different type/API, or just try to speed up parts of NumPy.. >> >>> One thing to do is make memoryviews coerce cheaply back to the >>> original objects if wanted (which is likely). Writting >>> np.asarray(mymemview) is kind of annoying. >>> >> >> >> It is going to be very confusing to have type(mymemview), repr(mymemview), and so on come out as NumPy arrays, but not have the full API of NumPy. Unless you auto-convert on getattr to... > > Yeah, the idea is as very simple, as you mention, just keep the object > around cached, and when you slice construct one lazily. > >> If you want to eradicate the distinction between the backing array and the memory view and make it transparent, I really suggest you kick back alive np.ndarray (it can exist in some 'unrealized' state with delayed construction after slicing, and so on). Implementation much the same either way, it is all about how it is presented to the user. > > You mean the buffer syntax? > >> Something like mymemview.asobject() could work though, and while not much shorter, it would have some polymorphism that np.asarray does not have (based probably on some custom PEP 3118 extension) > > I was thinking you could allow the user to register a callback, and > use that to coerce from a memoryview back to an object (given a > memoryview object). For numpy this would be np.asarray, and the > implementation is allowed to cache the result (which it will). > It may be too magicky though... but it will be convenient. The > memoryview will act as a subclass, meaning that any of its methods > will override methods of the converted object. My point was that this seems *way* to magicky. Beyond "confusing users" and so on that are sort of subjective, here's a fundamental problem for you: We're making it very difficult to type-infer memoryviews. Consider: cdef double[:] x = ... y = x print y.shape Now, because y is not typed, you're semantically throwing in a conversion on line 2, so that line 3 says that you want the attribute access to be invoked on "whatever object x coerced back to". And we have no idea what kind of object that is. If you don't transparently convert to object, it'd be safe to automatically infer y as a double[:]. On a related note, I've said before that I dislike the notion of cdef double[:] mview = obj I'd rather like cdef double[:] mview = double[:](obj) I support Robert in that "np.ndarray[double]" is the syntax to use when you want this kind of transparent "be an object when I need to and a memory view when I need to". Proposal: 1) We NEVER deprecate "np.ndarray[double]", we commit to keeping that in the language. It means exactly what you would like double[:] to mean, i.e. a variable that is memoryview when you need to and an object otherwise. When you use this type, you bear the consequences of early-binding things that could in theory be overridden. 2) double[:] is for when you want to access data of *any* Python object in a generic way. Raw PEP 3118. In those situations, access to the underlying object is much less useful. 2a) Therefore we require that you do "mview.asobject()" manually; doing "mview.foo()" is a compile-time error 2b) To drive the point home among users, and aid type inference and overall language clarity, we REMOVE the auto-acquisition and require that you do cdef double[:] mview = double[:](obj) 2c) Perhaps: Do not even coerce to a Python memoryview and disallow "print mview"; instead require that you do "print mview.asmemoryview()" or "print memoryview(mview)" or somesuch. (A related proposal that's been up earlier has been that a variable can be annotated with many interfaces; e.g. cdef A|B|C obj ...and then when you do "obj.method", it is first looked up in C, then B, then A, then Python getattr. Not sure if we want to reopen that can of worms...) Dag From stefan_ml at behnel.de Tue May 8 10:18:49 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 08 May 2012 10:18:49 +0200 Subject: [Cython] buffer syntax vs. memory view syntax In-Reply-To: <4FA8D1F8.5020109@astro.uio.no> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7D940.5030607@behnel.de> <4FA7F194.5080008@astro.uio.no> <95c0afc3-08f4-47d1-8649-7b80f931be54@email.android.com> <4FA8D1F8.5020109@astro.uio.no> Message-ID: <4FA8D6E9.9090004@behnel.de> Dag Sverre Seljebotn, 08.05.2012 09:57: > On 05/07/2012 11:21 PM, mark florisson wrote: >> On 7 May 2012 19:40, Dag Sverre Seljebotn wrote: >>> mark florisson wrote: >>>> On 7 May 2012 17:00, Dag Sverre Seljebotn wrote: >>>>> On 05/07/2012 04:16 PM, Stefan Behnel wrote: >>>>>> Stefan Behnel, 07.05.2012 15:04: >>>>>>> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>>>>>> BTW, with the coming of memoryviews, me and Mark talked about just >>>>>>>> deprecating the "mytype[...]" meaning buffers, and rather treat it >>>>>>>> as np.ndarray, array.array etc. being some sort of "template types". >>>>>>>> That is, >>>>>>>> we disallow "object[int]" and require some special declarations in >>>>>>>> the relevant pxd files. >>>>>>> >>>>>>> Hmm, yes, it's unfortunate that we have two different types of >>>>>>> syntax now, >>>>>>> one that declares the item type before the brackets and one that >>>>>>> declares it afterwards. >>>>>> Should we consider the >>>>>> buffer interface syntax deprecated and focus on the memory view >>>>>> syntax? >>>>> >>>>> I think that's the very-long-term intention. Then again, it may be >>>>> too early >>>>> to really tell yet, we just need to see how the memory views play out >>>>> in >>>>> real life and whether they'll be able to replace np.ndarray[double] >>>>> among real users. We don't want to shove things down users throats. >>>>> >>>>> But the use of the trailing-[] syntax needs some cleaning up. Me and >>>>> Mark agreed we'd put this proposal forward when we got around to it: >>>>> >>>>> - Deprecate the "object[double]" form, where [dtype] can be stuck on >>>>> any extension type >>>>> >>>>> - But, do NOT (for the next year at least) deprecate >>>>> np.ndarray[double], >>>>> array.array[double], etc. Basically, there should be a magic flag in >>>>> extension type declarations saying "I can be a buffer". >>>>> >>>>> For one thing, that is sort of needed to open up things for templated >>>>> cdef classes/fused types cdef classes, if that is ever implemented. >>>> >>>> Deprecating is definitely a good start. I think at least if you only >>>> allow two types as buffers it will be at least reasonably clear when >>>> one is dealing with fused types or buffers. >>>> >>>> Basically, I think memoryviews should live up to demands of the users, >>>> which would mean there would be no reason to keep the buffer syntax. >>> >>> But they are different approaches -- use a different type/API, or just >>> try to speed up parts of NumPy.. >>> >>>> One thing to do is make memoryviews coerce cheaply back to the >>>> original objects if wanted (which is likely). Writting >>>> np.asarray(mymemview) is kind of annoying. >>> >>> It is going to be very confusing to have type(mymemview), >>> repr(mymemview), and so on come out as NumPy arrays, but not have the >>> full API of NumPy. Unless you auto-convert on getattr to... >> >> Yeah, the idea is as very simple, as you mention, just keep the object >> around cached, and when you slice construct one lazily. >> >>> If you want to eradicate the distinction between the backing array and >>> the memory view and make it transparent, I really suggest you kick back >>> alive np.ndarray (it can exist in some 'unrealized' state with delayed >>> construction after slicing, and so on). Implementation much the same >>> either way, it is all about how it is presented to the user. >> >> You mean the buffer syntax? >> >>> Something like mymemview.asobject() could work though, and while not >>> much shorter, it would have some polymorphism that np.asarray does not >>> have (based probably on some custom PEP 3118 extension) >> >> I was thinking you could allow the user to register a callback, and >> use that to coerce from a memoryview back to an object (given a >> memoryview object). For numpy this would be np.asarray, and the >> implementation is allowed to cache the result (which it will). >> It may be too magicky though... but it will be convenient. The >> memoryview will act as a subclass, meaning that any of its methods >> will override methods of the converted object. > > My point was that this seems *way* to magicky. > > Beyond "confusing users" and so on that are sort of subjective, here's a > fundamental problem for you: We're making it very difficult to type-infer > memoryviews. Consider: > > cdef double[:] x = ... > y = x > print y.shape > > Now, because y is not typed, you're semantically throwing in a conversion > on line 2, so that line 3 says that you want the attribute access to be > invoked on "whatever object x coerced back to". And we have no idea what > kind of object that is. > > If you don't transparently convert to object, it'd be safe to automatically > infer y as a double[:]. Why can't y be inferred as the type of x due to the assignment? > On a related note, I've said before that I dislike the notion of > > cdef double[:] mview = obj > > I'd rather like > > cdef double[:] mview = double[:](obj) Why? We currently allow cdef char* s = some_py_bytes_string Auto-coercion is a serious part of the language, and I don't see the advantage of requiring the redundancy in the case above. It's clear enough to me what the typed assignment is intended to mean: get me a buffer view on the object, regardless of what it is. > I support Robert in that "np.ndarray[double]" is the syntax to use when you > want this kind of transparent "be an object when I need to and a memory > view when I need to". > > Proposal: > > 1) We NEVER deprecate "np.ndarray[double]", we commit to keeping that in > the language. It means exactly what you would like double[:] to mean, i.e. > a variable that is memoryview when you need to and an object otherwise. > When you use this type, you bear the consequences of early-binding things > that could in theory be overridden. > > 2) double[:] is for when you want to access data of *any* Python object in > a generic way. Raw PEP 3118. In those situations, access to the underlying > object is much less useful. > > 2a) Therefore we require that you do "mview.asobject()" manually; doing > "mview.foo()" is a compile-time error Sounds good. I think that would clean up the current syntax overlap very nicely. > 2b) To drive the point home among users, and aid type inference and > overall language clarity, we REMOVE the auto-acquisition and require that > you do > > cdef double[:] mview = double[:](obj) I don't see the point, as noted above. Either "obj" is statically typed and the bare assignment becomes a no-op, or it's not typed and the assignment coerces by creating a view. As with all other typed assignments. > 2c) Perhaps: Do not even coerce to a Python memoryview and disallow > "print mview"; instead require that you do "print mview.asmemoryview()" or > "print memoryview(mview)" or somesuch. This seems to depend on 2b. > (A related proposal that's been up earlier has been that a variable can be > annotated with many interfaces; e.g. > > cdef A|B|C obj > > ...and then when you do "obj.method", it is first looked up in C, then B, > then A, then Python getattr. Not sure if we want to reopen that can of > worms...) Different topic - new thread? Stefan From d.s.seljebotn at astro.uio.no Tue May 8 10:31:32 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 08 May 2012 10:31:32 +0200 Subject: [Cython] buffer syntax vs. memory view syntax In-Reply-To: <4FA8D6E9.9090004@behnel.de> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7D940.5030607@behnel.de> <4FA7F194.5080008@astro.uio.no> <95c0afc3-08f4-47d1-8649-7b80f931be54@email.android.com> <4FA8D1F8.5020109@astro.uio.no> <4FA8D6E9.9090004@behnel.de> Message-ID: <4FA8D9E4.1040102@astro.uio.no> On 05/08/2012 10:18 AM, Stefan Behnel wrote: > Dag Sverre Seljebotn, 08.05.2012 09:57: >> On 05/07/2012 11:21 PM, mark florisson wrote: >>> On 7 May 2012 19:40, Dag Sverre Seljebotn wrote: >>>> mark florisson wrote: >>>>> On 7 May 2012 17:00, Dag Sverre Seljebotn wrote: >>>>>> On 05/07/2012 04:16 PM, Stefan Behnel wrote: >>>>>>> Stefan Behnel, 07.05.2012 15:04: >>>>>>>> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>>>>>>> BTW, with the coming of memoryviews, me and Mark talked about just >>>>>>>>> deprecating the "mytype[...]" meaning buffers, and rather treat it >>>>>>>>> as np.ndarray, array.array etc. being some sort of "template types". >>>>>>>>> That is, >>>>>>>>> we disallow "object[int]" and require some special declarations in >>>>>>>>> the relevant pxd files. >>>>>>>> >>>>>>>> Hmm, yes, it's unfortunate that we have two different types of >>>>>>>> syntax now, >>>>>>>> one that declares the item type before the brackets and one that >>>>>>>> declares it afterwards. >>>>>>> Should we consider the >>>>>>> buffer interface syntax deprecated and focus on the memory view >>>>>>> syntax? >>>>>> >>>>>> I think that's the very-long-term intention. Then again, it may be >>>>>> too early >>>>>> to really tell yet, we just need to see how the memory views play out >>>>>> in >>>>>> real life and whether they'll be able to replace np.ndarray[double] >>>>>> among real users. We don't want to shove things down users throats. >>>>>> >>>>>> But the use of the trailing-[] syntax needs some cleaning up. Me and >>>>>> Mark agreed we'd put this proposal forward when we got around to it: >>>>>> >>>>>> - Deprecate the "object[double]" form, where [dtype] can be stuck on >>>>>> any extension type >>>>>> >>>>>> - But, do NOT (for the next year at least) deprecate >>>>>> np.ndarray[double], >>>>>> array.array[double], etc. Basically, there should be a magic flag in >>>>>> extension type declarations saying "I can be a buffer". >>>>>> >>>>>> For one thing, that is sort of needed to open up things for templated >>>>>> cdef classes/fused types cdef classes, if that is ever implemented. >>>>> >>>>> Deprecating is definitely a good start. I think at least if you only >>>>> allow two types as buffers it will be at least reasonably clear when >>>>> one is dealing with fused types or buffers. >>>>> >>>>> Basically, I think memoryviews should live up to demands of the users, >>>>> which would mean there would be no reason to keep the buffer syntax. >>>> >>>> But they are different approaches -- use a different type/API, or just >>>> try to speed up parts of NumPy.. >>>> >>>>> One thing to do is make memoryviews coerce cheaply back to the >>>>> original objects if wanted (which is likely). Writting >>>>> np.asarray(mymemview) is kind of annoying. >>>> >>>> It is going to be very confusing to have type(mymemview), >>>> repr(mymemview), and so on come out as NumPy arrays, but not have the >>>> full API of NumPy. Unless you auto-convert on getattr to... >>> >>> Yeah, the idea is as very simple, as you mention, just keep the object >>> around cached, and when you slice construct one lazily. >>> >>>> If you want to eradicate the distinction between the backing array and >>>> the memory view and make it transparent, I really suggest you kick back >>>> alive np.ndarray (it can exist in some 'unrealized' state with delayed >>>> construction after slicing, and so on). Implementation much the same >>>> either way, it is all about how it is presented to the user. >>> >>> You mean the buffer syntax? >>> >>>> Something like mymemview.asobject() could work though, and while not >>>> much shorter, it would have some polymorphism that np.asarray does not >>>> have (based probably on some custom PEP 3118 extension) >>> >>> I was thinking you could allow the user to register a callback, and >>> use that to coerce from a memoryview back to an object (given a >>> memoryview object). For numpy this would be np.asarray, and the >>> implementation is allowed to cache the result (which it will). >>> It may be too magicky though... but it will be convenient. The >>> memoryview will act as a subclass, meaning that any of its methods >>> will override methods of the converted object. >> >> My point was that this seems *way* to magicky. >> >> Beyond "confusing users" and so on that are sort of subjective, here's a >> fundamental problem for you: We're making it very difficult to type-infer >> memoryviews. Consider: >> >> cdef double[:] x = ... >> y = x >> print y.shape >> >> Now, because y is not typed, you're semantically throwing in a conversion >> on line 2, so that line 3 says that you want the attribute access to be >> invoked on "whatever object x coerced back to". And we have no idea what >> kind of object that is. >> >> If you don't transparently convert to object, it'd be safe to automatically >> infer y as a double[:]. > > Why can't y be inferred as the type of x due to the assignment? > > >> On a related note, I've said before that I dislike the notion of >> >> cdef double[:] mview = obj >> >> I'd rather like >> >> cdef double[:] mview = double[:](obj) > > Why? We currently allow > > cdef char* s = some_py_bytes_string > > Auto-coercion is a serious part of the language, and I don't see the > advantage of requiring the redundancy in the case above. It's clear enough > to me what the typed assignment is intended to mean: get me a buffer view > on the object, regardless of what it is. Good point. I admit defeat. There's slight difference in that there's more of a 1:1 between a bytes and a char*, whereas there's a many:1 for buffers. But it doesn't seem to matter, since "char*" doesn't coerce back to object automatically. (Though that fact is an argument against letting memoryviews coerce to objects automatically) (Also I happen to not like this part of the language -- I think it's making us be further from Python than we would need to -- but that's not relevant in this thread at all, but rather in some pure Python mode thread.) > > >> I support Robert in that "np.ndarray[double]" is the syntax to use when you >> want this kind of transparent "be an object when I need to and a memory >> view when I need to". >> >> Proposal: >> >> 1) We NEVER deprecate "np.ndarray[double]", we commit to keeping that in >> the language. It means exactly what you would like double[:] to mean, i.e. >> a variable that is memoryview when you need to and an object otherwise. >> When you use this type, you bear the consequences of early-binding things >> that could in theory be overridden. >> >> 2) double[:] is for when you want to access data of *any* Python object in >> a generic way. Raw PEP 3118. In those situations, access to the underlying >> object is much less useful. >> >> 2a) Therefore we require that you do "mview.asobject()" manually; doing >> "mview.foo()" is a compile-time error > > Sounds good. I think that would clean up the current syntax overlap very > nicely. > > >> 2b) To drive the point home among users, and aid type inference and >> overall language clarity, we REMOVE the auto-acquisition and require that >> you do >> >> cdef double[:] mview = double[:](obj) > > I don't see the point, as noted above. Either "obj" is statically typed and > the bare assignment becomes a no-op, or it's not typed and the assignment > coerces by creating a view. As with all other typed assignments. >> 2c) Perhaps: Do not even coerce to a Python memoryview and disallow >> "print mview"; instead require that you do "print mview.asmemoryview()" or >> "print memoryview(mview)" or somesuch. > > This seems to depend on 2b. > > >> (A related proposal that's been up earlier has been that a variable can be >> annotated with many interfaces; e.g. >> >> cdef A|B|C obj >> >> ...and then when you do "obj.method", it is first looked up in C, then B, >> then A, then Python getattr. Not sure if we want to reopen that can of >> worms...) > > Different topic - new thread? It's very related, since np.ndarray[double] would essentially be "np.ndarray | double[:]". Dag From d.s.seljebotn at astro.uio.no Tue May 8 10:36:18 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 08 May 2012 10:36:18 +0200 Subject: [Cython] buffer syntax vs. memory view syntax In-Reply-To: <4FA8D6E9.9090004@behnel.de> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7D940.5030607@behnel.de> <4FA7F194.5080008@astro.uio.no> <95c0afc3-08f4-47d1-8649-7b80f931be54@email.android.com> <4FA8D1F8.5020109@astro.uio.no> <4FA8D6E9.9090004@behnel.de> Message-ID: <4FA8DB02.2020902@astro.uio.no> On 05/08/2012 10:18 AM, Stefan Behnel wrote: > Dag Sverre Seljebotn, 08.05.2012 09:57: >> On 05/07/2012 11:21 PM, mark florisson wrote: >>> On 7 May 2012 19:40, Dag Sverre Seljebotn wrote: >>>> mark florisson wrote: >>>>> On 7 May 2012 17:00, Dag Sverre Seljebotn wrote: >>>>>> On 05/07/2012 04:16 PM, Stefan Behnel wrote: >>>>>>> Stefan Behnel, 07.05.2012 15:04: >>>>>>>> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>>>>>>> BTW, with the coming of memoryviews, me and Mark talked about just >>>>>>>>> deprecating the "mytype[...]" meaning buffers, and rather treat it >>>>>>>>> as np.ndarray, array.array etc. being some sort of "template types". >>>>>>>>> That is, >>>>>>>>> we disallow "object[int]" and require some special declarations in >>>>>>>>> the relevant pxd files. >>>>>>>> >>>>>>>> Hmm, yes, it's unfortunate that we have two different types of >>>>>>>> syntax now, >>>>>>>> one that declares the item type before the brackets and one that >>>>>>>> declares it afterwards. >>>>>>> Should we consider the >>>>>>> buffer interface syntax deprecated and focus on the memory view >>>>>>> syntax? >>>>>> >>>>>> I think that's the very-long-term intention. Then again, it may be >>>>>> too early >>>>>> to really tell yet, we just need to see how the memory views play out >>>>>> in >>>>>> real life and whether they'll be able to replace np.ndarray[double] >>>>>> among real users. We don't want to shove things down users throats. >>>>>> >>>>>> But the use of the trailing-[] syntax needs some cleaning up. Me and >>>>>> Mark agreed we'd put this proposal forward when we got around to it: >>>>>> >>>>>> - Deprecate the "object[double]" form, where [dtype] can be stuck on >>>>>> any extension type >>>>>> >>>>>> - But, do NOT (for the next year at least) deprecate >>>>>> np.ndarray[double], >>>>>> array.array[double], etc. Basically, there should be a magic flag in >>>>>> extension type declarations saying "I can be a buffer". >>>>>> >>>>>> For one thing, that is sort of needed to open up things for templated >>>>>> cdef classes/fused types cdef classes, if that is ever implemented. >>>>> >>>>> Deprecating is definitely a good start. I think at least if you only >>>>> allow two types as buffers it will be at least reasonably clear when >>>>> one is dealing with fused types or buffers. >>>>> >>>>> Basically, I think memoryviews should live up to demands of the users, >>>>> which would mean there would be no reason to keep the buffer syntax. >>>> >>>> But they are different approaches -- use a different type/API, or just >>>> try to speed up parts of NumPy.. >>>> >>>>> One thing to do is make memoryviews coerce cheaply back to the >>>>> original objects if wanted (which is likely). Writting >>>>> np.asarray(mymemview) is kind of annoying. >>>> >>>> It is going to be very confusing to have type(mymemview), >>>> repr(mymemview), and so on come out as NumPy arrays, but not have the >>>> full API of NumPy. Unless you auto-convert on getattr to... >>> >>> Yeah, the idea is as very simple, as you mention, just keep the object >>> around cached, and when you slice construct one lazily. >>> >>>> If you want to eradicate the distinction between the backing array and >>>> the memory view and make it transparent, I really suggest you kick back >>>> alive np.ndarray (it can exist in some 'unrealized' state with delayed >>>> construction after slicing, and so on). Implementation much the same >>>> either way, it is all about how it is presented to the user. >>> >>> You mean the buffer syntax? >>> >>>> Something like mymemview.asobject() could work though, and while not >>>> much shorter, it would have some polymorphism that np.asarray does not >>>> have (based probably on some custom PEP 3118 extension) >>> >>> I was thinking you could allow the user to register a callback, and >>> use that to coerce from a memoryview back to an object (given a >>> memoryview object). For numpy this would be np.asarray, and the >>> implementation is allowed to cache the result (which it will). >>> It may be too magicky though... but it will be convenient. The >>> memoryview will act as a subclass, meaning that any of its methods >>> will override methods of the converted object. >> >> My point was that this seems *way* to magicky. >> >> Beyond "confusing users" and so on that are sort of subjective, here's a >> fundamental problem for you: We're making it very difficult to type-infer >> memoryviews. Consider: >> >> cdef double[:] x = ... >> y = x >> print y.shape >> >> Now, because y is not typed, you're semantically throwing in a conversion >> on line 2, so that line 3 says that you want the attribute access to be >> invoked on "whatever object x coerced back to". And we have no idea what >> kind of object that is. >> >> If you don't transparently convert to object, it'd be safe to automatically >> infer y as a double[:]. > > Why can't y be inferred as the type of x due to the assignment? > > >> On a related note, I've said before that I dislike the notion of >> >> cdef double[:] mview = obj >> >> I'd rather like >> >> cdef double[:] mview = double[:](obj) > > Why? We currently allow > > cdef char* s = some_py_bytes_string > > Auto-coercion is a serious part of the language, and I don't see the > advantage of requiring the redundancy in the case above. It's clear enough > to me what the typed assignment is intended to mean: get me a buffer view > on the object, regardless of what it is. > > >> I support Robert in that "np.ndarray[double]" is the syntax to use when you >> want this kind of transparent "be an object when I need to and a memory >> view when I need to". >> >> Proposal: >> >> 1) We NEVER deprecate "np.ndarray[double]", we commit to keeping that in >> the language. It means exactly what you would like double[:] to mean, i.e. >> a variable that is memoryview when you need to and an object otherwise. >> When you use this type, you bear the consequences of early-binding things >> that could in theory be overridden. >> >> 2) double[:] is for when you want to access data of *any* Python object in >> a generic way. Raw PEP 3118. In those situations, access to the underlying >> object is much less useful. >> >> 2a) Therefore we require that you do "mview.asobject()" manually; doing >> "mview.foo()" is a compile-time error > > Sounds good. I think that would clean up the current syntax overlap very > nicely. > > >> 2b) To drive the point home among users, and aid type inference and >> overall language clarity, we REMOVE the auto-acquisition and require that >> you do >> >> cdef double[:] mview = double[:](obj) > > I don't see the point, as noted above. Either "obj" is statically typed and > the bare assignment becomes a no-op, or it's not typed and the assignment > coerces by creating a view. As with all other typed assignments. > > >> 2c) Perhaps: Do not even coerce to a Python memoryview and disallow >> "print mview"; instead require that you do "print mview.asmemoryview()" or >> "print memoryview(mview)" or somesuch. > > This seems to depend on 2b. This I don't understand. The question of 2c) is the analogue to auto-coercion of "char*" to bytes; approving 2c) would put memoryviews in line with char*. Then again, we could in future auto-coerce char* to a ctypes pointer, and in that case, coercing a memoryview to an object representing that memoryview would be OK. Either way, you would never get back the same object that you coerced from! Dag From stefan_ml at behnel.de Tue May 8 10:49:56 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 08 May 2012 10:49:56 +0200 Subject: [Cython] buffer syntax vs. memory view syntax In-Reply-To: <4FA8DB02.2020902@astro.uio.no> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7D940.5030607@behnel.de> <4FA7F194.5080008@astro.uio.no> <95c0afc3-08f4-47d1-8649-7b80f931be54@email.android.com> <4FA8D1F8.5020109@astro.uio.no> <4FA8D6E9.9090004@behnel.de> <4FA8DB02.2020902@astro.uio.no> Message-ID: <4FA8DE34.6050806@behnel.de> Dag Sverre Seljebotn, 08.05.2012 10:36: > On 05/08/2012 10:18 AM, Stefan Behnel wrote: >> Dag Sverre Seljebotn, 08.05.2012 09:57: >>> On 05/07/2012 11:21 PM, mark florisson wrote: >>>> On 7 May 2012 19:40, Dag Sverre Seljebotn wrote: >>>>> mark florisson wrote: >>>>>> On 7 May 2012 17:00, Dag Sverre Seljebotn wrote: >>>>>>> On 05/07/2012 04:16 PM, Stefan Behnel wrote: >>>>>>>> Stefan Behnel, 07.05.2012 15:04: >>>>>>>>> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>>>>>>>> BTW, with the coming of memoryviews, me and Mark talked about just >>>>>>>>>> deprecating the "mytype[...]" meaning buffers, and rather treat it >>>>>>>>>> as np.ndarray, array.array etc. being some sort of "template types". >>>>>>>>>> That is, >>>>>>>>>> we disallow "object[int]" and require some special declarations in >>>>>>>>>> the relevant pxd files. >>>>>>>>> >>>>>>>>> Hmm, yes, it's unfortunate that we have two different types of >>>>>>>>> syntax now, >>>>>>>>> one that declares the item type before the brackets and one that >>>>>>>>> declares it afterwards. >>>>>>>> Should we consider the >>>>>>>> buffer interface syntax deprecated and focus on the memory view >>>>>>>> syntax? >>>>>>> >>>>>>> I think that's the very-long-term intention. Then again, it may be >>>>>>> too early >>>>>>> to really tell yet, we just need to see how the memory views play out >>>>>>> in >>>>>>> real life and whether they'll be able to replace np.ndarray[double] >>>>>>> among real users. We don't want to shove things down users throats. >>>>>>> >>>>>>> But the use of the trailing-[] syntax needs some cleaning up. Me and >>>>>>> Mark agreed we'd put this proposal forward when we got around to it: >>>>>>> >>>>>>> - Deprecate the "object[double]" form, where [dtype] can be stuck on >>>>>>> any extension type >>>>>>> >>>>>>> - But, do NOT (for the next year at least) deprecate >>>>>>> np.ndarray[double], >>>>>>> array.array[double], etc. Basically, there should be a magic flag in >>>>>>> extension type declarations saying "I can be a buffer". >>>>>>> >>>>>>> For one thing, that is sort of needed to open up things for templated >>>>>>> cdef classes/fused types cdef classes, if that is ever implemented. >>>>>> >>>>>> Deprecating is definitely a good start. I think at least if you only >>>>>> allow two types as buffers it will be at least reasonably clear when >>>>>> one is dealing with fused types or buffers. >>>>>> >>>>>> Basically, I think memoryviews should live up to demands of the users, >>>>>> which would mean there would be no reason to keep the buffer syntax. >>>>> >>>>> But they are different approaches -- use a different type/API, or just >>>>> try to speed up parts of NumPy.. >>>>> >>>>>> One thing to do is make memoryviews coerce cheaply back to the >>>>>> original objects if wanted (which is likely). Writting >>>>>> np.asarray(mymemview) is kind of annoying. >>>>> >>>>> It is going to be very confusing to have type(mymemview), >>>>> repr(mymemview), and so on come out as NumPy arrays, but not have the >>>>> full API of NumPy. Unless you auto-convert on getattr to... >>>> >>>> Yeah, the idea is as very simple, as you mention, just keep the object >>>> around cached, and when you slice construct one lazily. >>>> >>>>> If you want to eradicate the distinction between the backing array and >>>>> the memory view and make it transparent, I really suggest you kick back >>>>> alive np.ndarray (it can exist in some 'unrealized' state with delayed >>>>> construction after slicing, and so on). Implementation much the same >>>>> either way, it is all about how it is presented to the user. >>>> >>>> You mean the buffer syntax? >>>> >>>>> Something like mymemview.asobject() could work though, and while not >>>>> much shorter, it would have some polymorphism that np.asarray does not >>>>> have (based probably on some custom PEP 3118 extension) >>>> >>>> I was thinking you could allow the user to register a callback, and >>>> use that to coerce from a memoryview back to an object (given a >>>> memoryview object). For numpy this would be np.asarray, and the >>>> implementation is allowed to cache the result (which it will). >>>> It may be too magicky though... but it will be convenient. The >>>> memoryview will act as a subclass, meaning that any of its methods >>>> will override methods of the converted object. >>> >>> My point was that this seems *way* to magicky. >>> >>> Beyond "confusing users" and so on that are sort of subjective, here's a >>> fundamental problem for you: We're making it very difficult to type-infer >>> memoryviews. Consider: >>> >>> cdef double[:] x = ... >>> y = x >>> print y.shape >>> >>> Now, because y is not typed, you're semantically throwing in a conversion >>> on line 2, so that line 3 says that you want the attribute access to be >>> invoked on "whatever object x coerced back to". And we have no idea what >>> kind of object that is. >>> >>> If you don't transparently convert to object, it'd be safe to automatically >>> infer y as a double[:]. >> >> Why can't y be inferred as the type of x due to the assignment? >> >> >>> On a related note, I've said before that I dislike the notion of >>> >>> cdef double[:] mview = obj >>> >>> I'd rather like >>> >>> cdef double[:] mview = double[:](obj) >> >> Why? We currently allow >> >> cdef char* s = some_py_bytes_string >> >> Auto-coercion is a serious part of the language, and I don't see the >> advantage of requiring the redundancy in the case above. It's clear enough >> to me what the typed assignment is intended to mean: get me a buffer view >> on the object, regardless of what it is. >> >> >>> I support Robert in that "np.ndarray[double]" is the syntax to use when you >>> want this kind of transparent "be an object when I need to and a memory >>> view when I need to". >>> >>> Proposal: >>> >>> 1) We NEVER deprecate "np.ndarray[double]", we commit to keeping that in >>> the language. It means exactly what you would like double[:] to mean, i.e. >>> a variable that is memoryview when you need to and an object otherwise. >>> When you use this type, you bear the consequences of early-binding things >>> that could in theory be overridden. >>> >>> 2) double[:] is for when you want to access data of *any* Python >>> object in >>> a generic way. Raw PEP 3118. In those situations, access to the underlying >>> object is much less useful. >>> >>> 2a) Therefore we require that you do "mview.asobject()" manually; doing >>> "mview.foo()" is a compile-time error >> >> Sounds good. I think that would clean up the current syntax overlap very >> nicely. >> >> >>> 2b) To drive the point home among users, and aid type inference and >>> overall language clarity, we REMOVE the auto-acquisition and require that >>> you do >>> >>> cdef double[:] mview = double[:](obj) >> >> I don't see the point, as noted above. Either "obj" is statically typed and >> the bare assignment becomes a no-op, or it's not typed and the assignment >> coerces by creating a view. As with all other typed assignments. >> >> >>> 2c) Perhaps: Do not even coerce to a Python memoryview and disallow >>> "print mview"; instead require that you do "print mview.asmemoryview()" or >>> "print memoryview(mview)" or somesuch. >> >> This seems to depend on 2b. > > This I don't understand. The question of 2c) is the analogue to > auto-coercion of "char*" to bytes; approving 2c) would put memoryviews in > line with char*. > > Then again, we could in future auto-coerce char* to a ctypes pointer, and > in that case, coercing a memoryview to an object representing that > memoryview would be OK. > > Either way, you would never get back the same object that you coerced from! Ah, that's what you meant. I thought you were referring to getting a memoryview from an object. I agree that a buffer view shouldn't auto-coerce back to its owner (or to a Python object in general), that's the whole point of the syntax cleanup. In simple cases, buffer.obj would be the thing to talk to, except for memory views, where only the view knows the mapped memory layout but the underlying exporter has the methods to deal with the buffer. In that case, we may really want to leave it to the user to handle this. I don't think the compiler can do the right thing in all cases, and the user is really the only one who knows what kind of object should be used or even instantiated to wrap a buffer. Nothing we can do is shorter or more clearly readable than np.asarray() or whatever function a specific library has for this. So, what about just keeping buffer.obj visible and leaving everything else to users? Stefan From markflorisson88 at gmail.com Tue May 8 11:21:04 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 8 May 2012 10:21:04 +0100 Subject: [Cython] buffer syntax vs. memory view syntax In-Reply-To: <4FA8DE34.6050806@behnel.de> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7D940.5030607@behnel.de> <4FA7F194.5080008@astro.uio.no> <95c0afc3-08f4-47d1-8649-7b80f931be54@email.android.com> <4FA8D1F8.5020109@astro.uio.no> <4FA8D6E9.9090004@behnel.de> <4FA8DB02.2020902@astro.uio.no> <4FA8DE34.6050806@behnel.de> Message-ID: On 8 May 2012 09:49, Stefan Behnel wrote: > Dag Sverre Seljebotn, 08.05.2012 10:36: >> On 05/08/2012 10:18 AM, Stefan Behnel wrote: >>> Dag Sverre Seljebotn, 08.05.2012 09:57: >>>> On 05/07/2012 11:21 PM, mark florisson wrote: >>>>> On 7 May 2012 19:40, Dag Sverre Seljebotn wrote: >>>>>> mark florisson wrote: >>>>>>> On 7 May 2012 17:00, Dag Sverre Seljebotn wrote: >>>>>>>> On 05/07/2012 04:16 PM, Stefan Behnel wrote: >>>>>>>>> Stefan Behnel, 07.05.2012 15:04: >>>>>>>>>> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>>>>>>>>> BTW, with the coming of memoryviews, me and Mark talked about just >>>>>>>>>>> deprecating the "mytype[...]" meaning buffers, and rather treat it >>>>>>>>>>> as np.ndarray, array.array etc. being some sort of "template types". >>>>>>>>>>> That is, >>>>>>>>>>> we disallow "object[int]" and require some special declarations in >>>>>>>>>>> the relevant pxd files. >>>>>>>>>> >>>>>>>>>> Hmm, yes, it's unfortunate that we have two different types of >>>>>>>>>> syntax now, >>>>>>>>>> one that declares the item type before the brackets and one that >>>>>>>>>> declares it afterwards. >>>>>>>>> Should we consider the >>>>>>>>> buffer interface syntax deprecated and focus on the memory view >>>>>>>>> syntax? >>>>>>>> >>>>>>>> I think that's the very-long-term intention. Then again, it may be >>>>>>>> too early >>>>>>>> to really tell yet, we just need to see how the memory views play out >>>>>>>> in >>>>>>>> real life and whether they'll be able to replace np.ndarray[double] >>>>>>>> among real users. We don't want to shove things down users throats. >>>>>>>> >>>>>>>> But the use of the trailing-[] syntax needs some cleaning up. Me and >>>>>>>> Mark agreed we'd put this proposal forward when we got around to it: >>>>>>>> >>>>>>>> ? ?- Deprecate the "object[double]" form, where [dtype] can be stuck on >>>>>>>> ? ?any extension type >>>>>>>> >>>>>>>> ? ?- But, do NOT (for the next year at least) deprecate >>>>>>>> ? ?np.ndarray[double], >>>>>>>> ? ?array.array[double], etc. Basically, there should be a magic flag in >>>>>>>> ? ?extension type declarations saying "I can be a buffer". >>>>>>>> >>>>>>>> For one thing, that is sort of needed to open up things for templated >>>>>>>> cdef classes/fused types cdef classes, if that is ever implemented. >>>>>>> >>>>>>> Deprecating is definitely a good start. I think at least if you only >>>>>>> allow two types as buffers it will be at least reasonably clear when >>>>>>> one is dealing with fused types or buffers. >>>>>>> >>>>>>> Basically, I think memoryviews should live up to demands of the users, >>>>>>> which would mean there would be no reason to keep the buffer syntax. >>>>>> >>>>>> But they are different approaches -- use a different type/API, or just >>>>>> try to speed up parts of NumPy.. >>>>>> >>>>>>> One thing to do is make memoryviews coerce cheaply back to the >>>>>>> original objects if wanted (which is likely). Writting >>>>>>> np.asarray(mymemview) is kind of annoying. >>>>>> >>>>>> It is going to be very confusing to have type(mymemview), >>>>>> repr(mymemview), and so on come out as NumPy arrays, but not have the >>>>>> full API of NumPy. Unless you auto-convert on getattr to... >>>>> >>>>> Yeah, the idea is as very simple, as you mention, just keep the object >>>>> around cached, and when you slice construct one lazily. >>>>> >>>>>> If you want to eradicate the distinction between the backing array and >>>>>> the memory view and make it transparent, I really suggest you kick back >>>>>> alive np.ndarray (it can exist in some 'unrealized' state with delayed >>>>>> construction after slicing, and so on). Implementation much the same >>>>>> either way, it is all about how it is presented to the user. >>>>> >>>>> You mean the buffer syntax? >>>>> >>>>>> Something like mymemview.asobject() could work though, and while not >>>>>> much shorter, it would have some polymorphism that np.asarray does not >>>>>> have (based probably on some custom PEP 3118 extension) >>>>> >>>>> I was thinking you could allow the user to register a callback, and >>>>> use that to coerce from a memoryview back to an object (given a >>>>> memoryview object). For numpy this would be np.asarray, and the >>>>> implementation is allowed to cache the result (which it will). >>>>> It may be too magicky though... but it will be convenient. The >>>>> memoryview will act as a subclass, meaning that any of its methods >>>>> will override methods of the converted object. >>>> >>>> My point was that this seems *way* to magicky. >>>> >>>> Beyond "confusing users" and so on that are sort of subjective, here's a >>>> fundamental problem for you: We're making it very difficult to type-infer >>>> memoryviews. Consider: >>>> >>>> cdef double[:] x = ... >>>> y = x >>>> print y.shape >>>> >>>> Now, because y is not typed, you're semantically throwing in a conversion >>>> on line 2, so that line 3 says that you want the attribute access to be >>>> invoked on "whatever object x coerced back to". And we have no idea what >>>> kind of object that is. >>>> >>>> If you don't transparently convert to object, it'd be safe to automatically >>>> infer y as a double[:]. >>> >>> Why can't y be inferred as the type of x due to the assignment? >>> >>> >>>> On a related note, I've said before that I dislike the notion of >>>> >>>> cdef double[:] mview = obj >>>> >>>> I'd rather like >>>> >>>> cdef double[:] mview = double[:](obj) >>> >>> Why? We currently allow >>> >>> ? ? ?cdef char* s = some_py_bytes_string >>> >>> Auto-coercion is a serious part of the language, and I don't see the >>> advantage of requiring the redundancy in the case above. It's clear enough >>> to me what the typed assignment is intended to mean: get me a buffer view >>> on the object, regardless of what it is. >>> >>> >>>> I support Robert in that "np.ndarray[double]" is the syntax to use when you >>>> want this kind of transparent "be an object when I need to and a memory >>>> view when I need to". >>>> >>>> Proposal: >>>> >>>> ? 1) We NEVER deprecate "np.ndarray[double]", we commit to keeping that in >>>> the language. It means exactly what you would like double[:] to mean, i.e. >>>> a variable that is memoryview when you need to and an object otherwise. >>>> When you use this type, you bear the consequences of early-binding things >>>> that could in theory be overridden. >>>> >>>> ? 2) double[:] is for when you want to access data of *any* Python >>>> object in >>>> a generic way. Raw PEP 3118. In those situations, access to the underlying >>>> object is much less useful. >>>> >>>> ? ?2a) Therefore we require that you do "mview.asobject()" manually; doing >>>> "mview.foo()" is a compile-time error >>> >>> Sounds good. I think that would clean up the current syntax overlap very >>> nicely. >>> >>> >>>> ? ?2b) To drive the point home among users, and aid type inference and >>>> overall language clarity, we REMOVE the auto-acquisition and require that >>>> you do >>>> >>>> ? ? ?cdef double[:] mview = double[:](obj) >>> >>> I don't see the point, as noted above. Either "obj" is statically typed and >>> the bare assignment becomes a no-op, or it's not typed and the assignment >>> coerces by creating a view. As with all other typed assignments. >>> >>> >>>> ? ?2c) Perhaps: Do not even coerce to a Python memoryview and disallow >>>> "print mview"; instead require that you do "print mview.asmemoryview()" or >>>> "print memoryview(mview)" or somesuch. >>> >>> This seems to depend on 2b. >> >> This I don't understand. The question of 2c) is the analogue to >> auto-coercion of "char*" to bytes; approving 2c) would put memoryviews in >> line with char*. >> >> Then again, we could in future auto-coerce char* to a ctypes pointer, and >> in that case, coercing a memoryview to an object representing that >> memoryview would be OK. >> >> Either way, you would never get back the same object that you coerced from! > > Ah, that's what you meant. I thought you were referring to getting a > memoryview from an object. > > I agree that a buffer view shouldn't auto-coerce back to its owner (or to a > Python object in general), that's the whole point of the syntax cleanup. > > In simple cases, buffer.obj would be the thing to talk to, except for > memory views, where only the view knows the mapped memory layout but the > underlying exporter has the methods to deal with the buffer. In that case, > we may really want to leave it to the user to handle this. I don't think > the compiler can do the right thing in all cases, and the user is really > the only one who knows what kind of object should be used or even > instantiated to wrap a buffer. Nothing we can do is shorter or more clearly > readable than np.asarray() or whatever function a specific library has for > this. > > So, what about just keeping buffer.obj visible and leaving everything else > to users? buffer.base gets you the original object. > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Tue May 8 11:22:24 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 8 May 2012 10:22:24 +0100 Subject: [Cython] buffer syntax vs. memory view syntax In-Reply-To: <4FA8DB02.2020902@astro.uio.no> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7D940.5030607@behnel.de> <4FA7F194.5080008@astro.uio.no> <95c0afc3-08f4-47d1-8649-7b80f931be54@email.android.com> <4FA8D1F8.5020109@astro.uio.no> <4FA8D6E9.9090004@behnel.de> <4FA8DB02.2020902@astro.uio.no> Message-ID: On 8 May 2012 09:36, Dag Sverre Seljebotn wrote: > On 05/08/2012 10:18 AM, Stefan Behnel wrote: >> >> Dag Sverre Seljebotn, 08.05.2012 09:57: >>> >>> On 05/07/2012 11:21 PM, mark florisson wrote: >>>> >>>> On 7 May 2012 19:40, Dag Sverre Seljebotn wrote: >>>>> >>>>> mark florisson wrote: >>>>>> >>>>>> On 7 May 2012 17:00, Dag Sverre Seljebotn wrote: >>>>>>> >>>>>>> On 05/07/2012 04:16 PM, Stefan Behnel wrote: >>>>>>>> >>>>>>>> Stefan Behnel, 07.05.2012 15:04: >>>>>>>>> >>>>>>>>> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>>>>>>>> >>>>>>>>>> BTW, with the coming of memoryviews, me and Mark talked about just >>>>>>>>>> deprecating the "mytype[...]" meaning buffers, and rather treat it >>>>>>>>>> as np.ndarray, array.array etc. being some sort of "template >>>>>>>>>> types". >>>>>>>>>> That is, >>>>>>>>>> we disallow "object[int]" and require some special declarations in >>>>>>>>>> the relevant pxd files. >>>>>>>>> >>>>>>>>> >>>>>>>>> Hmm, yes, it's unfortunate that we have two different types of >>>>>>>>> syntax now, >>>>>>>>> one that declares the item type before the brackets and one that >>>>>>>>> declares it afterwards. >>>>>>>> >>>>>>>> Should we consider the >>>>>>>> buffer interface syntax deprecated and focus on the memory view >>>>>>>> syntax? >>>>>>> >>>>>>> >>>>>>> I think that's the very-long-term intention. Then again, it may be >>>>>>> too early >>>>>>> to really tell yet, we just need to see how the memory views play out >>>>>>> in >>>>>>> real life and whether they'll be able to replace np.ndarray[double] >>>>>>> among real users. We don't want to shove things down users throats. >>>>>>> >>>>>>> But the use of the trailing-[] syntax needs some cleaning up. Me and >>>>>>> Mark agreed we'd put this proposal forward when we got around to it: >>>>>>> >>>>>>> ? - Deprecate the "object[double]" form, where [dtype] can be stuck >>>>>>> on >>>>>>> ? any extension type >>>>>>> >>>>>>> ? - But, do NOT (for the next year at least) deprecate >>>>>>> ? np.ndarray[double], >>>>>>> ? array.array[double], etc. Basically, there should be a magic flag >>>>>>> in >>>>>>> ? extension type declarations saying "I can be a buffer". >>>>>>> >>>>>>> For one thing, that is sort of needed to open up things for templated >>>>>>> cdef classes/fused types cdef classes, if that is ever implemented. >>>>>> >>>>>> >>>>>> Deprecating is definitely a good start. I think at least if you only >>>>>> allow two types as buffers it will be at least reasonably clear when >>>>>> one is dealing with fused types or buffers. >>>>>> >>>>>> Basically, I think memoryviews should live up to demands of the users, >>>>>> which would mean there would be no reason to keep the buffer syntax. >>>>> >>>>> >>>>> But they are different approaches -- use a different type/API, or just >>>>> try to speed up parts of NumPy.. >>>>> >>>>>> One thing to do is make memoryviews coerce cheaply back to the >>>>>> original objects if wanted (which is likely). Writting >>>>>> np.asarray(mymemview) is kind of annoying. >>>>> >>>>> >>>>> It is going to be very confusing to have type(mymemview), >>>>> repr(mymemview), and so on come out as NumPy arrays, but not have the >>>>> full API of NumPy. Unless you auto-convert on getattr to... >>>> >>>> >>>> Yeah, the idea is as very simple, as you mention, just keep the object >>>> around cached, and when you slice construct one lazily. >>>> >>>>> If you want to eradicate the distinction between the backing array and >>>>> the memory view and make it transparent, I really suggest you kick back >>>>> alive np.ndarray (it can exist in some 'unrealized' state with delayed >>>>> construction after slicing, and so on). Implementation much the same >>>>> either way, it is all about how it is presented to the user. >>>> >>>> >>>> You mean the buffer syntax? >>>> >>>>> Something like mymemview.asobject() could work though, and while not >>>>> much shorter, it would have some polymorphism that np.asarray does not >>>>> have (based probably on some custom PEP 3118 extension) >>>> >>>> >>>> I was thinking you could allow the user to register a callback, and >>>> use that to coerce from a memoryview back to an object (given a >>>> memoryview object). For numpy this would be np.asarray, and the >>>> implementation is allowed to cache the result (which it will). >>>> It may be too magicky though... but it will be convenient. The >>>> memoryview will act as a subclass, meaning that any of its methods >>>> will override methods of the converted object. >>> >>> >>> My point was that this seems *way* to magicky. >>> >>> Beyond "confusing users" and so on that are sort of subjective, here's a >>> fundamental problem for you: We're making it very difficult to type-infer >>> memoryviews. Consider: >>> >>> cdef double[:] x = ... >>> y = x >>> print y.shape >>> >>> Now, because y is not typed, you're semantically throwing in a conversion >>> on line 2, so that line 3 says that you want the attribute access to be >>> invoked on "whatever object x coerced back to". And we have no idea what >>> kind of object that is. >>> >>> If you don't transparently convert to object, it'd be safe to >>> automatically >>> infer y as a double[:]. >> >> >> Why can't y be inferred as the type of x due to the assignment? >> >> >>> On a related note, I've said before that I dislike the notion of >>> >>> cdef double[:] mview = obj >>> >>> I'd rather like >>> >>> cdef double[:] mview = double[:](obj) >> >> >> Why? We currently allow >> >> ? ? cdef char* s = some_py_bytes_string >> >> Auto-coercion is a serious part of the language, and I don't see the >> advantage of requiring the redundancy in the case above. It's clear enough >> to me what the typed assignment is intended to mean: get me a buffer view >> on the object, regardless of what it is. >> >> >>> I support Robert in that "np.ndarray[double]" is the syntax to use when >>> you >>> want this kind of transparent "be an object when I need to and a memory >>> view when I need to". >>> >>> Proposal: >>> >>> ?1) We NEVER deprecate "np.ndarray[double]", we commit to keeping that in >>> the language. It means exactly what you would like double[:] to mean, >>> i.e. >>> a variable that is memoryview when you need to and an object otherwise. >>> When you use this type, you bear the consequences of early-binding things >>> that could in theory be overridden. >>> >>> ?2) double[:] is for when you want to access data of *any* Python object >>> in >>> a generic way. Raw PEP 3118. In those situations, access to the >>> underlying >>> object is much less useful. >>> >>> ? 2a) Therefore we require that you do "mview.asobject()" manually; doing >>> "mview.foo()" is a compile-time error >> >> >> Sounds good. I think that would clean up the current syntax overlap very >> nicely. >> >> >>> ? 2b) To drive the point home among users, and aid type inference and >>> overall language clarity, we REMOVE the auto-acquisition and require that >>> you do >>> >>> ? ? cdef double[:] mview = double[:](obj) >> >> >> I don't see the point, as noted above. Either "obj" is statically typed >> and >> the bare assignment becomes a no-op, or it's not typed and the assignment >> coerces by creating a view. As with all other typed assignments. >> >> >>> ? 2c) Perhaps: Do not even coerce to a Python memoryview and disallow >>> "print mview"; instead require that you do "print mview.asmemoryview()" >>> or >>> "print memoryview(mview)" or somesuch. >> >> >> This seems to depend on 2b. > > > This I don't understand. The question of 2c) is the analogue to > auto-coercion of "char*" to bytes; approving 2c) would put memoryviews in > line with char*. > > Then again, we could in future auto-coerce char* to a ctypes pointer, and in > that case, coercing a memoryview to an object representing that memoryview > would be OK. Character pointers coerce to strings. Hell, even structs coerce to and from python dicts, so disallowing the same for memoryviews would just be inconsistent and inconvenient. > Either way, you would never get back the same object that you coerced from! > > Dag > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Tue May 8 11:24:56 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 8 May 2012 10:24:56 +0100 Subject: [Cython] buffer syntax vs. memory view syntax In-Reply-To: References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7D940.5030607@behnel.de> <4FA7F194.5080008@astro.uio.no> <95c0afc3-08f4-47d1-8649-7b80f931be54@email.android.com> <4FA8D1F8.5020109@astro.uio.no> <4FA8D6E9.9090004@behnel.de> <4FA8DB02.2020902@astro.uio.no> Message-ID: On 8 May 2012 10:22, mark florisson wrote: > On 8 May 2012 09:36, Dag Sverre Seljebotn wrote: >> On 05/08/2012 10:18 AM, Stefan Behnel wrote: >>> >>> Dag Sverre Seljebotn, 08.05.2012 09:57: >>>> >>>> On 05/07/2012 11:21 PM, mark florisson wrote: >>>>> >>>>> On 7 May 2012 19:40, Dag Sverre Seljebotn wrote: >>>>>> >>>>>> mark florisson wrote: >>>>>>> >>>>>>> On 7 May 2012 17:00, Dag Sverre Seljebotn wrote: >>>>>>>> >>>>>>>> On 05/07/2012 04:16 PM, Stefan Behnel wrote: >>>>>>>>> >>>>>>>>> Stefan Behnel, 07.05.2012 15:04: >>>>>>>>>> >>>>>>>>>> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>>>>>>>>> >>>>>>>>>>> BTW, with the coming of memoryviews, me and Mark talked about just >>>>>>>>>>> deprecating the "mytype[...]" meaning buffers, and rather treat it >>>>>>>>>>> as np.ndarray, array.array etc. being some sort of "template >>>>>>>>>>> types". >>>>>>>>>>> That is, >>>>>>>>>>> we disallow "object[int]" and require some special declarations in >>>>>>>>>>> the relevant pxd files. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hmm, yes, it's unfortunate that we have two different types of >>>>>>>>>> syntax now, >>>>>>>>>> one that declares the item type before the brackets and one that >>>>>>>>>> declares it afterwards. >>>>>>>>> >>>>>>>>> Should we consider the >>>>>>>>> buffer interface syntax deprecated and focus on the memory view >>>>>>>>> syntax? >>>>>>>> >>>>>>>> >>>>>>>> I think that's the very-long-term intention. Then again, it may be >>>>>>>> too early >>>>>>>> to really tell yet, we just need to see how the memory views play out >>>>>>>> in >>>>>>>> real life and whether they'll be able to replace np.ndarray[double] >>>>>>>> among real users. We don't want to shove things down users throats. >>>>>>>> >>>>>>>> But the use of the trailing-[] syntax needs some cleaning up. Me and >>>>>>>> Mark agreed we'd put this proposal forward when we got around to it: >>>>>>>> >>>>>>>> ? - Deprecate the "object[double]" form, where [dtype] can be stuck >>>>>>>> on >>>>>>>> ? any extension type >>>>>>>> >>>>>>>> ? - But, do NOT (for the next year at least) deprecate >>>>>>>> ? np.ndarray[double], >>>>>>>> ? array.array[double], etc. Basically, there should be a magic flag >>>>>>>> in >>>>>>>> ? extension type declarations saying "I can be a buffer". >>>>>>>> >>>>>>>> For one thing, that is sort of needed to open up things for templated >>>>>>>> cdef classes/fused types cdef classes, if that is ever implemented. >>>>>>> >>>>>>> >>>>>>> Deprecating is definitely a good start. I think at least if you only >>>>>>> allow two types as buffers it will be at least reasonably clear when >>>>>>> one is dealing with fused types or buffers. >>>>>>> >>>>>>> Basically, I think memoryviews should live up to demands of the users, >>>>>>> which would mean there would be no reason to keep the buffer syntax. >>>>>> >>>>>> >>>>>> But they are different approaches -- use a different type/API, or just >>>>>> try to speed up parts of NumPy.. >>>>>> >>>>>>> One thing to do is make memoryviews coerce cheaply back to the >>>>>>> original objects if wanted (which is likely). Writting >>>>>>> np.asarray(mymemview) is kind of annoying. >>>>>> >>>>>> >>>>>> It is going to be very confusing to have type(mymemview), >>>>>> repr(mymemview), and so on come out as NumPy arrays, but not have the >>>>>> full API of NumPy. Unless you auto-convert on getattr to... >>>>> >>>>> >>>>> Yeah, the idea is as very simple, as you mention, just keep the object >>>>> around cached, and when you slice construct one lazily. >>>>> >>>>>> If you want to eradicate the distinction between the backing array and >>>>>> the memory view and make it transparent, I really suggest you kick back >>>>>> alive np.ndarray (it can exist in some 'unrealized' state with delayed >>>>>> construction after slicing, and so on). Implementation much the same >>>>>> either way, it is all about how it is presented to the user. >>>>> >>>>> >>>>> You mean the buffer syntax? >>>>> >>>>>> Something like mymemview.asobject() could work though, and while not >>>>>> much shorter, it would have some polymorphism that np.asarray does not >>>>>> have (based probably on some custom PEP 3118 extension) >>>>> >>>>> >>>>> I was thinking you could allow the user to register a callback, and >>>>> use that to coerce from a memoryview back to an object (given a >>>>> memoryview object). For numpy this would be np.asarray, and the >>>>> implementation is allowed to cache the result (which it will). >>>>> It may be too magicky though... but it will be convenient. The >>>>> memoryview will act as a subclass, meaning that any of its methods >>>>> will override methods of the converted object. >>>> >>>> >>>> My point was that this seems *way* to magicky. >>>> >>>> Beyond "confusing users" and so on that are sort of subjective, here's a >>>> fundamental problem for you: We're making it very difficult to type-infer >>>> memoryviews. Consider: >>>> >>>> cdef double[:] x = ... >>>> y = x >>>> print y.shape >>>> >>>> Now, because y is not typed, you're semantically throwing in a conversion >>>> on line 2, so that line 3 says that you want the attribute access to be >>>> invoked on "whatever object x coerced back to". And we have no idea what >>>> kind of object that is. >>>> >>>> If you don't transparently convert to object, it'd be safe to >>>> automatically >>>> infer y as a double[:]. >>> >>> >>> Why can't y be inferred as the type of x due to the assignment? >>> >>> >>>> On a related note, I've said before that I dislike the notion of >>>> >>>> cdef double[:] mview = obj >>>> >>>> I'd rather like >>>> >>>> cdef double[:] mview = double[:](obj) >>> >>> >>> Why? We currently allow >>> >>> ? ? cdef char* s = some_py_bytes_string >>> >>> Auto-coercion is a serious part of the language, and I don't see the >>> advantage of requiring the redundancy in the case above. It's clear enough >>> to me what the typed assignment is intended to mean: get me a buffer view >>> on the object, regardless of what it is. >>> >>> >>>> I support Robert in that "np.ndarray[double]" is the syntax to use when >>>> you >>>> want this kind of transparent "be an object when I need to and a memory >>>> view when I need to". >>>> >>>> Proposal: >>>> >>>> ?1) We NEVER deprecate "np.ndarray[double]", we commit to keeping that in >>>> the language. It means exactly what you would like double[:] to mean, >>>> i.e. >>>> a variable that is memoryview when you need to and an object otherwise. >>>> When you use this type, you bear the consequences of early-binding things >>>> that could in theory be overridden. >>>> >>>> ?2) double[:] is for when you want to access data of *any* Python object >>>> in >>>> a generic way. Raw PEP 3118. In those situations, access to the >>>> underlying >>>> object is much less useful. >>>> >>>> ? 2a) Therefore we require that you do "mview.asobject()" manually; doing >>>> "mview.foo()" is a compile-time error >>> >>> >>> Sounds good. I think that would clean up the current syntax overlap very >>> nicely. >>> >>> >>>> ? 2b) To drive the point home among users, and aid type inference and >>>> overall language clarity, we REMOVE the auto-acquisition and require that >>>> you do >>>> >>>> ? ? cdef double[:] mview = double[:](obj) >>> >>> >>> I don't see the point, as noted above. Either "obj" is statically typed >>> and >>> the bare assignment becomes a no-op, or it's not typed and the assignment >>> coerces by creating a view. As with all other typed assignments. >>> >>> >>>> ? 2c) Perhaps: Do not even coerce to a Python memoryview and disallow >>>> "print mview"; instead require that you do "print mview.asmemoryview()" >>>> or >>>> "print memoryview(mview)" or somesuch. >>> >>> >>> This seems to depend on 2b. >> >> >> This I don't understand. The question of 2c) is the analogue to >> auto-coercion of "char*" to bytes; approving 2c) would put memoryviews in >> line with char*. >> >> Then again, we could in future auto-coerce char* to a ctypes pointer, and in >> that case, coercing a memoryview to an object representing that memoryview >> would be OK. > > Character pointers coerce to strings. Hell, even structs coerce to and > from python dicts, so disallowing the same for memoryviews would just > be inconsistent and inconvenient. Also,?if you don't allow coercion from python, then it means they also cannot be used as 'def' function arguments and be called from python. >> Either way, you would never get back the same object that you coerced from! >> >> Dag >> >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Tue May 8 11:26:00 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 8 May 2012 10:26:00 +0100 Subject: [Cython] buffer syntax vs. memory view syntax In-Reply-To: <4FA8DE34.6050806@behnel.de> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7D940.5030607@behnel.de> <4FA7F194.5080008@astro.uio.no> <95c0afc3-08f4-47d1-8649-7b80f931be54@email.android.com> <4FA8D1F8.5020109@astro.uio.no> <4FA8D6E9.9090004@behnel.de> <4FA8DB02.2020902@astro.uio.no> <4FA8DE34.6050806@behnel.de> Message-ID: On 8 May 2012 09:49, Stefan Behnel wrote: > Dag Sverre Seljebotn, 08.05.2012 10:36: >> On 05/08/2012 10:18 AM, Stefan Behnel wrote: >>> Dag Sverre Seljebotn, 08.05.2012 09:57: >>>> On 05/07/2012 11:21 PM, mark florisson wrote: >>>>> On 7 May 2012 19:40, Dag Sverre Seljebotn wrote: >>>>>> mark florisson wrote: >>>>>>> On 7 May 2012 17:00, Dag Sverre Seljebotn wrote: >>>>>>>> On 05/07/2012 04:16 PM, Stefan Behnel wrote: >>>>>>>>> Stefan Behnel, 07.05.2012 15:04: >>>>>>>>>> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>>>>>>>>> BTW, with the coming of memoryviews, me and Mark talked about just >>>>>>>>>>> deprecating the "mytype[...]" meaning buffers, and rather treat it >>>>>>>>>>> as np.ndarray, array.array etc. being some sort of "template types". >>>>>>>>>>> That is, >>>>>>>>>>> we disallow "object[int]" and require some special declarations in >>>>>>>>>>> the relevant pxd files. >>>>>>>>>> >>>>>>>>>> Hmm, yes, it's unfortunate that we have two different types of >>>>>>>>>> syntax now, >>>>>>>>>> one that declares the item type before the brackets and one that >>>>>>>>>> declares it afterwards. >>>>>>>>> Should we consider the >>>>>>>>> buffer interface syntax deprecated and focus on the memory view >>>>>>>>> syntax? >>>>>>>> >>>>>>>> I think that's the very-long-term intention. Then again, it may be >>>>>>>> too early >>>>>>>> to really tell yet, we just need to see how the memory views play out >>>>>>>> in >>>>>>>> real life and whether they'll be able to replace np.ndarray[double] >>>>>>>> among real users. We don't want to shove things down users throats. >>>>>>>> >>>>>>>> But the use of the trailing-[] syntax needs some cleaning up. Me and >>>>>>>> Mark agreed we'd put this proposal forward when we got around to it: >>>>>>>> >>>>>>>> ? ?- Deprecate the "object[double]" form, where [dtype] can be stuck on >>>>>>>> ? ?any extension type >>>>>>>> >>>>>>>> ? ?- But, do NOT (for the next year at least) deprecate >>>>>>>> ? ?np.ndarray[double], >>>>>>>> ? ?array.array[double], etc. Basically, there should be a magic flag in >>>>>>>> ? ?extension type declarations saying "I can be a buffer". >>>>>>>> >>>>>>>> For one thing, that is sort of needed to open up things for templated >>>>>>>> cdef classes/fused types cdef classes, if that is ever implemented. >>>>>>> >>>>>>> Deprecating is definitely a good start. I think at least if you only >>>>>>> allow two types as buffers it will be at least reasonably clear when >>>>>>> one is dealing with fused types or buffers. >>>>>>> >>>>>>> Basically, I think memoryviews should live up to demands of the users, >>>>>>> which would mean there would be no reason to keep the buffer syntax. >>>>>> >>>>>> But they are different approaches -- use a different type/API, or just >>>>>> try to speed up parts of NumPy.. >>>>>> >>>>>>> One thing to do is make memoryviews coerce cheaply back to the >>>>>>> original objects if wanted (which is likely). Writting >>>>>>> np.asarray(mymemview) is kind of annoying. >>>>>> >>>>>> It is going to be very confusing to have type(mymemview), >>>>>> repr(mymemview), and so on come out as NumPy arrays, but not have the >>>>>> full API of NumPy. Unless you auto-convert on getattr to... >>>>> >>>>> Yeah, the idea is as very simple, as you mention, just keep the object >>>>> around cached, and when you slice construct one lazily. >>>>> >>>>>> If you want to eradicate the distinction between the backing array and >>>>>> the memory view and make it transparent, I really suggest you kick back >>>>>> alive np.ndarray (it can exist in some 'unrealized' state with delayed >>>>>> construction after slicing, and so on). Implementation much the same >>>>>> either way, it is all about how it is presented to the user. >>>>> >>>>> You mean the buffer syntax? >>>>> >>>>>> Something like mymemview.asobject() could work though, and while not >>>>>> much shorter, it would have some polymorphism that np.asarray does not >>>>>> have (based probably on some custom PEP 3118 extension) >>>>> >>>>> I was thinking you could allow the user to register a callback, and >>>>> use that to coerce from a memoryview back to an object (given a >>>>> memoryview object). For numpy this would be np.asarray, and the >>>>> implementation is allowed to cache the result (which it will). >>>>> It may be too magicky though... but it will be convenient. The >>>>> memoryview will act as a subclass, meaning that any of its methods >>>>> will override methods of the converted object. >>>> >>>> My point was that this seems *way* to magicky. >>>> >>>> Beyond "confusing users" and so on that are sort of subjective, here's a >>>> fundamental problem for you: We're making it very difficult to type-infer >>>> memoryviews. Consider: >>>> >>>> cdef double[:] x = ... >>>> y = x >>>> print y.shape >>>> >>>> Now, because y is not typed, you're semantically throwing in a conversion >>>> on line 2, so that line 3 says that you want the attribute access to be >>>> invoked on "whatever object x coerced back to". And we have no idea what >>>> kind of object that is. >>>> >>>> If you don't transparently convert to object, it'd be safe to automatically >>>> infer y as a double[:]. >>> >>> Why can't y be inferred as the type of x due to the assignment? >>> >>> >>>> On a related note, I've said before that I dislike the notion of >>>> >>>> cdef double[:] mview = obj >>>> >>>> I'd rather like >>>> >>>> cdef double[:] mview = double[:](obj) >>> >>> Why? We currently allow >>> >>> ? ? ?cdef char* s = some_py_bytes_string >>> >>> Auto-coercion is a serious part of the language, and I don't see the >>> advantage of requiring the redundancy in the case above. It's clear enough >>> to me what the typed assignment is intended to mean: get me a buffer view >>> on the object, regardless of what it is. >>> >>> >>>> I support Robert in that "np.ndarray[double]" is the syntax to use when you >>>> want this kind of transparent "be an object when I need to and a memory >>>> view when I need to". >>>> >>>> Proposal: >>>> >>>> ? 1) We NEVER deprecate "np.ndarray[double]", we commit to keeping that in >>>> the language. It means exactly what you would like double[:] to mean, i.e. >>>> a variable that is memoryview when you need to and an object otherwise. >>>> When you use this type, you bear the consequences of early-binding things >>>> that could in theory be overridden. >>>> >>>> ? 2) double[:] is for when you want to access data of *any* Python >>>> object in >>>> a generic way. Raw PEP 3118. In those situations, access to the underlying >>>> object is much less useful. >>>> >>>> ? ?2a) Therefore we require that you do "mview.asobject()" manually; doing >>>> "mview.foo()" is a compile-time error >>> >>> Sounds good. I think that would clean up the current syntax overlap very >>> nicely. >>> >>> >>>> ? ?2b) To drive the point home among users, and aid type inference and >>>> overall language clarity, we REMOVE the auto-acquisition and require that >>>> you do >>>> >>>> ? ? ?cdef double[:] mview = double[:](obj) >>> >>> I don't see the point, as noted above. Either "obj" is statically typed and >>> the bare assignment becomes a no-op, or it's not typed and the assignment >>> coerces by creating a view. As with all other typed assignments. >>> >>> >>>> ? ?2c) Perhaps: Do not even coerce to a Python memoryview and disallow >>>> "print mview"; instead require that you do "print mview.asmemoryview()" or >>>> "print memoryview(mview)" or somesuch. >>> >>> This seems to depend on 2b. >> >> This I don't understand. The question of 2c) is the analogue to >> auto-coercion of "char*" to bytes; approving 2c) would put memoryviews in >> line with char*. >> >> Then again, we could in future auto-coerce char* to a ctypes pointer, and >> in that case, coercing a memoryview to an object representing that >> memoryview would be OK. >> >> Either way, you would never get back the same object that you coerced from! > > Ah, that's what you meant. I thought you were referring to getting a > memoryview from an object. > > I agree that a buffer view shouldn't auto-coerce back to its owner (or to a > Python object in general), that's the whole point of the syntax cleanup. > > In simple cases, buffer.obj would be the thing to talk to, except for > memory views, where only the view knows the mapped memory layout but the > underlying exporter has the methods to deal with the buffer. In that case, > we may really want to leave it to the user to handle this. I don't think > the compiler can do the right thing in all cases, and the user is really > the only one who knows what kind of object should be used or even > instantiated to wrap a buffer. Nothing we can do is shorter or more clearly > readable than np.asarray() or whatever function a specific library has for > this. > > So, what about just keeping buffer.obj visible and leaving everything else > to users? What about allowing a user callback to trigger when accessing buffer.obj, of which results may be cached? buffer.base will then still remain the 'original base object'. > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From d.s.seljebotn at astro.uio.no Tue May 8 11:30:02 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 08 May 2012 11:30:02 +0200 Subject: [Cython] buffer syntax vs. memory view syntax In-Reply-To: References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7D940.5030607@behnel.de> <4FA7F194.5080008@astro.uio.no> <95c0afc3-08f4-47d1-8649-7b80f931be54@email.android.com> <4FA8D1F8.5020109@astro.uio.no> <4FA8D6E9.9090004@behnel.de> <4FA8DB02.2020902@astro.uio.no> Message-ID: <4FA8E79A.4040402@astro.uio.no> On 05/08/2012 11:22 AM, mark florisson wrote: > On 8 May 2012 09:36, Dag Sverre Seljebotn wrote: >> On 05/08/2012 10:18 AM, Stefan Behnel wrote: >>> >>> Dag Sverre Seljebotn, 08.05.2012 09:57: >>>> >>>> On 05/07/2012 11:21 PM, mark florisson wrote: >>>>> >>>>> On 7 May 2012 19:40, Dag Sverre Seljebotn wrote: >>>>>> >>>>>> mark florisson wrote: >>>>>>> >>>>>>> On 7 May 2012 17:00, Dag Sverre Seljebotn wrote: >>>>>>>> >>>>>>>> On 05/07/2012 04:16 PM, Stefan Behnel wrote: >>>>>>>>> >>>>>>>>> Stefan Behnel, 07.05.2012 15:04: >>>>>>>>>> >>>>>>>>>> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>>>>>>>>> >>>>>>>>>>> BTW, with the coming of memoryviews, me and Mark talked about just >>>>>>>>>>> deprecating the "mytype[...]" meaning buffers, and rather treat it >>>>>>>>>>> as np.ndarray, array.array etc. being some sort of "template >>>>>>>>>>> types". >>>>>>>>>>> That is, >>>>>>>>>>> we disallow "object[int]" and require some special declarations in >>>>>>>>>>> the relevant pxd files. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hmm, yes, it's unfortunate that we have two different types of >>>>>>>>>> syntax now, >>>>>>>>>> one that declares the item type before the brackets and one that >>>>>>>>>> declares it afterwards. >>>>>>>>> >>>>>>>>> Should we consider the >>>>>>>>> buffer interface syntax deprecated and focus on the memory view >>>>>>>>> syntax? >>>>>>>> >>>>>>>> >>>>>>>> I think that's the very-long-term intention. Then again, it may be >>>>>>>> too early >>>>>>>> to really tell yet, we just need to see how the memory views play out >>>>>>>> in >>>>>>>> real life and whether they'll be able to replace np.ndarray[double] >>>>>>>> among real users. We don't want to shove things down users throats. >>>>>>>> >>>>>>>> But the use of the trailing-[] syntax needs some cleaning up. Me and >>>>>>>> Mark agreed we'd put this proposal forward when we got around to it: >>>>>>>> >>>>>>>> - Deprecate the "object[double]" form, where [dtype] can be stuck >>>>>>>> on >>>>>>>> any extension type >>>>>>>> >>>>>>>> - But, do NOT (for the next year at least) deprecate >>>>>>>> np.ndarray[double], >>>>>>>> array.array[double], etc. Basically, there should be a magic flag >>>>>>>> in >>>>>>>> extension type declarations saying "I can be a buffer". >>>>>>>> >>>>>>>> For one thing, that is sort of needed to open up things for templated >>>>>>>> cdef classes/fused types cdef classes, if that is ever implemented. >>>>>>> >>>>>>> >>>>>>> Deprecating is definitely a good start. I think at least if you only >>>>>>> allow two types as buffers it will be at least reasonably clear when >>>>>>> one is dealing with fused types or buffers. >>>>>>> >>>>>>> Basically, I think memoryviews should live up to demands of the users, >>>>>>> which would mean there would be no reason to keep the buffer syntax. >>>>>> >>>>>> >>>>>> But they are different approaches -- use a different type/API, or just >>>>>> try to speed up parts of NumPy.. >>>>>> >>>>>>> One thing to do is make memoryviews coerce cheaply back to the >>>>>>> original objects if wanted (which is likely). Writting >>>>>>> np.asarray(mymemview) is kind of annoying. >>>>>> >>>>>> >>>>>> It is going to be very confusing to have type(mymemview), >>>>>> repr(mymemview), and so on come out as NumPy arrays, but not have the >>>>>> full API of NumPy. Unless you auto-convert on getattr to... >>>>> >>>>> >>>>> Yeah, the idea is as very simple, as you mention, just keep the object >>>>> around cached, and when you slice construct one lazily. >>>>> >>>>>> If you want to eradicate the distinction between the backing array and >>>>>> the memory view and make it transparent, I really suggest you kick back >>>>>> alive np.ndarray (it can exist in some 'unrealized' state with delayed >>>>>> construction after slicing, and so on). Implementation much the same >>>>>> either way, it is all about how it is presented to the user. >>>>> >>>>> >>>>> You mean the buffer syntax? >>>>> >>>>>> Something like mymemview.asobject() could work though, and while not >>>>>> much shorter, it would have some polymorphism that np.asarray does not >>>>>> have (based probably on some custom PEP 3118 extension) >>>>> >>>>> >>>>> I was thinking you could allow the user to register a callback, and >>>>> use that to coerce from a memoryview back to an object (given a >>>>> memoryview object). For numpy this would be np.asarray, and the >>>>> implementation is allowed to cache the result (which it will). >>>>> It may be too magicky though... but it will be convenient. The >>>>> memoryview will act as a subclass, meaning that any of its methods >>>>> will override methods of the converted object. >>>> >>>> >>>> My point was that this seems *way* to magicky. >>>> >>>> Beyond "confusing users" and so on that are sort of subjective, here's a >>>> fundamental problem for you: We're making it very difficult to type-infer >>>> memoryviews. Consider: >>>> >>>> cdef double[:] x = ... >>>> y = x >>>> print y.shape >>>> >>>> Now, because y is not typed, you're semantically throwing in a conversion >>>> on line 2, so that line 3 says that you want the attribute access to be >>>> invoked on "whatever object x coerced back to". And we have no idea what >>>> kind of object that is. >>>> >>>> If you don't transparently convert to object, it'd be safe to >>>> automatically >>>> infer y as a double[:]. >>> >>> >>> Why can't y be inferred as the type of x due to the assignment? >>> >>> >>>> On a related note, I've said before that I dislike the notion of >>>> >>>> cdef double[:] mview = obj >>>> >>>> I'd rather like >>>> >>>> cdef double[:] mview = double[:](obj) >>> >>> >>> Why? We currently allow >>> >>> cdef char* s = some_py_bytes_string >>> >>> Auto-coercion is a serious part of the language, and I don't see the >>> advantage of requiring the redundancy in the case above. It's clear enough >>> to me what the typed assignment is intended to mean: get me a buffer view >>> on the object, regardless of what it is. >>> >>> >>>> I support Robert in that "np.ndarray[double]" is the syntax to use when >>>> you >>>> want this kind of transparent "be an object when I need to and a memory >>>> view when I need to". >>>> >>>> Proposal: >>>> >>>> 1) We NEVER deprecate "np.ndarray[double]", we commit to keeping that in >>>> the language. It means exactly what you would like double[:] to mean, >>>> i.e. >>>> a variable that is memoryview when you need to and an object otherwise. >>>> When you use this type, you bear the consequences of early-binding things >>>> that could in theory be overridden. >>>> >>>> 2) double[:] is for when you want to access data of *any* Python object >>>> in >>>> a generic way. Raw PEP 3118. In those situations, access to the >>>> underlying >>>> object is much less useful. >>>> >>>> 2a) Therefore we require that you do "mview.asobject()" manually; doing >>>> "mview.foo()" is a compile-time error >>> >>> >>> Sounds good. I think that would clean up the current syntax overlap very >>> nicely. >>> >>> >>>> 2b) To drive the point home among users, and aid type inference and >>>> overall language clarity, we REMOVE the auto-acquisition and require that >>>> you do >>>> >>>> cdef double[:] mview = double[:](obj) >>> >>> >>> I don't see the point, as noted above. Either "obj" is statically typed >>> and >>> the bare assignment becomes a no-op, or it's not typed and the assignment >>> coerces by creating a view. As with all other typed assignments. >>> >>> >>>> 2c) Perhaps: Do not even coerce to a Python memoryview and disallow >>>> "print mview"; instead require that you do "print mview.asmemoryview()" >>>> or >>>> "print memoryview(mview)" or somesuch. >>> >>> >>> This seems to depend on 2b. >> >> >> This I don't understand. The question of 2c) is the analogue to >> auto-coercion of "char*" to bytes; approving 2c) would put memoryviews in >> line with char*. >> >> Then again, we could in future auto-coerce char* to a ctypes pointer, and in >> that case, coercing a memoryview to an object representing that memoryview >> would be OK. > > Character pointers coerce to strings. Hell, even structs coerce to and > from python dicts, so disallowing the same for memoryviews would just > be inconsistent and inconvenient. OK, but even structs don't coerce back to some arbitrary type, it's always a dict. I don't necesarrily oppose coercing memoryviews to some Python memoryview object (not necesarrily the builtin). I agree that some mview.asobject() triggering a callback defined by some CEP 1xxx ("cross-language CEP") would be really useful; and that could form the basis of a new, improved np.ndarray[double] that allows fast slicing etc. (where that is used automatically whenever needed). Dag From d.s.seljebotn at astro.uio.no Tue May 8 11:47:26 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 08 May 2012 11:47:26 +0200 Subject: [Cython] buffer syntax vs. memory view syntax In-Reply-To: <4FA8E79A.4040402@astro.uio.no> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7D940.5030607@behnel.de> <4FA7F194.5080008@astro.uio.no> <95c0afc3-08f4-47d1-8649-7b80f931be54@email.android.com> <4FA8D1F8.5020109@astro.uio.no> <4FA8D6E9.9090004@behnel.de> <4FA8DB02.2020902@astro.uio.no> <4FA8E79A.4040402@astro.uio.no> Message-ID: <4FA8EBAE.3010106@astro.uio.no> On 05/08/2012 11:30 AM, Dag Sverre Seljebotn wrote: > On 05/08/2012 11:22 AM, mark florisson wrote: >> On 8 May 2012 09:36, Dag Sverre Seljebotn >> wrote: >>> On 05/08/2012 10:18 AM, Stefan Behnel wrote: >>>> >>>> Dag Sverre Seljebotn, 08.05.2012 09:57: >>>>> >>>>> On 05/07/2012 11:21 PM, mark florisson wrote: >>>>>> >>>>>> On 7 May 2012 19:40, Dag Sverre Seljebotn wrote: >>>>>>> >>>>>>> mark florisson wrote: >>>>>>>> >>>>>>>> On 7 May 2012 17:00, Dag Sverre Seljebotn wrote: >>>>>>>>> >>>>>>>>> On 05/07/2012 04:16 PM, Stefan Behnel wrote: >>>>>>>>>> >>>>>>>>>> Stefan Behnel, 07.05.2012 15:04: >>>>>>>>>>> >>>>>>>>>>> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>>>>>>>>>> >>>>>>>>>>>> BTW, with the coming of memoryviews, me and Mark talked >>>>>>>>>>>> about just >>>>>>>>>>>> deprecating the "mytype[...]" meaning buffers, and rather >>>>>>>>>>>> treat it >>>>>>>>>>>> as np.ndarray, array.array etc. being some sort of "template >>>>>>>>>>>> types". >>>>>>>>>>>> That is, >>>>>>>>>>>> we disallow "object[int]" and require some special >>>>>>>>>>>> declarations in >>>>>>>>>>>> the relevant pxd files. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hmm, yes, it's unfortunate that we have two different types of >>>>>>>>>>> syntax now, >>>>>>>>>>> one that declares the item type before the brackets and one that >>>>>>>>>>> declares it afterwards. >>>>>>>>>> >>>>>>>>>> Should we consider the >>>>>>>>>> buffer interface syntax deprecated and focus on the memory view >>>>>>>>>> syntax? >>>>>>>>> >>>>>>>>> >>>>>>>>> I think that's the very-long-term intention. Then again, it may be >>>>>>>>> too early >>>>>>>>> to really tell yet, we just need to see how the memory views >>>>>>>>> play out >>>>>>>>> in >>>>>>>>> real life and whether they'll be able to replace >>>>>>>>> np.ndarray[double] >>>>>>>>> among real users. We don't want to shove things down users >>>>>>>>> throats. >>>>>>>>> >>>>>>>>> But the use of the trailing-[] syntax needs some cleaning up. >>>>>>>>> Me and >>>>>>>>> Mark agreed we'd put this proposal forward when we got around >>>>>>>>> to it: >>>>>>>>> >>>>>>>>> - Deprecate the "object[double]" form, where [dtype] can be stuck >>>>>>>>> on >>>>>>>>> any extension type >>>>>>>>> >>>>>>>>> - But, do NOT (for the next year at least) deprecate >>>>>>>>> np.ndarray[double], >>>>>>>>> array.array[double], etc. Basically, there should be a magic flag >>>>>>>>> in >>>>>>>>> extension type declarations saying "I can be a buffer". >>>>>>>>> >>>>>>>>> For one thing, that is sort of needed to open up things for >>>>>>>>> templated >>>>>>>>> cdef classes/fused types cdef classes, if that is ever >>>>>>>>> implemented. >>>>>>>> >>>>>>>> >>>>>>>> Deprecating is definitely a good start. I think at least if you >>>>>>>> only >>>>>>>> allow two types as buffers it will be at least reasonably clear >>>>>>>> when >>>>>>>> one is dealing with fused types or buffers. >>>>>>>> >>>>>>>> Basically, I think memoryviews should live up to demands of the >>>>>>>> users, >>>>>>>> which would mean there would be no reason to keep the buffer >>>>>>>> syntax. >>>>>>> >>>>>>> >>>>>>> But they are different approaches -- use a different type/API, or >>>>>>> just >>>>>>> try to speed up parts of NumPy.. >>>>>>> >>>>>>>> One thing to do is make memoryviews coerce cheaply back to the >>>>>>>> original objects if wanted (which is likely). Writting >>>>>>>> np.asarray(mymemview) is kind of annoying. >>>>>>> >>>>>>> >>>>>>> It is going to be very confusing to have type(mymemview), >>>>>>> repr(mymemview), and so on come out as NumPy arrays, but not have >>>>>>> the >>>>>>> full API of NumPy. Unless you auto-convert on getattr to... >>>>>> >>>>>> >>>>>> Yeah, the idea is as very simple, as you mention, just keep the >>>>>> object >>>>>> around cached, and when you slice construct one lazily. >>>>>> >>>>>>> If you want to eradicate the distinction between the backing >>>>>>> array and >>>>>>> the memory view and make it transparent, I really suggest you >>>>>>> kick back >>>>>>> alive np.ndarray (it can exist in some 'unrealized' state with >>>>>>> delayed >>>>>>> construction after slicing, and so on). Implementation much the same >>>>>>> either way, it is all about how it is presented to the user. >>>>>> >>>>>> >>>>>> You mean the buffer syntax? >>>>>> >>>>>>> Something like mymemview.asobject() could work though, and while not >>>>>>> much shorter, it would have some polymorphism that np.asarray >>>>>>> does not >>>>>>> have (based probably on some custom PEP 3118 extension) >>>>>> >>>>>> >>>>>> I was thinking you could allow the user to register a callback, and >>>>>> use that to coerce from a memoryview back to an object (given a >>>>>> memoryview object). For numpy this would be np.asarray, and the >>>>>> implementation is allowed to cache the result (which it will). >>>>>> It may be too magicky though... but it will be convenient. The >>>>>> memoryview will act as a subclass, meaning that any of its methods >>>>>> will override methods of the converted object. >>>>> >>>>> >>>>> My point was that this seems *way* to magicky. >>>>> >>>>> Beyond "confusing users" and so on that are sort of subjective, >>>>> here's a >>>>> fundamental problem for you: We're making it very difficult to >>>>> type-infer >>>>> memoryviews. Consider: >>>>> >>>>> cdef double[:] x = ... >>>>> y = x >>>>> print y.shape >>>>> >>>>> Now, because y is not typed, you're semantically throwing in a >>>>> conversion >>>>> on line 2, so that line 3 says that you want the attribute access >>>>> to be >>>>> invoked on "whatever object x coerced back to". And we have no idea >>>>> what >>>>> kind of object that is. >>>>> >>>>> If you don't transparently convert to object, it'd be safe to >>>>> automatically >>>>> infer y as a double[:]. >>>> >>>> >>>> Why can't y be inferred as the type of x due to the assignment? >>>> >>>> >>>>> On a related note, I've said before that I dislike the notion of >>>>> >>>>> cdef double[:] mview = obj >>>>> >>>>> I'd rather like >>>>> >>>>> cdef double[:] mview = double[:](obj) >>>> >>>> >>>> Why? We currently allow >>>> >>>> cdef char* s = some_py_bytes_string >>>> >>>> Auto-coercion is a serious part of the language, and I don't see the >>>> advantage of requiring the redundancy in the case above. It's clear >>>> enough >>>> to me what the typed assignment is intended to mean: get me a buffer >>>> view >>>> on the object, regardless of what it is. >>>> >>>> >>>>> I support Robert in that "np.ndarray[double]" is the syntax to use >>>>> when >>>>> you >>>>> want this kind of transparent "be an object when I need to and a >>>>> memory >>>>> view when I need to". >>>>> >>>>> Proposal: >>>>> >>>>> 1) We NEVER deprecate "np.ndarray[double]", we commit to keeping >>>>> that in >>>>> the language. It means exactly what you would like double[:] to mean, >>>>> i.e. >>>>> a variable that is memoryview when you need to and an object >>>>> otherwise. >>>>> When you use this type, you bear the consequences of early-binding >>>>> things >>>>> that could in theory be overridden. >>>>> >>>>> 2) double[:] is for when you want to access data of *any* Python >>>>> object >>>>> in >>>>> a generic way. Raw PEP 3118. In those situations, access to the >>>>> underlying >>>>> object is much less useful. >>>>> >>>>> 2a) Therefore we require that you do "mview.asobject()" manually; >>>>> doing >>>>> "mview.foo()" is a compile-time error >>>> >>>> >>>> Sounds good. I think that would clean up the current syntax overlap >>>> very >>>> nicely. >>>> >>>> >>>>> 2b) To drive the point home among users, and aid type inference and >>>>> overall language clarity, we REMOVE the auto-acquisition and >>>>> require that >>>>> you do >>>>> >>>>> cdef double[:] mview = double[:](obj) >>>> >>>> >>>> I don't see the point, as noted above. Either "obj" is statically typed >>>> and >>>> the bare assignment becomes a no-op, or it's not typed and the >>>> assignment >>>> coerces by creating a view. As with all other typed assignments. >>>> >>>> >>>>> 2c) Perhaps: Do not even coerce to a Python memoryview and disallow >>>>> "print mview"; instead require that you do "print >>>>> mview.asmemoryview()" >>>>> or >>>>> "print memoryview(mview)" or somesuch. >>>> >>>> >>>> This seems to depend on 2b. >>> >>> >>> This I don't understand. The question of 2c) is the analogue to >>> auto-coercion of "char*" to bytes; approving 2c) would put >>> memoryviews in >>> line with char*. >>> >>> Then again, we could in future auto-coerce char* to a ctypes pointer, >>> and in >>> that case, coercing a memoryview to an object representing that >>> memoryview >>> would be OK. >> >> Character pointers coerce to strings. Hell, even structs coerce to and >> from python dicts, so disallowing the same for memoryviews would just >> be inconsistent and inconvenient. > > OK, but even structs don't coerce back to some arbitrary type, it's > always a dict. I don't necesarrily oppose coercing memoryviews to some > Python memoryview object (not necesarrily the builtin). > > I agree that some mview.asobject() triggering a callback defined by some > CEP 1xxx ("cross-language CEP") would be really useful; and that could > form the basis of a new, improved np.ndarray[double] that allows fast > slicing etc. (where that is used automatically whenever needed). After some thinking I believe I can see more clearly where Mark is coming from. To sum up, it's either A) Keep both np.ndarray[double] and double[:] around, with clearly defined and separate roles. np.ndarray[double] implementation is revamped to allow fast slicing etc., based on the double[:] implementation. B) Deprecate np.ndarray[double] sooner rather than later, but make double[:] have functionality that is *really* close to what np.ndarray[double] currently does. In most cases one should be able to basically replace np.ndarray[double] with double[:] and the code should continue to work just like before; difference is that if you pass in anything else than a NumPy array, it will likely fail with a runtime AttributeError at some point rather than fail a PyType_Check. Between those two I believe it's a matter of design taste, not so much rational argument, and I don't know where I stand yet. And I'm going to stop thinking about it until I see what Robert says... Dag From stefan_ml at behnel.de Tue May 8 11:48:51 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 08 May 2012 11:48:51 +0200 Subject: [Cython] buffer syntax vs. memory view syntax In-Reply-To: References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7D940.5030607@behnel.de> <4FA7F194.5080008@astro.uio.no> <95c0afc3-08f4-47d1-8649-7b80f931be54@email.android.com> <4FA8D1F8.5020109@astro.uio.no> <4FA8D6E9.9090004@behnel.de> <4FA8DB02.2020902@astro.uio.no>

Message-ID: <4FA8EC03.4000201@behnel.de> mark florisson, 08.05.2012 11:24: >>>> Dag Sverre Seljebotn, 08.05.2012 09:57: >>>>> 1) We NEVER deprecate "np.ndarray[double]", we commit to keeping that in >>>>> the language. It means exactly what you would like double[:] to mean, >>>>> i.e. >>>>> a variable that is memoryview when you need to and an object otherwise. >>>>> When you use this type, you bear the consequences of early-binding things >>>>> that could in theory be overridden. >>>>> >>>>> 2) double[:] is for when you want to access data of *any* Python object >>>>> in a generic way. Raw PEP 3118. In those situations, access to the >>>>> underlying object is much less useful. >>>>> >>>>> 2a) Therefore we require that you do "mview.asobject()" manually; doing >>>>> "mview.foo()" is a compile-time error >> [...] >> Character pointers coerce to strings. Hell, even structs coerce to and >> from python dicts, so disallowing the same for memoryviews would just >> be inconsistent and inconvenient. Two separate things to discuss here: the original exporter and a Python level wrapper. As long as wrapping the memoryview in a new object is can easily be done by users, I don't see a reason to provide compiler support for getting at the exporter. After all, a user may have a memory view that is backed by a NumPy array but wants to reinterpret it as a PIL image. Just because the underlying object has a specific object type doesn't mean that's the one to use for a given use case. If a user requires a specific object *instead* of a bare memory view, we have the object type buffer syntax for that. It's also not necessarily more efficient to access the underlying object than to create a new one if the underlying exporter has to learn about the mapped layout first. Regarding the coercion to Python, I do not see a problem with providing a general Python view object for memory views that arbitrary Cython memory views can coerce to. In fact, I consider that a useful feature. The builtin memoryview type in Python (at least the one in CPython 3.3) should be quite capable of providing this, although I don't mind what exactly this becomes. > Also, if you don't allow coercion from python, then it means they also > cannot be used as 'def' function arguments and be called from python. Coercion *from* Python is not being questioned. We have syntax for that, and a Python memory view wrapper can easily be unboxed (even transitively) through the buffer interface when entering back into Cython. Stefan From markflorisson88 at gmail.com Tue May 8 12:35:13 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 8 May 2012 11:35:13 +0100 Subject: [Cython] buffer syntax vs. memory view syntax In-Reply-To: <4FA8EBAE.3010106@astro.uio.no> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7D940.5030607@behnel.de> <4FA7F194.5080008@astro.uio.no> <95c0afc3-08f4-47d1-8649-7b80f931be54@email.android.com> <4FA8D1F8.5020109@astro.uio.no> <4FA8D6E9.9090004@behnel.de> <4FA8DB02.2020902@astro.uio.no> <4FA8E79A.4040402@astro.uio.no> <4FA8EBAE.3010106@astro.uio.no> Message-ID: On 8 May 2012 10:47, Dag Sverre Seljebotn wrote: > On 05/08/2012 11:30 AM, Dag Sverre Seljebotn wrote: >> >> On 05/08/2012 11:22 AM, mark florisson wrote: >>> >>> On 8 May 2012 09:36, Dag Sverre Seljebotn >>> wrote: >>>> >>>> On 05/08/2012 10:18 AM, Stefan Behnel wrote: >>>>> >>>>> >>>>> Dag Sverre Seljebotn, 08.05.2012 09:57: >>>>>> >>>>>> >>>>>> On 05/07/2012 11:21 PM, mark florisson wrote: >>>>>>> >>>>>>> >>>>>>> On 7 May 2012 19:40, Dag Sverre Seljebotn wrote: >>>>>>>> >>>>>>>> >>>>>>>> mark florisson wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 7 May 2012 17:00, Dag Sverre Seljebotn wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 05/07/2012 04:16 PM, Stefan Behnel wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Stefan Behnel, 07.05.2012 15:04: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> BTW, with the coming of memoryviews, me and Mark talked >>>>>>>>>>>>> about just >>>>>>>>>>>>> deprecating the "mytype[...]" meaning buffers, and rather >>>>>>>>>>>>> treat it >>>>>>>>>>>>> as np.ndarray, array.array etc. being some sort of "template >>>>>>>>>>>>> types". >>>>>>>>>>>>> That is, >>>>>>>>>>>>> we disallow "object[int]" and require some special >>>>>>>>>>>>> declarations in >>>>>>>>>>>>> the relevant pxd files. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hmm, yes, it's unfortunate that we have two different types of >>>>>>>>>>>> syntax now, >>>>>>>>>>>> one that declares the item type before the brackets and one that >>>>>>>>>>>> declares it afterwards. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Should we consider the >>>>>>>>>>> buffer interface syntax deprecated and focus on the memory view >>>>>>>>>>> syntax? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I think that's the very-long-term intention. Then again, it may be >>>>>>>>>> too early >>>>>>>>>> to really tell yet, we just need to see how the memory views >>>>>>>>>> play out >>>>>>>>>> in >>>>>>>>>> real life and whether they'll be able to replace >>>>>>>>>> np.ndarray[double] >>>>>>>>>> among real users. We don't want to shove things down users >>>>>>>>>> throats. >>>>>>>>>> >>>>>>>>>> But the use of the trailing-[] syntax needs some cleaning up. >>>>>>>>>> Me and >>>>>>>>>> Mark agreed we'd put this proposal forward when we got around >>>>>>>>>> to it: >>>>>>>>>> >>>>>>>>>> - Deprecate the "object[double]" form, where [dtype] can be stuck >>>>>>>>>> on >>>>>>>>>> any extension type >>>>>>>>>> >>>>>>>>>> - But, do NOT (for the next year at least) deprecate >>>>>>>>>> np.ndarray[double], >>>>>>>>>> array.array[double], etc. Basically, there should be a magic flag >>>>>>>>>> in >>>>>>>>>> extension type declarations saying "I can be a buffer". >>>>>>>>>> >>>>>>>>>> For one thing, that is sort of needed to open up things for >>>>>>>>>> templated >>>>>>>>>> cdef classes/fused types cdef classes, if that is ever >>>>>>>>>> implemented. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Deprecating is definitely a good start. I think at least if you >>>>>>>>> only >>>>>>>>> allow two types as buffers it will be at least reasonably clear >>>>>>>>> when >>>>>>>>> one is dealing with fused types or buffers. >>>>>>>>> >>>>>>>>> Basically, I think memoryviews should live up to demands of the >>>>>>>>> users, >>>>>>>>> which would mean there would be no reason to keep the buffer >>>>>>>>> syntax. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> But they are different approaches -- use a different type/API, or >>>>>>>> just >>>>>>>> try to speed up parts of NumPy.. >>>>>>>> >>>>>>>>> One thing to do is make memoryviews coerce cheaply back to the >>>>>>>>> original objects if wanted (which is likely). Writting >>>>>>>>> np.asarray(mymemview) is kind of annoying. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> It is going to be very confusing to have type(mymemview), >>>>>>>> repr(mymemview), and so on come out as NumPy arrays, but not have >>>>>>>> the >>>>>>>> full API of NumPy. Unless you auto-convert on getattr to... >>>>>>> >>>>>>> >>>>>>> >>>>>>> Yeah, the idea is as very simple, as you mention, just keep the >>>>>>> object >>>>>>> around cached, and when you slice construct one lazily. >>>>>>> >>>>>>>> If you want to eradicate the distinction between the backing >>>>>>>> array and >>>>>>>> the memory view and make it transparent, I really suggest you >>>>>>>> kick back >>>>>>>> alive np.ndarray (it can exist in some 'unrealized' state with >>>>>>>> delayed >>>>>>>> construction after slicing, and so on). Implementation much the same >>>>>>>> either way, it is all about how it is presented to the user. >>>>>>> >>>>>>> >>>>>>> >>>>>>> You mean the buffer syntax? >>>>>>> >>>>>>>> Something like mymemview.asobject() could work though, and while not >>>>>>>> much shorter, it would have some polymorphism that np.asarray >>>>>>>> does not >>>>>>>> have (based probably on some custom PEP 3118 extension) >>>>>>> >>>>>>> >>>>>>> >>>>>>> I was thinking you could allow the user to register a callback, and >>>>>>> use that to coerce from a memoryview back to an object (given a >>>>>>> memoryview object). For numpy this would be np.asarray, and the >>>>>>> implementation is allowed to cache the result (which it will). >>>>>>> It may be too magicky though... but it will be convenient. The >>>>>>> memoryview will act as a subclass, meaning that any of its methods >>>>>>> will override methods of the converted object. >>>>>> >>>>>> >>>>>> >>>>>> My point was that this seems *way* to magicky. >>>>>> >>>>>> Beyond "confusing users" and so on that are sort of subjective, >>>>>> here's a >>>>>> fundamental problem for you: We're making it very difficult to >>>>>> type-infer >>>>>> memoryviews. Consider: >>>>>> >>>>>> cdef double[:] x = ... >>>>>> y = x >>>>>> print y.shape >>>>>> >>>>>> Now, because y is not typed, you're semantically throwing in a >>>>>> conversion >>>>>> on line 2, so that line 3 says that you want the attribute access >>>>>> to be >>>>>> invoked on "whatever object x coerced back to". And we have no idea >>>>>> what >>>>>> kind of object that is. >>>>>> >>>>>> If you don't transparently convert to object, it'd be safe to >>>>>> automatically >>>>>> infer y as a double[:]. >>>>> >>>>> >>>>> >>>>> Why can't y be inferred as the type of x due to the assignment? >>>>> >>>>> >>>>>> On a related note, I've said before that I dislike the notion of >>>>>> >>>>>> cdef double[:] mview = obj >>>>>> >>>>>> I'd rather like >>>>>> >>>>>> cdef double[:] mview = double[:](obj) >>>>> >>>>> >>>>> >>>>> Why? We currently allow >>>>> >>>>> cdef char* s = some_py_bytes_string >>>>> >>>>> Auto-coercion is a serious part of the language, and I don't see the >>>>> advantage of requiring the redundancy in the case above. It's clear >>>>> enough >>>>> to me what the typed assignment is intended to mean: get me a buffer >>>>> view >>>>> on the object, regardless of what it is. >>>>> >>>>> >>>>>> I support Robert in that "np.ndarray[double]" is the syntax to use >>>>>> when >>>>>> you >>>>>> want this kind of transparent "be an object when I need to and a >>>>>> memory >>>>>> view when I need to". >>>>>> >>>>>> Proposal: >>>>>> >>>>>> 1) We NEVER deprecate "np.ndarray[double]", we commit to keeping >>>>>> that in >>>>>> the language. It means exactly what you would like double[:] to mean, >>>>>> i.e. >>>>>> a variable that is memoryview when you need to and an object >>>>>> otherwise. >>>>>> When you use this type, you bear the consequences of early-binding >>>>>> things >>>>>> that could in theory be overridden. >>>>>> >>>>>> 2) double[:] is for when you want to access data of *any* Python >>>>>> object >>>>>> in >>>>>> a generic way. Raw PEP 3118. In those situations, access to the >>>>>> underlying >>>>>> object is much less useful. >>>>>> >>>>>> 2a) Therefore we require that you do "mview.asobject()" manually; >>>>>> doing >>>>>> "mview.foo()" is a compile-time error >>>>> >>>>> >>>>> >>>>> Sounds good. I think that would clean up the current syntax overlap >>>>> very >>>>> nicely. >>>>> >>>>> >>>>>> 2b) To drive the point home among users, and aid type inference and >>>>>> overall language clarity, we REMOVE the auto-acquisition and >>>>>> require that >>>>>> you do >>>>>> >>>>>> cdef double[:] mview = double[:](obj) >>>>> >>>>> >>>>> >>>>> I don't see the point, as noted above. Either "obj" is statically typed >>>>> and >>>>> the bare assignment becomes a no-op, or it's not typed and the >>>>> assignment >>>>> coerces by creating a view. As with all other typed assignments. >>>>> >>>>> >>>>>> 2c) Perhaps: Do not even coerce to a Python memoryview and disallow >>>>>> "print mview"; instead require that you do "print >>>>>> mview.asmemoryview()" >>>>>> or >>>>>> "print memoryview(mview)" or somesuch. >>>>> >>>>> >>>>> >>>>> This seems to depend on 2b. >>>> >>>> >>>> >>>> This I don't understand. The question of 2c) is the analogue to >>>> auto-coercion of "char*" to bytes; approving 2c) would put >>>> memoryviews in >>>> line with char*. >>>> >>>> Then again, we could in future auto-coerce char* to a ctypes pointer, >>>> and in >>>> that case, coercing a memoryview to an object representing that >>>> memoryview >>>> would be OK. >>> >>> >>> Character pointers coerce to strings. Hell, even structs coerce to and >>> from python dicts, so disallowing the same for memoryviews would just >>> be inconsistent and inconvenient. >> >> >> OK, but even structs don't coerce back to some arbitrary type, it's >> always a dict. I don't necesarrily oppose coercing memoryviews to some >> Python memoryview object (not necesarrily the builtin). >> >> I agree that some mview.asobject() triggering a callback defined by some >> CEP 1xxx ("cross-language CEP") would be really useful; and that could >> form the basis of a new, improved np.ndarray[double] that allows fast >> slicing etc. (where that is used automatically whenever needed). > > > After some thinking I believe I can see more clearly where Mark is coming > from. To sum up, it's either > > A) Keep both np.ndarray[double] and double[:] around, with clearly defined > and separate roles. np.ndarray[double] implementation is revamped to allow > fast slicing etc., based on the double[:] implementation. > > B) Deprecate np.ndarray[double] sooner rather than later, but make double[:] > have functionality that is *really* close to what np.ndarray[double] > currently does. In most cases one should be able to basically replace > np.ndarray[double] with double[:] and the code should continue to work just > like before; difference is that if you pass in anything else than a NumPy > array, it will likely fail with a runtime AttributeError at some point > rather than fail a PyType_Check. That's a good summary. I have a big preference for B here, but I agree that treating a typed memoryview as both a user object (possibly converted through callback) and a typed memoryview "subclass" is quite magicky. I wouldn't particularly mind something concise like 'm.obj'. The AttributeError would be the case as usual, when a python object doesn't have the right interface. > Between those two I believe it's a matter of design taste, not so much > rational argument, and I don't know where I stand yet. And I'm going to stop > thinking about it until I see what Robert says... > > > Dag > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Tue May 8 12:35:30 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 8 May 2012 11:35:30 +0100 Subject: [Cython] buffer syntax vs. memory view syntax In-Reply-To: <4FA8EC03.4000201@behnel.de> References: <4FA7A618.4000503@astro.uio.no> <4FA7A6B2.5000801@astro.uio.no> <4FA7ADC0.40501@behnel.de> <4FA7B682.5050300@astro.uio.no> <4FA7C852.9020004@behnel.de> <4FA7D940.5030607@behnel.de> <4FA7F194.5080008@astro.uio.no> <95c0afc3-08f4-47d1-8649-7b80f931be54@email.android.com> <4FA8D1F8.5020109@astro.uio.no> <4FA8D6E9.9090004@behnel.de> <4FA8DB02.2020902@astro.uio.no>