From robertwb at math.washington.edu Wed Feb 1 19:50:45 2012 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 1 Feb 2012 10:50:45 -0800 Subject: [Cython] [cython-users] Re: How to find out where an AttributeError is ignored In-Reply-To: References: <2bdc0373-c865-4c88-9764-b520e7dcf707@t16g2000vba.googlegroups.com>

<0c7296f3-085d-4edd-8aaa-4062bb75d175@h6g2000yqk.googlegroups.com>

<4F22D7A2.1050806@behnel.de> <4F230312.9050506@astro.uio.no> <4F23109E.3030203@behnel.de>

Message-ID: On Tue, Jan 31, 2012 at 8:30 AM, mark florisson wrote: > On 31 January 2012 02:12, Robert Bradshaw wrote: >> On Fri, Jan 27, 2012 at 1:01 PM, Stefan Behnel wrote: >>> Dag Sverre Seljebotn, 27.01.2012 21:03: >>>> On 01/27/2012 05:58 PM, Stefan Behnel wrote: >>>>> mark florisson, 27.01.2012 17:30: >>>>>> On 27 January 2012 16:22, mark florisson ?wrote: >>>>>>> On 27 January 2012 15:47, Simon King ?wrote: >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I am still *very* frustrated about the fact that Cython does not tell >>>>>>>> where the error occurs. Since about one week, I am adding lots and >>>>>>>> lots of lines into Sage that write a log into some file, so that I get >>>>>>>> at least some idea where the error occurs. But still: Even these >>>>>>>> extensive logs do not provide a hint on what exactly is happening. >>>>>>>> >>>>>>>> How can I patch Cython such that some more information on the location >>>>>>>> of the error is printed? I unpacked Sage's Cython spkg, and did "grep - >>>>>>>> R ignored .", but the code lines containing the word "ignored" did not >>>>>>>> seem to be the lines that are responsible for printing the warning >>>>>>>> message >>>>>>>> ? ?Exception AttributeError: 'PolynomialRing_field_with_category' >>>>>>>> object has no attribute '_modulus' in ?ignored >>>>>>>> >>>>>>>> Can you point me to the file in Sage's Cython spkg which is >>>>>>>> responsible for printing the warning? >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Simon >>>>>>> >>>>>>> These messages are written by PyErr_WriteUnraisable, which is a >>>>>>> CPython C API function that writes unraisable exceptions. There are >>>>>>> typically two reasons for unraisable exceptions: >>>>>>> >>>>>>> ? ? 1) as Robert mentioned, a function that does not allow propagation >>>>>>> of exceptions, e.g. >>>>>>> >>>>>>> ? ? ? ? cdef int func(): >>>>>>> ? ? ? ? ? ? raise Exception >>>>>>> >>>>>>> ? ? ? ? Here there is no way to propagate the raised exception, so >>>>>>> instead one should write something like >>>>>>> >>>>>>> ? ? ? ? ? ? cdef int func() except -1: ... >>>>>>> >>>>>>> ? ? ? ? Alternatively one may use 'except *' in case there is no error >>>>>>> indicator and Cython should always check, or "except ? -1" which means >>>>>>> "-1 may or may not indicate an error". >>>>>>> >>>>>>> ? ? 2) in deallocators or finalizers (e.g. __dealloc__ or __del__) >>>>>>> >>>>>>> For functions the right thing is to add an except clause, for >>>>>>> finalizers and destructors one could use the traceback module, e.g. >>>>>>> >>>>>>> ? ? try: >>>>>>> ? ? ? ? ... >>>>>>> ? ? except: >>>>>>> ? ? ? ? traceback.print_exc() >>>>>>> >>>>>>> If this all still doesn't help, try setting a (deferred) breakpoint on >>>>>>> __Pyx_WriteUnraisable or PyErr_WriteUnraisable. >>>>>> >>>>>> Actually, I don't see why the default is to write unraisable >>>>>> exceptions. Instead Cython could detect that exceptions may propagate >>>>>> and have callers do the check (i.e. make it implicitly "except *"). >>>> >>>> As for speed, there's optimizations on this, e.g., "except? 32434623" if >>>> the return type is int, "except? 0xfffff..." if the return type is a pointer. >>>> >>>> And for floating point, we could make our own NaN -- that's obscure enough >>>> that it could probably be made "except cython.cython_exception_nan" by >>>> default, not "except? cython.cython_exception_nan". >>> >>> The problem with that is that we can't be sure that Cython will be the only >>> caller. So exceptions may still not propagate in cases, and users will have >>> to know about these "obscure" values and that they must deal with them >>> manually then. >>> >>> You could add that we'd just have to disable this when user code takes a >>> pointer from a function, but then, how many rules are there that users will >>> have to learn and remember after such a change? And what's that for a >>> language that changes the calling semantics of a function because way down >>> in the code someone happens to take a pointer to it? >>> >>> >>>>>> Was this not implemented because Cython only knows whether functions >>>>>> may propagate exceptions at code generation time by looking at the >>>>>> presence of an error label? >>>>>> Maybe it could keep code insertion points around for every call to >>>>>> such a potential function and if the function uses the error label >>>>>> have the caller perform the check? Although I do forsee problems for >>>>>> external such functions... maybe Cython could have it's own >>>>>> threadstate regardless of the GIL which would indicate whether an >>>>>> error has occurred? e.g. CyErr_Occurred()? >>>>> >>>>> Yep, those are the kind of reasons why writing unraisable exceptions is the >>>>> default. >>>> >>>> Still, >>> >>> I wasn't really advocating this behaviour, just indicating that it's hard >>> to do "better", because this "better" isn't all that clear. It's also not >>> "better" for all code, which means that we get from one trade-off to >>> another, while breaking existing code at the same time. Not exactly >>> paradise on either side of the tunnel. >> >> I still feel like we're stuck in the wrong default. I'd rather require >> more work to interact with C libraries than require more work to >> convert innocent-looking Python to Cython. >> >>> One example that keeps popping up in my mind is callback functions that >>> cannot propagate errors, at least not the CPython way. I have a couple of >>> those in lxml, even some returning void. So I wrap their code in a bare >>> try-except and when an exception strikes, I set a C error flag to tell the >>> C library that something went wrong and return normally. No Python code >>> outside of the try block. But Cython still generates code for unraisable >>> errors. Why? Because the internal code that handles the bare except clause >>> may fail and raise an exception. How about that? >>> >>> >>>> the need to explicitly declare "except *" keeps coming up again and >>>> again, and is really a blemish on the usability of Cython. When teaching >>>> people Cython, then it's really irritating to have to follow "all you need >>>> to do is add some 'cdef' and some types" with "and then you need to >>>> remember to say "except *", or you're in deep trouble". Cython sort of >>>> looks very elegant until that point... >>> >>> I know what this feels like. The problem is that these things *are* complex. >> >> Yes. We've been wrestling with this issue almost since Cython's inception... >> >> I like Mark's two-function idea, with the caveat that f(bad_argument) >> now behaves quite differently than (&f)[0](bad_argument) for even more >> obscure reasons. But it may be the way to go. >> >> The other option is to embed the error behavior into the signature and >> require casts to explicitly go from one to the other. This would >> probably require a notation for never raising an exception (e.g. >> "except -"). Cdef public or api functions could require an except >> declaration (positive or negative), ordinary cdef functions would be >> "except *" by default, and cdef extern functions would be "except -" >> by default. > > Only except * and except ? have ever made some sense to me. Except + > is the most mysterious syntax ever, imho it should have been 'except > cpperror' or something. And when you try to search for "except +" or > "except *" etc on docs.cython.org it doesn't find anything, which > makes it hard for people reading the code and unfamiliar with the > syntax to figure out what it means. In general I also think decorators > would have been clearer when defining such functions. Let's please not > introduce more weird syntax. "except IDENTIFIER" already has a meaning, and it's nice to have as few keywords as possible (though I agree for searching). The problem with decorators is that they don't lend themselves being declarable part of a type declaration. What would the syntax be for a function pointer (or extern function) that does propagate Python exceptions? What about one that throws a C++ exception? > In any event I don't see why we'd want 'except -', as we're trying to > get rid of the except clause. Is it possible to get rid of it entirely? I was thinking we could provide more natural defaults, but the user might still need to declare when things are different. > So you can still get your old behaviour > for function pointers by not using the except clause and having it > write unraisable exceptions in the function, but in Cython space you'd > simply get better semantics (that is, propagating exceptions). I agree with Dag that this is very similar to the GIL issue, and should probably be tackled similarly. - Robert From dtcaciuc at gmail.com Thu Feb 2 01:53:02 2012 From: dtcaciuc at gmail.com (Dimitri Tcaciuc) Date: Wed, 1 Feb 2012 16:53:02 -0800 Subject: [Cython] distutils extension pxd problem Message-ID: Hey everyone, I bumped into an issue where my .pyx file doesn't see its matching .pxd file. Here's a build test to show the problem If I change my target package from `b.a` to just `a`, it works as expected. Running `cython src/a.pyx` works as expected as well, but not the Extension. ---- PYTHON setup.py build_ext --inplace PYTHON -c "from b import a" ######## setup.py ######## from distutils.core import setup from distutils.extension import Extension from Cython.Distutils import build_ext ext_modules = [ Extension("b.a", ["src/a.pyx"]) ] setup( cmdclass = {'build_ext': build_ext}, ext_modules = ext_modules ) ######## b/__init__.py ######## ######## src/a.pxd ######## cdef class X: cdef object foo ######## src/a.pyx ######## cdef class X: def __cinit__(self): self.foo = 1 x = X() ---- Traceback (most recent call last): File "", line 1, in File "a.pyx", line 7, in init b.a (src/a.c:793) File "a.pyx", line 5, in b.a.X.__cinit__ (src/a.c:488) AttributeError: 'b.a.X' object has no attribute 'foo' ---- Any idea what's going on here? Thanks, Dimitri. From stefan_ml at behnel.de Thu Feb 2 08:11:16 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 02 Feb 2012 08:11:16 +0100 Subject: [Cython] distutils extension pxd problem In-Reply-To: References: Message-ID: <4F2A3714.7000704@behnel.de> Dimitri Tcaciuc, 02.02.2012 01:53: > I bumped into an issue where my .pyx file doesn't see its matching > .pxd file. Here's a build test to show the problem If I change my > target package from `b.a` to just `a`, it works as expected. Running > `cython src/a.pyx` works as expected as well, but not the Extension. > > ---- > > PYTHON setup.py build_ext --inplace > PYTHON -c "from b import a" > > ######## setup.py ######## > > from distutils.core import setup > from distutils.extension import Extension > from Cython.Distutils import build_ext > > ext_modules = [ > Extension("b.a", ["src/a.pyx"]) > ] > > setup( > cmdclass = {'build_ext': build_ext}, > ext_modules = ext_modules > ) > > ######## b/__init__.py ######## > > ######## src/a.pxd ######## > > cdef class X: > cdef object foo > > ######## src/a.pyx ######## > > cdef class X: > > def __cinit__(self): > self.foo = 1 > > x = X() > > ---- > > Traceback (most recent call last): > File "", line 1, in > File "a.pyx", line 7, in init b.a (src/a.c:793) > File "a.pyx", line 5, in b.a.X.__cinit__ (src/a.c:488) > AttributeError: 'b.a.X' object has no attribute 'foo' > > ---- Any reason you cannot rename "src" to "b"? Because that would fix your problem. Cython basically uses the same algorithm for finding package content as Python, i.e. it will look inside the package "b" when looking for "b.a". And "a.pxd" is not in "b" in your setup. Stefan From dtcaciuc at gmail.com Thu Feb 2 08:12:23 2012 From: dtcaciuc at gmail.com (Dimitri Tcaciuc) Date: Wed, 1 Feb 2012 23:12:23 -0800 Subject: [Cython] distutils extension pxd problem In-Reply-To: References: Message-ID: Ok, so I narrowed the problem down to https://github.com/cython/cython/blob/master/Cython/Compiler/Main.py#L223. At this point, it looks like if target extension name is `x.y.z`, the pxd must either be called `x.y.z.pxd` and be located in project root (I believe this is Pyrex convention?) or be in the exact x/y/z.pxd directory structure, and each of the parents have to be a package (ie. contain __init__.[py,pyx,pyd]). Again, this looks like a problem only if module name is nested. If this is along the right lines, I'd be happy to make some clarifications to http://docs.cython.org/src/userguide/sharing_declarations.html#search-paths-for-definition-files. It looks like there's a conflict between the Extension name parameter (which also says where the module gets installed in package tree) and the name of the actual .so file, which sometimes one needs to customize (eg. I need to compile x/y/z.pyx to x/y/_z.so since I'd like to have z.py with some extra bits in them. In this case `cythonize` seems to ignore the extension name, goes ahead and names the output `z.so`) I recon you guys probably had plenty of discussions on this topic. Is there a general direction where you're taking the whole include/pxd discovery system or is it staying where it is right now? Dimitri. On Wed, Feb 1, 2012 at 4:53 PM, Dimitri Tcaciuc wrote: > Hey everyone, > > I bumped into an issue where my .pyx file doesn't see its matching > .pxd file. Here's a build test to show the problem If I change my > target package from `b.a` to just `a`, it works as expected. Running > `cython src/a.pyx` works as expected as well, but not the Extension. > > ---- > > PYTHON setup.py build_ext --inplace > PYTHON -c "from b import a" > > ######## setup.py ######## > > from distutils.core import setup > from distutils.extension import Extension > from Cython.Distutils import build_ext > > ext_modules = [ > ? ?Extension("b.a", ["src/a.pyx"]) > ] > > setup( > ? ?cmdclass = {'build_ext': build_ext}, > ? ?ext_modules = ext_modules > ) > > ######## b/__init__.py ######## > > ######## src/a.pxd ######## > > cdef class X: > ? ?cdef object foo > > ######## src/a.pyx ######## > > cdef class X: > > ? ?def __cinit__(self): > ? ? ? ?self.foo = 1 > > x = X() > > ---- > > Traceback (most recent call last): > ?File "", line 1, in > ?File "a.pyx", line 7, in init b.a (src/a.c:793) > ?File "a.pyx", line 5, in b.a.X.__cinit__ (src/a.c:488) > AttributeError: 'b.a.X' object has no attribute 'foo' > > ---- > > Any idea what's going on here? > > Thanks, > > > Dimitri. From dtcaciuc at gmail.com Thu Feb 2 08:24:43 2012 From: dtcaciuc at gmail.com (Dimitri Tcaciuc) Date: Wed, 1 Feb 2012 23:24:43 -0800 Subject: [Cython] distutils extension pxd problem Message-ID: > Date: Thu, 02 Feb 2012 08:11:16 +0100 > From: Stefan Behnel > To: Core developer mailing list of the Cython compiler > ? ? ? ? > Subject: Re: [Cython] distutils extension pxd problem > Message-ID: <4F2A3714.7000704 at behnel.de> > Content-Type: text/plain; charset=UTF-8 > > Dimitri Tcaciuc, 02.02.2012 01:53: >> I bumped into an issue where my .pyx file doesn't see its matching >> .pxd file. Here's a build test to show the problem If I change my >> target package from `b.a` to just `a`, it works as expected. Running >> `cython src/a.pyx` works as expected as well, but not the Extension. >> >> ---- >> >> PYTHON setup.py build_ext --inplace >> PYTHON -c "from b import a" >> >> ######## setup.py ######## >> >> from distutils.core import setup >> from distutils.extension import Extension >> from Cython.Distutils import build_ext >> >> ext_modules = [ >> ? ? Extension("b.a", ["src/a.pyx"]) >> ] >> >> setup( >> ? ? cmdclass = {'build_ext': build_ext}, >> ? ? ext_modules = ext_modules >> ) >> >> ######## b/__init__.py ######## >> >> ######## src/a.pxd ######## >> >> cdef class X: >> ? ? cdef object foo >> >> ######## src/a.pyx ######## >> >> cdef class X: >> >> ? ? def __cinit__(self): >> ? ? ? ? self.foo = 1 >> >> x = X() >> >> ---- >> >> Traceback (most recent call last): >> ? File "", line 1, in >> ? File "a.pyx", line 7, in init b.a (src/a.c:793) >> ? File "a.pyx", line 5, in b.a.X.__cinit__ (src/a.c:488) >> AttributeError: 'b.a.X' object has no attribute 'foo' >> >> ---- > > Any reason you cannot rename "src" to "b"? Because that would fix your > problem. Cython basically uses the same algorithm for finding package > content as Python, i.e. it will look inside the package "b" when looking > for "b.a". And "a.pxd" is not in "b" in your setup. > > Stefan > This certainly looks like the cleanest solution now. I have some mixed C++ bits in src/ so I thought to keep compiled/interpreted bits separate, if possible. Dimitri. From stefan_ml at behnel.de Thu Feb 2 09:21:11 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 02 Feb 2012 09:21:11 +0100 Subject: [Cython] Cython's view on a common benchmark suite (was: Re: [Speed] Buildbot Status) In-Reply-To: References: <4F21B5AE.2080304@rehfisch.de>

<4F281DA4.7010300@paulgraydon.co.uk>

<2403B6B2-6077-4061-B85E-C32B8268B6FC@gmail.com>

Message-ID: <4F2A4777.2080705@behnel.de> Brett Cannon, 01.02.2012 18:25: > to prevent this from either ending up in a dead-end because of this, we > need to first decide where the canonical set of Python VM benchmarks are > going to live. I say hg.python.org/benchmarks for two reasons. One is that > Antoine has already done work there to port some of the benchmarks so there > is at least some there that are ready to be run under Python 3 (and the > tooling is in place to create separate Python 2 and Python 3 benchmark > suites). Two, this can be a test of having the various VM contributors work > out of hg.python.org if we are ever going to break the stdlib out for > shared development. At worst we can simply take the changes made at > pypy/benchmarks that apply to just the unladen benchmarks that exists, and > at best merge the two sets (manually) into one benchmark suite so PyPy > doesn't lose anything for Python 2 measurements that they have written and > CPython doesn't lose any of its Python 3 benchmarks that it has created. > > How does that sound? +1 FWIW, Cython currently uses both benchmark suites, that of PyPy (in Py2.7) and that of hg.python.org (in Py2.7 and 3.3), but without codespeed integration and also without a dedicated server for benchmark runs. So the results are unfortunately not accurate enough to spot minor changes even over time. https://sage.math.washington.edu:8091/hudson/view/bench/ We would like to join in on speed.python.org, once it's clear how the benchmarks will be run and how the data uploads work and all that. It already proved a bit tricky to get Cython integrated with the benchmark runner on our side, and I'm planning to rewrite that integration at some point, but it should already be doable to get "something" to work now. I should also note that we don't currently support the whole benchmark suite, so there must be a way to record individual benchmark results even in the face of failures in other benchmarks. Basically, speed.python.org would be useless for us if a failure in a single benchmark left us without any performance data at all, because it will still take us some time to get to 100% compliance and we would like to know if anything on that road has a performance impact. Currently, we apply a short patch that adds a try-except to the benchmark runner's main loop before starting the measurements, because otherwise it would just bail out completely on a single failure. Oh, and we also patch the benchmarks to remove references to __file__ because of CPython issue 13429, although we may be able to work around that at some point, specifically when doing on-the-fly compilation during imports. http://bugs.python.org/issue13429 Also note that benchmarks that only test C implemented stdlib modules (re, pickle, json) are useless for Cython because they would only end up timing the exact same code as for plain CPython. Another test that is useless for us is the "mako" benchmark, because most of what it does is to run generated code. There is currently no way for Cython to hook into that, so we're out of the game here. We also don't care about program startup tests, obviously, because we know that Cython's compiler overhead plus an optimising gcc run will render them meaningless anyway. I like the fact that there's still an old hg_startup timing result lingering around from the time before I disabled that test, telling us that Cython runs it 99.68% slower than CPython. Got to beat that. 8-) Stefan From d.s.seljebotn at astro.uio.no Thu Feb 2 13:19:13 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 02 Feb 2012 13:19:13 +0100 Subject: [Cython] memoryview slices can't be None? Message-ID: <4F2A7F41.6010303@astro.uio.no> I just realized that cdef int[:] a = None raises an exception; even though I'd argue that 'a' is of the "reference" kind of type where Cython usually allow None (i.e., "cdef MyClass b = None" is allowed even if type(None) is NoneType). Is this a bug or not, and is it possible to do something about it? Dag Sverre From markflorisson88 at gmail.com Thu Feb 2 22:10:49 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 2 Feb 2012 21:10:49 +0000 Subject: [Cython] [cython-users] Re: How to find out where an AttributeError is ignored In-Reply-To: References: <2bdc0373-c865-4c88-9764-b520e7dcf707@t16g2000vba.googlegroups.com>

<0c7296f3-085d-4edd-8aaa-4062bb75d175@h6g2000yqk.googlegroups.com>

<4F22D7A2.1050806@behnel.de> <4F230312.9050506@astro.uio.no> <4F23109E.3030203@behnel.de>

Message-ID: On 1 February 2012 18:50, Robert Bradshaw wrote: > On Tue, Jan 31, 2012 at 8:30 AM, mark florisson > wrote: >> On 31 January 2012 02:12, Robert Bradshaw wrote: >>> On Fri, Jan 27, 2012 at 1:01 PM, Stefan Behnel wrote: >>>> Dag Sverre Seljebotn, 27.01.2012 21:03: >>>>> On 01/27/2012 05:58 PM, Stefan Behnel wrote: >>>>>> mark florisson, 27.01.2012 17:30: >>>>>>> On 27 January 2012 16:22, mark florisson ?wrote: >>>>>>>> On 27 January 2012 15:47, Simon King ?wrote: >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I am still *very* frustrated about the fact that Cython does not tell >>>>>>>>> where the error occurs. Since about one week, I am adding lots and >>>>>>>>> lots of lines into Sage that write a log into some file, so that I get >>>>>>>>> at least some idea where the error occurs. But still: Even these >>>>>>>>> extensive logs do not provide a hint on what exactly is happening. >>>>>>>>> >>>>>>>>> How can I patch Cython such that some more information on the location >>>>>>>>> of the error is printed? I unpacked Sage's Cython spkg, and did "grep - >>>>>>>>> R ignored .", but the code lines containing the word "ignored" did not >>>>>>>>> seem to be the lines that are responsible for printing the warning >>>>>>>>> message >>>>>>>>> ? ?Exception AttributeError: 'PolynomialRing_field_with_category' >>>>>>>>> object has no attribute '_modulus' in ?ignored >>>>>>>>> >>>>>>>>> Can you point me to the file in Sage's Cython spkg which is >>>>>>>>> responsible for printing the warning? >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Simon >>>>>>>> >>>>>>>> These messages are written by PyErr_WriteUnraisable, which is a >>>>>>>> CPython C API function that writes unraisable exceptions. There are >>>>>>>> typically two reasons for unraisable exceptions: >>>>>>>> >>>>>>>> ? ? 1) as Robert mentioned, a function that does not allow propagation >>>>>>>> of exceptions, e.g. >>>>>>>> >>>>>>>> ? ? ? ? cdef int func(): >>>>>>>> ? ? ? ? ? ? raise Exception >>>>>>>> >>>>>>>> ? ? ? ? Here there is no way to propagate the raised exception, so >>>>>>>> instead one should write something like >>>>>>>> >>>>>>>> ? ? ? ? ? ? cdef int func() except -1: ... >>>>>>>> >>>>>>>> ? ? ? ? Alternatively one may use 'except *' in case there is no error >>>>>>>> indicator and Cython should always check, or "except ? -1" which means >>>>>>>> "-1 may or may not indicate an error". >>>>>>>> >>>>>>>> ? ? 2) in deallocators or finalizers (e.g. __dealloc__ or __del__) >>>>>>>> >>>>>>>> For functions the right thing is to add an except clause, for >>>>>>>> finalizers and destructors one could use the traceback module, e.g. >>>>>>>> >>>>>>>> ? ? try: >>>>>>>> ? ? ? ? ... >>>>>>>> ? ? except: >>>>>>>> ? ? ? ? traceback.print_exc() >>>>>>>> >>>>>>>> If this all still doesn't help, try setting a (deferred) breakpoint on >>>>>>>> __Pyx_WriteUnraisable or PyErr_WriteUnraisable. >>>>>>> >>>>>>> Actually, I don't see why the default is to write unraisable >>>>>>> exceptions. Instead Cython could detect that exceptions may propagate >>>>>>> and have callers do the check (i.e. make it implicitly "except *"). >>>>> >>>>> As for speed, there's optimizations on this, e.g., "except? 32434623" if >>>>> the return type is int, "except? 0xfffff..." if the return type is a pointer. >>>>> >>>>> And for floating point, we could make our own NaN -- that's obscure enough >>>>> that it could probably be made "except cython.cython_exception_nan" by >>>>> default, not "except? cython.cython_exception_nan". >>>> >>>> The problem with that is that we can't be sure that Cython will be the only >>>> caller. So exceptions may still not propagate in cases, and users will have >>>> to know about these "obscure" values and that they must deal with them >>>> manually then. >>>> >>>> You could add that we'd just have to disable this when user code takes a >>>> pointer from a function, but then, how many rules are there that users will >>>> have to learn and remember after such a change? And what's that for a >>>> language that changes the calling semantics of a function because way down >>>> in the code someone happens to take a pointer to it? >>>> >>>> >>>>>>> Was this not implemented because Cython only knows whether functions >>>>>>> may propagate exceptions at code generation time by looking at the >>>>>>> presence of an error label? >>>>>>> Maybe it could keep code insertion points around for every call to >>>>>>> such a potential function and if the function uses the error label >>>>>>> have the caller perform the check? Although I do forsee problems for >>>>>>> external such functions... maybe Cython could have it's own >>>>>>> threadstate regardless of the GIL which would indicate whether an >>>>>>> error has occurred? e.g. CyErr_Occurred()? >>>>>> >>>>>> Yep, those are the kind of reasons why writing unraisable exceptions is the >>>>>> default. >>>>> >>>>> Still, >>>> >>>> I wasn't really advocating this behaviour, just indicating that it's hard >>>> to do "better", because this "better" isn't all that clear. It's also not >>>> "better" for all code, which means that we get from one trade-off to >>>> another, while breaking existing code at the same time. Not exactly >>>> paradise on either side of the tunnel. >>> >>> I still feel like we're stuck in the wrong default. I'd rather require >>> more work to interact with C libraries than require more work to >>> convert innocent-looking Python to Cython. >>> >>>> One example that keeps popping up in my mind is callback functions that >>>> cannot propagate errors, at least not the CPython way. I have a couple of >>>> those in lxml, even some returning void. So I wrap their code in a bare >>>> try-except and when an exception strikes, I set a C error flag to tell the >>>> C library that something went wrong and return normally. No Python code >>>> outside of the try block. But Cython still generates code for unraisable >>>> errors. Why? Because the internal code that handles the bare except clause >>>> may fail and raise an exception. How about that? >>>> >>>> >>>>> the need to explicitly declare "except *" keeps coming up again and >>>>> again, and is really a blemish on the usability of Cython. When teaching >>>>> people Cython, then it's really irritating to have to follow "all you need >>>>> to do is add some 'cdef' and some types" with "and then you need to >>>>> remember to say "except *", or you're in deep trouble". Cython sort of >>>>> looks very elegant until that point... >>>> >>>> I know what this feels like. The problem is that these things *are* complex. >>> >>> Yes. We've been wrestling with this issue almost since Cython's inception... >>> >>> I like Mark's two-function idea, with the caveat that f(bad_argument) >>> now behaves quite differently than (&f)[0](bad_argument) for even more >>> obscure reasons. But it may be the way to go. >>> >>> The other option is to embed the error behavior into the signature and >>> require casts to explicitly go from one to the other. This would >>> probably require a notation for never raising an exception (e.g. >>> "except -"). Cdef public or api functions could require an except >>> declaration (positive or negative), ordinary cdef functions would be >>> "except *" by default, and cdef extern functions would be "except -" >>> by default. >> >> Only except * and except ? have ever made some sense to me. Except + >> is the most mysterious syntax ever, imho it should have been 'except >> cpperror' or something. And when you try to search for "except +" or >> "except *" etc on docs.cython.org it doesn't find anything, which >> makes it hard for people reading the code and unfamiliar with the >> syntax to figure out what it means. In general I also think decorators >> would have been clearer when defining such functions. Let's please not >> introduce more weird syntax. > > "except IDENTIFIER" already has a meaning, and it's nice to have as > few keywords as possible (though I agree for searching). The problem > with decorators is that they don't lend themselves being declarable > part of a type declaration. What would the syntax be for a function > pointer (or extern function) that does propagate Python exceptions? > What about one that throws a C++ exception? Right, that's why I said 'when defining the function' :) The except clause would still be used for declarations (and may also be used in definitions, if wanted). But it's best to provide the best possible defaults, in which case you only need except in rare cases. >> In any event I don't see why we'd want 'except -', as we're trying to >> get rid of the except clause. > > Is it possible to get rid of it entirely? I was thinking we could > provide more natural defaults, but the user might still need to > declare when things are different. > >> So you can still get your old behaviour >> for function pointers by not using the except clause and having it >> write unraisable exceptions in the function, but in Cython space you'd >> simply get better semantics (that is, propagating exceptions). > > I agree with Dag that this is very similar to the GIL issue, and > should probably be tackled similarly. > > - Robert > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Thu Feb 2 22:16:29 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 2 Feb 2012 21:16:29 +0000 Subject: [Cython] memoryview slices can't be None? In-Reply-To: <4F2A7F41.6010303@astro.uio.no> References: <4F2A7F41.6010303@astro.uio.no> Message-ID: On 2 February 2012 12:19, Dag Sverre Seljebotn wrote: > I just realized that > > cdef int[:] a = None > > raises an exception; even though I'd argue that 'a' is of the "reference" > kind of type where Cython usually allow None (i.e., "cdef MyClass b = None" > is allowed even if type(None) is NoneType). Is this a bug or not, and is it > possible to do something about it? > > Dag Sverre > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Yeah I disabled that quite early. It was supposed to be working but gave a lot of trouble in cases (segfaults, mainly). At the time I was trying to get rid of all the segfaults and get the basic functionality working, so I disabled it. Personally, I have never liked how things can be None unchecked. I personally prefer to write cdef foo(obj=None): cdef int[:] a if obj is None: obj = ... a = obj Often you forget to write 'not None' when declaring the parameter (and apparently that it only allowed for 'def' functions). As such, I never bothered to re-enable it. However, it does support control flow with uninitialized slices, and will raise an error if it is uninitialized. Do we want this behaviour (e.g. for consistency)? From d.s.seljebotn at astro.uio.no Thu Feb 2 22:38:15 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 02 Feb 2012 22:38:15 +0100 Subject: [Cython] memoryview slices can't be None? In-Reply-To: References: <4F2A7F41.6010303@astro.uio.no> Message-ID: <4F2B0247.8040008@astro.uio.no> On 02/02/2012 10:16 PM, mark florisson wrote: > On 2 February 2012 12:19, Dag Sverre Seljebotn > wrote: >> I just realized that >> >> cdef int[:] a = None >> >> raises an exception; even though I'd argue that 'a' is of the "reference" >> kind of type where Cython usually allow None (i.e., "cdef MyClass b = None" >> is allowed even if type(None) is NoneType). Is this a bug or not, and is it >> possible to do something about it? >> >> Dag Sverre >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > Yeah I disabled that quite early. It was supposed to be working but > gave a lot of trouble in cases (segfaults, mainly). At the time I was > trying to get rid of all the segfaults and get the basic functionality > working, so I disabled it. Personally, I have never liked how things Well, you can segfault quite easily with cdef MyClass a = None print a.field so it doesn't make sense to slices different from cdef classes IMO. > can be None unchecked. I personally prefer to write > > cdef foo(obj=None): > cdef int[:] a > if obj is None: > obj = ... > a = obj > > Often you forget to write 'not None' when declaring the parameter (and > apparently that it only allowed for 'def' functions). > > As such, I never bothered to re-enable it. However, it does support > control flow with uninitialized slices, and will raise an error if it > is uninitialized. Do we want this behaviour (e.g. for consistency)? When in doubt, go for consistency. So +1 for that reason. I do believe that setting stuff to None is rather vital in Python. What I typically do is more like this: def f(double[:] input, double[:] out=None): if out is None: out = np.empty_like(input) ... Having to use another variable name is a bit of a pain. (Come on -- do you use "a" in real code? What do you actually call "the other obj"? I sometimes end up with "out_" and so on, but it creates smelly code quite quickly.) It's easy to segfault with cdef classes anyway, so decent nonechecking should be implemented at some point, and then memoryviews would use the same mechanisms. Java has decent null-checking... Dag Sverre From markflorisson88 at gmail.com Fri Feb 3 00:09:09 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 2 Feb 2012 23:09:09 +0000 Subject: [Cython] memoryview slices can't be None? In-Reply-To: <4F2B0247.8040008@astro.uio.no> References: <4F2A7F41.6010303@astro.uio.no> <4F2B0247.8040008@astro.uio.no> Message-ID: On 2 February 2012 21:38, Dag Sverre Seljebotn wrote: > On 02/02/2012 10:16 PM, mark florisson wrote: >> >> On 2 February 2012 12:19, Dag Sverre Seljebotn >> ?wrote: >>> >>> I just realized that >>> >>> cdef int[:] a = None >>> >>> raises an exception; even though I'd argue that 'a' is of the "reference" >>> kind of type where Cython usually allow None (i.e., "cdef MyClass b = >>> None" >>> is allowed even if type(None) is NoneType). Is this a bug or not, and is >>> it >>> possible to do something about it? >>> >>> Dag Sverre >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> >> >> Yeah I disabled that quite early. It was supposed to be working but >> gave a lot of trouble in cases (segfaults, mainly). At the time I was >> trying to get rid of all the segfaults and get the basic functionality >> working, so I disabled it. Personally, I have never liked how things > > > Well, you can segfault quite easily with > > cdef MyClass a = None > print a.field > > so it doesn't make sense to slices different from cdef classes IMO. > > >> can be None unchecked. I personally prefer to write >> >> cdef foo(obj=None): >> ? ? cdef int[:] a >> ? ? if obj is None: >> ? ? ? ? obj = ... >> ? ? a = obj >> >> Often you forget to write 'not None' when declaring the parameter (and >> apparently that it only allowed for 'def' functions). >> >> As such, I never bothered to re-enable it. However, it does support >> control flow with uninitialized slices, and will raise an error if it >> is uninitialized. Do we want this behaviour (e.g. for consistency)? > > > When in doubt, go for consistency. So +1 for that reason. I do believe that > setting stuff to None is rather vital in Python. > > What I typically do is more like this: > > def f(double[:] input, double[:] out=None): > ? ?if out is None: > ? ? ? ?out = np.empty_like(input) > ? ?... > > Having to use another variable name is a bit of a pain. (Come on -- do you > use "a" in real code? What do you actually call "the other obj"? I sometimes > end up with "out_" and so on, but it creates smelly code quite quickly.) No, it was just a contrived example. > It's easy to segfault with cdef classes anyway, so decent nonechecking > should be implemented at some point, and then memoryviews would use the same > mechanisms. Java has decent null-checking... > The problem with none checking is that it has to occur at every point. With initialized slices the control flow knows when the slices are initialized, or when they might not be (and it can raise a compile-time or runtime error, instead of a segfault if you're lucky). I'm fine with implementing the behaviour, I just always left it at the bottom of my todo list. > Dag Sverre > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From d.s.seljebotn at astro.uio.no Fri Feb 3 18:53:02 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 03 Feb 2012 18:53:02 +0100 Subject: [Cython] memoryview slices can't be None? In-Reply-To: References: <4F2A7F41.6010303@astro.uio.no> <4F2B0247.8040008@astro.uio.no> Message-ID: <4F2C1EFE.4050904@astro.uio.no> On 02/03/2012 12:09 AM, mark florisson wrote: > On 2 February 2012 21:38, Dag Sverre Seljebotn > wrote: >> On 02/02/2012 10:16 PM, mark florisson wrote: >>> >>> On 2 February 2012 12:19, Dag Sverre Seljebotn >>> wrote: >>>> >>>> I just realized that >>>> >>>> cdef int[:] a = None >>>> >>>> raises an exception; even though I'd argue that 'a' is of the "reference" >>>> kind of type where Cython usually allow None (i.e., "cdef MyClass b = >>>> None" >>>> is allowed even if type(None) is NoneType). Is this a bug or not, and is >>>> it >>>> possible to do something about it? >>>> >>>> Dag Sverre >>>> _______________________________________________ >>>> cython-devel mailing list >>>> cython-devel at python.org >>>> http://mail.python.org/mailman/listinfo/cython-devel >>> >>> >>> Yeah I disabled that quite early. It was supposed to be working but >>> gave a lot of trouble in cases (segfaults, mainly). At the time I was >>> trying to get rid of all the segfaults and get the basic functionality >>> working, so I disabled it. Personally, I have never liked how things >> >> >> Well, you can segfault quite easily with >> >> cdef MyClass a = None >> print a.field >> >> so it doesn't make sense to slices different from cdef classes IMO. >> >> >>> can be None unchecked. I personally prefer to write >>> >>> cdef foo(obj=None): >>> cdef int[:] a >>> if obj is None: >>> obj = ... >>> a = obj >>> >>> Often you forget to write 'not None' when declaring the parameter (and >>> apparently that it only allowed for 'def' functions). >>> >>> As such, I never bothered to re-enable it. However, it does support >>> control flow with uninitialized slices, and will raise an error if it >>> is uninitialized. Do we want this behaviour (e.g. for consistency)? >> >> >> When in doubt, go for consistency. So +1 for that reason. I do believe that >> setting stuff to None is rather vital in Python. >> >> What I typically do is more like this: >> >> def f(double[:] input, double[:] out=None): >> if out is None: >> out = np.empty_like(input) >> ... >> >> Having to use another variable name is a bit of a pain. (Come on -- do you >> use "a" in real code? What do you actually call "the other obj"? I sometimes >> end up with "out_" and so on, but it creates smelly code quite quickly.) > > No, it was just a contrived example. > >> It's easy to segfault with cdef classes anyway, so decent nonechecking >> should be implemented at some point, and then memoryviews would use the same >> mechanisms. Java has decent null-checking... >> > > The problem with none checking is that it has to occur at every point. Well, using control flow analysis etc. it doesn't really. E.g., for i in range(a.shape[0]): print i a[i] *= 3 can be unrolled and none-checks inserted as print 0 if a is None: raise .... a[0] *= 3 for i in range(1, a.shape[0]): print i a[i] *= 3 # no need for none-check It's very similar to what you'd want to do to pull boundschecking out of the loop... > With initialized slices the control flow knows when the slices are > initialized, or when they might not be (and it can raise a > compile-time or runtime error, instead of a segfault if you're lucky). > I'm fine with implementing the behaviour, I just always left it at the > bottom of my todo list. Wasn't saying you should do it, just checking. I'm still not sure about this. I think what I'd really like is a) Stop cdef classes from being None as well b) Sort-of deprecate cdef in favor of cast/assertion type statements that help the type inferences: def f(arr): if arr is None: arr = ... arr = int[:](arr) # equivalent to "cdef int[:] arr = arr", but # acts as statement, with a specific point # for the none-check ... or even: def f(arr): if arr is None: return 'foo' else: arr = int[:](arr) # takes effect *here*, does none-check ... # arr still typed as int[:] here If we can make this work well enough with control flow analysis I'd never cdef declare local vars again :-) Dag From d.s.seljebotn at astro.uio.no Fri Feb 3 18:59:25 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 03 Feb 2012 18:59:25 +0100 Subject: [Cython] memoryview slices can't be None? In-Reply-To: <4F2C1EFE.4050904@astro.uio.no> References: <4F2A7F41.6010303@astro.uio.no> <4F2B0247.8040008@astro.uio.no> <4F2C1EFE.4050904@astro.uio.no> Message-ID: <4F2C207D.3050100@astro.uio.no> On 02/03/2012 06:53 PM, Dag Sverre Seljebotn wrote: > On 02/03/2012 12:09 AM, mark florisson wrote: >> On 2 February 2012 21:38, Dag Sverre Seljebotn >> wrote: >>> On 02/02/2012 10:16 PM, mark florisson wrote: >>>> >>>> On 2 February 2012 12:19, Dag Sverre Seljebotn >>>> wrote: >>>>> >>>>> I just realized that >>>>> >>>>> cdef int[:] a = None >>>>> >>>>> raises an exception; even though I'd argue that 'a' is of the >>>>> "reference" >>>>> kind of type where Cython usually allow None (i.e., "cdef MyClass b = >>>>> None" >>>>> is allowed even if type(None) is NoneType). Is this a bug or not, >>>>> and is >>>>> it >>>>> possible to do something about it? >>>>> >>>>> Dag Sverre >>>>> _______________________________________________ >>>>> cython-devel mailing list >>>>> cython-devel at python.org >>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>> >>>> >>>> Yeah I disabled that quite early. It was supposed to be working but >>>> gave a lot of trouble in cases (segfaults, mainly). At the time I was >>>> trying to get rid of all the segfaults and get the basic functionality >>>> working, so I disabled it. Personally, I have never liked how things >>> >>> >>> Well, you can segfault quite easily with >>> >>> cdef MyClass a = None >>> print a.field >>> >>> so it doesn't make sense to slices different from cdef classes IMO. >>> >>> >>>> can be None unchecked. I personally prefer to write >>>> >>>> cdef foo(obj=None): >>>> cdef int[:] a >>>> if obj is None: >>>> obj = ... >>>> a = obj >>>> >>>> Often you forget to write 'not None' when declaring the parameter (and >>>> apparently that it only allowed for 'def' functions). >>>> >>>> As such, I never bothered to re-enable it. However, it does support >>>> control flow with uninitialized slices, and will raise an error if it >>>> is uninitialized. Do we want this behaviour (e.g. for consistency)? >>> >>> >>> When in doubt, go for consistency. So +1 for that reason. I do >>> believe that >>> setting stuff to None is rather vital in Python. >>> >>> What I typically do is more like this: >>> >>> def f(double[:] input, double[:] out=None): >>> if out is None: >>> out = np.empty_like(input) >>> ... >>> >>> Having to use another variable name is a bit of a pain. (Come on -- >>> do you >>> use "a" in real code? What do you actually call "the other obj"? I >>> sometimes >>> end up with "out_" and so on, but it creates smelly code quite quickly.) >> >> No, it was just a contrived example. >> >>> It's easy to segfault with cdef classes anyway, so decent nonechecking >>> should be implemented at some point, and then memoryviews would use >>> the same >>> mechanisms. Java has decent null-checking... >>> >> >> The problem with none checking is that it has to occur at every point. > > Well, using control flow analysis etc. it doesn't really. E.g., > > for i in range(a.shape[0]): > print i > a[i] *= 3 > > can be unrolled and none-checks inserted as > > print 0 > if a is None: raise .... > a[0] *= 3 > for i in range(1, a.shape[0]): > print i > a[i] *= 3 # no need for none-check > > It's very similar to what you'd want to do to pull boundschecking out of > the loop... > >> With initialized slices the control flow knows when the slices are >> initialized, or when they might not be (and it can raise a >> compile-time or runtime error, instead of a segfault if you're lucky). >> I'm fine with implementing the behaviour, I just always left it at the >> bottom of my todo list. > > Wasn't saying you should do it, just checking. > > I'm still not sure about this. I think what I'd really like is > > a) Stop cdef classes from being None as well I'm sorry, this was out of place and hardly constructive. I beg you all to ignore the above sentence. The below proposal still works for slices only though, which we can still define the semantics for. I rather like it. Is type-inference + control flow analysis powerful enough to pull this off with some minor changes? > > b) Sort-of deprecate cdef in favor of cast/assertion type statements > that help the type inferences: > > def f(arr): > if arr is None: > arr = ... > arr = int[:](arr) # equivalent to "cdef int[:] arr = arr", but > # acts as statement, with a specific point > # for the none-check > ... > > or even: > > def f(arr): > if arr is None: > return 'foo' > else: > arr = int[:](arr) # takes effect *here*, does none-check > ... > # arr still typed as int[:] here > > If we can make this work well enough with control flow analysis I'd > never cdef declare local vars again :-) Dag From markflorisson88 at gmail.com Fri Feb 3 19:06:14 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 3 Feb 2012 18:06:14 +0000 Subject: [Cython] memoryview slices can't be None? In-Reply-To: <4F2C1EFE.4050904@astro.uio.no> References: <4F2A7F41.6010303@astro.uio.no> <4F2B0247.8040008@astro.uio.no> <4F2C1EFE.4050904@astro.uio.no> Message-ID: On 3 February 2012 17:53, Dag Sverre Seljebotn wrote: > On 02/03/2012 12:09 AM, mark florisson wrote: >> >> On 2 February 2012 21:38, Dag Sverre Seljebotn >> ?wrote: >>> >>> On 02/02/2012 10:16 PM, mark florisson wrote: >>>> >>>> >>>> On 2 February 2012 12:19, Dag Sverre Seljebotn >>>> ? ?wrote: >>>>> >>>>> >>>>> I just realized that >>>>> >>>>> cdef int[:] a = None >>>>> >>>>> raises an exception; even though I'd argue that 'a' is of the >>>>> "reference" >>>>> kind of type where Cython usually allow None (i.e., "cdef MyClass b = >>>>> None" >>>>> is allowed even if type(None) is NoneType). Is this a bug or not, and >>>>> is >>>>> it >>>>> possible to do something about it? >>>>> >>>>> Dag Sverre >>>>> _______________________________________________ >>>>> cython-devel mailing list >>>>> cython-devel at python.org >>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>> >>>> >>>> >>>> Yeah I disabled that quite early. It was supposed to be working but >>>> gave a lot of trouble in cases (segfaults, mainly). At the time I was >>>> trying to get rid of all the segfaults and get the basic functionality >>>> working, so I disabled it. Personally, I have never liked how things >>> >>> >>> >>> Well, you can segfault quite easily with >>> >>> cdef MyClass a = None >>> print a.field >>> >>> so it doesn't make sense to slices different from cdef classes IMO. >>> >>> >>>> can be None unchecked. I personally prefer to write >>>> >>>> cdef foo(obj=None): >>>> ? ? cdef int[:] a >>>> ? ? if obj is None: >>>> ? ? ? ? obj = ... >>>> ? ? a = obj >>>> >>>> Often you forget to write 'not None' when declaring the parameter (and >>>> apparently that it only allowed for 'def' functions). >>>> >>>> As such, I never bothered to re-enable it. However, it does support >>>> control flow with uninitialized slices, and will raise an error if it >>>> is uninitialized. Do we want this behaviour (e.g. for consistency)? >>> >>> >>> >>> When in doubt, go for consistency. So +1 for that reason. I do believe >>> that >>> setting stuff to None is rather vital in Python. >>> >>> What I typically do is more like this: >>> >>> def f(double[:] input, double[:] out=None): >>> ? ?if out is None: >>> ? ? ? ?out = np.empty_like(input) >>> ? ?... >>> >>> Having to use another variable name is a bit of a pain. (Come on -- do >>> you >>> use "a" in real code? What do you actually call "the other obj"? I >>> sometimes >>> end up with "out_" and so on, but it creates smelly code quite quickly.) >> >> >> No, it was just a contrived example. >> >>> It's easy to segfault with cdef classes anyway, so decent nonechecking >>> should be implemented at some point, and then memoryviews would use the >>> same >>> mechanisms. Java has decent null-checking... >>> >> >> The problem with none checking is that it has to occur at every point. > > > Well, using control flow analysis etc. it doesn't really. E.g., > > for i in range(a.shape[0]): > ? ?print i > ? ?a[i] *= 3 > > can be unrolled and none-checks inserted as > > print 0 > if a is None: raise .... > a[0] *= 3 > for i in range(1, a.shape[0]): > ? ?print i > ? ?a[i] *= 3 # no need for none-check > > It's very similar to what you'd want to do to pull boundschecking out of the > loop... > Oh, definitely. Both optimizations may not always be possible to do, though. The optimization (for boundschecking) is easier for prange() than range(), as you can immediately raise an exception as the exceptional condition may be issued at any iteration. What do you do with bounds checking when some accesses are in-bound, and some are out-of-bound? Do you immediately raise the exception? Are we fine with aborting (like Fortran compilers do when you ask them for bounds checking)? And how do you detect that the code doesn't already raise an exception or break out of the loop itself to prevent the out-of-bound access? (Unless no exceptions are propagating and no break/return is used, but exceptions are so very common). >> With initialized slices the control flow knows when the slices are >> initialized, or when they might not be (and it can raise a >> compile-time or runtime error, instead of a segfault if you're lucky). >> I'm fine with implementing the behaviour, I just always left it at the >> bottom of my todo list. > > > Wasn't saying you should do it, just checking. > > I'm still not sure about this. I think what I'd really like is > > ?a) Stop cdef classes from being None as well > > ?b) Sort-of deprecate cdef in favor of cast/assertion type statements that > help the type inferences: > > def f(arr): > ? ?if arr is None: > ? ? ? ?arr = ... > ? ?arr = int[:](arr) # equivalent to "cdef int[:] arr = arr", but > ? ? ? ? ? ? ? ? ? ? ?# acts as statement, with a specific point > ? ? ? ? ? ? ? ? ? ? ?# for the none-check > ? ?... > > or even: > > def f(arr): > ? ?if arr is None: > ? ? ? ?return 'foo' > ? ?else: > ? ? ? ?arr = int[:](arr) # takes effect *here*, does none-check > ? ? ? ?... > ? ?# arr still typed as int[:] here > > If we can make this work well enough with control flow analysis I'd never > cdef declare local vars again :-) Hm, what about the following? def f(arr): if arr is None: return 'foo' cdef int[:] arr # arr may not be None > Dag > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Fri Feb 3 19:07:09 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 3 Feb 2012 18:07:09 +0000 Subject: [Cython] memoryview slices can't be None? In-Reply-To: References: <4F2A7F41.6010303@astro.uio.no> <4F2B0247.8040008@astro.uio.no> <4F2C1EFE.4050904@astro.uio.no> Message-ID: On 3 February 2012 18:06, mark florisson wrote: > On 3 February 2012 17:53, Dag Sverre Seljebotn > wrote: >> On 02/03/2012 12:09 AM, mark florisson wrote: >>> >>> On 2 February 2012 21:38, Dag Sverre Seljebotn >>> ?wrote: >>>> >>>> On 02/02/2012 10:16 PM, mark florisson wrote: >>>>> >>>>> >>>>> On 2 February 2012 12:19, Dag Sverre Seljebotn >>>>> ? ?wrote: >>>>>> >>>>>> >>>>>> I just realized that >>>>>> >>>>>> cdef int[:] a = None >>>>>> >>>>>> raises an exception; even though I'd argue that 'a' is of the >>>>>> "reference" >>>>>> kind of type where Cython usually allow None (i.e., "cdef MyClass b = >>>>>> None" >>>>>> is allowed even if type(None) is NoneType). Is this a bug or not, and >>>>>> is >>>>>> it >>>>>> possible to do something about it? >>>>>> >>>>>> Dag Sverre >>>>>> _______________________________________________ >>>>>> cython-devel mailing list >>>>>> cython-devel at python.org >>>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>>> >>>>> >>>>> >>>>> Yeah I disabled that quite early. It was supposed to be working but >>>>> gave a lot of trouble in cases (segfaults, mainly). At the time I was >>>>> trying to get rid of all the segfaults and get the basic functionality >>>>> working, so I disabled it. Personally, I have never liked how things >>>> >>>> >>>> >>>> Well, you can segfault quite easily with >>>> >>>> cdef MyClass a = None >>>> print a.field >>>> >>>> so it doesn't make sense to slices different from cdef classes IMO. >>>> >>>> >>>>> can be None unchecked. I personally prefer to write >>>>> >>>>> cdef foo(obj=None): >>>>> ? ? cdef int[:] a >>>>> ? ? if obj is None: >>>>> ? ? ? ? obj = ... >>>>> ? ? a = obj >>>>> >>>>> Often you forget to write 'not None' when declaring the parameter (and >>>>> apparently that it only allowed for 'def' functions). >>>>> >>>>> As such, I never bothered to re-enable it. However, it does support >>>>> control flow with uninitialized slices, and will raise an error if it >>>>> is uninitialized. Do we want this behaviour (e.g. for consistency)? >>>> >>>> >>>> >>>> When in doubt, go for consistency. So +1 for that reason. I do believe >>>> that >>>> setting stuff to None is rather vital in Python. >>>> >>>> What I typically do is more like this: >>>> >>>> def f(double[:] input, double[:] out=None): >>>> ? ?if out is None: >>>> ? ? ? ?out = np.empty_like(input) >>>> ? ?... >>>> >>>> Having to use another variable name is a bit of a pain. (Come on -- do >>>> you >>>> use "a" in real code? What do you actually call "the other obj"? I >>>> sometimes >>>> end up with "out_" and so on, but it creates smelly code quite quickly.) >>> >>> >>> No, it was just a contrived example. >>> >>>> It's easy to segfault with cdef classes anyway, so decent nonechecking >>>> should be implemented at some point, and then memoryviews would use the >>>> same >>>> mechanisms. Java has decent null-checking... >>>> >>> >>> The problem with none checking is that it has to occur at every point. >> >> >> Well, using control flow analysis etc. it doesn't really. E.g., >> >> for i in range(a.shape[0]): >> ? ?print i >> ? ?a[i] *= 3 >> >> can be unrolled and none-checks inserted as >> >> print 0 >> if a is None: raise .... >> a[0] *= 3 >> for i in range(1, a.shape[0]): >> ? ?print i >> ? ?a[i] *= 3 # no need for none-check >> >> It's very similar to what you'd want to do to pull boundschecking out of the >> loop... >> > > Oh, definitely. Both optimizations may not always be possible to do, > though. The optimization (for boundschecking) is easier for prange() > than range(), as you can immediately raise an exception as the > exceptional condition may be issued at any iteration. ?What do you do > with bounds checking when some accesses are in-bound, and some are > out-of-bound? Do you immediately raise the exception? Are we fine with > aborting (like Fortran compilers do when you ask them for bounds > checking)? And how do you detect that the code doesn't already raise > an exception or break out of the loop itself to prevent the > out-of-bound access? (Unless no exceptions are propagating and no > break/return is used, but exceptions are so very common). > >>> With initialized slices the control flow knows when the slices are >>> initialized, or when they might not be (and it can raise a >>> compile-time or runtime error, instead of a segfault if you're lucky). >>> I'm fine with implementing the behaviour, I just always left it at the >>> bottom of my todo list. >> >> >> Wasn't saying you should do it, just checking. >> >> I'm still not sure about this. I think what I'd really like is >> >> ?a) Stop cdef classes from being None as well >> >> ?b) Sort-of deprecate cdef in favor of cast/assertion type statements that >> help the type inferences: >> >> def f(arr): >> ? ?if arr is None: >> ? ? ? ?arr = ... >> ? ?arr = int[:](arr) # equivalent to "cdef int[:] arr = arr", but >> ? ? ? ? ? ? ? ? ? ? ?# acts as statement, with a specific point >> ? ? ? ? ? ? ? ? ? ? ?# for the none-check >> ? ?... >> >> or even: >> >> def f(arr): >> ? ?if arr is None: >> ? ? ? ?return 'foo' >> ? ?else: >> ? ? ? ?arr = int[:](arr) # takes effect *here*, does none-check >> ? ? ? ?... >> ? ?# arr still typed as int[:] here >> >> If we can make this work well enough with control flow analysis I'd never >> cdef declare local vars again :-) > > Hm, what about the following? > > def f(arr): > ? ?if arr is None: > ? ? ? ?return 'foo' > > ? ?cdef int[:] arr # arr may not be None The above would work in general, until the declaration is lexically encountered, the object is typed as object. >> Dag >> >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Fri Feb 3 19:07:41 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 3 Feb 2012 18:07:41 +0000 Subject: [Cython] memoryview slices can't be None? In-Reply-To: References: <4F2A7F41.6010303@astro.uio.no> <4F2B0247.8040008@astro.uio.no> <4F2C1EFE.4050904@astro.uio.no> Message-ID: On 3 February 2012 18:07, mark florisson wrote: > On 3 February 2012 18:06, mark florisson wrote: >> On 3 February 2012 17:53, Dag Sverre Seljebotn >> wrote: >>> On 02/03/2012 12:09 AM, mark florisson wrote: >>>> >>>> On 2 February 2012 21:38, Dag Sverre Seljebotn >>>> ?wrote: >>>>> >>>>> On 02/02/2012 10:16 PM, mark florisson wrote: >>>>>> >>>>>> >>>>>> On 2 February 2012 12:19, Dag Sverre Seljebotn >>>>>> ? ?wrote: >>>>>>> >>>>>>> >>>>>>> I just realized that >>>>>>> >>>>>>> cdef int[:] a = None >>>>>>> >>>>>>> raises an exception; even though I'd argue that 'a' is of the >>>>>>> "reference" >>>>>>> kind of type where Cython usually allow None (i.e., "cdef MyClass b = >>>>>>> None" >>>>>>> is allowed even if type(None) is NoneType). Is this a bug or not, and >>>>>>> is >>>>>>> it >>>>>>> possible to do something about it? >>>>>>> >>>>>>> Dag Sverre >>>>>>> _______________________________________________ >>>>>>> cython-devel mailing list >>>>>>> cython-devel at python.org >>>>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>>>> >>>>>> >>>>>> >>>>>> Yeah I disabled that quite early. It was supposed to be working but >>>>>> gave a lot of trouble in cases (segfaults, mainly). At the time I was >>>>>> trying to get rid of all the segfaults and get the basic functionality >>>>>> working, so I disabled it. Personally, I have never liked how things >>>>> >>>>> >>>>> >>>>> Well, you can segfault quite easily with >>>>> >>>>> cdef MyClass a = None >>>>> print a.field >>>>> >>>>> so it doesn't make sense to slices different from cdef classes IMO. >>>>> >>>>> >>>>>> can be None unchecked. I personally prefer to write >>>>>> >>>>>> cdef foo(obj=None): >>>>>> ? ? cdef int[:] a >>>>>> ? ? if obj is None: >>>>>> ? ? ? ? obj = ... >>>>>> ? ? a = obj >>>>>> >>>>>> Often you forget to write 'not None' when declaring the parameter (and >>>>>> apparently that it only allowed for 'def' functions). >>>>>> >>>>>> As such, I never bothered to re-enable it. However, it does support >>>>>> control flow with uninitialized slices, and will raise an error if it >>>>>> is uninitialized. Do we want this behaviour (e.g. for consistency)? >>>>> >>>>> >>>>> >>>>> When in doubt, go for consistency. So +1 for that reason. I do believe >>>>> that >>>>> setting stuff to None is rather vital in Python. >>>>> >>>>> What I typically do is more like this: >>>>> >>>>> def f(double[:] input, double[:] out=None): >>>>> ? ?if out is None: >>>>> ? ? ? ?out = np.empty_like(input) >>>>> ? ?... >>>>> >>>>> Having to use another variable name is a bit of a pain. (Come on -- do >>>>> you >>>>> use "a" in real code? What do you actually call "the other obj"? I >>>>> sometimes >>>>> end up with "out_" and so on, but it creates smelly code quite quickly.) >>>> >>>> >>>> No, it was just a contrived example. >>>> >>>>> It's easy to segfault with cdef classes anyway, so decent nonechecking >>>>> should be implemented at some point, and then memoryviews would use the >>>>> same >>>>> mechanisms. Java has decent null-checking... >>>>> >>>> >>>> The problem with none checking is that it has to occur at every point. >>> >>> >>> Well, using control flow analysis etc. it doesn't really. E.g., >>> >>> for i in range(a.shape[0]): >>> ? ?print i >>> ? ?a[i] *= 3 >>> >>> can be unrolled and none-checks inserted as >>> >>> print 0 >>> if a is None: raise .... >>> a[0] *= 3 >>> for i in range(1, a.shape[0]): >>> ? ?print i >>> ? ?a[i] *= 3 # no need for none-check >>> >>> It's very similar to what you'd want to do to pull boundschecking out of the >>> loop... >>> >> >> Oh, definitely. Both optimizations may not always be possible to do, >> though. The optimization (for boundschecking) is easier for prange() >> than range(), as you can immediately raise an exception as the >> exceptional condition may be issued at any iteration. ?What do you do >> with bounds checking when some accesses are in-bound, and some are >> out-of-bound? Do you immediately raise the exception? Are we fine with >> aborting (like Fortran compilers do when you ask them for bounds >> checking)? And how do you detect that the code doesn't already raise >> an exception or break out of the loop itself to prevent the >> out-of-bound access? (Unless no exceptions are propagating and no >> break/return is used, but exceptions are so very common). >> >>>> With initialized slices the control flow knows when the slices are >>>> initialized, or when they might not be (and it can raise a >>>> compile-time or runtime error, instead of a segfault if you're lucky). >>>> I'm fine with implementing the behaviour, I just always left it at the >>>> bottom of my todo list. >>> >>> >>> Wasn't saying you should do it, just checking. >>> >>> I'm still not sure about this. I think what I'd really like is >>> >>> ?a) Stop cdef classes from being None as well >>> >>> ?b) Sort-of deprecate cdef in favor of cast/assertion type statements that >>> help the type inferences: >>> >>> def f(arr): >>> ? ?if arr is None: >>> ? ? ? ?arr = ... >>> ? ?arr = int[:](arr) # equivalent to "cdef int[:] arr = arr", but >>> ? ? ? ? ? ? ? ? ? ? ?# acts as statement, with a specific point >>> ? ? ? ? ? ? ? ? ? ? ?# for the none-check >>> ? ?... >>> >>> or even: >>> >>> def f(arr): >>> ? ?if arr is None: >>> ? ? ? ?return 'foo' >>> ? ?else: >>> ? ? ? ?arr = int[:](arr) # takes effect *here*, does none-check >>> ? ? ? ?... >>> ? ?# arr still typed as int[:] here >>> >>> If we can make this work well enough with control flow analysis I'd never >>> cdef declare local vars again :-) >> >> Hm, what about the following? >> >> def f(arr): >> ? ?if arr is None: >> ? ? ? ?return 'foo' >> >> ? ?cdef int[:] arr # arr may not be None > > The above would work in general, until the declaration is lexically > encountered, the object is typed as object. Hurr, the *variable* is typed as object :) >>> Dag >>> >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel From d.s.seljebotn at astro.uio.no Fri Feb 3 19:15:07 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 03 Feb 2012 19:15:07 +0100 Subject: [Cython] memoryview slices can't be None? In-Reply-To: References: <4F2A7F41.6010303@astro.uio.no> <4F2B0247.8040008@astro.uio.no> <4F2C1EFE.4050904@astro.uio.no> Message-ID: <4F2C242B.9010309@astro.uio.no> On 02/03/2012 07:07 PM, mark florisson wrote: > On 3 February 2012 18:06, mark florisson wrote: >> On 3 February 2012 17:53, Dag Sverre Seljebotn >> wrote: >>> On 02/03/2012 12:09 AM, mark florisson wrote: >>>> >>>> On 2 February 2012 21:38, Dag Sverre Seljebotn >>>> wrote: >>>>> >>>>> On 02/02/2012 10:16 PM, mark florisson wrote: >>>>>> >>>>>> >>>>>> On 2 February 2012 12:19, Dag Sverre Seljebotn >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> I just realized that >>>>>>> >>>>>>> cdef int[:] a = None >>>>>>> >>>>>>> raises an exception; even though I'd argue that 'a' is of the >>>>>>> "reference" >>>>>>> kind of type where Cython usually allow None (i.e., "cdef MyClass b = >>>>>>> None" >>>>>>> is allowed even if type(None) is NoneType). Is this a bug or not, and >>>>>>> is >>>>>>> it >>>>>>> possible to do something about it? >>>>>>> >>>>>>> Dag Sverre >>>>>>> _______________________________________________ >>>>>>> cython-devel mailing list >>>>>>> cython-devel at python.org >>>>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>>>> >>>>>> >>>>>> >>>>>> Yeah I disabled that quite early. It was supposed to be working but >>>>>> gave a lot of trouble in cases (segfaults, mainly). At the time I was >>>>>> trying to get rid of all the segfaults and get the basic functionality >>>>>> working, so I disabled it. Personally, I have never liked how things >>>>> >>>>> >>>>> >>>>> Well, you can segfault quite easily with >>>>> >>>>> cdef MyClass a = None >>>>> print a.field >>>>> >>>>> so it doesn't make sense to slices different from cdef classes IMO. >>>>> >>>>> >>>>>> can be None unchecked. I personally prefer to write >>>>>> >>>>>> cdef foo(obj=None): >>>>>> cdef int[:] a >>>>>> if obj is None: >>>>>> obj = ... >>>>>> a = obj >>>>>> >>>>>> Often you forget to write 'not None' when declaring the parameter (and >>>>>> apparently that it only allowed for 'def' functions). >>>>>> >>>>>> As such, I never bothered to re-enable it. However, it does support >>>>>> control flow with uninitialized slices, and will raise an error if it >>>>>> is uninitialized. Do we want this behaviour (e.g. for consistency)? >>>>> >>>>> >>>>> >>>>> When in doubt, go for consistency. So +1 for that reason. I do believe >>>>> that >>>>> setting stuff to None is rather vital in Python. >>>>> >>>>> What I typically do is more like this: >>>>> >>>>> def f(double[:] input, double[:] out=None): >>>>> if out is None: >>>>> out = np.empty_like(input) >>>>> ... >>>>> >>>>> Having to use another variable name is a bit of a pain. (Come on -- do >>>>> you >>>>> use "a" in real code? What do you actually call "the other obj"? I >>>>> sometimes >>>>> end up with "out_" and so on, but it creates smelly code quite quickly.) >>>> >>>> >>>> No, it was just a contrived example. >>>> >>>>> It's easy to segfault with cdef classes anyway, so decent nonechecking >>>>> should be implemented at some point, and then memoryviews would use the >>>>> same >>>>> mechanisms. Java has decent null-checking... >>>>> >>>> >>>> The problem with none checking is that it has to occur at every point. >>> >>> >>> Well, using control flow analysis etc. it doesn't really. E.g., >>> >>> for i in range(a.shape[0]): >>> print i >>> a[i] *= 3 >>> >>> can be unrolled and none-checks inserted as >>> >>> print 0 >>> if a is None: raise .... >>> a[0] *= 3 >>> for i in range(1, a.shape[0]): >>> print i >>> a[i] *= 3 # no need for none-check >>> >>> It's very similar to what you'd want to do to pull boundschecking out of the >>> loop... >>> >> >> Oh, definitely. Both optimizations may not always be possible to do, >> though. The optimization (for boundschecking) is easier for prange() >> than range(), as you can immediately raise an exception as the >> exceptional condition may be issued at any iteration. What do you do >> with bounds checking when some accesses are in-bound, and some are >> out-of-bound? Do you immediately raise the exception? Are we fine with >> aborting (like Fortran compilers do when you ask them for bounds >> checking)? And how do you detect that the code doesn't already raise >> an exception or break out of the loop itself to prevent the >> out-of-bound access? (Unless no exceptions are propagating and no >> break/return is used, but exceptions are so very common). >> >>>> With initialized slices the control flow knows when the slices are >>>> initialized, or when they might not be (and it can raise a >>>> compile-time or runtime error, instead of a segfault if you're lucky). >>>> I'm fine with implementing the behaviour, I just always left it at the >>>> bottom of my todo list. >>> >>> >>> Wasn't saying you should do it, just checking. >>> >>> I'm still not sure about this. I think what I'd really like is >>> >>> a) Stop cdef classes from being None as well >>> >>> b) Sort-of deprecate cdef in favor of cast/assertion type statements that >>> help the type inferences: >>> >>> def f(arr): >>> if arr is None: >>> arr = ... >>> arr = int[:](arr) # equivalent to "cdef int[:] arr = arr", but >>> # acts as statement, with a specific point >>> # for the none-check >>> ... >>> >>> or even: >>> >>> def f(arr): >>> if arr is None: >>> return 'foo' >>> else: >>> arr = int[:](arr) # takes effect *here*, does none-check >>> ... >>> # arr still typed as int[:] here >>> >>> If we can make this work well enough with control flow analysis I'd never >>> cdef declare local vars again :-) >> >> Hm, what about the following? >> >> def f(arr): >> if arr is None: >> return 'foo' >> >> cdef int[:] arr # arr may not be None > > The above would work in general, until the declaration is lexically > encountered, the object is typed as object. This was actually going to be my first proposal :-) That would finally define how "cdef" inside of if-statements etc. behave too (simply use control flow analysis and treat it like a statement). But I like int[:] as a way of making it pure Python syntax compatible as well. Perhaps the two are orthogonal -- a) make variable declaration a statement, b) make cython.int[:](x) do, essentially, a cdef declaration, for Python compatability. Dag From markflorisson88 at gmail.com Fri Feb 3 19:26:48 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 3 Feb 2012 18:26:48 +0000 Subject: [Cython] memoryview slices can't be None? In-Reply-To: <4F2C242B.9010309@astro.uio.no> References: <4F2A7F41.6010303@astro.uio.no> <4F2B0247.8040008@astro.uio.no> <4F2C1EFE.4050904@astro.uio.no> <4F2C242B.9010309@astro.uio.no> Message-ID: On 3 February 2012 18:15, Dag Sverre Seljebotn wrote: > On 02/03/2012 07:07 PM, mark florisson wrote: >> >> On 3 February 2012 18:06, mark florisson >> ?wrote: >>> >>> On 3 February 2012 17:53, Dag Sverre Seljebotn >>> ?wrote: >>>> >>>> On 02/03/2012 12:09 AM, mark florisson wrote: >>>>> >>>>> >>>>> On 2 February 2012 21:38, Dag Sverre Seljebotn >>>>> ? ?wrote: >>>>>> >>>>>> >>>>>> On 02/02/2012 10:16 PM, mark florisson wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 2 February 2012 12:19, Dag Sverre Seljebotn >>>>>>> ? ? ?wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I just realized that >>>>>>>> >>>>>>>> cdef int[:] a = None >>>>>>>> >>>>>>>> raises an exception; even though I'd argue that 'a' is of the >>>>>>>> "reference" >>>>>>>> kind of type where Cython usually allow None (i.e., "cdef MyClass b >>>>>>>> = >>>>>>>> None" >>>>>>>> is allowed even if type(None) is NoneType). Is this a bug or not, >>>>>>>> and >>>>>>>> is >>>>>>>> it >>>>>>>> possible to do something about it? >>>>>>>> >>>>>>>> Dag Sverre >>>>>>>> _______________________________________________ >>>>>>>> cython-devel mailing list >>>>>>>> cython-devel at python.org >>>>>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Yeah I disabled that quite early. It was supposed to be working but >>>>>>> gave a lot of trouble in cases (segfaults, mainly). At the time I was >>>>>>> trying to get rid of all the segfaults and get the basic >>>>>>> functionality >>>>>>> working, so I disabled it. Personally, I have never liked how things >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Well, you can segfault quite easily with >>>>>> >>>>>> cdef MyClass a = None >>>>>> print a.field >>>>>> >>>>>> so it doesn't make sense to slices different from cdef classes IMO. >>>>>> >>>>>> >>>>>>> can be None unchecked. I personally prefer to write >>>>>>> >>>>>>> cdef foo(obj=None): >>>>>>> ? ? cdef int[:] a >>>>>>> ? ? if obj is None: >>>>>>> ? ? ? ? obj = ... >>>>>>> ? ? a = obj >>>>>>> >>>>>>> Often you forget to write 'not None' when declaring the parameter >>>>>>> (and >>>>>>> apparently that it only allowed for 'def' functions). >>>>>>> >>>>>>> As such, I never bothered to re-enable it. However, it does support >>>>>>> control flow with uninitialized slices, and will raise an error if it >>>>>>> is uninitialized. Do we want this behaviour (e.g. for consistency)? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> When in doubt, go for consistency. So +1 for that reason. I do believe >>>>>> that >>>>>> setting stuff to None is rather vital in Python. >>>>>> >>>>>> What I typically do is more like this: >>>>>> >>>>>> def f(double[:] input, double[:] out=None): >>>>>> ? ?if out is None: >>>>>> ? ? ? ?out = np.empty_like(input) >>>>>> ? ?... >>>>>> >>>>>> Having to use another variable name is a bit of a pain. (Come on -- do >>>>>> you >>>>>> use "a" in real code? What do you actually call "the other obj"? I >>>>>> sometimes >>>>>> end up with "out_" and so on, but it creates smelly code quite >>>>>> quickly.) >>>>> >>>>> >>>>> >>>>> No, it was just a contrived example. >>>>> >>>>>> It's easy to segfault with cdef classes anyway, so decent nonechecking >>>>>> should be implemented at some point, and then memoryviews would use >>>>>> the >>>>>> same >>>>>> mechanisms. Java has decent null-checking... >>>>>> >>>>> >>>>> The problem with none checking is that it has to occur at every point. >>>> >>>> >>>> >>>> Well, using control flow analysis etc. it doesn't really. E.g., >>>> >>>> for i in range(a.shape[0]): >>>> ? ?print i >>>> ? ?a[i] *= 3 >>>> >>>> can be unrolled and none-checks inserted as >>>> >>>> print 0 >>>> if a is None: raise .... >>>> a[0] *= 3 >>>> for i in range(1, a.shape[0]): >>>> ? ?print i >>>> ? ?a[i] *= 3 # no need for none-check >>>> >>>> It's very similar to what you'd want to do to pull boundschecking out of >>>> the >>>> loop... >>>> >>> >>> Oh, definitely. Both optimizations may not always be possible to do, >>> though. The optimization (for boundschecking) is easier for prange() >>> than range(), as you can immediately raise an exception as the >>> exceptional condition may be issued at any iteration. ?What do you do >>> with bounds checking when some accesses are in-bound, and some are >>> out-of-bound? Do you immediately raise the exception? Are we fine with >>> aborting (like Fortran compilers do when you ask them for bounds >>> checking)? And how do you detect that the code doesn't already raise >>> an exception or break out of the loop itself to prevent the >>> out-of-bound access? (Unless no exceptions are propagating and no >>> break/return is used, but exceptions are so very common). >>> >>>>> With initialized slices the control flow knows when the slices are >>>>> initialized, or when they might not be (and it can raise a >>>>> compile-time or runtime error, instead of a segfault if you're lucky). >>>>> I'm fine with implementing the behaviour, I just always left it at the >>>>> bottom of my todo list. >>>> >>>> >>>> >>>> Wasn't saying you should do it, just checking. >>>> >>>> I'm still not sure about this. I think what I'd really like is >>>> >>>> ?a) Stop cdef classes from being None as well >>>> >>>> ?b) Sort-of deprecate cdef in favor of cast/assertion type statements >>>> that >>>> help the type inferences: >>>> >>>> def f(arr): >>>> ? ?if arr is None: >>>> ? ? ? ?arr = ... >>>> ? ?arr = int[:](arr) # equivalent to "cdef int[:] arr = arr", but >>>> ? ? ? ? ? ? ? ? ? ? ?# acts as statement, with a specific point >>>> ? ? ? ? ? ? ? ? ? ? ?# for the none-check >>>> ? ?... >>>> >>>> or even: >>>> >>>> def f(arr): >>>> ? ?if arr is None: >>>> ? ? ? ?return 'foo' >>>> ? ?else: >>>> ? ? ? ?arr = int[:](arr) # takes effect *here*, does none-check >>>> ? ? ? ?... >>>> ? ?# arr still typed as int[:] here >>>> >>>> If we can make this work well enough with control flow analysis I'd >>>> never >>>> cdef declare local vars again :-) >>> >>> >>> Hm, what about the following? >>> >>> def f(arr): >>> ? ?if arr is None: >>> ? ? ? ?return 'foo' >>> >>> ? ?cdef int[:] arr # arr may not be None >> >> >> The above would work in general, until the declaration is lexically >> encountered, the object is typed as object. > > > This was actually going to be my first proposal :-) That would finally > define how "cdef" inside of if-statements etc. behave too (simply use > control flow analysis and treat it like a statement). Block-local declarations are definitely something we want, although I think it would require some more (non-trivial) changes to the compiler. Maybe the cleanup code from functions, as well as the temp handling etc could be re-factored to a BlockNode, that all block nodes could subclass. They'd have to instantiate new symbol table environments as well. I'm not yet entirely sure what else would be involved in the implementation of that. > But I like int[:] as a way of making it pure Python syntax compatible as > well. Perhaps the two are orthogonal -- a) make variable declaration a > statement, b) make cython.int[:](x) do, essentially, a cdef declaration, for > Python compatability. > Don't we have cython.declare() for that? e.g. arr = cython.declare(cython.int[:]) That would also be treated as a statement like normal declarations (if and when implemented). > Dag > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From vitja.makarov at gmail.com Fri Feb 3 21:27:21 2012 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Sat, 4 Feb 2012 00:27:21 +0400 Subject: [Cython] Bug in Cython producing incorrect C code In-Reply-To: References: <1327405070.15017.140661027320813@webmail.messagingengine.com> <4F1FB21A.9080407@behnel.de> <4F21A112.30803@behnel.de> <4F21A92F.6050401@behnel.de> Message-ID: 2012/1/26 mark florisson : > On 26 January 2012 19:27, Stefan Behnel wrote: >> mark florisson, 26.01.2012 20:15: >>> On 26 January 2012 18:53, Stefan Behnel wrote: >>>> mark florisson, 26.01.2012 16:20: >>>>> I think this problem can trivially be solved by creating a ProxyNode >>>>> that should never be replaced by any transform, but it's argument may >>>>> be replaced. So you wrap self.rhs in a ProxyNode and use that to >>>>> create your CloneNodes. >>>> >>>> I can't see what a ProxyNode would do that a CloneNode shouldn't do anyway. >>> >>> It wouldn't be a replacement, merely an addition (an extra indirection). >> >> What I was trying to say was that a ProxyNode would always be required by a >> CloneNode, but I don't see where a ProxyNode would be needed outside of a >> CloneNode. So it seems rather redundant and I don't know if we need a >> separate node for it. > > Yes it would be needed only for that, but I think the only real > alternative is to not use CloneNode at all, i.e. make the > transformation Dag mentioned, where you create new rhs (NameNode?) > references to the temporary result. > Now it seems to be the only case when we got problem like this. It means that clones may be safely created at very late stage. So transforming CascadeAssignment into SingleAssignments doesn't solve generic problem. I tried to implement conditional inlining the same problem may happen there (ConditionalCallNode owns arguments and replaces SimpleCallNode's args with clones). Splitting analyse_expressions() would help. On the other hand moving this optimization after OptimizeBuiltinCalls() would help too. -- vitja. From markflorisson88 at gmail.com Fri Feb 3 22:59:03 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 3 Feb 2012 21:59:03 +0000 Subject: [Cython] memoryview slices can't be None? In-Reply-To: References: <4F2A7F41.6010303@astro.uio.no> <4F2B0247.8040008@astro.uio.no> <4F2C1EFE.4050904@astro.uio.no> Message-ID: On 3 February 2012 18:06, mark florisson wrote: > On 3 February 2012 17:53, Dag Sverre Seljebotn > wrote: >> On 02/03/2012 12:09 AM, mark florisson wrote: >>> >>> On 2 February 2012 21:38, Dag Sverre Seljebotn >>> ?wrote: >>>> >>>> On 02/02/2012 10:16 PM, mark florisson wrote: >>>>> >>>>> >>>>> On 2 February 2012 12:19, Dag Sverre Seljebotn >>>>> ? ?wrote: >>>>>> >>>>>> >>>>>> I just realized that >>>>>> >>>>>> cdef int[:] a = None >>>>>> >>>>>> raises an exception; even though I'd argue that 'a' is of the >>>>>> "reference" >>>>>> kind of type where Cython usually allow None (i.e., "cdef MyClass b = >>>>>> None" >>>>>> is allowed even if type(None) is NoneType). Is this a bug or not, and >>>>>> is >>>>>> it >>>>>> possible to do something about it? >>>>>> >>>>>> Dag Sverre >>>>>> _______________________________________________ >>>>>> cython-devel mailing list >>>>>> cython-devel at python.org >>>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>>> >>>>> >>>>> >>>>> Yeah I disabled that quite early. It was supposed to be working but >>>>> gave a lot of trouble in cases (segfaults, mainly). At the time I was >>>>> trying to get rid of all the segfaults and get the basic functionality >>>>> working, so I disabled it. Personally, I have never liked how things >>>> >>>> >>>> >>>> Well, you can segfault quite easily with >>>> >>>> cdef MyClass a = None >>>> print a.field >>>> >>>> so it doesn't make sense to slices different from cdef classes IMO. >>>> >>>> >>>>> can be None unchecked. I personally prefer to write >>>>> >>>>> cdef foo(obj=None): >>>>> ? ? cdef int[:] a >>>>> ? ? if obj is None: >>>>> ? ? ? ? obj = ... >>>>> ? ? a = obj >>>>> >>>>> Often you forget to write 'not None' when declaring the parameter (and >>>>> apparently that it only allowed for 'def' functions). >>>>> >>>>> As such, I never bothered to re-enable it. However, it does support >>>>> control flow with uninitialized slices, and will raise an error if it >>>>> is uninitialized. Do we want this behaviour (e.g. for consistency)? >>>> >>>> >>>> >>>> When in doubt, go for consistency. So +1 for that reason. I do believe >>>> that >>>> setting stuff to None is rather vital in Python. >>>> >>>> What I typically do is more like this: >>>> >>>> def f(double[:] input, double[:] out=None): >>>> ? ?if out is None: >>>> ? ? ? ?out = np.empty_like(input) >>>> ? ?... >>>> >>>> Having to use another variable name is a bit of a pain. (Come on -- do >>>> you >>>> use "a" in real code? What do you actually call "the other obj"? I >>>> sometimes >>>> end up with "out_" and so on, but it creates smelly code quite quickly.) >>> >>> >>> No, it was just a contrived example. >>> >>>> It's easy to segfault with cdef classes anyway, so decent nonechecking >>>> should be implemented at some point, and then memoryviews would use the >>>> same >>>> mechanisms. Java has decent null-checking... >>>> >>> >>> The problem with none checking is that it has to occur at every point. >> >> >> Well, using control flow analysis etc. it doesn't really. E.g., >> >> for i in range(a.shape[0]): >> ? ?print i >> ? ?a[i] *= 3 >> >> can be unrolled and none-checks inserted as >> >> print 0 >> if a is None: raise .... >> a[0] *= 3 >> for i in range(1, a.shape[0]): >> ? ?print i >> ? ?a[i] *= 3 # no need for none-check >> >> It's very similar to what you'd want to do to pull boundschecking out of the >> loop... >> > > Oh, definitely. Both optimizations may not always be possible to do, > though. The optimization (for boundschecking) is easier for prange() > than range(), as you can immediately raise an exception as the > exceptional condition may be issued at any iteration. ?What do you do > with bounds checking when some accesses are in-bound, and some are > out-of-bound? Do you immediately raise the exception? Are we fine with > aborting (like Fortran compilers do when you ask them for bounds > checking)? And how do you detect that the code doesn't already raise > an exception or break out of the loop itself to prevent the > out-of-bound access? (Unless no exceptions are propagating and no > break/return is used, but exceptions are so very common). I enabled bound checking in nogil contexts: https://github.com/markflorisson88/cython/commit/73c6b0ea8e7e1c243e87b3966ade834b02664a4f . It's not optimized yet, but at least it doesn't force users to use boundscheck(False), it just hints that it would be faster to disable the bounds checking. When we actually start optimizing these things (e.g. moving it outside loops etc), it might also be useful to consider inlining functions at the Cython level (otherwise optimizations cannot escape the function). >>> With initialized slices the control flow knows when the slices are >>> initialized, or when they might not be (and it can raise a >>> compile-time or runtime error, instead of a segfault if you're lucky). >>> I'm fine with implementing the behaviour, I just always left it at the >>> bottom of my todo list. >> >> >> Wasn't saying you should do it, just checking. >> >> I'm still not sure about this. I think what I'd really like is >> >> ?a) Stop cdef classes from being None as well >> >> ?b) Sort-of deprecate cdef in favor of cast/assertion type statements that >> help the type inferences: >> >> def f(arr): >> ? ?if arr is None: >> ? ? ? ?arr = ... >> ? ?arr = int[:](arr) # equivalent to "cdef int[:] arr = arr", but >> ? ? ? ? ? ? ? ? ? ? ?# acts as statement, with a specific point >> ? ? ? ? ? ? ? ? ? ? ?# for the none-check >> ? ?... >> >> or even: >> >> def f(arr): >> ? ?if arr is None: >> ? ? ? ?return 'foo' >> ? ?else: >> ? ? ? ?arr = int[:](arr) # takes effect *here*, does none-check >> ? ? ? ?... >> ? ?# arr still typed as int[:] here >> >> If we can make this work well enough with control flow analysis I'd never >> cdef declare local vars again :-) > > Hm, what about the following? > > def f(arr): > ? ?if arr is None: > ? ? ? ?return 'foo' > > ? ?cdef int[:] arr # arr may not be None > >> Dag >> >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel From vitja.makarov at gmail.com Sat Feb 4 19:49:26 2012 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Sat, 4 Feb 2012 22:49:26 +0400 Subject: [Cython] 0.16 release In-Reply-To: References: <4F1FEEEE.2060605@behnel.de>

<4F2083B0.9020209@creativetrax.com> Message-ID: 2012/1/31 Robert Bradshaw : > On Sat, Jan 28, 2012 at 8:05 AM, Vitja Makarov wrote: >> 2012/1/26 Jason Grout : >>> On 1/25/12 11:39 AM, Robert Bradshaw wrote: >>>> >>>> install >>>> >>>> https://sage.math.washington.edu:8091/hudson/view/ext-libs/job/sage-build/lastSuccessfulBuild/artifact/cython-devel.spkg >>>> by downloading it and running "sage -i cython-devel.spkg" >>> >>> >>> >>> In fact, you could just do >>> >>> sage -i >>> https://sage.math.washington.edu:8091/hudson/view/ext-libs/job/sage-build/lastSuccessfulBuild/artifact/cython-devel.spkg >>> >>> and Sage will (at least, should) download it for you, so that's even one >>> less step! >>> >>> Jason >>> >> >> Thanks for detailed instruction! I've successfully built it. >> >> "sage -t -gdb ./...." doesn't work, is that a bug? >> >> vitja at mchome:~/Downloads/sage-4.8$ ./sage ?-t -gdb >> devel/sage/sage/combinat/sf/macdonald.py >> sage -t -gdb "devel/sage/sage/combinat/sf/macdonald.py" >> ******************************************************************************** >> Type r at the (gdb) prompt to run the doctests. >> Type bt if there is a crash to see a traceback. >> ******************************************************************************** >> gdb --args python /home/vitja/.sage//tmp/macdonald_6182.py >> starting cmd gdb --args python /home/vitja/.sage//tmp/macdonald_6182.py >> ImportError: No module named site >> ? ? ? ? [0.2 s] >> >> ---------------------------------------------------------------------- >> The following tests failed: >> >> >> ? ? ? ?sage -t -gdb "devel/sage/sage/combinat/sf/macdonald.py"release >> Total time for all tests: 0.2 seconds > > Yes, that's a bug. > >> I've found another way to run tests (using sage -sh and then direct >> python ~/.sage/tmp/...py) >> >> So I found one of the problems. Here is minimal cython example: >> >> def foo(values): >> ? ?return (0,)*len(values) >> foo([1,2,3]) >> >> len(values) somehow is passed as an integer to PyObject_Multiply() > > Yeah, that's a bug too :). I've fixed tuple mult_factor bug here: https://github.com/cython/cython/commit/2d4b85dbcef885fbdaf6a3b2daef7a017184a56f No more segfaults in sage-tests but still 7 errors. -- vitja. From d.s.seljebotn at astro.uio.no Sat Feb 4 20:39:29 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 04 Feb 2012 20:39:29 +0100 Subject: [Cython] memoryview slices can't be None? In-Reply-To: References: <4F2A7F41.6010303@astro.uio.no> <4F2B0247.8040008@astro.uio.no> <4F2C1EFE.4050904@astro.uio.no> <4F2C242B.9010309@astro.uio.no> Message-ID: <4F2D8971.1000102@astro.uio.no> On 02/03/2012 07:26 PM, mark florisson wrote: > On 3 February 2012 18:15, Dag Sverre Seljebotn > wrote: >> On 02/03/2012 07:07 PM, mark florisson wrote: >>> >>> On 3 February 2012 18:06, mark florisson >>> wrote: >>>> >>>> On 3 February 2012 17:53, Dag Sverre Seljebotn >>>> wrote: >>>>> >>>>> On 02/03/2012 12:09 AM, mark florisson wrote: >>>>>> >>>>>> >>>>>> On 2 February 2012 21:38, Dag Sverre Seljebotn >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> On 02/02/2012 10:16 PM, mark florisson wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 2 February 2012 12:19, Dag Sverre Seljebotn >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I just realized that >>>>>>>>> >>>>>>>>> cdef int[:] a = None >>>>>>>>> >>>>>>>>> raises an exception; even though I'd argue that 'a' is of the >>>>>>>>> "reference" >>>>>>>>> kind of type where Cython usually allow None (i.e., "cdef MyClass b >>>>>>>>> = >>>>>>>>> None" >>>>>>>>> is allowed even if type(None) is NoneType). Is this a bug or not, >>>>>>>>> and >>>>>>>>> is >>>>>>>>> it >>>>>>>>> possible to do something about it? >>>>>>>>> >>>>>>>>> Dag Sverre >>>>>>>>> _______________________________________________ >>>>>>>>> cython-devel mailing list >>>>>>>>> cython-devel at python.org >>>>>>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Yeah I disabled that quite early. It was supposed to be working but >>>>>>>> gave a lot of trouble in cases (segfaults, mainly). At the time I was >>>>>>>> trying to get rid of all the segfaults and get the basic >>>>>>>> functionality >>>>>>>> working, so I disabled it. Personally, I have never liked how things >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Well, you can segfault quite easily with >>>>>>> >>>>>>> cdef MyClass a = None >>>>>>> print a.field >>>>>>> >>>>>>> so it doesn't make sense to slices different from cdef classes IMO. >>>>>>> >>>>>>> >>>>>>>> can be None unchecked. I personally prefer to write >>>>>>>> >>>>>>>> cdef foo(obj=None): >>>>>>>> cdef int[:] a >>>>>>>> if obj is None: >>>>>>>> obj = ... >>>>>>>> a = obj >>>>>>>> >>>>>>>> Often you forget to write 'not None' when declaring the parameter >>>>>>>> (and >>>>>>>> apparently that it only allowed for 'def' functions). >>>>>>>> >>>>>>>> As such, I never bothered to re-enable it. However, it does support >>>>>>>> control flow with uninitialized slices, and will raise an error if it >>>>>>>> is uninitialized. Do we want this behaviour (e.g. for consistency)? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> When in doubt, go for consistency. So +1 for that reason. I do believe >>>>>>> that >>>>>>> setting stuff to None is rather vital in Python. >>>>>>> >>>>>>> What I typically do is more like this: >>>>>>> >>>>>>> def f(double[:] input, double[:] out=None): >>>>>>> if out is None: >>>>>>> out = np.empty_like(input) >>>>>>> ... >>>>>>> >>>>>>> Having to use another variable name is a bit of a pain. (Come on -- do >>>>>>> you >>>>>>> use "a" in real code? What do you actually call "the other obj"? I >>>>>>> sometimes >>>>>>> end up with "out_" and so on, but it creates smelly code quite >>>>>>> quickly.) >>>>>> >>>>>> >>>>>> >>>>>> No, it was just a contrived example. >>>>>> >>>>>>> It's easy to segfault with cdef classes anyway, so decent nonechecking >>>>>>> should be implemented at some point, and then memoryviews would use >>>>>>> the >>>>>>> same >>>>>>> mechanisms. Java has decent null-checking... >>>>>>> >>>>>> >>>>>> The problem with none checking is that it has to occur at every point. >>>>> >>>>> >>>>> >>>>> Well, using control flow analysis etc. it doesn't really. E.g., >>>>> >>>>> for i in range(a.shape[0]): >>>>> print i >>>>> a[i] *= 3 >>>>> >>>>> can be unrolled and none-checks inserted as >>>>> >>>>> print 0 >>>>> if a is None: raise .... >>>>> a[0] *= 3 >>>>> for i in range(1, a.shape[0]): >>>>> print i >>>>> a[i] *= 3 # no need for none-check >>>>> >>>>> It's very similar to what you'd want to do to pull boundschecking out of >>>>> the >>>>> loop... >>>>> >>>> >>>> Oh, definitely. Both optimizations may not always be possible to do, >>>> though. The optimization (for boundschecking) is easier for prange() >>>> than range(), as you can immediately raise an exception as the >>>> exceptional condition may be issued at any iteration. What do you do >>>> with bounds checking when some accesses are in-bound, and some are >>>> out-of-bound? Do you immediately raise the exception? Are we fine with >>>> aborting (like Fortran compilers do when you ask them for bounds >>>> checking)? And how do you detect that the code doesn't already raise >>>> an exception or break out of the loop itself to prevent the >>>> out-of-bound access? (Unless no exceptions are propagating and no >>>> break/return is used, but exceptions are so very common). >>>> >>>>>> With initialized slices the control flow knows when the slices are >>>>>> initialized, or when they might not be (and it can raise a >>>>>> compile-time or runtime error, instead of a segfault if you're lucky). >>>>>> I'm fine with implementing the behaviour, I just always left it at the >>>>>> bottom of my todo list. >>>>> >>>>> >>>>> >>>>> Wasn't saying you should do it, just checking. >>>>> >>>>> I'm still not sure about this. I think what I'd really like is >>>>> >>>>> a) Stop cdef classes from being None as well >>>>> >>>>> b) Sort-of deprecate cdef in favor of cast/assertion type statements >>>>> that >>>>> help the type inferences: >>>>> >>>>> def f(arr): >>>>> if arr is None: >>>>> arr = ... >>>>> arr = int[:](arr) # equivalent to "cdef int[:] arr = arr", but >>>>> # acts as statement, with a specific point >>>>> # for the none-check >>>>> ... >>>>> >>>>> or even: >>>>> >>>>> def f(arr): >>>>> if arr is None: >>>>> return 'foo' >>>>> else: >>>>> arr = int[:](arr) # takes effect *here*, does none-check >>>>> ... >>>>> # arr still typed as int[:] here >>>>> >>>>> If we can make this work well enough with control flow analysis I'd >>>>> never >>>>> cdef declare local vars again :-) >>>> >>>> >>>> Hm, what about the following? >>>> >>>> def f(arr): >>>> if arr is None: >>>> return 'foo' >>>> >>>> cdef int[:] arr # arr may not be None >>> >>> >>> The above would work in general, until the declaration is lexically >>> encountered, the object is typed as object. >> >> >> This was actually going to be my first proposal :-) That would finally >> define how "cdef" inside of if-statements etc. behave too (simply use >> control flow analysis and treat it like a statement). > > Block-local declarations are definitely something we want, although I > think it would require some more (non-trivial) changes to the > compiler. Note that my proposal was actually not about block-local declarations. Block-local: { int x = 4; } /* x not available here */ My idea was much more like hints to control flow analysis. That is, I wanted to have this raise an error: x = 'adf' if foo(): cdef int x = y print x # type of x not known This is OK: if foo(): cdef int x = y else: cdef int x = 4 print x # ok, type the same anyway -- so type "escapes" block And I would allow cdef str x = y if foo: cdef int x = int(x) return g(x) # x must be int print x # x must be str at this point The reason for this madness is simply that control statements do NOT create blocks in Python, and making it so in Cython is just confusing. It would bring too much of C into the language for my taste. I think that in my Cython-utopia, Symtab.py is only responsible for resolving the scope of *names*, and types of things are not bound to blocks, just to the state at control flow points. Of course, implementing this would be a nightmare. > Maybe the cleanup code from functions, as well as the temp handling > etc could be re-factored to a BlockNode, that all block nodes could > subclass. They'd have to instantiate new symbol table environments as > well. I'm not yet entirely sure what else would be involved in the > implementation of that. > >> But I like int[:] as a way of making it pure Python syntax compatible as >> well. Perhaps the two are orthogonal -- a) make variable declaration a >> statement, b) make cython.int[:](x) do, essentially, a cdef declaration, for >> Python compatability. >> > > Don't we have cython.declare() for that? e.g. > > arr = cython.declare(cython.int[:]) > > That would also be treated as a statement like normal declarations (if > and when implemented). This was what I said, but it wasn't what I meant. Sorry. I'll try to explain better: 1) There's no way to have the above actually do the right thing in Python. With "arr = cython.int[:](arr)" one could actually return a NumPy or NumPy-like array that works in Python (since "arr" might not have the "shape" attribute before the conversion, all we know is that it exports the buffer interface...). 2) I don't like the fact that we overload the assignment operator to acquire a view. "cdef np.ndarray[int] x = y" is fine since if you do "x.someattr" then a NumPy subclass could provide someattr and it works fine. Acquiring a view is just something different. 3) Hence I guess I like "arr = int[:](arr)" better both for Cython and Python; at least if "arr" is always type-inferred to be int[:], even if arr was an "object" further up in the code (really, if you do "x = f(x)" at the top-level of the function, then x can just take the identity of another variable from that point on -- I don't know if the current control flow analysis and type inferences does this though?) Dag Sverre From stefan_ml at behnel.de Sat Feb 4 21:32:45 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 04 Feb 2012 21:32:45 +0100 Subject: [Cython] 0.16 release In-Reply-To: References: <4F1FEEEE.2060605@behnel.de>

<4F2083B0.9020209@creativetrax.com>

Message-ID: <4F2D95ED.4050908@behnel.de> Vitja Makarov, 04.02.2012 19:49: >> On Sat, Jan 28, 2012 at 8:05 AM, Vitja Makarov wrote: >>> So I found one of the problems. Here is minimal cython example: >>> >>> def foo(values): >>> return (0,)*len(values) >>> foo([1,2,3]) >>> >>> len(values) somehow is passed as an integer to PyObject_Multiply() > > I've fixed tuple mult_factor bug here: > > https://github.com/cython/cython/commit/2d4b85dbcef885fbdaf6a3b2daef7a017184a56f I didn't have any time to look into this, but your fix seems right. Thanks! Stefan From robertwb at math.washington.edu Sun Feb 5 11:31:58 2012 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Sun, 5 Feb 2012 02:31:58 -0800 Subject: [Cython] 0.16 release In-Reply-To: References: <4F1FEEEE.2060605@behnel.de>

<4F2083B0.9020209@creativetrax.com>

Message-ID: On Sat, Feb 4, 2012 at 10:49 AM, Vitja Makarov wrote: > 2012/1/31 Robert Bradshaw : >> On Sat, Jan 28, 2012 at 8:05 AM, Vitja Makarov wrote: >>> 2012/1/26 Jason Grout : >>>> On 1/25/12 11:39 AM, Robert Bradshaw wrote: >>>>> >>>>> install >>>>> >>>>> https://sage.math.washington.edu:8091/hudson/view/ext-libs/job/sage-build/lastSuccessfulBuild/artifact/cython-devel.spkg >>>>> by downloading it and running "sage -i cython-devel.spkg" >>>> >>>> >>>> >>>> In fact, you could just do >>>> >>>> sage -i >>>> https://sage.math.washington.edu:8091/hudson/view/ext-libs/job/sage-build/lastSuccessfulBuild/artifact/cython-devel.spkg >>>> >>>> and Sage will (at least, should) download it for you, so that's even one >>>> less step! >>>> >>>> Jason >>>> >>> >>> Thanks for detailed instruction! I've successfully built it. >>> >>> "sage -t -gdb ./...." doesn't work, is that a bug? >>> >>> vitja at mchome:~/Downloads/sage-4.8$ ./sage ?-t -gdb >>> devel/sage/sage/combinat/sf/macdonald.py >>> sage -t -gdb "devel/sage/sage/combinat/sf/macdonald.py" >>> ******************************************************************************** >>> Type r at the (gdb) prompt to run the doctests. >>> Type bt if there is a crash to see a traceback. >>> ******************************************************************************** >>> gdb --args python /home/vitja/.sage//tmp/macdonald_6182.py >>> starting cmd gdb --args python /home/vitja/.sage//tmp/macdonald_6182.py >>> ImportError: No module named site >>> ? ? ? ? [0.2 s] >>> >>> ---------------------------------------------------------------------- >>> The following tests failed: >>> >>> >>> ? ? ? ?sage -t -gdb "devel/sage/sage/combinat/sf/macdonald.py"release >>> Total time for all tests: 0.2 seconds >> >> Yes, that's a bug. >> >>> I've found another way to run tests (using sage -sh and then direct >>> python ~/.sage/tmp/...py) >>> >>> So I found one of the problems. Here is minimal cython example: >>> >>> def foo(values): >>> ? ?return (0,)*len(values) >>> foo([1,2,3]) >>> >>> len(values) somehow is passed as an integer to PyObject_Multiply() >> >> Yeah, that's a bug too :). > > I've fixed tuple mult_factor bug here: > > https://github.com/cython/cython/commit/2d4b85dbcef885fbdaf6a3b2daef7a017184a56f > > No more segfaults in sage-tests but still 7 errors. > Thanks! I've looked into the other errors and I think it boils down to our use of --disable-function-redefinition being incompatible with how decorators work. Of course that was just a hack, so I've fixed sage to build/startup without using that flag, but there's some strangeness with import order now that I haven't had time to resolve yet. From markflorisson88 at gmail.com Sun Feb 5 22:56:18 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 5 Feb 2012 21:56:18 +0000 Subject: [Cython] memoryview slices can't be None? In-Reply-To: <4F2D8971.1000102@astro.uio.no> References: <4F2A7F41.6010303@astro.uio.no> <4F2B0247.8040008@astro.uio.no> <4F2C1EFE.4050904@astro.uio.no> <4F2C242B.9010309@astro.uio.no> <4F2D8971.1000102@astro.uio.no> Message-ID: On 4 February 2012 19:39, Dag Sverre Seljebotn wrote: > On 02/03/2012 07:26 PM, mark florisson wrote: >> >> On 3 February 2012 18:15, Dag Sverre Seljebotn >> ?wrote: >>> >>> On 02/03/2012 07:07 PM, mark florisson wrote: >>>> >>>> >>>> On 3 February 2012 18:06, mark florisson >>>> ?wrote: >>>>> >>>>> >>>>> On 3 February 2012 17:53, Dag Sverre Seljebotn >>>>> ? ?wrote: >>>>>> >>>>>> >>>>>> On 02/03/2012 12:09 AM, mark florisson wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 2 February 2012 21:38, Dag Sverre Seljebotn >>>>>>> ? ? ?wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 02/02/2012 10:16 PM, mark florisson wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 2 February 2012 12:19, Dag Sverre Seljebotn >>>>>>>>> ? ? ? ?wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I just realized that >>>>>>>>>> >>>>>>>>>> cdef int[:] a = None >>>>>>>>>> >>>>>>>>>> raises an exception; even though I'd argue that 'a' is of the >>>>>>>>>> "reference" >>>>>>>>>> kind of type where Cython usually allow None (i.e., "cdef MyClass >>>>>>>>>> b >>>>>>>>>> = >>>>>>>>>> None" >>>>>>>>>> is allowed even if type(None) is NoneType). Is this a bug or not, >>>>>>>>>> and >>>>>>>>>> is >>>>>>>>>> it >>>>>>>>>> possible to do something about it? >>>>>>>>>> >>>>>>>>>> Dag Sverre >>>>>>>>>> _______________________________________________ >>>>>>>>>> cython-devel mailing list >>>>>>>>>> cython-devel at python.org >>>>>>>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Yeah I disabled that quite early. It was supposed to be working but >>>>>>>>> gave a lot of trouble in cases (segfaults, mainly). At the time I >>>>>>>>> was >>>>>>>>> trying to get rid of all the segfaults and get the basic >>>>>>>>> functionality >>>>>>>>> working, so I disabled it. Personally, I have never liked how >>>>>>>>> things >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Well, you can segfault quite easily with >>>>>>>> >>>>>>>> cdef MyClass a = None >>>>>>>> print a.field >>>>>>>> >>>>>>>> so it doesn't make sense to slices different from cdef classes IMO. >>>>>>>> >>>>>>>> >>>>>>>>> can be None unchecked. I personally prefer to write >>>>>>>>> >>>>>>>>> cdef foo(obj=None): >>>>>>>>> ? ? cdef int[:] a >>>>>>>>> ? ? if obj is None: >>>>>>>>> ? ? ? ? obj = ... >>>>>>>>> ? ? a = obj >>>>>>>>> >>>>>>>>> Often you forget to write 'not None' when declaring the parameter >>>>>>>>> (and >>>>>>>>> apparently that it only allowed for 'def' functions). >>>>>>>>> >>>>>>>>> As such, I never bothered to re-enable it. However, it does support >>>>>>>>> control flow with uninitialized slices, and will raise an error if >>>>>>>>> it >>>>>>>>> is uninitialized. Do we want this behaviour (e.g. for consistency)? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> When in doubt, go for consistency. So +1 for that reason. I do >>>>>>>> believe >>>>>>>> that >>>>>>>> setting stuff to None is rather vital in Python. >>>>>>>> >>>>>>>> What I typically do is more like this: >>>>>>>> >>>>>>>> def f(double[:] input, double[:] out=None): >>>>>>>> ? ?if out is None: >>>>>>>> ? ? ? ?out = np.empty_like(input) >>>>>>>> ? ?... >>>>>>>> >>>>>>>> Having to use another variable name is a bit of a pain. (Come on -- >>>>>>>> do >>>>>>>> you >>>>>>>> use "a" in real code? What do you actually call "the other obj"? I >>>>>>>> sometimes >>>>>>>> end up with "out_" and so on, but it creates smelly code quite >>>>>>>> quickly.) >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> No, it was just a contrived example. >>>>>>> >>>>>>>> It's easy to segfault with cdef classes anyway, so decent >>>>>>>> nonechecking >>>>>>>> should be implemented at some point, and then memoryviews would use >>>>>>>> the >>>>>>>> same >>>>>>>> mechanisms. Java has decent null-checking... >>>>>>>> >>>>>>> >>>>>>> The problem with none checking is that it has to occur at every >>>>>>> point. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Well, using control flow analysis etc. it doesn't really. E.g., >>>>>> >>>>>> for i in range(a.shape[0]): >>>>>> ? ?print i >>>>>> ? ?a[i] *= 3 >>>>>> >>>>>> can be unrolled and none-checks inserted as >>>>>> >>>>>> print 0 >>>>>> if a is None: raise .... >>>>>> a[0] *= 3 >>>>>> for i in range(1, a.shape[0]): >>>>>> ? ?print i >>>>>> ? ?a[i] *= 3 # no need for none-check >>>>>> >>>>>> It's very similar to what you'd want to do to pull boundschecking out >>>>>> of >>>>>> the >>>>>> loop... >>>>>> >>>>> >>>>> Oh, definitely. Both optimizations may not always be possible to do, >>>>> though. The optimization (for boundschecking) is easier for prange() >>>>> than range(), as you can immediately raise an exception as the >>>>> exceptional condition may be issued at any iteration. ?What do you do >>>>> with bounds checking when some accesses are in-bound, and some are >>>>> out-of-bound? Do you immediately raise the exception? Are we fine with >>>>> aborting (like Fortran compilers do when you ask them for bounds >>>>> checking)? And how do you detect that the code doesn't already raise >>>>> an exception or break out of the loop itself to prevent the >>>>> out-of-bound access? (Unless no exceptions are propagating and no >>>>> break/return is used, but exceptions are so very common). >>>>> >>>>>>> With initialized slices the control flow knows when the slices are >>>>>>> initialized, or when they might not be (and it can raise a >>>>>>> compile-time or runtime error, instead of a segfault if you're >>>>>>> lucky). >>>>>>> I'm fine with implementing the behaviour, I just always left it at >>>>>>> the >>>>>>> bottom of my todo list. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Wasn't saying you should do it, just checking. >>>>>> >>>>>> I'm still not sure about this. I think what I'd really like is >>>>>> >>>>>> ?a) Stop cdef classes from being None as well >>>>>> >>>>>> ?b) Sort-of deprecate cdef in favor of cast/assertion type statements >>>>>> that >>>>>> help the type inferences: >>>>>> >>>>>> def f(arr): >>>>>> ? ?if arr is None: >>>>>> ? ? ? ?arr = ... >>>>>> ? ?arr = int[:](arr) # equivalent to "cdef int[:] arr = arr", but >>>>>> ? ? ? ? ? ? ? ? ? ? ?# acts as statement, with a specific point >>>>>> ? ? ? ? ? ? ? ? ? ? ?# for the none-check >>>>>> ? ?... >>>>>> >>>>>> or even: >>>>>> >>>>>> def f(arr): >>>>>> ? ?if arr is None: >>>>>> ? ? ? ?return 'foo' >>>>>> ? ?else: >>>>>> ? ? ? ?arr = int[:](arr) # takes effect *here*, does none-check >>>>>> ? ? ? ?... >>>>>> ? ?# arr still typed as int[:] here >>>>>> >>>>>> If we can make this work well enough with control flow analysis I'd >>>>>> never >>>>>> cdef declare local vars again :-) >>>>> >>>>> >>>>> >>>>> Hm, what about the following? >>>>> >>>>> def f(arr): >>>>> ? ?if arr is None: >>>>> ? ? ? ?return 'foo' >>>>> >>>>> ? ?cdef int[:] arr # arr may not be None >>>> >>>> >>>> >>>> The above would work in general, until the declaration is lexically >>>> encountered, the object is typed as object. >>> >>> >>> >>> This was actually going to be my first proposal :-) That would finally >>> define how "cdef" inside of if-statements etc. behave too (simply use >>> control flow analysis and treat it like a statement). >> >> >> Block-local declarations are definitely something we want, although I >> think it would require some more (non-trivial) changes to the >> compiler. > > > Note that my proposal was actually not about block-local declarations. > > Block-local: > > { > ? int x = 4; > } > /* x not available here */ > > My idea was much more like hints to control flow analysis. That is, I wanted > to have this raise an error: > > x = 'adf' > if foo(): > ? ?cdef int x = y > print x # type of x not known > > This is OK: > > if foo(): > ? ?cdef int x = y > else: > ? ?cdef int x = 4 > print x # ok, type the same anyway -- so type "escapes" block Seeing that it doesn't work that way in any language with block scopes, I find that pretty surprising behaviour. Why would you not simply mandate that the user declares 'x' outside of the blocks? > And I would allow > > cdef str x = y > if foo: > ? ?cdef int x = int(x) > ? ?return g(x) # x must be int > print x # x must be str at this point > > > The reason for this madness is simply that control statements do NOT create > blocks in Python, and making it so in Cython is just confusing. It would > bring too much of C into the language for my taste. And yet it can be very useful and intuitive in several contexts, just not for objects (which aren't typed anyway!). Block-local declarations are useful when a variable is only used in the block and it can be useful to make variables private in the cython.parallel context ("assignment makes private" is really not as intuitive). It's not a very important feature though, and it's indeed more a thing from static languages than Python. > I think that in my Cython-utopia, Symtab.py is only responsible for > resolving the scope of *names*, and types of things are not bound to blocks, > just to the state at control flow points. > > Of course, implementing this would be a nightmare. > > >> Maybe the cleanup code from functions, as well as the temp handling >> etc could be re-factored to a BlockNode, that all block nodes could >> subclass. They'd have to instantiate new symbol table environments as >> well. I'm not yet entirely sure what else would be involved in the >> implementation of that. >> >>> But I like int[:] as a way of making it pure Python syntax compatible as >>> well. Perhaps the two are orthogonal -- a) make variable declaration a >>> statement, b) make cython.int[:](x) do, essentially, a cdef declaration, >>> for >>> Python compatability. >>> >> >> Don't we have cython.declare() for that? e.g. >> >> ? ? arr = cython.declare(cython.int[:]) >> >> That would also be treated as a statement like normal declarations (if >> and when implemented). > > > This was what I said, but it wasn't what I meant. Sorry. I'll try to explain > better: > > 1) ?There's no way to have the above actually do the right thing in Python. > With "arr = cython.int[:](arr)" one could actually return a NumPy or > NumPy-like array that works in Python (since "arr" might not have the > "shape" attribute before the conversion, all we know is that it exports the > buffer interface...). Right, but the same thing goes for other types as well. E.g. I can type something int with cython.declare() and then use strings instead. > 2) I don't like the fact that we overload the assignment operator to acquire > a view. "cdef np.ndarray[int] x = y" is fine since if you do "x.someattr" > then a NumPy subclass could provide someattr and it works fine. Acquiring a > view is just something different. Yeah it's kind of overloaded, but in a good way :) It's the language that does the overloading, which means it's not very surprising. And the memoryview slices coerce to numpy-like (although somewhat incapable) objects and support some of their attributes. I like the simplicity of assignment here, you don't really care that it takes a view, you just want to access and operate on the data. What do you think of allowing the user to register a conversion-to-object function? And perhaps the default should be that if a view was never sliced, it just returns the original object (although that might mean you get back objects with incompatible interfaces...). > 3) Hence I guess I like "arr = int[:](arr)" better both for Cython and > Python; at least if "arr" is always type-inferred to be int[:], even if arr > was an "object" further up in the code (really, if you do "x = f(x)" at the > top-level of the function, then x can just take the identity of another > variable from that point on -- I don't know if the current control flow > analysis and type inferences does this though?) > > > Dag Sverre > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Sun Feb 5 22:57:51 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 5 Feb 2012 21:57:51 +0000 Subject: [Cython] OpenCL support Message-ID: Hey, I created a CEP for opencl support: http://wiki.cython.org/enhancements/opencl What do you think? Mark From markflorisson88 at gmail.com Sun Feb 5 23:03:36 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 5 Feb 2012 22:03:36 +0000 Subject: [Cython] memoryview slices can't be None? In-Reply-To: <4F2B0247.8040008@astro.uio.no> References: <4F2A7F41.6010303@astro.uio.no> <4F2B0247.8040008@astro.uio.no> Message-ID: On 2 February 2012 21:38, Dag Sverre Seljebotn wrote: > On 02/02/2012 10:16 PM, mark florisson wrote: >> >> On 2 February 2012 12:19, Dag Sverre Seljebotn >> ?wrote: >>> >>> I just realized that >>> >>> cdef int[:] a = None >>> >>> raises an exception; even though I'd argue that 'a' is of the "reference" >>> kind of type where Cython usually allow None (i.e., "cdef MyClass b = >>> None" >>> is allowed even if type(None) is NoneType). Is this a bug or not, and is >>> it >>> possible to do something about it? >>> >>> Dag Sverre >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> >> >> Yeah I disabled that quite early. It was supposed to be working but >> gave a lot of trouble in cases (segfaults, mainly). At the time I was >> trying to get rid of all the segfaults and get the basic functionality >> working, so I disabled it. Personally, I have never liked how things > > > Well, you can segfault quite easily with > > cdef MyClass a = None > print a.field > > so it doesn't make sense to slices different from cdef classes IMO. > > >> can be None unchecked. I personally prefer to write >> >> cdef foo(obj=None): >> ? ? cdef int[:] a >> ? ? if obj is None: >> ? ? ? ? obj = ... >> ? ? a = obj >> >> Often you forget to write 'not None' when declaring the parameter (and >> apparently that it only allowed for 'def' functions). >> >> As such, I never bothered to re-enable it. However, it does support >> control flow with uninitialized slices, and will raise an error if it >> is uninitialized. Do we want this behaviour (e.g. for consistency)? > > > When in doubt, go for consistency. So +1 for that reason. I do believe that > setting stuff to None is rather vital in Python. Yeah I think we should go back to this discussion :) Checking for None and allowing slices to be None is simply very convenient, and doesn't involve any drastic changes. I was never really against it, I just never got around to implementing it. > What I typically do is more like this: > > def f(double[:] input, double[:] out=None): > ? ?if out is None: > ? ? ? ?out = np.empty_like(input) > ? ?... > > Having to use another variable name is a bit of a pain. (Come on -- do you > use "a" in real code? What do you actually call "the other obj"? I sometimes > end up with "out_" and so on, but it creates smelly code quite quickly.) > > It's easy to segfault with cdef classes anyway, so decent nonechecking > should be implemented at some point, and then memoryviews would use the same > mechanisms. Java has decent null-checking... > > > Dag Sverre > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From dtcaciuc at gmail.com Sun Feb 5 23:39:25 2012 From: dtcaciuc at gmail.com (Dimitri Tcaciuc) Date: Sun, 5 Feb 2012 14:39:25 -0800 Subject: [Cython] OpenCL support In-Reply-To: References: Message-ID: Mark, Couple of thoughts based on some experience with OpenCL... 1. This may be going outside the proposed purpose, but some algorithms such as molecular simulations can benefit from a fairly large amount of constant data loaded at the beginning of the program and persisted in between invocations of a function. If I understand the proposal, entire program would need to be within one `with` block, which would certainly be limiting to the architecture. Eg. ? ? # run.py ? ? from cython_module import Evaluator ? ? # Arrays are loaded into device memory here ? ? x = Evaluator(params...) ? ? for i in range(N): ? ? ? ? # Calculations are performed with ? ? ? ? # mostly data in the device memory ? ? ? ? data_i = x.step() ? ? ? ? ... 2. AFAIK, given a device, OpenCL basically takes it over (which would be eg. 8 cores on 2 CPU x 4 cores machine), so I'm not sure how `num_cores` parameter would work here. There's the fission extension that allows you to selectively run on a portion of the device, but the idea is that you're still dedicating entire device to your process, but merely giving more organization to your processing tasks, where you have to specify the core numbers you want to use. I may very well be wrong here, bashing is welcome :) 3. Does it make sense to make OpenCL more explicit? Heuristics and automatic switching between, say, CPU and GPU is great for eg. Sage users, but maybe not so much if you know exactly what you're doing with your machine resources. E.g just having a library with thin cython-adapted wrappers would be awesome. I imagine this can be augmented by arrays having a knowledge of device-side/client-side (which would go towards addressing the issue 1. above) Cheers, Dimitri. On Sun, Feb 5, 2012 at 1:57 PM, mark florisson wrote: > Hey, > > I created a CEP for opencl support: http://wiki.cython.org/enhancements/opencl > What do you think? > > Mark > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Mon Feb 6 00:12:47 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 5 Feb 2012 23:12:47 +0000 Subject: [Cython] OpenCL support In-Reply-To: References:

Message-ID: On 5 February 2012 22:39, Dimitri Tcaciuc wrote: > Mark, > > Couple of thoughts based on some experience with OpenCL... > > 1. This may be going outside the proposed purpose, but some algorithms > such as molecular simulations can benefit from a fairly large amount > of constant data loaded at the beginning of the program and persisted > in between invocations of a function. If I understand the proposal, > entire program would need to be within one `with` block, which would > certainly be limiting to the architecture. Eg. > > ? ? # run.py > ? ? from cython_module import Evaluator > > ? ? # Arrays are loaded into device memory here > ? ? x = Evaluator(params...) > ? ? for i in range(N): > ? ? ? ? # Calculations are performed with > ? ? ? ? # mostly data in the device memory > ? ? ? ? data_i = x.step() > ? ? ? ? ... The point of the proposal is that the slices will actually stay on the GPU as long as possible, until they absolutely need to be copied back (e.g. when you go back to NumPy-land). You can do anything you want in-between (outside any parallel section), e.g. call other functions that run on the CPU, call python functions, whatever. When you continue processing data that is still on the GPU, it will simply continue from there. But point taken, the compiler could think "oh but this is not too much work, and I have more data in main memory than on the GPU, so let me use the CPU and copy that constant data back". So perhaps the pinning should not just work for main memory, stuff could also be pinned on the GPU. Then if there is a "pinning conflict" Cython would raise an exception. > 2. AFAIK, given a device, OpenCL basically takes it over (which would > be eg. 8 cores on 2 CPU x 4 cores machine), so I'm not sure how > `num_cores` parameter would work here. There's the fission extension > that allows you to selectively run on a portion of the device, but the > idea is that you're still dedicating entire device to your process, > but merely giving more organization to your processing tasks, where > you have to specify the core numbers you want to use. I may very well > be wrong here, bashing is welcome :) Oh, yes. I think the num_threads clause could simply be ignored in that context, it's only supposed to be an upper limit. Scheduling hints like chunksize could also be ignored :) > 3. Does it make sense to make OpenCL more explicit? Heuristics and > automatic switching between, say, CPU and GPU is great for eg. Sage > users, but maybe not so much if you know exactly what you're doing > with your machine resources. E.g just having a library with thin > cython-adapted wrappers would be awesome. I imagine this can be > augmented by arrays having a knowledge of device-side/client-side > (which would go towards addressing the issue 1. above) Hm, there are several advantages to supporting this in the language. One is that you can support parallel sections, and that your code can transparently execute in parallel on whatever device the compiler and runtime think will be best. Excluding the cython.parallel stuff I don't think there is enough room for a library, you might as well use pyopencl directly in that case right? Not OpenCL perse, but part of that will also solve the numpy-temporary problem, which we have numexpr for. But it would be more convenient to express oneself natively in the programming language of choice (Cython :). > Cheers, > > > Dimitri. > > On Sun, Feb 5, 2012 at 1:57 PM, mark florisson > wrote: >> Hey, >> >> I created a CEP for opencl support: http://wiki.cython.org/enhancements/opencl >> What do you think? >> >> Mark >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From stefan_ml at behnel.de Mon Feb 6 08:22:26 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 06 Feb 2012 08:22:26 +0100 Subject: [Cython] OpenCL support In-Reply-To: References:

Message-ID: <4F2F7FB2.5020604@behnel.de> mark florisson, 06.02.2012 00:12: > On 5 February 2012 22:39, Dimitri Tcaciuc wrote: >> 3. Does it make sense to make OpenCL more explicit? Heuristics and >> automatic switching between, say, CPU and GPU is great for eg. Sage >> users, but maybe not so much if you know exactly what you're doing >> with your machine resources. E.g just having a library with thin >> cython-adapted wrappers would be awesome. I imagine this can be >> augmented by arrays having a knowledge of device-side/client-side >> (which would go towards addressing the issue 1. above) > > Hm, there are several advantages to supporting this in the language. ... and there's always the obvious disadvantage of making the language too complex and magic to learn and understand. Worth balancing. Stefan From markflorisson88 at gmail.com Mon Feb 6 11:21:53 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 6 Feb 2012 10:21:53 +0000 Subject: [Cython] OpenCL support In-Reply-To: <4F2F7FB2.5020604@behnel.de> References:

<4F2F7FB2.5020604@behnel.de> Message-ID: On 6 February 2012 07:22, Stefan Behnel wrote: > mark florisson, 06.02.2012 00:12: >> On 5 February 2012 22:39, Dimitri Tcaciuc wrote: >>> 3. Does it make sense to make OpenCL more explicit? Heuristics and >>> automatic switching between, say, CPU and GPU is great for eg. Sage >>> users, but maybe not so much if you know exactly what you're doing >>> with your machine resources. E.g just having a library with thin >>> cython-adapted wrappers would be awesome. I imagine this can be >>> augmented by arrays having a knowledge of device-side/client-side >>> (which would go towards addressing the issue 1. above) >> >> Hm, there are several advantages to supporting this in the language. > > ... and there's always the obvious disadvantage of making the language too > complex and magic to learn and understand. Worth balancing. Definitely. This would however introduce very minor changes to the language (no new syntax at least, just a few memoryview methods), but more major changes to the compiler. The support would mostly be transparent. Clyther (http://srossross.github.com/Clyther/) is a related project, which does a similar thing by compiling python (bytecode) to opencl. What I want for Cython is something even more transparent, the user wouldn't perhaps even know opencl was involved, and the compiler has more control over how data is handled. > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From dtcaciuc at gmail.com Mon Feb 6 18:23:19 2012 From: dtcaciuc at gmail.com (Dimitri Tcaciuc) Date: Mon, 6 Feb 2012 09:23:19 -0800 Subject: [Cython] OpenCL support In-Reply-To: References:

<4F2F7FB2.5020604@behnel.de> Message-ID: On Mon, Feb 6, 2012 at 2:21 AM, mark florisson wrote: > On 6 February 2012 07:22, Stefan Behnel wrote: >> mark florisson, 06.02.2012 00:12: >>> On 5 February 2012 22:39, Dimitri Tcaciuc wrote: >>>> 3. Does it make sense to make OpenCL more explicit? Heuristics and >>>> automatic switching between, say, CPU and GPU is great for eg. Sage >>>> users, but maybe not so much if you know exactly what you're doing >>>> with your machine resources. E.g just having a library with thin >>>> cython-adapted wrappers would be awesome. I imagine this can be >>>> augmented by arrays having a knowledge of device-side/client-side >>>> (which would go towards addressing the issue 1. above) >>> >>> Hm, there are several advantages to supporting this in the language. >> >> ... and there's always the obvious disadvantage of making the language too >> complex and magic to learn and understand. Worth balancing. > > Definitely. This would however introduce very minor changes to the > language (no new syntax at least, just a few memoryview methods), but > more major changes to the compiler. The support would mostly be > transparent. > Clyther (http://srossross.github.com/Clyther/) is a related project, > which does a similar thing by compiling python (bytecode) to opencl. > What I want for Cython is something even more transparent, the user > wouldn't perhaps even know opencl was involved, and the compiler has > more control over how data is handled. What I'm absolutely certain of is that sort of complete transparency will eventually start getting edge cases and from there on additional development and design will have to be made, so I it's better to plan not-as-transparent elements and user-side control right form the start. I think another reason I would go for a less automatic solution is because I imagine the alternative would inevitably complicate Cython internals. I think keeping that as simple as possible is huge advantage in the long run, which is arguably as important as reducing amount of code a language user has to write (hence me initially suggesting a more library-like Cython integration, although pyopencl did work quite well already :). Dimitri. > >> Stefan >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From sturla at molden.no Tue Feb 7 14:52:05 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 07 Feb 2012 14:52:05 +0100 Subject: [Cython] OpenCL support In-Reply-To: References:

Message-ID: <4F312C85.7050805@molden.no> On 05.02.2012 23:39, Dimitri Tcaciuc wrote: > 3. Does it make sense to make OpenCL more explicit? No, it takes the usefuness of OpenCL away, which is that kernels are text strings and compiled at run-time. > Heuristics and > automatic switching between, say, CPU and GPU is great for eg. Sage > users, but maybe not so much if you know exactly what you're doing > with your machine resources. E.g just having a library with thin > cython-adapted wrappers would be awesome. I imagine this can be > augmented by arrays having a knowledge of device-side/client-side > (which would go towards addressing the issue 1. above) Just use PyOpenCL and manipulate kernels as text. Python is excellent for that - Cython is not needed. If you think using Cython instead of Python (PyOpenCL and NumPy) will be important, you don't have a CPU bound problem that warrants the use of OpenCL. Sturla From dtcaciuc at gmail.com Tue Feb 7 18:22:59 2012 From: dtcaciuc at gmail.com (Dimitri Tcaciuc) Date: Tue, 7 Feb 2012 09:22:59 -0800 Subject: [Cython] OpenCL support In-Reply-To: <4F312C85.7050805@molden.no> References:

<4F312C85.7050805@molden.no> Message-ID: On Tue, Feb 7, 2012 at 5:52 AM, Sturla Molden wrote: > On 05.02.2012 23:39, Dimitri Tcaciuc wrote: > >> 3. Does it make sense to make OpenCL more explicit? > > > No, it takes the usefuness of OpenCL away, which is that kernels are text > strings and compiled at run-time. I'm not sure I understand you, maybe you could elaborate on that? By "explicit" I merely meant that the user will explicitly specify that they're working on OpenCL-enabled array or certain bit of Cython code will get compiled into OpenCL program etc. > >> Heuristics and >> automatic switching between, say, CPU and GPU is great for eg. Sage >> users, but maybe not so much if you know exactly what you're doing >> with your machine resources. E.g just having a library with thin >> cython-adapted wrappers would be awesome. I imagine this can be >> augmented by arrays having a knowledge of device-side/client-side >> (which would go towards addressing the issue 1. above) > > > Just use PyOpenCL and manipulate kernels as text. Python is excellent for > that - Cython is not needed. If you think using Cython instead of Python > (PyOpenCL and NumPy) will be important, you don't have a CPU bound problem > that warrants the use of OpenCL. Again, not sure what you mean here. As I mentioned in the thread, PyOpenCL worked quite fine, however if Cython is getting OpenCL support, I'd much rather use that than keeping a dependency on another library. > Sturla > > > > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Tue Feb 7 18:58:20 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 7 Feb 2012 17:58:20 +0000 Subject: [Cython] OpenCL support In-Reply-To: <4F312C85.7050805@molden.no> References:

<4F312C85.7050805@molden.no> Message-ID: On 7 February 2012 13:52, Sturla Molden wrote: > On 05.02.2012 23:39, Dimitri Tcaciuc wrote: > >> 3. Does it make sense to make OpenCL more explicit? > > > No, it takes the usefuness of OpenCL away, which is that kernels are text > strings and compiled at run-time. > I don't know why you think that is necessary. Obviously Cython's translated opencl would also be compiled at runtime (or loaded from a cache). If you mean you can't do string interpolation, I don't see why you would need that. >> Heuristics and >> automatic switching between, say, CPU and GPU is great for eg. Sage >> users, but maybe not so much if you know exactly what you're doing >> with your machine resources. E.g just having a library with thin >> cython-adapted wrappers would be awesome. I imagine this can be >> augmented by arrays having a knowledge of device-side/client-side >> (which would go towards addressing the issue 1. above) > > > Just use PyOpenCL and manipulate kernels as text. Python is excellent for > that - Cython is not needed. If you think using Cython instead of Python > (PyOpenCL and NumPy) will be important, you don't have a CPU bound problem > that warrants the use of OpenCL. > > Sturla > That is not very constructive input. If you use PyOpenCL you have to basically rethink and rewrite your kernels just for OpenCL. That is far from trivial and there is not much pyopencl does to keep you away from the pain of OpenCL and general GPU computing. There are existing approaches (compiler directives for C or Fortran) to do similar things, and an OpenMP (sub-)committee is working on adding/defining such features to the standard. Why? Because GPU programming is still a major pain in the ass. And although automatic translation will probably not yield the best performance for your particular hardware as a handwritten version, it will have saved you hours of coding and possibly rewriting. > > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From sturla at molden.no Tue Feb 7 18:58:41 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 07 Feb 2012 18:58:41 +0100 Subject: [Cython] OpenCL support In-Reply-To: References:

<4F312C85.7050805@molden.no> Message-ID: <4F316651.1040504@molden.no> On 07.02.2012 18:22, Dimitri Tcaciuc wrote: > I'm not sure I understand you, maybe you could elaborate on that? OpenCL code is a text string that is compiled when the program runs. So it can be generated from run-time data. Think of it like dynamic HTML. > Again, not sure what you mean here. As I mentioned in the thread, > PyOpenCL worked quite fine, however if Cython is getting OpenCL > support, I'd much rather use that than keeping a dependency on another > library. You can use PyOpenCL or OpenCL C or C++ headers with Cython. The latter you just use as you would with any other C or C++ library. You don't need to change the compiler to use a library: It seems like you think OpenCL is compiled from code when you build the program. It is actually compiled from text strings when you run the program. It is meaningless to ask if Cython supports OpenCL because Cython supports any C library. Sturla From markflorisson88 at gmail.com Tue Feb 7 19:01:41 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 7 Feb 2012 18:01:41 +0000 Subject: [Cython] OpenCL support In-Reply-To: References:

<4F312C85.7050805@molden.no> Message-ID: On 7 February 2012 17:22, Dimitri Tcaciuc wrote: > On Tue, Feb 7, 2012 at 5:52 AM, Sturla Molden wrote: >> On 05.02.2012 23:39, Dimitri Tcaciuc wrote: >> >>> 3. Does it make sense to make OpenCL more explicit? >> >> >> No, it takes the usefuness of OpenCL away, which is that kernels are text >> strings and compiled at run-time. > > I'm not sure I understand you, maybe you could elaborate on that? By > "explicit" I merely meant that the user will explicitly specify that > they're working on OpenCL-enabled array or certain bit of Cython code > will get compiled into OpenCL program etc. I gave that some thought as well, like 'cdef double[::view.gpu, :] myarray', which would mean that the data is on the gpu. Generally though, I think it kind of defeats the purpose. E.g. if you have small arrays you probably don't want anything to be on the gpu, whereas if you have larger ones and sufficient computation operating on them, it might be worthwhile. The point is, as a user you don't care, you want your runtime to make a sensible decision. If you don't want anything to do with OpenCL, you can disable it, or if you want to only ever stay on the CPU, you could "pin" it there. >> >>> Heuristics and >>> automatic switching between, say, CPU and GPU is great for eg. Sage >>> users, but maybe not so much if you know exactly what you're doing >>> with your machine resources. E.g just having a library with thin >>> cython-adapted wrappers would be awesome. I imagine this can be >>> augmented by arrays having a knowledge of device-side/client-side >>> (which would go towards addressing the issue 1. above) >> >> >> Just use PyOpenCL and manipulate kernels as text. Python is excellent for >> that - Cython is not needed. If you think using Cython instead of Python >> (PyOpenCL and NumPy) will be important, you don't have a CPU bound problem >> that warrants the use of OpenCL. > > Again, not sure what you mean here. As I mentioned in the thread, > PyOpenCL worked quite fine, however if Cython is getting OpenCL > support, I'd much rather use that than keeping a dependency on another > library. > >> Sturla >> >> >> >> >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Tue Feb 7 19:03:02 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 7 Feb 2012 18:03:02 +0000 Subject: [Cython] OpenCL support In-Reply-To: References:

<4F312C85.7050805@molden.no>

Message-ID: On 7 February 2012 18:01, mark florisson wrote: > On 7 February 2012 17:22, Dimitri Tcaciuc wrote: >> On Tue, Feb 7, 2012 at 5:52 AM, Sturla Molden wrote: >>> On 05.02.2012 23:39, Dimitri Tcaciuc wrote: >>> >>>> 3. Does it make sense to make OpenCL more explicit? >>> >>> >>> No, it takes the usefuness of OpenCL away, which is that kernels are text >>> strings and compiled at run-time. >> >> I'm not sure I understand you, maybe you could elaborate on that? By >> "explicit" I merely meant that the user will explicitly specify that >> they're working on OpenCL-enabled array or certain bit of Cython code >> will get compiled into OpenCL program etc. > > I gave that some thought as well, like 'cdef double[::view.gpu, :] > myarray', which would mean that the data is on the gpu. Generally > though, I think it kind of defeats the purpose. E.g. if you have small > arrays you probably don't want anything to be on the gpu, whereas if > you have larger ones and sufficient computation operating on them, it > might be worthwhile. The point is, as a user you don't care, you want > your runtime to make a sensible decision. If you don't want anything > to do with OpenCL, you can disable it, or if you want to only ever > stay on the CPU, you could "pin" it there. As for code regions, only operations on memoryview slices (most notably vector operations) and prange sections would be compiled (and only if possible at all). Maybe normal loops could be compiled as well, but it's best to start with prange only. >>> >>>> Heuristics and >>>> automatic switching between, say, CPU and GPU is great for eg. Sage >>>> users, but maybe not so much if you know exactly what you're doing >>>> with your machine resources. E.g just having a library with thin >>>> cython-adapted wrappers would be awesome. I imagine this can be >>>> augmented by arrays having a knowledge of device-side/client-side >>>> (which would go towards addressing the issue 1. above) >>> >>> >>> Just use PyOpenCL and manipulate kernels as text. Python is excellent for >>> that - Cython is not needed. If you think using Cython instead of Python >>> (PyOpenCL and NumPy) will be important, you don't have a CPU bound problem >>> that warrants the use of OpenCL. >> >> Again, not sure what you mean here. As I mentioned in the thread, >> PyOpenCL worked quite fine, however if Cython is getting OpenCL >> support, I'd much rather use that than keeping a dependency on another >> library. >> >>> Sturla >>> >>> >>> >>> >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Tue Feb 7 19:05:04 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 7 Feb 2012 18:05:04 +0000 Subject: [Cython] OpenCL support In-Reply-To: <4F316651.1040504@molden.no> References:

<4F312C85.7050805@molden.no> <4F316651.1040504@molden.no> Message-ID: On 7 February 2012 17:58, Sturla Molden wrote: > On 07.02.2012 18:22, Dimitri Tcaciuc wrote: > >> I'm not sure I understand you, maybe you could elaborate on that? > > > OpenCL code is a text string that is compiled when the program runs. So it > can be generated from run-time data. Think of it like dynamic HTML. > > >> Again, not sure what you mean here. As I mentioned in the thread, >> PyOpenCL worked quite fine, however if Cython is getting OpenCL >> support, I'd much rather use that than keeping a dependency on another >> library. > > > You can use PyOpenCL or OpenCL C or C++ headers with Cython. The latter you > just use as you would with any other C or C++ library. You don't need to > change the compiler to use a library: It seems like you think OpenCL is > compiled from code when you build the program. It is actually compiled from > text strings when you run the program. It is meaningless to ask if Cython > supports OpenCL because Cython supports any C library. > Sturla, in general we appreciate your input, you usually have useful things to say. But I really don't believe you have read the CEP, so please do, and then comment on what is proposed there if you want. Here is the link: http://wiki.cython.org/enhancements/opencl > Sturla > > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Tue Feb 7 23:21:38 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 7 Feb 2012 22:21:38 +0000 Subject: [Cython] memoryview slices can't be None? In-Reply-To: References: <4F2A7F41.6010303@astro.uio.no> <4F2B0247.8040008@astro.uio.no> Message-ID: On 5 February 2012 22:03, mark florisson wrote: > On 2 February 2012 21:38, Dag Sverre Seljebotn > wrote: >> On 02/02/2012 10:16 PM, mark florisson wrote: >>> >>> On 2 February 2012 12:19, Dag Sverre Seljebotn >>> ?wrote: >>>> >>>> I just realized that >>>> >>>> cdef int[:] a = None >>>> >>>> raises an exception; even though I'd argue that 'a' is of the "reference" >>>> kind of type where Cython usually allow None (i.e., "cdef MyClass b = >>>> None" >>>> is allowed even if type(None) is NoneType). Is this a bug or not, and is >>>> it >>>> possible to do something about it? >>>> >>>> Dag Sverre >>>> _______________________________________________ >>>> cython-devel mailing list >>>> cython-devel at python.org >>>> http://mail.python.org/mailman/listinfo/cython-devel >>> >>> >>> Yeah I disabled that quite early. It was supposed to be working but >>> gave a lot of trouble in cases (segfaults, mainly). At the time I was >>> trying to get rid of all the segfaults and get the basic functionality >>> working, so I disabled it. Personally, I have never liked how things >> >> >> Well, you can segfault quite easily with >> >> cdef MyClass a = None >> print a.field >> >> so it doesn't make sense to slices different from cdef classes IMO. >> >> >>> can be None unchecked. I personally prefer to write >>> >>> cdef foo(obj=None): >>> ? ? cdef int[:] a >>> ? ? if obj is None: >>> ? ? ? ? obj = ... >>> ? ? a = obj >>> >>> Often you forget to write 'not None' when declaring the parameter (and >>> apparently that it only allowed for 'def' functions). >>> >>> As such, I never bothered to re-enable it. However, it does support >>> control flow with uninitialized slices, and will raise an error if it >>> is uninitialized. Do we want this behaviour (e.g. for consistency)? >> >> >> When in doubt, go for consistency. So +1 for that reason. I do believe that >> setting stuff to None is rather vital in Python. > > Yeah I think we should go back to this discussion :) Checking for None > and allowing slices to be None is simply ?very convenient, and doesn't > involve any drastic changes. I was never really against it, I just > never got around to implementing it. We should now be able to use None memoryview slices: https://github.com/markflorisson88/cython/commit/a24495ac1348926af5e085334c4e6a960e723f87 They also coerce back to None when coercing to object. >> What I typically do is more like this: >> >> def f(double[:] input, double[:] out=None): >> ? ?if out is None: >> ? ? ? ?out = np.empty_like(input) >> ? ?... >> >> Having to use another variable name is a bit of a pain. (Come on -- do you >> use "a" in real code? What do you actually call "the other obj"? I sometimes >> end up with "out_" and so on, but it creates smelly code quite quickly.) >> >> It's easy to segfault with cdef classes anyway, so decent nonechecking >> should be implemented at some point, and then memoryviews would use the same >> mechanisms. Java has decent null-checking... >> >> >> Dag Sverre >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel From robertwb at math.washington.edu Wed Feb 8 09:22:34 2012 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 8 Feb 2012 00:22:34 -0800 Subject: [Cython] memoryview slices can't be None? In-Reply-To: <4F2D8971.1000102@astro.uio.no> References: <4F2A7F41.6010303@astro.uio.no> <4F2B0247.8040008@astro.uio.no> <4F2C1EFE.4050904@astro.uio.no> <4F2C242B.9010309@astro.uio.no> <4F2D8971.1000102@astro.uio.no> Message-ID: On Sat, Feb 4, 2012 at 11:39 AM, Dag Sverre Seljebotn wrote: >> >> Block-local declarations are definitely something we want, although I >> think it would require some more (non-trivial) changes to the >> compiler. > > > Note that my proposal was actually not about block-local declarations. > > Block-local: > > { > ? int x = 4; > } > /* x not available here */ > > My idea was much more like hints to control flow analysis. That is, I wanted > to have this raise an error: > > x = 'adf' > if foo(): > ? ?cdef int x = y > print x # type of x not known > > This is OK: > > if foo(): > ? ?cdef int x = y > else: > ? ?cdef int x = 4 > print x # ok, type the same anyway -- so type "escapes" block > > And I would allow > > cdef str x = y > if foo: > ? ?cdef int x = int(x) > ? ?return g(x) # x must be int > print x # x must be str at this point > > > The reason for this madness is simply that control statements do NOT create > blocks in Python, and making it so in Cython is just confusing. It would > bring too much of C into the language for my taste. I think the above examples (especially the last one) are a bit confusing as well. Introducing the notion of (implicit) block scoping is not very Pythonic. We would need something to be able to support local cdef classes, but I think a with statement is more appropriate for that as there's a notion of doing non-trivial work when exiting the block. > I think that in my Cython-utopia, Symtab.py is only responsible for > resolving the scope of *names*, and types of things are not bound to blocks, > just to the state at control flow points. > > Of course, implementing this would be a nightmare. > > >> Maybe the cleanup code from functions, as well as the temp handling >> etc could be re-factored to a BlockNode, that all block nodes could >> subclass. They'd have to instantiate new symbol table environments as >> well. I'm not yet entirely sure what else would be involved in the >> implementation of that. >> >>> But I like int[:] as a way of making it pure Python syntax compatible as >>> well. Perhaps the two are orthogonal -- a) make variable declaration a >>> statement, b) make cython.int[:](x) do, essentially, a cdef declaration, >>> for >>> Python compatability. >>> >> >> Don't we have cython.declare() for that? e.g. >> >> ? ? arr = cython.declare(cython.int[:]) >> >> That would also be treated as a statement like normal declarations (if >> and when implemented). > > > This was what I said, but it wasn't what I meant. Sorry. I'll try to explain > better: > > 1) ?There's no way to have the above actually do the right thing in Python. > With "arr = cython.int[:](arr)" one could actually return a NumPy or > NumPy-like array that works in Python (since "arr" might not have the > "shape" attribute before the conversion, all we know is that it exports the > buffer interface...). > > 2) I don't like the fact that we overload the assignment operator to acquire > a view. "cdef np.ndarray[int] x = y" is fine since if you do "x.someattr" > then a NumPy subclass could provide someattr and it works fine. Acquiring a > view is just something different. > > 3) Hence I guess I like "arr = int[:](arr)" better both for Cython and > Python; at least if "arr" is always type-inferred to be int[:], even if arr > was an "object" further up in the code (really, if you do "x = f(x)" at the > top-level of the function, then x can just take the identity of another > variable from that point on -- I don't know if the current control flow > analysis and type inferences does this though?) > > > Dag Sverre > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From robertwb at math.washington.edu Wed Feb 8 09:53:12 2012 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 8 Feb 2012 00:53:12 -0800 Subject: [Cython] OpenCL support In-Reply-To: <4F316651.1040504@molden.no> References:

<4F312C85.7050805@molden.no> <4F316651.1040504@molden.no> Message-ID: On Tue, Feb 7, 2012 at 9:58 AM, Sturla Molden wrote: > On 07.02.2012 18:22, Dimitri Tcaciuc wrote: > >> I'm not sure I understand you, maybe you could elaborate on that? > > > OpenCL code is a text string that is compiled when the program runs. So it > can be generated from run-time data. Think of it like dynamic HTML. > > >> Again, not sure what you mean here. As I mentioned in the thread, >> PyOpenCL worked quite fine, however if Cython is getting OpenCL >> support, I'd much rather use that than keeping a dependency on another >> library. > > > You can use PyOpenCL or OpenCL C or C++ headers with Cython. The latter you > just use as you would with any other C or C++ library. You don't need to > change the compiler to use a library: It seems like you think OpenCL is > compiled from code when you build the program. It is actually compiled from > text strings when you run the program. It is meaningless to ask if Cython > supports OpenCL because Cython supports any C library. I view this more as a proposal to have an OpenCL backend for prange loops and other vectorized operations. The advantage of integrating OpenCL into Cython is that one can write a single implementation of your algorithm (using traditional for...(p)range loops) and have it use the GPU in the background transparently (without having to manually learn and call the library yourself). This is analogous to the compiler/runtime system deciding to use sse instructions for a portion of your code because it thinks it will be faster. I really like the idea of decoupling the logic of the algorithm from the SIMD implementation (which is one of the reasons that prange, and in part OpenMP, works so well) but I think this is best done at the language level in our case. Whether OpenCL is mature enough/the abstractions are clean enough/the heuristics can be good enough to pull this off is another question, but it'd be great if it can be done (ideally with minimal impact to the language and isolated changes to the internals). - Robert From d.s.seljebotn at astro.uio.no Wed Feb 8 15:46:23 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 08 Feb 2012 15:46:23 +0100 Subject: [Cython] OpenCL support In-Reply-To: References: Message-ID: <4F328ABF.4010107@astro.uio.no> On 02/05/2012 10:57 PM, mark florisson wrote: > Hey, > > I created a CEP for opencl support: http://wiki.cython.org/enhancements/opencl > What do you think? To start with my own conclusion on this, my feel is that it is too little gain, at least for a GPU solution. There's already Theano for trivial SIMD-stuff and PyOpenCL for the getting-hands-dirty stuff. (Of course, this CEP would be more convenient to use than Theano if one is already using Cython.) But that's just my feeling, and I'm not the one potentially signing up to do the work, so whether it is "worth it" is really not my decision, the weighing is done with your weights, not mine. Given an implementation, I definitely support the inclusion in Cython for these kind of features (FWIW). First, CPU: OpenCL is probably a very good way of portably making use of SSE/AVX etc. But to really get a payoff then I would think that the real value would be in *not* using OpenCL vector types, just many threads, so that the OpenCL driver does the dirty work of mapping each thread to each slot in the CPU registers? I'd think the gain in using OpenCL is to emit scalar code and leave the dirty work to OpenCL. If one does the hard part and mapped variables to vectors and memory accesses to shuffles, one might as well go the whole length and emit SSE/AVX rather than OpenCL to avoid the startup overhead. I don't really know how good the Intel and AMD CPU drivers are w.r.t. this -- I have seen the Intel driver emit "vectorizing" and "could not vectorize", but didn't explore the circumstances. Then, on to GPU: It is not a generic-purpose solution, you still need to bring in pyopencl for lots of cases, and so the question is how many cases it fits with and if it is enough to grow a userbase around it. And, importantly, how much performance is sacrificed for the resulting user-friendlyness. 50% performance hit is usually OK, 95% maybe not. And a 95% hit is not unimaginable if the memory movement is done in a bad way for some code? I think the fundamental problem is one of programming paradigms. Fortran, C++, Cython are all sequential in nature; even with OpenMP it is like you have a modest bit of parallelism tacked on to speed up a sequential-looking program. With "massively parallel" solutions such as CUDA and OpenCL, and also MPI in fact, the fundamental assumption that you have thousands or hundreds of thousands of threads. And that just changes how you need to think about writing code, which would tend to show up at a syntax level. So, at least if you want good performance, you need to change your way of thinking enough that a new syntax (loosely cooperating threads rather than parallel-for-loop or SIMD instruction) is actually an advantage, as it keeps you reminded of how the hardware works. So I think the most important thing to do (if you bother) is: Gather a set of real worl(-ish) CUDA or OpenCL programs, port them to Cython + this CEP (without a working Cython implementation for it), and see how that goes. That's really the only way to evaluate it. Some experiences from the single instance GPU code I've written: - For starters I had to give up OpenCL and use CUDA to use all the 48 KB available shared memory on Nvidia compute-capability-2.0 (perhaps I just didn't find the OpenCL option for that). And increasing from 16 to 48 KB allowed a fundamentally faster and qualitatively different algorithm to be used. But OpenCL vs. CUDA is kind of beside the point here.... - When mucking about with various "obvious" ports of sequential code to GPU code, I got performance in the range of 5 to 20 GFLOP/s (out of 490 GFLOP/s or so theoretical; NVidia Tesla M2050). When really understanding the hardware, and making good use of the 48 KB of thread-shared memory, I achieved 209 GFLOP/s, without really doing any microoptimization. I don't think the CEP includes any features for intra-thread communication, so that's off the table. (My code is here: https://github.com/wavemoth/wavemoth/blob/cuda/wavemoth/cuda/legendre_transform.cu.in Though it's badly documented and rush-for-deadline-quality; I plan to polish it up and publish it when I get time in autumn). I guess I mention this as the kind of computation your CEP definitely does NOT cover. That's probably OK, but one should figure out specifically how many usecases it does cover (in particular with no control over thread blocks and intra-block communication). Is the CEP a 80%-solution, or a 10%-solution? Dag Sverre From dtcaciuc at gmail.com Wed Feb 8 18:35:52 2012 From: dtcaciuc at gmail.com (Dimitri Tcaciuc) Date: Wed, 8 Feb 2012 09:35:52 -0800 Subject: [Cython] OpenCL support In-Reply-To: <4F328ABF.4010107@astro.uio.no> References: <4F328ABF.4010107@astro.uio.no> Message-ID: On Wed, Feb 8, 2012 at 6:46 AM, Dag Sverre Seljebotn wrote: > On 02/05/2012 10:57 PM, mark florisson wrote: > > I don't really know how good the Intel and AMD CPU drivers are w.r.t. this > -- I have seen the Intel driver emit "vectorizing" and "could not > vectorize", but didn't explore the circumstances. For our project, we've tried both Intel and AMD (previously ATI) backends. The AMD experience somewhat mirrors what this developer described (http://www.msoos.org/2012/01/amds-opencl-heaven-and-hell/), although not as bad in terms of silent failures (or maybe I just havent caught any!). Intel backend was great and clearly better in terms of performance, sometimes by about 20-30%. However, when ran on older AMD-based machine as opposed to Intel one, the resulting kernel simply segfaulted without any warning about an unsupported architecture (I think its because it didn't have SSE3 support). > > Dag Sverre > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel I know Intel is working with LLVM/Clang folks to introduce their vectorization additions, at least to some degree, and LLVM seems to be consistently improving in this regard (eg http://blog.llvm.org/2011/12/llvm-31-vector-changes.html). I suppose if Cython emitted vectorization-friendly numerical loops, then appropriate C/C++ compiler should take care of this automatically, if used. Intel C++ can already do certain stuff like that (see http://software.intel.com/en-us/articles/a-guide-to-auto-vectorization-with-intel-c-compilers/), and GCC as well AFAIK. Dimitri. From markflorisson88 at gmail.com Wed Feb 8 23:11:43 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Wed, 8 Feb 2012 22:11:43 +0000 Subject: [Cython] OpenCL support In-Reply-To: <4F328ABF.4010107@astro.uio.no> References: <4F328ABF.4010107@astro.uio.no> Message-ID: On 8 February 2012 14:46, Dag Sverre Seljebotn wrote: > On 02/05/2012 10:57 PM, mark florisson wrote: >> >> Hey, >> >> I created a CEP for opencl support: >> http://wiki.cython.org/enhancements/opencl >> What do you think? > > > To start with my own conclusion on this, my feel is that it is too little > gain, at least for a GPU solution. There's already Theano for trivial > SIMD-stuff and PyOpenCL for the getting-hands-dirty stuff. (Of course, this > CEP would be more convenient to use than Theano if one is already using > Cython.) Yes, vector operations and elemental or reduction functions operator on vectors (which is what we can use Theano for, right?) don't quite merit the use of OpenCL. However, the upside is that OpenCL allows easier vectorization and multi-threading. We can appease to auto-vectorizing compilers, but e.g. using OpenMP for multithreading will still segfault the program if used outside the main thread with gcc's implementation. I believe intel allows you to use it in any thread. (Of course, keeping a thread pool around and managing it manually isn't too hard, but...) > But that's just my feeling, and I'm not the one potentially signing up to do > the work, so whether it is "worth it" is really not my decision, the > weighing is done with your weights, not mine. Given an implementation, I > definitely support the inclusion in Cython for these kind of features > (FWIW). > > First, CPU: > > OpenCL is probably a very good way of portably making use of SSE/AVX etc. > But to really get a payoff then I would think that the real value would be > in *not* using OpenCL vector types, just many threads, so that the OpenCL > driver does the dirty work of mapping each thread to each slot in the CPU > registers? I'd think the gain in using OpenCL is to emit scalar code and > leave the dirty work to OpenCL. If one does the hard part and mapped > variables to vectors and memory accesses to shuffles, one might as well go > the whole length and emit SSE/AVX rather than OpenCL to avoid the startup > overhead. > > I don't really know how good the Intel and AMD CPU drivers are w.r.t. this > -- I have seen the Intel driver emit "vectorizing" and "could not > vectorize", but didn't explore the circumstances. > I initially thought the same thing, single kernel invocations should be trivially auto-vectorizable one would think. At least with Apple OpenCL I am getting better performance with vector types though on the CPU (up to 35%). I would personally consider emitting vector data types bonus points. But I don't quite agree that emitting SSE or AVX directly would be almost as easy in that case. You'd still have to detect at runtime which instruction set is supported and generate SSE, SSE2, (SSE4?) and AVX. And that's not even all of them :) The OpenCL drivers just hide that pain. With handwritten code you might be coding for a specific architecture and might be fine with only SSE2, but as a compiler we can't really make that same decision. > Then, on to GPU: > > It is not a generic-purpose solution, you still need to bring in pyopencl > for lots of cases, and so the question is how many cases it fits with and if > it is enough to grow a userbase around it. And, importantly, how much > performance is sacrificed for the resulting user-friendlyness. 50% > performance hit is usually OK, 95% maybe not. And a 95% hit is not > unimaginable if the memory movement is done in a bad way for some code? Yes, I don't expect this to change a lot suddenly. In the long term I think the implementation could be sufficiently good to support at least most codes. And the user still has full control over data movement, if wanted (the pinning thing, which isn't mentioned in the CEP). > I think the fundamental problem is one of programming paradigms. Fortran, > C++, Cython are all sequential in nature; even with OpenMP it is like you > have a modest bit of parallelism tacked on to speed up a sequential-looking > program. With "massively parallel" solutions such as CUDA and OpenCL, and > also MPI in fact, the fundamental assumption that you have thousands or > hundreds of thousands of threads. And that just changes how you need to > think about writing code, which would tend to show up at a syntax level. So, > at least if you want good performance, you need to change your way of > thinking enough that a new syntax (loosely cooperating threads rather than > parallel-for-loop or SIMD instruction) is actually an advantage, as it keeps > you reminded of how the hardware works. > > So I think the most important thing to do (if you bother) is: Gather a set > of real worl(-ish) CUDA or OpenCL programs, port them to Cython + this CEP > (without a working Cython implementation for it), and see how that goes. > That's really the only way to evaluate it. I've been wanting to do that for a long time now, also to evaluate the capabilities of cython.parallel as it stands now. It's a really good idea, I'll try to port some codes, and not just the trivial ones like Jacobi's method :). > Some experiences from the single instance GPU code I've written: > > ?- For starters I had to give up OpenCL and use CUDA to use all the 48 KB > available shared memory on Nvidia compute-capability-2.0 (perhaps I just > didn't find the OpenCL option for that). And increasing from 16 to 48 KB > allowed a fundamentally faster and qualitatively different algorithm to be > used. But OpenCL vs. CUDA is kind of beside the point here.... > > ?- When mucking about with various "obvious" ports of sequential code to GPU > code, I got performance in the range of 5 to 20 GFLOP/s (out of 490 GFLOP/s > or so theoretical; NVidia Tesla M2050). When really understanding the > hardware, and making good use of the 48 KB of thread-shared memory, I > achieved 209 GFLOP/s, without really doing any microoptimization. I don't > think the CEP includes any features for intra-thread communication, so > that's off the table. The CEP doesn't mention barriers (discussed earlier), but they should be supported, and __local memory (that's "shared memory" in CUDA terms right?) could be utilized using a more explicit scheme (or implicitly if the compiler is smart). The only issue with barriers is that with OpenCL you have multiple levels of synchronization, but barriers only work within the work group / thread block, whereas with openmp it works simply for all your threads. I think a global barrier would have to mean kernel termination + start of a new one, which could be hard to support depending on where it is placed in the code... > (My code is here: > > https://github.com/wavemoth/wavemoth/blob/cuda/wavemoth/cuda/legendre_transform.cu.in > > Though it's badly documented and rush-for-deadline-quality; I plan to polish > it up and publish it when I get time in autumn). > > I guess I mention this as the kind of computation your CEP definitely does > NOT cover. That's probably OK, but one should figure out specifically how > many usecases it does cover (in particular with no control over thread > blocks and intra-block communication). Is the CEP a 80%-solution, or a > 10%-solution? I haven't looked too carefully at the code, but a large portion is dedicated to a reduction right? What I don't see is how your reduction spans across multiple work-groups / thread blocks? Because __syncthreads should only sync stuff within a single block. The CEP didn't mention reductions, but they should be supported (I'm thinking multi-stage or sequential within the workgroup (whatever works better), followed by another kernel invocation if the result is needed). As mentioned earlier in a different thread (on parallelism I think), reduction arrays (i.e. memoryviews or C arrays) as well as generally private arrays should be supported. An issue with that is that you can't really dedicate an array to each work item / thread (too much memory would be consumed). Again, declarations within blocks would solve many problems: cdef float[n] shared_by_work_group with parallel(): cdef float[n] local_to_work_group for i in prange(...): cdef float[n] local_to_work_item For arrays, the reductions could be somewhat more explicit, where there is an explicit 'my_memoryview += my_local_scratch_data'. That should probably only be allowed for memory local to the work group. Anyway, I'll try porting some numerical codes to this scheme over the coming weeks and see what is missing and how it can be solved. I still believe it can all be made to work quite properly, without adjusting the language to fit the hardware model. The prange (and OpenMP) model look like sequential code, but they tell the compiler a lot, namely that each iteration is independent and could therefore be scheduled as a separate thread. > Dag Sverre > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Wed Feb 8 23:13:45 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Wed, 8 Feb 2012 22:13:45 +0000 Subject: [Cython] OpenCL support In-Reply-To: References: <4F328ABF.4010107@astro.uio.no> Message-ID: On 8 February 2012 17:35, Dimitri Tcaciuc wrote: > On Wed, Feb 8, 2012 at 6:46 AM, Dag Sverre Seljebotn > wrote: >> On 02/05/2012 10:57 PM, mark florisson wrote: >> >> I don't really know how good the Intel and AMD CPU drivers are w.r.t. this >> -- I have seen the Intel driver emit "vectorizing" and "could not >> vectorize", but didn't explore the circumstances. > > For our project, we've tried both Intel and AMD (previously ATI) > backends. The AMD experience somewhat mirrors what this developer > described (http://www.msoos.org/2012/01/amds-opencl-heaven-and-hell/), > although not as bad in terms of silent failures (or maybe I just > havent caught any!). > > Intel backend was great and clearly better in terms of performance, > sometimes by about 20-30%. However, when ran on older AMD-based > machine as opposed to Intel one, the resulting kernel simply > segfaulted without any warning about an unsupported architecture (I > think its because it didn't have SSE3 support). > >> >> Dag Sverre >> >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > > I know Intel is working with LLVM/Clang folks to introduce their > vectorization additions, at least to some degree, and LLVM seems to be > consistently improving in this regard (eg > http://blog.llvm.org/2011/12/llvm-31-vector-changes.html). I suppose > if Cython emitted vectorization-friendly numerical loops, then > appropriate C/C++ compiler should take care of this automatically, if > used. Intel C++ can already do certain stuff like that (see > http://software.intel.com/en-us/articles/a-guide-to-auto-vectorization-with-intel-c-compilers/), > and GCC as well AFAIK. Indeed, native C (hopefully auto-vectorized whenever possible) is what we also hope to use (depending on heuristics). But what it doesn't give you is multithreading for the CPU (and e.g. the grand central dispatch on OS X). > Dimitri. > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From d.s.seljebotn at astro.uio.no Thu Feb 9 00:15:43 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 09 Feb 2012 00:15:43 +0100 Subject: [Cython] OpenCL support In-Reply-To: References: <4F328ABF.4010107@astro.uio.no> Message-ID: <4F33021F.7010903@astro.uio.no> On 02/08/2012 11:11 PM, mark florisson wrote: > On 8 February 2012 14:46, Dag Sverre Seljebotn > wrote: >> On 02/05/2012 10:57 PM, mark florisson wrote: >>> >>> Hey, >>> >>> I created a CEP for opencl support: >>> http://wiki.cython.org/enhancements/opencl >>> What do you think? >> >> >> To start with my own conclusion on this, my feel is that it is too little >> gain, at least for a GPU solution. There's already Theano for trivial >> SIMD-stuff and PyOpenCL for the getting-hands-dirty stuff. (Of course, this >> CEP would be more convenient to use than Theano if one is already using >> Cython.) > > Yes, vector operations and elemental or reduction functions operator > on vectors (which is what we can use Theano for, right?) don't quite > merit the use of OpenCL. However, the upside is that OpenCL allows > easier vectorization and multi-threading. We can appease to > auto-vectorizing compilers, but e.g. using OpenMP for multithreading > will still segfault the program if used outside the main thread with > gcc's implementation. I believe intel allows you to use it in any > thread. (Of course, keeping a thread pool around and managing it > manually isn't too hard, but...) > >> But that's just my feeling, and I'm not the one potentially signing up to do >> the work, so whether it is "worth it" is really not my decision, the >> weighing is done with your weights, not mine. Given an implementation, I >> definitely support the inclusion in Cython for these kind of features >> (FWIW). >> >> First, CPU: >> >> OpenCL is probably a very good way of portably making use of SSE/AVX etc. >> But to really get a payoff then I would think that the real value would be >> in *not* using OpenCL vector types, just many threads, so that the OpenCL >> driver does the dirty work of mapping each thread to each slot in the CPU >> registers? I'd think the gain in using OpenCL is to emit scalar code and >> leave the dirty work to OpenCL. If one does the hard part and mapped >> variables to vectors and memory accesses to shuffles, one might as well go >> the whole length and emit SSE/AVX rather than OpenCL to avoid the startup >> overhead. >> >> I don't really know how good the Intel and AMD CPU drivers are w.r.t. this >> -- I have seen the Intel driver emit "vectorizing" and "could not >> vectorize", but didn't explore the circumstances. >> > > I initially thought the same thing, single kernel invocations should > be trivially auto-vectorizable one would think. At least with Apple > OpenCL I am getting better performance with vector types though on the > CPU (up to 35%). I would personally consider emitting vector data > types bonus points. > > But I don't quite agree that emitting SSE or AVX directly would be > almost as easy in that case. You'd still have to detect at runtime > which instruction set is supported and generate SSE, SSE2, (SSE4?) and > AVX. And that's not even all of them :) The OpenCL drivers just hide > that pain. With handwritten code you might be coding for a specific > architecture and might be fine with only SSE2, but as a compiler we > can't really make that same decision. You make good points. > >> Then, on to GPU: >> >> It is not a generic-purpose solution, you still need to bring in pyopencl >> for lots of cases, and so the question is how many cases it fits with and if >> it is enough to grow a userbase around it. And, importantly, how much >> performance is sacrificed for the resulting user-friendlyness. 50% >> performance hit is usually OK, 95% maybe not. And a 95% hit is not >> unimaginable if the memory movement is done in a bad way for some code? > > Yes, I don't expect this to change a lot suddenly. In the long term I > think the implementation could be sufficiently good to support at > least most codes. And the user still has full control over data > movement, if wanted (the pinning thing, which isn't mentioned in the > CEP). > >> I think the fundamental problem is one of programming paradigms. Fortran, >> C++, Cython are all sequential in nature; even with OpenMP it is like you >> have a modest bit of parallelism tacked on to speed up a sequential-looking >> program. With "massively parallel" solutions such as CUDA and OpenCL, and >> also MPI in fact, the fundamental assumption that you have thousands or >> hundreds of thousands of threads. And that just changes how you need to >> think about writing code, which would tend to show up at a syntax level. So, >> at least if you want good performance, you need to change your way of >> thinking enough that a new syntax (loosely cooperating threads rather than >> parallel-for-loop or SIMD instruction) is actually an advantage, as it keeps >> you reminded of how the hardware works. >> >> So I think the most important thing to do (if you bother) is: Gather a set >> of real worl(-ish) CUDA or OpenCL programs, port them to Cython + this CEP >> (without a working Cython implementation for it), and see how that goes. >> That's really the only way to evaluate it. > > I've been wanting to do that for a long time now, also to evaluate the > capabilities of cython.parallel as it stands now. It's a really good > idea, I'll try to port some codes, and not just the trivial ones like > Jacobi's method :). > >> Some experiences from the single instance GPU code I've written: >> >> - For starters I had to give up OpenCL and use CUDA to use all the 48 KB >> available shared memory on Nvidia compute-capability-2.0 (perhaps I just >> didn't find the OpenCL option for that). And increasing from 16 to 48 KB >> allowed a fundamentally faster and qualitatively different algorithm to be >> used. But OpenCL vs. CUDA is kind of beside the point here.... >> >> - When mucking about with various "obvious" ports of sequential code to GPU >> code, I got performance in the range of 5 to 20 GFLOP/s (out of 490 GFLOP/s >> or so theoretical; NVidia Tesla M2050). When really understanding the >> hardware, and making good use of the 48 KB of thread-shared memory, I >> achieved 209 GFLOP/s, without really doing any microoptimization. I don't >> think the CEP includes any features for intra-thread communication, so >> that's off the table. > > The CEP doesn't mention barriers (discussed earlier), but they should > be supported, and __local memory (that's "shared memory" in CUDA terms > right?) could be utilized using a more explicit scheme (or implicitly > if the compiler is smart). The only issue with barriers is that with > OpenCL you have multiple levels of synchronization, but barriers only > work within the work group / thread block, whereas with openmp it > works simply for all your threads. I think a global barrier would have > to mean kernel termination + start of a new one, which could be hard > to support depending on where it is placed in the code... > >> (My code is here: >> >> https://github.com/wavemoth/wavemoth/blob/cuda/wavemoth/cuda/legendre_transform.cu.in >> >> Though it's badly documented and rush-for-deadline-quality; I plan to polish >> it up and publish it when I get time in autumn). >> >> I guess I mention this as the kind of computation your CEP definitely does >> NOT cover. That's probably OK, but one should figure out specifically how >> many usecases it does cover (in particular with no control over thread >> blocks and intra-block communication). Is the CEP a 80%-solution, or a >> 10%-solution? > > I haven't looked too carefully at the code, but a large portion is > dedicated to a reduction right? What I don't see is how your reduction > spans across multiple work-groups / thread blocks? Because > __syncthreads should only sync stuff within a single block. The CEP There's no need to reduce across thread blocks because (conveniently enough) there's 8000 independent computations to be performed with different parameters. I simply used one thread block for each problem. It's basically a matrix-vector product where the matrix must be generated on the fly columnwise (one entry can be generated from the preceding two in the same column), but the summation is row-wise. And turns out that getting inter-thread sum-reduction to work well was harder than I expected; a 32-by-32 matrix (needed since warps are 32 threads) is too big to fit in memory, but tree-reduction makes a lot of the threads in a warp do nothing. So I ended up with a hybrid approach; there's visual demo from page 49 onwards here: http://folk.uio.no/dagss/talk-gpusht.pdf Getting back to Cython, I'll admit that this form of inter-thread reduction is quite generic, and that my specific problem could be solved by basically coding a set of inter-thread reduction algorithms suitable for different hardware into Cython. > didn't mention reductions, but they should be supported (I'm thinking > multi-stage or sequential within the workgroup (whatever works > better), followed by another kernel invocation if the result is > needed). Multiple kernel invocations for global barriers appear to be pretty standard, and it's why OpenCL support queueing tasks with dependencies etc. > As mentioned earlier in a different thread (on parallelism I think), > reduction arrays (i.e. memoryviews or C arrays) as well as generally > private arrays should be supported. An issue with that is that you > can't really dedicate an array to each work item / thread (too much > memory would be consumed). > > Again, declarations within blocks would solve many problems: > > cdef float[n] shared_by_work_group > with parallel(): > cdef float[n] local_to_work_group > for i in prange(...): > cdef float[n] local_to_work_item > > For arrays, the reductions could be somewhat more explicit, where > there is an explicit 'my_memoryview += my_local_scratch_data'. That > should probably only be allowed for memory local to the work group. > > Anyway, I'll try porting some numerical codes to this scheme over the > coming weeks and see what is missing and how it can be solved. I still > believe it can all be made to work quite properly, without adjusting > the language to fit the hardware model. The prange (and OpenMP) model > look like sequential code, but they tell the compiler a lot, namely > that each iteration is independent and could therefore be scheduled as > a separate thread. Again, good points. Dag From d.s.seljebotn at astro.uio.no Thu Feb 9 00:28:29 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 09 Feb 2012 00:28:29 +0100 Subject: [Cython] OpenCL support In-Reply-To: <4F33021F.7010903@astro.uio.no> References: <4F328ABF.4010107@astro.uio.no> <4F33021F.7010903@astro.uio.no> Message-ID: <4F33051D.5050309@astro.uio.no> On 02/09/2012 12:15 AM, Dag Sverre Seljebotn wrote: > On 02/08/2012 11:11 PM, mark florisson wrote: >> On 8 February 2012 14:46, Dag Sverre Seljebotn >> wrote: >>> On 02/05/2012 10:57 PM, mark florisson wrote: >>>> >>>> Hey, >>>> >>>> I created a CEP for opencl support: >>>> http://wiki.cython.org/enhancements/opencl >>>> What do you think? >>> >>> >>> To start with my own conclusion on this, my feel is that it is too >>> little >>> gain, at least for a GPU solution. There's already Theano for trivial >>> SIMD-stuff and PyOpenCL for the getting-hands-dirty stuff. (Of >>> course, this >>> CEP would be more convenient to use than Theano if one is already using >>> Cython.) >> >> Yes, vector operations and elemental or reduction functions operator >> on vectors (which is what we can use Theano for, right?) don't quite >> merit the use of OpenCL. However, the upside is that OpenCL allows >> easier vectorization and multi-threading. We can appease to >> auto-vectorizing compilers, but e.g. using OpenMP for multithreading >> will still segfault the program if used outside the main thread with >> gcc's implementation. I believe intel allows you to use it in any >> thread. (Of course, keeping a thread pool around and managing it >> manually isn't too hard, but...) >> >>> But that's just my feeling, and I'm not the one potentially signing >>> up to do >>> the work, so whether it is "worth it" is really not my decision, the >>> weighing is done with your weights, not mine. Given an implementation, I >>> definitely support the inclusion in Cython for these kind of features >>> (FWIW). >>> >>> First, CPU: >>> >>> OpenCL is probably a very good way of portably making use of SSE/AVX >>> etc. >>> But to really get a payoff then I would think that the real value >>> would be >>> in *not* using OpenCL vector types, just many threads, so that the >>> OpenCL >>> driver does the dirty work of mapping each thread to each slot in the >>> CPU >>> registers? I'd think the gain in using OpenCL is to emit scalar code and >>> leave the dirty work to OpenCL. If one does the hard part and mapped >>> variables to vectors and memory accesses to shuffles, one might as >>> well go >>> the whole length and emit SSE/AVX rather than OpenCL to avoid the >>> startup >>> overhead. >>> >>> I don't really know how good the Intel and AMD CPU drivers are w.r.t. >>> this >>> -- I have seen the Intel driver emit "vectorizing" and "could not >>> vectorize", but didn't explore the circumstances. >>> >> >> I initially thought the same thing, single kernel invocations should >> be trivially auto-vectorizable one would think. At least with Apple >> OpenCL I am getting better performance with vector types though on the >> CPU (up to 35%). I would personally consider emitting vector data >> types bonus points. >> >> But I don't quite agree that emitting SSE or AVX directly would be >> almost as easy in that case. You'd still have to detect at runtime >> which instruction set is supported and generate SSE, SSE2, (SSE4?) and >> AVX. And that's not even all of them :) The OpenCL drivers just hide >> that pain. With handwritten code you might be coding for a specific >> architecture and might be fine with only SSE2, but as a compiler we >> can't really make that same decision. > > You make good points. > >> >>> Then, on to GPU: >>> >>> It is not a generic-purpose solution, you still need to bring in >>> pyopencl >>> for lots of cases, and so the question is how many cases it fits with >>> and if >>> it is enough to grow a userbase around it. And, importantly, how much >>> performance is sacrificed for the resulting user-friendlyness. 50% >>> performance hit is usually OK, 95% maybe not. And a 95% hit is not >>> unimaginable if the memory movement is done in a bad way for some code? >> >> Yes, I don't expect this to change a lot suddenly. In the long term I >> think the implementation could be sufficiently good to support at >> least most codes. And the user still has full control over data >> movement, if wanted (the pinning thing, which isn't mentioned in the >> CEP). >> >>> I think the fundamental problem is one of programming paradigms. >>> Fortran, >>> C++, Cython are all sequential in nature; even with OpenMP it is like >>> you >>> have a modest bit of parallelism tacked on to speed up a >>> sequential-looking >>> program. With "massively parallel" solutions such as CUDA and OpenCL, >>> and >>> also MPI in fact, the fundamental assumption that you have thousands or >>> hundreds of thousands of threads. And that just changes how you need to >>> think about writing code, which would tend to show up at a syntax >>> level. So, >>> at least if you want good performance, you need to change your way of >>> thinking enough that a new syntax (loosely cooperating threads rather >>> than >>> parallel-for-loop or SIMD instruction) is actually an advantage, as >>> it keeps >>> you reminded of how the hardware works. >>> >>> So I think the most important thing to do (if you bother) is: Gather >>> a set >>> of real worl(-ish) CUDA or OpenCL programs, port them to Cython + >>> this CEP >>> (without a working Cython implementation for it), and see how that goes. >>> That's really the only way to evaluate it. >> >> I've been wanting to do that for a long time now, also to evaluate the >> capabilities of cython.parallel as it stands now. It's a really good >> idea, I'll try to port some codes, and not just the trivial ones like >> Jacobi's method :). >> >>> Some experiences from the single instance GPU code I've written: >>> >>> - For starters I had to give up OpenCL and use CUDA to use all the 48 KB >>> available shared memory on Nvidia compute-capability-2.0 (perhaps I just >>> didn't find the OpenCL option for that). And increasing from 16 to 48 KB >>> allowed a fundamentally faster and qualitatively different algorithm >>> to be >>> used. But OpenCL vs. CUDA is kind of beside the point here.... >>> >>> - When mucking about with various "obvious" ports of sequential code >>> to GPU >>> code, I got performance in the range of 5 to 20 GFLOP/s (out of 490 >>> GFLOP/s >>> or so theoretical; NVidia Tesla M2050). When really understanding the >>> hardware, and making good use of the 48 KB of thread-shared memory, I >>> achieved 209 GFLOP/s, without really doing any microoptimization. I >>> don't >>> think the CEP includes any features for intra-thread communication, so >>> that's off the table. >> >> The CEP doesn't mention barriers (discussed earlier), but they should >> be supported, and __local memory (that's "shared memory" in CUDA terms >> right?) could be utilized using a more explicit scheme (or implicitly >> if the compiler is smart). The only issue with barriers is that with >> OpenCL you have multiple levels of synchronization, but barriers only >> work within the work group / thread block, whereas with openmp it >> works simply for all your threads. I think a global barrier would have >> to mean kernel termination + start of a new one, which could be hard >> to support depending on where it is placed in the code... >> >>> (My code is here: >>> >>> https://github.com/wavemoth/wavemoth/blob/cuda/wavemoth/cuda/legendre_transform.cu.in >>> >>> >>> Though it's badly documented and rush-for-deadline-quality; I plan to >>> polish >>> it up and publish it when I get time in autumn). >>> >>> I guess I mention this as the kind of computation your CEP definitely >>> does >>> NOT cover. That's probably OK, but one should figure out specifically >>> how >>> many usecases it does cover (in particular with no control over thread >>> blocks and intra-block communication). Is the CEP a 80%-solution, or a >>> 10%-solution? >> >> I haven't looked too carefully at the code, but a large portion is >> dedicated to a reduction right? What I don't see is how your reduction >> spans across multiple work-groups / thread blocks? Because >> __syncthreads should only sync stuff within a single block. The CEP Most of the time there's actually no explicit synchronization, but the code relies on all threads of a warp being on the same instruction in the scheduler. __synchtreads is then only used at the end of the reduction when all within-warp additions have been done. Calling __syncthreads at each step of the algorithm would have totally killed performance. Dag Sverre > > There's no need to reduce across thread blocks because (conveniently > enough) there's 8000 independent computations to be performed with > different parameters. I simply used one thread block for each problem. > > It's basically a matrix-vector product where the matrix must be > generated on the fly columnwise (one entry can be generated from the > preceding two in the same column), but the summation is row-wise. > > And turns out that getting inter-thread sum-reduction to work well was > harder than I expected; a 32-by-32 matrix (needed since warps are 32 > threads) is too big to fit in memory, but tree-reduction makes a lot of > the threads in a warp do nothing. So I ended up with a hybrid approach; > there's visual demo from page 49 onwards here: > > http://folk.uio.no/dagss/talk-gpusht.pdf > > Getting back to Cython, I'll admit that this form of inter-thread > reduction is quite generic, and that my specific problem could be solved > by basically coding a set of inter-thread reduction algorithms suitable > for different hardware into Cython. > >> didn't mention reductions, but they should be supported (I'm thinking >> multi-stage or sequential within the workgroup (whatever works >> better), followed by another kernel invocation if the result is >> needed). > > Multiple kernel invocations for global barriers appear to be pretty > standard, and it's why OpenCL support queueing tasks with dependencies etc. > >> As mentioned earlier in a different thread (on parallelism I think), >> reduction arrays (i.e. memoryviews or C arrays) as well as generally >> private arrays should be supported. An issue with that is that you >> can't really dedicate an array to each work item / thread (too much >> memory would be consumed). >> >> Again, declarations within blocks would solve many problems: >> >> cdef float[n] shared_by_work_group >> with parallel(): >> cdef float[n] local_to_work_group >> for i in prange(...): >> cdef float[n] local_to_work_item >> >> For arrays, the reductions could be somewhat more explicit, where >> there is an explicit 'my_memoryview += my_local_scratch_data'. That >> should probably only be allowed for memory local to the work group. >> >> Anyway, I'll try porting some numerical codes to this scheme over the >> coming weeks and see what is missing and how it can be solved. I still >> believe it can all be made to work quite properly, without adjusting >> the language to fit the hardware model. The prange (and OpenMP) model >> look like sequential code, but they tell the compiler a lot, namely >> that each iteration is independent and could therefore be scheduled as >> a separate thread. > > Again, good points. > > Dag > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Thu Feb 9 13:52:07 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 9 Feb 2012 12:52:07 +0000 Subject: [Cython] OpenCL support In-Reply-To: <4F33051D.5050309@astro.uio.no> References: <4F328ABF.4010107@astro.uio.no> <4F33021F.7010903@astro.uio.no> <4F33051D.5050309@astro.uio.no> Message-ID: On 8 February 2012 23:28, Dag Sverre Seljebotn wrote: > On 02/09/2012 12:15 AM, Dag Sverre Seljebotn wrote: >> >> On 02/08/2012 11:11 PM, mark florisson wrote: >>> >>> On 8 February 2012 14:46, Dag Sverre Seljebotn >>> wrote: >>>> >>>> On 02/05/2012 10:57 PM, mark florisson wrote: >>>>> >>>>> >>>>> Hey, >>>>> >>>>> I created a CEP for opencl support: >>>>> http://wiki.cython.org/enhancements/opencl >>>>> What do you think? >>>> >>>> >>>> >>>> To start with my own conclusion on this, my feel is that it is too >>>> little >>>> gain, at least for a GPU solution. There's already Theano for trivial >>>> SIMD-stuff and PyOpenCL for the getting-hands-dirty stuff. (Of >>>> course, this >>>> CEP would be more convenient to use than Theano if one is already using >>>> Cython.) >>> >>> >>> Yes, vector operations and elemental or reduction functions operator >>> on vectors (which is what we can use Theano for, right?) don't quite >>> merit the use of OpenCL. However, the upside is that OpenCL allows >>> easier vectorization and multi-threading. We can appease to >>> auto-vectorizing compilers, but e.g. using OpenMP for multithreading >>> will still segfault the program if used outside the main thread with >>> gcc's implementation. I believe intel allows you to use it in any >>> thread. (Of course, keeping a thread pool around and managing it >>> manually isn't too hard, but...) >>> >>>> But that's just my feeling, and I'm not the one potentially signing >>>> up to do >>>> the work, so whether it is "worth it" is really not my decision, the >>>> weighing is done with your weights, not mine. Given an implementation, I >>>> definitely support the inclusion in Cython for these kind of features >>>> (FWIW). >>>> >>>> First, CPU: >>>> >>>> OpenCL is probably a very good way of portably making use of SSE/AVX >>>> etc. >>>> But to really get a payoff then I would think that the real value >>>> would be >>>> in *not* using OpenCL vector types, just many threads, so that the >>>> OpenCL >>>> driver does the dirty work of mapping each thread to each slot in the >>>> CPU >>>> registers? I'd think the gain in using OpenCL is to emit scalar code and >>>> leave the dirty work to OpenCL. If one does the hard part and mapped >>>> variables to vectors and memory accesses to shuffles, one might as >>>> well go >>>> the whole length and emit SSE/AVX rather than OpenCL to avoid the >>>> startup >>>> overhead. >>>> >>>> I don't really know how good the Intel and AMD CPU drivers are w.r.t. >>>> this >>>> -- I have seen the Intel driver emit "vectorizing" and "could not >>>> vectorize", but didn't explore the circumstances. >>>> >>> >>> I initially thought the same thing, single kernel invocations should >>> be trivially auto-vectorizable one would think. At least with Apple >>> OpenCL I am getting better performance with vector types though on the >>> CPU (up to 35%). I would personally consider emitting vector data >>> types bonus points. >>> >>> But I don't quite agree that emitting SSE or AVX directly would be >>> almost as easy in that case. You'd still have to detect at runtime >>> which instruction set is supported and generate SSE, SSE2, (SSE4?) and >>> AVX. And that's not even all of them :) The OpenCL drivers just hide >>> that pain. With handwritten code you might be coding for a specific >>> architecture and might be fine with only SSE2, but as a compiler we >>> can't really make that same decision. >> >> >> You make good points. >> >>> >>>> Then, on to GPU: >>>> >>>> It is not a generic-purpose solution, you still need to bring in >>>> pyopencl >>>> for lots of cases, and so the question is how many cases it fits with >>>> and if >>>> it is enough to grow a userbase around it. And, importantly, how much >>>> performance is sacrificed for the resulting user-friendlyness. 50% >>>> performance hit is usually OK, 95% maybe not. And a 95% hit is not >>>> unimaginable if the memory movement is done in a bad way for some code? >>> >>> >>> Yes, I don't expect this to change a lot suddenly. In the long term I >>> think the implementation could be sufficiently good to support at >>> least most codes. And the user still has full control over data >>> movement, if wanted (the pinning thing, which isn't mentioned in the >>> CEP). >>> >>>> I think the fundamental problem is one of programming paradigms. >>>> Fortran, >>>> C++, Cython are all sequential in nature; even with OpenMP it is like >>>> you >>>> have a modest bit of parallelism tacked on to speed up a >>>> sequential-looking >>>> program. With "massively parallel" solutions such as CUDA and OpenCL, >>>> and >>>> also MPI in fact, the fundamental assumption that you have thousands or >>>> hundreds of thousands of threads. And that just changes how you need to >>>> think about writing code, which would tend to show up at a syntax >>>> level. So, >>>> at least if you want good performance, you need to change your way of >>>> thinking enough that a new syntax (loosely cooperating threads rather >>>> than >>>> parallel-for-loop or SIMD instruction) is actually an advantage, as >>>> it keeps >>>> you reminded of how the hardware works. >>>> >>>> So I think the most important thing to do (if you bother) is: Gather >>>> a set >>>> of real worl(-ish) CUDA or OpenCL programs, port them to Cython + >>>> this CEP >>>> (without a working Cython implementation for it), and see how that goes. >>>> That's really the only way to evaluate it. >>> >>> >>> I've been wanting to do that for a long time now, also to evaluate the >>> capabilities of cython.parallel as it stands now. It's a really good >>> idea, I'll try to port some codes, and not just the trivial ones like >>> Jacobi's method :). >>> >>>> Some experiences from the single instance GPU code I've written: >>>> >>>> - For starters I had to give up OpenCL and use CUDA to use all the 48 KB >>>> available shared memory on Nvidia compute-capability-2.0 (perhaps I just >>>> didn't find the OpenCL option for that). And increasing from 16 to 48 KB >>>> allowed a fundamentally faster and qualitatively different algorithm >>>> to be >>>> used. But OpenCL vs. CUDA is kind of beside the point here.... >>>> >>>> - When mucking about with various "obvious" ports of sequential code >>>> to GPU >>>> code, I got performance in the range of 5 to 20 GFLOP/s (out of 490 >>>> GFLOP/s >>>> or so theoretical; NVidia Tesla M2050). When really understanding the >>>> hardware, and making good use of the 48 KB of thread-shared memory, I >>>> achieved 209 GFLOP/s, without really doing any microoptimization. I >>>> don't >>>> think the CEP includes any features for intra-thread communication, so >>>> that's off the table. >>> >>> >>> The CEP doesn't mention barriers (discussed earlier), but they should >>> be supported, and __local memory (that's "shared memory" in CUDA terms >>> right?) could be utilized using a more explicit scheme (or implicitly >>> if the compiler is smart). The only issue with barriers is that with >>> OpenCL you have multiple levels of synchronization, but barriers only >>> work within the work group / thread block, whereas with openmp it >>> works simply for all your threads. I think a global barrier would have >>> to mean kernel termination + start of a new one, which could be hard >>> to support depending on where it is placed in the code... >>> >>>> (My code is here: >>>> >>>> >>>> https://github.com/wavemoth/wavemoth/blob/cuda/wavemoth/cuda/legendre_transform.cu.in >>>> >>>> >>>> Though it's badly documented and rush-for-deadline-quality; I plan to >>>> polish >>>> it up and publish it when I get time in autumn). >>>> >>>> I guess I mention this as the kind of computation your CEP definitely >>>> does >>>> NOT cover. That's probably OK, but one should figure out specifically >>>> how >>>> many usecases it does cover (in particular with no control over thread >>>> blocks and intra-block communication). Is the CEP a 80%-solution, or a >>>> 10%-solution? >>> >>> >>> I haven't looked too carefully at the code, but a large portion is >>> dedicated to a reduction right? What I don't see is how your reduction >>> spans across multiple work-groups / thread blocks? Because >>> __syncthreads should only sync stuff within a single block. The CEP > > > Most of the time there's actually no explicit synchronization, but the code > relies on all threads of a warp being on the same instruction in the > scheduler. __synchtreads is then only used at the end of the reduction when > all within-warp additions have been done. Calling __syncthreads at each step > of the algorithm would have totally killed performance. > > Dag Sverre > Ah, clever. I don't think there's any way to figure out the warp size with OpenCL, but maybe if the user specifies it in some way similar optimizations can be made. >> >> There's no need to reduce across thread blocks because (conveniently >> enough) there's 8000 independent computations to be performed with >> different parameters. I simply used one thread block for each problem. >> >> It's basically a matrix-vector product where the matrix must be >> generated on the fly columnwise (one entry can be generated from the >> preceding two in the same column), but the summation is row-wise. >> >> And turns out that getting inter-thread sum-reduction to work well was >> harder than I expected; a 32-by-32 matrix (needed since warps are 32 >> threads) is too big to fit in memory, but tree-reduction makes a lot of >> the threads in a warp do nothing. So I ended up with a hybrid approach; >> there's visual demo from page 49 onwards here: >> >> http://folk.uio.no/dagss/talk-gpusht.pdf >> >> Getting back to Cython, I'll admit that this form of inter-thread >> reduction is quite generic, and that my specific problem could be solved >> by basically coding a set of inter-thread reduction algorithms suitable >> for different hardware into Cython. >> >>> didn't mention reductions, but they should be supported (I'm thinking >>> multi-stage or sequential within the workgroup (whatever works >>> better), followed by another kernel invocation if the result is >>> needed). >> >> >> Multiple kernel invocations for global barriers appear to be pretty >> standard, and it's why OpenCL support queueing tasks with dependencies >> etc. >> >>> As mentioned earlier in a different thread (on parallelism I think), >>> reduction arrays (i.e. memoryviews or C arrays) as well as generally >>> private arrays should be supported. An issue with that is that you >>> can't really dedicate an array to each work item / thread (too much >>> memory would be consumed). >>> >>> Again, declarations within blocks would solve many problems: >>> >>> cdef float[n] shared_by_work_group >>> with parallel(): >>> cdef float[n] local_to_work_group >>> for i in prange(...): >>> cdef float[n] local_to_work_item >>> >>> For arrays, the reductions could be somewhat more explicit, where >>> there is an explicit 'my_memoryview += my_local_scratch_data'. That >>> should probably only be allowed for memory local to the work group. >>> >>> Anyway, I'll try porting some numerical codes to this scheme over the >>> coming weeks and see what is missing and how it can be solved. I still >>> believe it can all be made to work quite properly, without adjusting >>> the language to fit the hardware model. The prange (and OpenMP) model >>> look like sequential code, but they tell the compiler a lot, namely >>> that each iteration is independent and could therefore be scheduled as >>> a separate thread. >> >> >> Again, good points. >> >> Dag >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From robertwb at math.washington.edu Sat Feb 11 20:52:28 2012 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Sat, 11 Feb 2012 11:52:28 -0800 Subject: [Cython] 0.16 release In-Reply-To: References: <4F1FEEEE.2060605@behnel.de>

<4F2083B0.9020209@creativetrax.com>

Message-ID: All of Sage passes except for one test: sage -t devel/sage/sage/misc/sageinspect.py ********************************************************************** File "/levi/scratch/robertwb/hudson/sage-4.8/devel/sage-main/sage/misc/sageinspect.py", line 970: sage: sage_getargspec(bernstein_polynomial_factory_ratlist.coeffs_bitsize) Expected: ArgSpec(args=['self'], varargs=None, keywords=None, defaults=None) Got: ArgSpec(args=['self'], varargs=None, keywords=None, defaults=()) ********************************************************************** File "/levi/scratch/robertwb/hudson/sage-4.8/devel/sage-main/sage/misc/sageinspect.py", line 973: sage: sage_getargspec(BooleanMonomialMonoid.gen) Expected: ArgSpec(args=['self', 'i'], varargs=None, keywords=None, defaults=(0,)) Got: ArgSpec(args=['self', 'i'], varargs=None, keywords=None, defaults=()) ********************************************************************** 1 items had failures: 2 of 31 in __main__.example_21 ***Test Failed*** 2 failures. Any ideas why this would have changed? On Sun, Feb 5, 2012 at 2:31 AM, Robert Bradshaw wrote: > On Sat, Feb 4, 2012 at 10:49 AM, Vitja Makarov wrote: >> 2012/1/31 Robert Bradshaw : >>> On Sat, Jan 28, 2012 at 8:05 AM, Vitja Makarov wrote: >>>> 2012/1/26 Jason Grout : >>>>> On 1/25/12 11:39 AM, Robert Bradshaw wrote: >>>>>> >>>>>> install >>>>>> >>>>>> https://sage.math.washington.edu:8091/hudson/view/ext-libs/job/sage-build/lastSuccessfulBuild/artifact/cython-devel.spkg >>>>>> by downloading it and running "sage -i cython-devel.spkg" >>>>> >>>>> >>>>> >>>>> In fact, you could just do >>>>> >>>>> sage -i >>>>> https://sage.math.washington.edu:8091/hudson/view/ext-libs/job/sage-build/lastSuccessfulBuild/artifact/cython-devel.spkg >>>>> >>>>> and Sage will (at least, should) download it for you, so that's even one >>>>> less step! >>>>> >>>>> Jason >>>>> >>>> >>>> Thanks for detailed instruction! I've successfully built it. >>>> >>>> "sage -t -gdb ./...." doesn't work, is that a bug? >>>> >>>> vitja at mchome:~/Downloads/sage-4.8$ ./sage ?-t -gdb >>>> devel/sage/sage/combinat/sf/macdonald.py >>>> sage -t -gdb "devel/sage/sage/combinat/sf/macdonald.py" >>>> ******************************************************************************** >>>> Type r at the (gdb) prompt to run the doctests. >>>> Type bt if there is a crash to see a traceback. >>>> ******************************************************************************** >>>> gdb --args python /home/vitja/.sage//tmp/macdonald_6182.py >>>> starting cmd gdb --args python /home/vitja/.sage//tmp/macdonald_6182.py >>>> ImportError: No module named site >>>> ? ? ? ? [0.2 s] >>>> >>>> ---------------------------------------------------------------------- >>>> The following tests failed: >>>> >>>> >>>> ? ? ? ?sage -t -gdb "devel/sage/sage/combinat/sf/macdonald.py"release >>>> Total time for all tests: 0.2 seconds >>> >>> Yes, that's a bug. >>> >>>> I've found another way to run tests (using sage -sh and then direct >>>> python ~/.sage/tmp/...py) >>>> >>>> So I found one of the problems. Here is minimal cython example: >>>> >>>> def foo(values): >>>> ? ?return (0,)*len(values) >>>> foo([1,2,3]) >>>> >>>> len(values) somehow is passed as an integer to PyObject_Multiply() >>> >>> Yeah, that's a bug too :). >> >> I've fixed tuple mult_factor bug here: >> >> https://github.com/cython/cython/commit/2d4b85dbcef885fbdaf6a3b2daef7a017184a56f >> >> No more segfaults in sage-tests but still 7 errors. >> > > Thanks! I've looked into the other errors and I think it boils down to > our use of --disable-function-redefinition being incompatible with how > decorators work. Of course that was just a hack, so I've fixed sage to > build/startup without using that flag, but there's some strangeness > with import order now that I haven't had time to resolve yet. From vitja.makarov at gmail.com Sun Feb 12 07:45:46 2012 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Sun, 12 Feb 2012 10:45:46 +0400 Subject: [Cython] 0.16 release In-Reply-To: References: <4F1FEEEE.2060605@behnel.de>

<4F2083B0.9020209@creativetrax.com>

Message-ID: 2012/2/11 Robert Bradshaw : > All of Sage passes except for one test: > > sage -t ?devel/sage/sage/misc/sageinspect.py > ********************************************************************** > File "/levi/scratch/robertwb/hudson/sage-4.8/devel/sage-main/sage/misc/sageinspect.py", > line 970: > ? ?sage: sage_getargspec(bernstein_polynomial_factory_ratlist.coeffs_bitsize) > Expected: > ? ?ArgSpec(args=['self'], varargs=None, keywords=None, defaults=None) > Got: > ? ?ArgSpec(args=['self'], varargs=None, keywords=None, defaults=()) > ********************************************************************** > File "/levi/scratch/robertwb/hudson/sage-4.8/devel/sage-main/sage/misc/sageinspect.py", > line 973: > ? ?sage: sage_getargspec(BooleanMonomialMonoid.gen) > Expected: > ? ?ArgSpec(args=['self', 'i'], varargs=None, keywords=None, defaults=(0,)) > Got: > ? ?ArgSpec(args=['self', 'i'], varargs=None, keywords=None, defaults=()) > ********************************************************************** > 1 items had failures: > ? 2 of ?31 in __main__.example_21 > ***Test Failed*** 2 failures. > > Any ideas why this would have changed? > CyFunction now provides its own code object. So inspect.getargs() is called instead of inspect.ArgSpec(*_sage_getargspec_cython(sage_getsource(obj))). It seems like func.func_defaults should be implemented. -- vitja. From vitja.makarov at gmail.com Sun Feb 12 15:06:52 2012 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Sun, 12 Feb 2012 18:06:52 +0400 Subject: [Cython] Bug in Cython producing incorrect C code In-Reply-To: References: <1327405070.15017.140661027320813@webmail.messagingengine.com> <4F1FB21A.9080407@behnel.de> <4F21A112.30803@behnel.de> <4F21A92F.6050401@behnel.de> Message-ID: 2012/2/4 Vitja Makarov : > 2012/1/26 mark florisson : >> On 26 January 2012 19:27, Stefan Behnel wrote: >>> mark florisson, 26.01.2012 20:15: >>>> On 26 January 2012 18:53, Stefan Behnel wrote: >>>>> mark florisson, 26.01.2012 16:20: >>>>>> I think this problem can trivially be solved by creating a ProxyNode >>>>>> that should never be replaced by any transform, but it's argument may >>>>>> be replaced. So you wrap self.rhs in a ProxyNode and use that to >>>>>> create your CloneNodes. >>>>> >>>>> I can't see what a ProxyNode would do that a CloneNode shouldn't do anyway. >>>> >>>> It wouldn't be a replacement, merely an addition (an extra indirection). >>> >>> What I was trying to say was that a ProxyNode would always be required by a >>> CloneNode, but I don't see where a ProxyNode would be needed outside of a >>> CloneNode. So it seems rather redundant and I don't know if we need a >>> separate node for it. >> >> Yes it would be needed only for that, but I think the only real >> alternative is to not use CloneNode at all, i.e. make the >> transformation Dag mentioned, where you create new rhs (NameNode?) >> references to the temporary result. >> > > Now it seems to be the only case when we got problem like this. It > means that clones may be safely created at very late stage. > So transforming CascadeAssignment into SingleAssignments doesn't solve > generic problem. > > I tried to implement conditional inlining the same problem may happen > there (ConditionalCallNode owns arguments and replaces > SimpleCallNode's args with clones). Splitting analyse_expressions() > would help. On the other hand moving this optimization after > OptimizeBuiltinCalls() would help too. > I tried to introduce finalize_expressions() here: https://github.com/vitek/cython/tree/_finalize_expressions I moved arg_tuple creation logic from SimpleCallNode's analyse_types() to finalize_expressions() so few tests are broken now. Now inlining is done right before AnalyseExpressions before arg_tuple is created (before pyobject coercion nodes are created). It must be run after expression analysis. So I'm completely sure that analyse_types() must be split. -- vitja. From markflorisson88 at gmail.com Sun Feb 12 15:35:01 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 12 Feb 2012 14:35:01 +0000 Subject: [Cython] Bug in Cython producing incorrect C code In-Reply-To: References: <1327405070.15017.140661027320813@webmail.messagingengine.com> <4F1FB21A.9080407@behnel.de> <4F21A112.30803@behnel.de> <4F21A92F.6050401@behnel.de>

Message-ID: On 12 February 2012 14:06, Vitja Makarov wrote: > 2012/2/4 Vitja Makarov : >> 2012/1/26 mark florisson : >>> On 26 January 2012 19:27, Stefan Behnel wrote: >>>> mark florisson, 26.01.2012 20:15: >>>>> On 26 January 2012 18:53, Stefan Behnel wrote: >>>>>> mark florisson, 26.01.2012 16:20: >>>>>>> I think this problem can trivially be solved by creating a ProxyNode >>>>>>> that should never be replaced by any transform, but it's argument may >>>>>>> be replaced. So you wrap self.rhs in a ProxyNode and use that to >>>>>>> create your CloneNodes. >>>>>> >>>>>> I can't see what a ProxyNode would do that a CloneNode shouldn't do anyway. >>>>> >>>>> It wouldn't be a replacement, merely an addition (an extra indirection). >>>> >>>> What I was trying to say was that a ProxyNode would always be required by a >>>> CloneNode, but I don't see where a ProxyNode would be needed outside of a >>>> CloneNode. So it seems rather redundant and I don't know if we need a >>>> separate node for it. >>> >>> Yes it would be needed only for that, but I think the only real >>> alternative is to not use CloneNode at all, i.e. make the >>> transformation Dag mentioned, where you create new rhs (NameNode?) >>> references to the temporary result. >>> >> >> Now it seems to be the only case when we got problem like this. It >> means that clones may be safely created at very late stage. >> So transforming CascadeAssignment into SingleAssignments doesn't solve >> generic problem. >> >> I tried to implement conditional inlining the same problem may happen >> there (ConditionalCallNode owns arguments and replaces >> SimpleCallNode's args with clones). Splitting analyse_expressions() >> would help. On the other hand moving this optimization after >> OptimizeBuiltinCalls() would help too. >> > > I tried to introduce finalize_expressions() here: > > https://github.com/vitek/cython/tree/_finalize_expressions > > I moved arg_tuple creation logic from SimpleCallNode's analyse_types() > to finalize_expressions() so few tests are broken now. > > Now inlining is done right before AnalyseExpressions before arg_tuple > is created (before pyobject coercion nodes are created). It must be > run after expression analysis. So I'm completely sure that > analyse_types() must be split. Ah, I didn't realize you were working on that, I fixed the cascaded assignment bug and pushed it to master a while ago. Anyway, if this fixes the problem for cascaded assignment, feel free to do a hard reset to remove those commits (probably want to keep the added tests though). > -- > vitja. > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From vitja.makarov at gmail.com Sun Feb 12 16:15:23 2012 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Sun, 12 Feb 2012 19:15:23 +0400 Subject: [Cython] Bug in Cython producing incorrect C code In-Reply-To: References: <1327405070.15017.140661027320813@webmail.messagingengine.com> <4F1FB21A.9080407@behnel.de> <4F21A112.30803@behnel.de> <4F21A92F.6050401@behnel.de>

Message-ID: 2012/2/12 mark florisson : > On 12 February 2012 14:06, Vitja Makarov wrote: >> 2012/2/4 Vitja Makarov : >>> 2012/1/26 mark florisson : >>>> On 26 January 2012 19:27, Stefan Behnel wrote: >>>>> mark florisson, 26.01.2012 20:15: >>>>>> On 26 January 2012 18:53, Stefan Behnel wrote: >>>>>>> mark florisson, 26.01.2012 16:20: >>>>>>>> I think this problem can trivially be solved by creating a ProxyNode >>>>>>>> that should never be replaced by any transform, but it's argument may >>>>>>>> be replaced. So you wrap self.rhs in a ProxyNode and use that to >>>>>>>> create your CloneNodes. >>>>>>> >>>>>>> I can't see what a ProxyNode would do that a CloneNode shouldn't do anyway. >>>>>> >>>>>> It wouldn't be a replacement, merely an addition (an extra indirection). >>>>> >>>>> What I was trying to say was that a ProxyNode would always be required by a >>>>> CloneNode, but I don't see where a ProxyNode would be needed outside of a >>>>> CloneNode. So it seems rather redundant and I don't know if we need a >>>>> separate node for it. >>>> >>>> Yes it would be needed only for that, but I think the only real >>>> alternative is to not use CloneNode at all, i.e. make the >>>> transformation Dag mentioned, where you create new rhs (NameNode?) >>>> references to the temporary result. >>>> >>> >>> Now it seems to be the only case when we got problem like this. It >>> means that clones may be safely created at very late stage. >>> So transforming CascadeAssignment into SingleAssignments doesn't solve >>> generic problem. >>> >>> I tried to implement conditional inlining the same problem may happen >>> there (ConditionalCallNode owns arguments and replaces >>> SimpleCallNode's args with clones). Splitting analyse_expressions() >>> would help. On the other hand moving this optimization after >>> OptimizeBuiltinCalls() would help too. >>> >> >> I tried to introduce finalize_expressions() here: >> >> https://github.com/vitek/cython/tree/_finalize_expressions >> >> I moved arg_tuple creation logic from SimpleCallNode's analyse_types() >> to finalize_expressions() so few tests are broken now. >> >> Now inlining is done right before AnalyseExpressions before arg_tuple >> is created (before pyobject coercion nodes are created). It must be >> run after expression analysis. So I'm completely sure that >> analyse_types() must be split. > > Ah, I didn't realize you were working on that, I fixed the cascaded > assignment bug and pushed it to master a while ago. Anyway, if this > fixes the problem for cascaded assignment, feel free to do a hard > reset to remove those commits (probably want to keep the added tests > though). > Nice, I'm ok with your fix! Btw finalize_expressions() isn't done yet. It's a major change so it's better to wait until next release. -- vitja. From vitja.makarov at gmail.com Sun Feb 12 21:53:35 2012 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Mon, 13 Feb 2012 00:53:35 +0400 Subject: [Cython] 0.16 release In-Reply-To: References: <4F1FEEEE.2060605@behnel.de>

<4F2083B0.9020209@creativetrax.com>

Message-ID: 2012/2/12 Vitja Makarov : > 2012/2/11 Robert Bradshaw : >> All of Sage passes except for one test: >> >> sage -t ?devel/sage/sage/misc/sageinspect.py >> ********************************************************************** >> File "/levi/scratch/robertwb/hudson/sage-4.8/devel/sage-main/sage/misc/sageinspect.py", >> line 970: >> ? ?sage: sage_getargspec(bernstein_polynomial_factory_ratlist.coeffs_bitsize) >> Expected: >> ? ?ArgSpec(args=['self'], varargs=None, keywords=None, defaults=None) >> Got: >> ? ?ArgSpec(args=['self'], varargs=None, keywords=None, defaults=()) >> ********************************************************************** >> File "/levi/scratch/robertwb/hudson/sage-4.8/devel/sage-main/sage/misc/sageinspect.py", >> line 973: >> ? ?sage: sage_getargspec(BooleanMonomialMonoid.gen) >> Expected: >> ? ?ArgSpec(args=['self', 'i'], varargs=None, keywords=None, defaults=(0,)) >> Got: >> ? ?ArgSpec(args=['self', 'i'], varargs=None, keywords=None, defaults=()) >> ********************************************************************** >> 1 items had failures: >> ? 2 of ?31 in __main__.example_21 >> ***Test Failed*** 2 failures. >> >> Any ideas why this would have changed? >> > > CyFunction now provides its own code object. So inspect.getargs() is > called instead of > inspect.ArgSpec(*_sage_getargspec_cython(sage_getsource(obj))). It > seems like func.func_defaults should be implemented. > > I've created a pull request: https://github.com/cython/cython/pull/88 -- vitja. From robertwb at math.washington.edu Tue Feb 14 08:07:11 2012 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Mon, 13 Feb 2012 23:07:11 -0800 Subject: [Cython] 0.16 release In-Reply-To: References: <4F1FEEEE.2060605@behnel.de>