From robertwb at gmail.com Thu Oct 3 05:38:33 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Wed, 2 Oct 2013 20:38:33 -0700 Subject: [Cython] Declaration syntax change Message-ID: In thinking about 1.0, this would be a chance to make backwards-incompatible changes. One thing that has always bothered me is the C-delclarator decoration for specifying complex types. Specifically, cdef int *a, b, c, *d[3] is IMHO quite ugly but also adds a lot of complexity to the parser. What if instead we required cdef int* a cdef int b, c cdef int[3]* d In this case our grammar could be almost identical to the Python grammar, with the addition of a "type declaration" that could occur in certain situations plus a new type token. Does anyone see any issues with this? We could have a period where we warned if declarators were used, or simply disallow them outright before changing the meaning. It'd be easy to create a tool one could run on your sources that would "expand" such declarations as well. - Robert From gu.haijie at gmail.com Wed Oct 2 18:36:01 2013 From: gu.haijie at gmail.com (Haijie Gu) Date: Wed, 2 Oct 2013 09:36:01 -0700 Subject: [Cython] Memory leak when using Typed Memory View and np array of objects Message-ID: Hi, I'm new to cython's typed memory view, and found some cases where the function that uses typed memory view has memory leaking. The leak happens when I pass a numpy array of objects, where each object itself is a numpy array.( You can get this 'weird' object from construction a pandas Series with a list of numpy arrays). Please see the following code snippet or use the attached code to reproduce the case. I appreciate any help and suggestions in advance! (I also posted on the cython-users google group. Apologize for the redundancy.) # BEGIN CONTENT OF test.pyx # this does not leak cpdef int do_nothing(arr): return 0 # this does leak cpdef int do_nothing_typed(double[:] arr): return 0 # this does leak cpdef int do_nothing_but_copy(arr): cdef double[:] _arr = arr return 0 # END CONTENT OF test.pyx # BEGIN CONTENT OF runtest.py # ... omit all the imports here def gc_obj_hist(): """ Returns a sorted map from type to the counts of in memory objects with the type """ hst = defaultdict(lambda: 0) for v in gc.get_objects(): hst[type(v)] += 1 l = sorted(hst.iteritems(), key=operator.itemgetter(1), reverse=True) return l # NOT LEAK def test1(n=10000): s = pd.Series([np.random.randn(10) for i in range(n)]) for i in range(n): do_nothing(s[i]) print "Top 5 object types after test 1: " + str(gc_obj_hist()[:5]) # LEAK def test2(n=10000): s = pd.Series([np.random.randn(10) for i in range(n)]) for i in range(n): do_nothing_typed(s[i]) print "Top 5 object types after test 2: " + str(gc_obj_hist()[:5]) # LEAK def test3(n=10000): s = pd.Series([np.random.randn(10) for i in range(n)]) for i in range(n): do_nothing_but_copy(s[i]) print "Top 5 object types after test 3: " + str(gc_obj_hist()[:5]) # NOT LEAK def test4(n=10000): s = pd.Series([np.random.randn(10) for i in range(n)]) for i in range(n): do_nothing_but_copy(np.array(s[i])) print "Top 5 object types after test 4: " + str(gc_obj_hist()[:5]) if __name__ == "__main__": n = 100000 test1(n) test2(n) test3(n) test4(n) # END CONTENT OF runtest.py Thanks, -jay -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: leaktest.tar.gz Type: application/x-gzip Size: 893 bytes Desc: not available URL: From stefan_ml at behnel.de Thu Oct 3 13:34:21 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 03 Oct 2013 13:34:21 +0200 Subject: [Cython] Declaration syntax change In-Reply-To: References: Message-ID: <524D563D.1020501@behnel.de> Robert Bradshaw, 03.10.2013 05:38: > In thinking about 1.0, this would be a chance to make > backwards-incompatible changes. One thing that has always bothered me > is the C-delclarator decoration for specifying complex types. > Specifically, > > cdef int *a, b, c, *d[3] > > is IMHO quite ugly but also adds a lot of complexity to the parser. > What if instead we required > > cdef int* a > cdef int b, c > cdef int[3]* d I think we've tossed this around long enough to consider it set. > In this case our grammar could be almost identical to the Python > grammar, with the addition of a "type declaration" that could occur in > certain situations plus a new type token. I don't expect the parser/grammar simplifications to be all that large. There are still cdef functions, C function pointers and typed function argument lists, for example, which will stay as they are and involve quite a bit of parsing complexity all by themselves. Only type declaration statements would benefit. I guess we should give it a try in a branch to see what the code gain is. There may still be a couple of nasty little details. > We could have a period where we warned if declarators were used, or > simply disallow them outright before changing the meaning. If that means that people get urged into manually changing their code to have one variable declaration per line in order to be future proof, then that's asking a bit too much, I think. Better do a clean cut and provide a tool that migrates the code to the next Cython major release. We've always encouraged people not to depend on Cython in released sources, so the situation is very different from the Py2/3 change where people necessarily needed to support older versions. > It'd be easy to create a tool one could run on your sources that would > "expand" such declarations as well. Yes, we should definitely provide something here. I'm pretty sure this change is going to require modifications to almost all Cython code out there. (Then again, if we actually break all that code anyway, maybe there are other things to change as well? I'm not aware of any...) Stefan From greg.ewing at canterbury.ac.nz Thu Oct 3 14:10:24 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 04 Oct 2013 01:10:24 +1300 Subject: [Cython] Declaration syntax change In-Reply-To: References: Message-ID: <524D5EB0.2070502@canterbury.ac.nz> Robert Bradshaw wrote: > cdef int *a, b, c, *d[3] > > is IMHO quite ugly but also adds a lot of complexity to the parser. > What if instead we required > > cdef int* a > cdef int b, c > cdef int[3]* d What would be the benefit of this? You're proposing to change from something identical to C declaration syntax, which is second nature for a great many people, to something that looks deceptively like C syntax but isn't. I can't see that causing anything other than a massive amount of confusion, anguish and hair-tearing. -- Greg From stefan_ml at behnel.de Thu Oct 3 14:23:25 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 03 Oct 2013 14:23:25 +0200 Subject: [Cython] Declaration syntax change In-Reply-To: <524D5EB0.2070502@canterbury.ac.nz> References: <524D5EB0.2070502@canterbury.ac.nz> Message-ID: <524D61BD.30000@behnel.de> Greg Ewing, 03.10.2013 14:10: > Robert Bradshaw wrote: >> cdef int *a, b, c, *d[3] >> >> is IMHO quite ugly but also adds a lot of complexity to the parser. >> What if instead we required >> >> cdef int* a >> cdef int b, c >> cdef int[3]* d The last line looks ambiguous, BTW, hadn't even noticed it before. Is that an array of int pointers or a pointer to an array (pointer)? We should make sure the way this is declared is really obvious and not unexpected to C users. > What would be the benefit of this? You're proposing to change > from something identical to C declaration syntax, which is > second nature for a great many people, to something that > looks deceptively like C syntax but isn't. The reasoning is that the C syntax is error prone and less readable than it could be, because you have to spot stars in the right places of a potentially long list of variable names to know if something is a value or a pointer. If there was only one type declaration, right after the cdef, it would be much clearer. It would just say: "this is a list of variables declared as int*", not mixing any further types into it. Also, C is only second nature to some people. A great many people actually use Cython specifically to *avoid* having to write C. Stefan From njs at pobox.com Thu Oct 3 14:35:11 2013 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 3 Oct 2013 13:35:11 +0100 Subject: [Cython] Declaration syntax change In-Reply-To: <524D61BD.30000@behnel.de> References: <524D5EB0.2070502@canterbury.ac.nz> <524D61BD.30000@behnel.de> Message-ID: On Thu, Oct 3, 2013 at 1:23 PM, Stefan Behnel wrote: > Greg Ewing, 03.10.2013 14:10: >> Robert Bradshaw wrote: >>> cdef int *a, b, c, *d[3] >>> >>> is IMHO quite ugly but also adds a lot of complexity to the parser. >>> What if instead we required >>> >>> cdef int* a >>> cdef int b, c >>> cdef int[3]* d > > The last line looks ambiguous, BTW, hadn't even noticed it before. Is that > an array of int pointers or a pointer to an array (pointer)? We should make > sure the way this is declared is really obvious and not unexpected to C users. > > >> What would be the benefit of this? You're proposing to change >> from something identical to C declaration syntax, which is >> second nature for a great many people, to something that >> looks deceptively like C syntax but isn't. > > The reasoning is that the C syntax is error prone and less readable than it > could be, because you have to spot stars in the right places of a > potentially long list of variable names to know if something is a value or > a pointer. If there was only one type declaration, right after the cdef, it > would be much clearer. It would just say: "this is a list of variables > declared as int*", not mixing any further types into it. > > Also, C is only second nature to some people. A great many people actually > use Cython specifically to *avoid* having to write C. The two halves of this email seem to sort of contradict each other, don't you think? At least the C syntax has the advantage that it's well-defined and many people *do* know it (and if they don't then there are bazillions of references around, plus you can just copy it out of header files if you're wrapping a C library), whereas as noted above, in fact there are *no* people who know how to look at int[3]* and be confident about what it means, even you...? (I'm not against improving on C in general, but I'm far from convinced that there's any syntax for encoding C types that's sufficiently better than what C does to be worth the switching costs.) If what really bothers you is having objects of different types declared within the same statement then you could just litigate *that* out of existence directly... not convinced this would be worthwhile (though I do tend to use that style myself already), but it seems more viable than trying to reinvent C's type syntax. -n From stefan_ml at behnel.de Thu Oct 3 16:00:23 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 03 Oct 2013 16:00:23 +0200 Subject: [Cython] Declaration syntax change In-Reply-To: References: <524D5EB0.2070502@canterbury.ac.nz> <524D61BD.30000@behnel.de> Message-ID: <524D7877.4070205@behnel.de> Nathaniel Smith, 03.10.2013 14:35: > On Thu, Oct 3, 2013 at 1:23 PM, Stefan Behnel wrote: >> Greg Ewing, 03.10.2013 14:10: >>> Robert Bradshaw wrote: >>>> cdef int *a, b, c, *d[3] >>>> >>>> is IMHO quite ugly but also adds a lot of complexity to the parser. >>>> What if instead we required >>>> >>>> cdef int* a >>>> cdef int b, c >>>> cdef int[3]* d >> >> The last line looks ambiguous, BTW, hadn't even noticed it before. Is that >> an array of int pointers or a pointer to an array (pointer)? We should make >> sure the way this is declared is really obvious and not unexpected to C users. > [...] > The two halves of this email seem to sort of contradict each other, > don't you think? At least the C syntax has the advantage that it's > well-defined and many people *do* know it (and if they don't then > there are bazillions of references around, plus you can just copy it > out of header files if you're wrapping a C library), whereas as noted > above, in fact there are *no* people who know how to look at int[3]* > and be confident about what it means, even you...? Well, it's still better than looking at "*d[3]", now, isn't it? Maybe I'm just confused (by both, actually) because I'm not really breathing C. If the following is what it's supposed to mean, then I find it quite straight forward, and more obvious than the C spelling: cdef int* a # pointer to int cdef int[3]* b # pointer to 3-item int array cdef int*[3] c # 3-item array of pointers to int I agree that the argument of "copying it out of a header file" has a certain value, although pointer declarations are not exactly the most common thing to find in header files. We shouldn't make it *harder* to copy things from header files, though, because it's already a drawback of Cython that you have to do that at all. Stefan From njs at pobox.com Thu Oct 3 16:13:38 2013 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 3 Oct 2013 15:13:38 +0100 Subject: [Cython] Declaration syntax change In-Reply-To: <524D7877.4070205@behnel.de> References: <524D5EB0.2070502@canterbury.ac.nz> <524D61BD.30000@behnel.de> <524D7877.4070205@behnel.de> Message-ID: On Thu, Oct 3, 2013 at 3:00 PM, Stefan Behnel wrote: > Nathaniel Smith, 03.10.2013 14:35: >> On Thu, Oct 3, 2013 at 1:23 PM, Stefan Behnel wrote: >>> Greg Ewing, 03.10.2013 14:10: >>>> Robert Bradshaw wrote: >>>>> cdef int *a, b, c, *d[3] >>>>> >>>>> is IMHO quite ugly but also adds a lot of complexity to the parser. >>>>> What if instead we required >>>>> >>>>> cdef int* a >>>>> cdef int b, c >>>>> cdef int[3]* d >>> >>> The last line looks ambiguous, BTW, hadn't even noticed it before. Is that >>> an array of int pointers or a pointer to an array (pointer)? We should make >>> sure the way this is declared is really obvious and not unexpected to C users. >> [...] >> The two halves of this email seem to sort of contradict each other, >> don't you think? At least the C syntax has the advantage that it's >> well-defined and many people *do* know it (and if they don't then >> there are bazillions of references around, plus you can just copy it >> out of header files if you're wrapping a C library), whereas as noted >> above, in fact there are *no* people who know how to look at int[3]* >> and be confident about what it means, even you...? > > Well, it's still better than looking at "*d[3]", now, isn't it? Maybe I'm > just confused (by both, actually) because I'm not really breathing C. Yeah, personally in either case I'd have to look it up (and it's simply impossible that you're going to make it as easy to lookup this funky Cython-specific syntax as it is to look up standard C syntax). But also, the reason I don't know the C version already is that I've probably never seen such a declaration in real life, which makes it hard to see why this is a really pressing problem. I don't really come to Cython because I want idiosyncratic tweaks to things C already does perfectly well, you know? I come to Cython because I want a nice way to get Python and C to talk to each other, so sticking to familiar Python and C things is rather nice... -n From robertwb at gmail.com Thu Oct 3 18:03:48 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Thu, 3 Oct 2013 09:03:48 -0700 Subject: [Cython] Declaration syntax change In-Reply-To: References: <524D5EB0.2070502@canterbury.ac.nz> <524D61BD.30000@behnel.de> <524D7877.4070205@behnel.de> Message-ID: On Thu, Oct 3, 2013 at 7:13 AM, Nathaniel Smith wrote: > On Thu, Oct 3, 2013 at 3:00 PM, Stefan Behnel wrote: >> Nathaniel Smith, 03.10.2013 14:35: >>> On Thu, Oct 3, 2013 at 1:23 PM, Stefan Behnel wrote: >>>> Greg Ewing, 03.10.2013 14:10: >>>>> Robert Bradshaw wrote: >>>>>> cdef int *a, b, c, *d[3] >>>>>> >>>>>> is IMHO quite ugly but also adds a lot of complexity to the parser. >>>>>> What if instead we required >>>>>> >>>>>> cdef int* a >>>>>> cdef int b, c >>>>>> cdef int[3]* d >>>> >>>> The last line looks ambiguous, BTW, hadn't even noticed it before. Is that >>>> an array of int pointers or a pointer to an array (pointer)? We should make >>>> sure the way this is declared is really obvious and not unexpected to C users. >>> [...] >>> The two halves of this email seem to sort of contradict each other, >>> don't you think? At least the C syntax has the advantage that it's >>> well-defined and many people *do* know it (and if they don't then >>> there are bazillions of references around, plus you can just copy it >>> out of header files if you're wrapping a C library), whereas as noted >>> above, in fact there are *no* people who know how to look at int[3]* >>> and be confident about what it means, even you...? >> >> Well, it's still better than looking at "*d[3]", now, isn't it? Maybe I'm >> just confused (by both, actually) because I'm not really breathing C. > > Yeah, personally in either case I'd have to look it up (and it's > simply impossible that you're going to make it as easy to lookup this > funky Cython-specific syntax as it is to look up standard C syntax). > But also, the reason I don't know the C version already is that I've > probably never seen such a declaration in real life, which makes it > hard to see why this is a really pressing problem. Cause or effect :). > I don't really come > to Cython because I want idiosyncratic tweaks to things C already does > perfectly well, you know? I wouldn't classify this as something that "C already does perfectly well." The fact that people commonly write int* ptr; rather than int *ptr; means that it's parsed differently in people's heads than the grammar, and though it's hard to miss given the context of this thread I've seen people gloss right over things like char* a, b; a = malloc(n); b = malloc(n); strcpy(b, a); which, yes, is perfectly valid C (though any sane compiler will throw out a warning). It should also be noted that an increasing percentage of Cython users don't know C at all. The rule would be very simple--type decorations would be left associative, so an int*[5]** would be a pointer to a pointer to an array of pointers to ints. Now, you're right that this doesn't come up often, which is why it'll be easy to change, but it does complicate the compiler (and hypothetical grammar). Ideally people shouldn't be copying C headers in the long run, they should be parsed automatically or by wrappers like xdress. This would affect all variable declarations. E.g. cdef int *foo(int* a[3], int (*b)[3]): ... would become cdef int* foo(int*[3] a, int[3]* b): ... or cdef int* foo(a : int*[3], b : int[3]*): ... or even cdef foo(a : int*[3], b : int[3]*) -> int*: ... (those last two being rather hypothetical) and ctypedef double point[3] would become ctypedef double[3] point Function pointers are a bit harder, but cdef double (*f)(int, long) could become cdef double (*)(int, long) f unless we wanted to introduce something totally new like cdef (int, long) -> double f following Python return value annotation syntax (which is new, but quite transparent in meaning). - Robert From cb at mit.edu Thu Oct 3 18:30:54 2013 From: cb at mit.edu (Chuck Blake) Date: Thu, 3 Oct 2013 12:30:54 -0400 Subject: [Cython] Declaration syntax change In-Reply-To: References: Message-ID: <20131003163054.GA67599@pdos.lcs.mit.edu> Greg Ewing wrote: >What would be the benefit of this? You're proposing to change >from something identical to C declaration syntax, which is >second nature for a great many people, to something that >looks deceptively like C syntax but isn't. > >I can't see that causing anything other than a massive >amount of confusion, anguish and hair-tearing. I would echo this, actually. While the C declarator syntax takes a while to get used to, there is actually a simple rationale for why it is the way it is: just one set of operator associativity and precedences. The way [], *, and function call ()s bind do not vary by context. One doesn't have to remember multiple sets of rules "tailored" for various contexts like declaration/typedef definition or casting vs operator application. I believe this is the "fact underlying" the reactions various folk here are having of "initially seemingly clearer, but oh wait..what about corner case?" There is a simplicity embedded in the seemingly backwards "operators you need to apply to get the base type" rule. The manual translation of C headers/decls to Cython is another good point. I think that it's a big departure from the perhaps hard to articulate "sweet spot" Cython/Pyrex are trying to hit straddling the C & Python worlds. Stefan Behnel wrote: >Also, C is only second nature to some people. A great many people actually >use Cython specifically to *avoid* having to write C. This is a fair point, but Pyrex/Cython have a decade plus long history here of being very close to C in this specific way, which is kind of a central thing..almost like a different way to spell the same type grammar. It's easy after a short while to visually see the translation. With whole new syntax and maybe parens in new, shiner places, I doubt seeing the mapping would be so easy for "hard cases". And easy cases are easy. On a separate relase-number-tying note, I also feel like the Cython 1.0 target (or even Pyrex 1.0) was always more about compatibility with "whatever Python you throw at the Cython compiler" than about finality to all the various extended Cython syntax areas. There are quite a few beyond-Py syntax areas at this point. So, *if* the conclusion is that there is a group will and solid rationale to let Cython allow a user to use an *alternate* type declaration sub-language/warnings/a wrapper/macro syntax, then I don't see any 1.0-release relevance. Cython grows just yet another way to specify the same things at whatever rev is easy. The 1.0 only matters if it's a strict backward incompatible proposal, but I sense a bit of discord on this response chain about that. Maybe changing "cdef" to be "cydef" would be a visual flag that an entirely different syntax is afoot? But then..sooo many ways to specify types. If you really want to be *that* different - arguably almost as much as different as simply requiring Python3 and annotations rather than having "cdef" at all, then why not go all the way and full on require Py3? Then a user confronts all the Py3 core syntax change and Cy2.0 (or maybe Cy1.5 as 3.0/2 or whatever) at the same time. And, why, the changes are even related to only-in-Py3 things to boot. Just my two cents, anyway. cb From njs at pobox.com Thu Oct 3 19:21:04 2013 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 3 Oct 2013 18:21:04 +0100 Subject: [Cython] Declaration syntax change In-Reply-To: References: <524D5EB0.2070502@canterbury.ac.nz> <524D61BD.30000@behnel.de> <524D7877.4070205@behnel.de> Message-ID: On Thu, Oct 3, 2013 at 5:03 PM, Robert Bradshaw wrote: > On Thu, Oct 3, 2013 at 7:13 AM, Nathaniel Smith wrote: >> On Thu, Oct 3, 2013 at 3:00 PM, Stefan Behnel wrote: >>> Nathaniel Smith, 03.10.2013 14:35: >>>> On Thu, Oct 3, 2013 at 1:23 PM, Stefan Behnel wrote: >>>>> Greg Ewing, 03.10.2013 14:10: >>>>>> Robert Bradshaw wrote: >>>>>>> cdef int *a, b, c, *d[3] >>>>>>> >>>>>>> is IMHO quite ugly but also adds a lot of complexity to the parser. >>>>>>> What if instead we required >>>>>>> >>>>>>> cdef int* a >>>>>>> cdef int b, c >>>>>>> cdef int[3]* d >>>>> >>>>> The last line looks ambiguous, BTW, hadn't even noticed it before. Is that >>>>> an array of int pointers or a pointer to an array (pointer)? We should make >>>>> sure the way this is declared is really obvious and not unexpected to C users. >>>> [...] >>>> The two halves of this email seem to sort of contradict each other, >>>> don't you think? At least the C syntax has the advantage that it's >>>> well-defined and many people *do* know it (and if they don't then >>>> there are bazillions of references around, plus you can just copy it >>>> out of header files if you're wrapping a C library), whereas as noted >>>> above, in fact there are *no* people who know how to look at int[3]* >>>> and be confident about what it means, even you...? >>> >>> Well, it's still better than looking at "*d[3]", now, isn't it? Maybe I'm >>> just confused (by both, actually) because I'm not really breathing C. >> >> Yeah, personally in either case I'd have to look it up (and it's >> simply impossible that you're going to make it as easy to lookup this >> funky Cython-specific syntax as it is to look up standard C syntax). >> But also, the reason I don't know the C version already is that I've >> probably never seen such a declaration in real life, which makes it >> hard to see why this is a really pressing problem. > > Cause or effect :). > >> I don't really come >> to Cython because I want idiosyncratic tweaks to things C already does >> perfectly well, you know? > > I wouldn't classify this as something that "C already does perfectly > well." The fact that people commonly write > > int* ptr; > > rather than > > int *ptr; > > means that it's parsed differently in people's heads than the grammar, > and though it's hard to miss given the context of this thread I've > seen people gloss right over things like > > char* a, b; > a = malloc(n); > b = malloc(n); > strcpy(b, a); > > which, yes, is perfectly valid C (though any sane compiler will throw > out a warning). It should also be noted that an increasing percentage > of Cython users don't know C at all. So, like I said upthread, if this is the problem you're really trying to solve, just tackle it directly by making 'char *a, b' an error of some kind. > The rule would be very simple--type decorations would be left > associative, so an int*[5]** would be a pointer to a pointer to an > array of pointers to ints. That's a simple rule, but not necessarily a memorable one -- it's exactly the opposite of how both English and C work. English: you'd say "array of pointers" but write pointer-array *[]. C: *const * is a const-pointer-to-pointer, but here it would be a pointer-to-const-pointer. In real life of course I'd parenthesize an expression like this to make it unambiguous even for readers who don't remember the precedence/associativity rules, but I guess in this notation you can't parenthesize things, readers will have to memorize/look-up the associativity regardless. > Now, you're right that this doesn't come up > often, which is why it'll be easy to change, but it does complicate > the compiler (and hypothetical grammar). Ideally people shouldn't be > copying C headers in the long run, they should be parsed automatically > or by wrappers like xdress. But we don't live in the long run. Maybe that will happen eventually, maybe it won't -- it's hard to make predictions, especially about the future... -n From robertwb at gmail.com Thu Oct 3 20:32:43 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Thu, 3 Oct 2013 11:32:43 -0700 Subject: [Cython] Declaration syntax change In-Reply-To: References: <524D5EB0.2070502@canterbury.ac.nz> <524D61BD.30000@behnel.de> <524D7877.4070205@behnel.de> Message-ID: On Thu, Oct 3, 2013 at 10:21 AM, Nathaniel Smith wrote: > On Thu, Oct 3, 2013 at 5:03 PM, Robert Bradshaw wrote: >> On Thu, Oct 3, 2013 at 7:13 AM, Nathaniel Smith wrote: >>> On Thu, Oct 3, 2013 at 3:00 PM, Stefan Behnel wrote: >>>> Nathaniel Smith, 03.10.2013 14:35: >>>>> On Thu, Oct 3, 2013 at 1:23 PM, Stefan Behnel wrote: >>>>>> Greg Ewing, 03.10.2013 14:10: >>>>>>> Robert Bradshaw wrote: >>>>>>>> cdef int *a, b, c, *d[3] >>>>>>>> >>>>>>>> is IMHO quite ugly but also adds a lot of complexity to the parser. >>>>>>>> What if instead we required >>>>>>>> >>>>>>>> cdef int* a >>>>>>>> cdef int b, c >>>>>>>> cdef int[3]* d >>>>>> >>>>>> The last line looks ambiguous, BTW, hadn't even noticed it before. Is that >>>>>> an array of int pointers or a pointer to an array (pointer)? We should make >>>>>> sure the way this is declared is really obvious and not unexpected to C users. >>>>> [...] >>>>> The two halves of this email seem to sort of contradict each other, >>>>> don't you think? At least the C syntax has the advantage that it's >>>>> well-defined and many people *do* know it (and if they don't then >>>>> there are bazillions of references around, plus you can just copy it >>>>> out of header files if you're wrapping a C library), whereas as noted >>>>> above, in fact there are *no* people who know how to look at int[3]* >>>>> and be confident about what it means, even you...? >>>> >>>> Well, it's still better than looking at "*d[3]", now, isn't it? Maybe I'm >>>> just confused (by both, actually) because I'm not really breathing C. >>> >>> Yeah, personally in either case I'd have to look it up (and it's >>> simply impossible that you're going to make it as easy to lookup this >>> funky Cython-specific syntax as it is to look up standard C syntax). >>> But also, the reason I don't know the C version already is that I've >>> probably never seen such a declaration in real life, which makes it >>> hard to see why this is a really pressing problem. >> >> Cause or effect :). >> >>> I don't really come >>> to Cython because I want idiosyncratic tweaks to things C already does >>> perfectly well, you know? >> >> I wouldn't classify this as something that "C already does perfectly >> well." The fact that people commonly write >> >> int* ptr; >> >> rather than >> >> int *ptr; >> >> means that it's parsed differently in people's heads than the grammar, >> and though it's hard to miss given the context of this thread I've >> seen people gloss right over things like >> >> char* a, b; >> a = malloc(n); >> b = malloc(n); >> strcpy(b, a); >> >> which, yes, is perfectly valid C (though any sane compiler will throw >> out a warning). It should also be noted that an increasing percentage >> of Cython users don't know C at all. > > So, like I said upthread, if this is the problem you're really trying > to solve, just tackle it directly by making 'char *a, b' an error of > some kind. > >> The rule would be very simple--type decorations would be left >> associative, so an int*[5]** would be a pointer to a pointer to an >> array of pointers to ints. > > That's a simple rule, but not necessarily a memorable one -- it's > exactly the opposite of how both English and C work. English: you'd > say "array of pointers" but write pointer-array *[]. C: *const * is a > const-pointer-to-pointer, but here it would be a > pointer-to-const-pointer. Actually, this is like English. "Int pointer" or "int array pointer" where the preceding words describe the kind of pointer you have. (it can be inverted of course, key board = board of keys, stack pointer = pointer to the stack, etc.) Const is a pain, but I'd say it binds tighter than anything else and cannot be mixed with the other declarators to follow C conventions. E.g. "const int*" points to a const int, and "const (int*)" points to an int but can't be changed. > In real life of course I'd parenthesize an expression like this to > make it unambiguous even for readers who don't remember the > precedence/associativity rules, but I guess in this notation you can't > parenthesize things, readers will have to memorize/look-up the > associativity regardless. Yes, write (((int*)[5])*)* will be supported, just as one can write (((a - b) - c) - c) instead of a - b - c - d. >> Now, you're right that this doesn't come up >> often, which is why it'll be easy to change, but it does complicate >> the compiler (and hypothetical grammar). Ideally people shouldn't be >> copying C headers in the long run, they should be parsed automatically >> or by wrappers like xdress. > > But we don't live in the long run. Maybe that will happen eventually, > maybe it won't -- it's hard to make predictions, especially about the > future... Maybe this'll be additional motivation :). But as has been mentioned they don't come up often in real life, but do have an oversized presence in the compiler/grammar. Whatever script is provided to clean up existing .pxd files could be used to transform copy-pasted header declarations. - Robert From robertwb at gmail.com Thu Oct 3 20:48:42 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Thu, 3 Oct 2013 11:48:42 -0700 Subject: [Cython] Declaration syntax change In-Reply-To: <20131003163054.GA67599@pdos.lcs.mit.edu> References: <20131003163054.GA67599@pdos.lcs.mit.edu> Message-ID: On Thu, Oct 3, 2013 at 9:30 AM, Chuck Blake wrote: > Greg Ewing wrote: >>What would be the benefit of this? You're proposing to change >>from something identical to C declaration syntax, which is >>second nature for a great many people, to something that >>looks deceptively like C syntax but isn't. >> >>I can't see that causing anything other than a massive >>amount of confusion, anguish and hair-tearing. > > I would echo this, actually. The big win is a simplification of the grammar (and its diff to the Python grammar), both for humans and for the computer. Just being able to write an actual grammar for Cython would be nice--there's too much special casing/logic in the parser right now. > While the C declarator syntax takes > a while to get used to, there is actually a simple rationale for > why it is the way it is: just one set of operator associativity > and precedences. The way [], *, and function call ()s bind do > not vary by context. One doesn't have to remember multiple sets > of rules "tailored" for various contexts like declaration/typedef > definition or casting vs operator application. I don't think this proposal would require multiple sets of rules for different contexts. > I believe this is the "fact underlying" the reactions various folk > here are having of "initially seemingly clearer, but oh wait..what > about corner case?" There is a simplicity embedded in the seemingly > backwards "operators you need to apply to get the base type" rule. While I understand the reasoning and construction of C type declarators, I don't think it makes for clear code. Note that Cython/Pyrex does not have the deference operator which is by far the most common declarator annotation, so this makes even less sense than in C. Of course trying to find/flesh out any forgotten corner cases is a big point of this thread. Function pointers still need to be resolved. > The manual translation of C headers/decls to Cython is another good > point. I think that it's a big departure from the perhaps hard to > articulate "sweet spot" Cython/Pyrex are trying to hit straddling > the C & Python worlds. > > > Stefan Behnel wrote: >>Also, C is only second nature to some people. A great many people actually >>use Cython specifically to *avoid* having to write C. > > This is a fair point, but Pyrex/Cython have a decade plus long > history here of being very close to C in this specific way, > which is kind of a central thing..almost like a different way > to spell the same type grammar. It's easy after a short while > to visually see the translation. With whole new syntax and maybe > parens in new, shiner places, I doubt seeing the mapping would be > so easy for "hard cases". And easy cases are easy. > > > On a separate relase-number-tying note, I also feel like the > Cython 1.0 target (or even Pyrex 1.0) was always more about > compatibility with "whatever Python you throw at the Cython > compiler" than about finality to all the various extended > Cython syntax areas. There are quite a few beyond-Py syntax > areas at this point. Yes, it still is. But it would be nice to get any such major changes out of the way ahead of time. > So, *if* the conclusion is that there is a group will and solid > rationale to let Cython allow a user to use an *alternate* type > declaration sub-language/warnings/a wrapper/macro syntax, then I > don't see any 1.0-release relevance. Cython grows just yet another > way to specify the same things at whatever rev is easy. The 1.0 > only matters if it's a strict backward incompatible proposal, > but I sense a bit of discord on this response chain about that. > Maybe changing "cdef" to be "cydef" would be a visual flag that > an entirely different syntax is afoot? But then..sooo many ways > to specify types. That doesn't help with the goal of simplifying the language. > If you really want to be *that* different - arguably almost as > much as different as simply requiring Python3 and annotations > rather than having "cdef" at all, then why not go all the way > and full on require Py3? Then a user confronts all the Py3 core > syntax change and Cy2.0 (or maybe Cy1.5 as 3.0/2 or whatever) > at the same time. And, why, the changes are even related to > only-in-Py3 things to boot. Just my two cents, anyway. Python2 isn't dead yet. Also, I think changes are more easily done one-at-a-time, and this would be an easier transition as it's a purely syntactic (not semantic) change that could be done safely automatically. In fact, if we dissallow "cdef int* a, b" I think there's syntax that's interpreted differently. - Robert From cb at mit.edu Fri Oct 4 00:06:23 2013 From: cb at mit.edu (Chuck Blake) Date: Thu, 3 Oct 2013 18:06:23 -0400 Subject: [Cython] Declaration syntax change In-Reply-To: References: <20131003163054.GA67599@pdos.lcs.mit.edu> Message-ID: <20131003220623.GA460@pdos.lcs.mit.edu> Robert Bradshaw [robertwb at gmail.com] wrote: >Yes, it still is. But it would be nice to get any such major changes >out of the way ahead of time. I think the "compiling Python" goal is so orthogonal to the goal of how "beyond Python syntax work" or simplicity of the parsing impl that "getting it out of the way" doesn't feel right. It's not like SciPy and a lot of other things don't have dependencies on lots of Cython code. >I don't think this proposal would require multiple sets of rules for >different contexts. Yeah. Probably. I just was explaining what I thought the mechanism behind people's reactions. They're used to something..wrinkles and all, potential for abuse and all, and honestly in terms of the bugs like that char *a,b malloc example -- visual bug finding and all. Not always, but a lot of the times there is just "confusion jello". Squeeze one place and it just oozes around. Also, Cython hasn't been so much about creating new Python syntax or new C syntax, and I think that makes it more approachable. Part of the question afoot, at least in the thread, was whether the learning curve for the new "less unclear code" is "worth the shift". Whatever common cases you make easy, nothing will ever hope to be as "search the internet"-supported as the C way. Part of debugging is a habit, a list in your head, and this proposes a new such list implicitly. So, just to confirm, I think what you propose would mean that the ubiquitous C string list is written sometimes as char *argv[] will have to change in every declared function signature that Cython sees? >That doesn't help with the goal of simplifying the language. Sure it doesn't. I said so, too. :) I'm not strongly advocating that. I don't mind how it is now. I prefer no change, but I understand this is not about pleasing me, specifically. So, I make arguments. ;-) >Python2 isn't dead yet. Also, I think changes are more easily done Your idea to simplify the language is a half measure. If simplicity of the compiler/syntax and Py similarity is what is *really* important then why not hatch a full measure Py3 plan as a longer-term goal? How long-term? Why, as long-term a goal as Py2 being mostly irrelevant, whenever that happens. I don't get the need for it to be a pre-1.0 goal. If there's a practical Py2 compatibility requirement, then why not a practical Cy <= 0.2 requirement, too? Just a leeetle more of the former than the latter and so on. All this just feels like a hair splitting judgement call. That kind of judgement just generally feels more appropriate for a post-1.0 mode of onward and upward language evolution to me not some last few releases move. Why not finish (mostly) all that you started to do with semantics and compiling and harden that as a release first? Then evolve something as basic as how to declare types. If Greg is as against the idea as his initial reaction, it'd also be a big divergence from Pyrex that is more basic than than just a cpdef or None-Arg or etc. kind of way since it impacts all type annotation. Type annotation (or inference) is the heart of .pyx over .py anyway. Not that there aren't a lot of other syntax variations, but in a lot of cases you can kind of code to the least common denominator if you care, like C++ compatible C. This would break that for any types beyond the most simple. Well, maybe least common Pyrex/Cy is more broken that I know anyway - I haven't tried to do it in a while. Cheers, cb From robertwb at gmail.com Fri Oct 4 02:31:38 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Thu, 3 Oct 2013 17:31:38 -0700 Subject: [Cython] Declaration syntax change In-Reply-To: <20131003220623.GA460@pdos.lcs.mit.edu> References: <20131003163054.GA67599@pdos.lcs.mit.edu> <20131003220623.GA460@pdos.lcs.mit.edu> Message-ID: On Thu, Oct 3, 2013 at 3:06 PM, Chuck Blake wrote: > Robert Bradshaw [robertwb at gmail.com] wrote: >>Yes, it still is. But it would be nice to get any such major changes >>out of the way ahead of time. > > I think the "compiling Python" goal is so orthogonal to the goal of > how "beyond Python syntax work" or simplicity of the parsing impl > that "getting it out of the way" doesn't feel right. It's not like > SciPy and a lot of other things don't have dependencies on lots of > Cython code. What I don't want to do is release 1.0, then in 1.1 have a major, backwards incompatible change. In particular, though the blocker for 1.0 is compatibility, I think it is a time to think about what we want to stick with going forward, and it would also be nice if we could provide backwards compatibility through the entire 1.0 series. (Yes, the next release after 1.0 could be 2.0, but I'd rather avoid that too :) >>I don't think this proposal would require multiple sets of rules for >>different contexts. > > Yeah. Probably. I just was explaining what I thought the mechanism > behind people's reactions. They're used to something..wrinkles and > all, potential for abuse and all, and honestly in terms of the bugs > like that char *a,b malloc example -- visual bug finding and all. > Not always, but a lot of the times there is just "confusion jello". > Squeeze one place and it just oozes around. Also, Cython hasn't been > so much about creating new Python syntax or new C syntax, and I think > that makes it more approachable. > > Part of the question afoot, at least in the thread, was whether the > learning curve for the new "less unclear code" is "worth the shift". > Whatever common cases you make easy, nothing will ever hope to be as > "search the internet"-supported as the C way. Part of debugging is a > habit, a list in your head, and this proposes a new such list implicitly. > > So, just to confirm, I think what you propose would mean that the > ubiquitous C string list is written sometimes as char *argv[] will > have to change in every declared function signature that Cython sees? Yes, though I rarely see that anywhere but main methods (that don't occur in Cython). Certainly a survey would be made of how much code out there in the wild would have to change (implicitly favoring open-source projects) which could make the whole thing not worth it. >>That doesn't help with the goal of simplifying the language. > > Sure it doesn't. I said so, too. :) I'm not strongly advocating that. > I don't mind how it is now. I prefer no change, but I understand this > is not about pleasing me, specifically. So, I make arguments. ;-) > > >>Python2 isn't dead yet. Also, I think changes are more easily done > > Your idea to simplify the language is a half measure. If simplicity > of the compiler/syntax and Py similarity is what is *really* important > then why not hatch a full measure Py3 plan as a longer-term goal? I'm not sure what you mean here, Py3 is supported, and there is a -3 mode for interpreting ambiguous syntax the 3 way rather than the 2. > How long-term? Why, as long-term a goal as Py2 being mostly irrelevant, > whenever that happens. I don't get the need for it to be a pre-1.0 > goal. > > If there's a practical Py2 compatibility requirement, then why not a > practical Cy <= 0.2 requirement, too? Just a leeetle more of the former > than the latter and so on. All this just feels like a hair splitting > judgement call. That kind of judgement just generally feels more > appropriate for a post-1.0 mode of onward and upward language evolution > to me not some last few releases move. Why not finish (mostly) all that > you started to do with semantics and compiling and harden that as a > release first? Then evolve something as basic as how to declare types. > > If Greg is as against the idea as his initial reaction, it'd also be > a big divergence from Pyrex that is more basic than than just a cpdef > or None-Arg or etc. kind of way since it impacts all type annotation. > Type annotation (or inference) is the heart of .pyx over .py anyway. > Not that there aren't a lot of other syntax variations, but in a lot > of cases you can kind of code to the least common denominator if you > care, like C++ compatible C. This would break that for any types > beyond the most simple. Well, maybe least common Pyrex/Cy is more > broken that I know anyway - I haven't tried to do it in a while. Thanks for your thoughts, there are clearly some good reasons to not make this kind of change as well. - Robert From yury at shurup.com Wed Oct 9 15:01:20 2013 From: yury at shurup.com (Yury V. Zaytsev) Date: Wed, 09 Oct 2013 15:01:20 +0200 Subject: [Cython] State of PyPy compatibility wrt. arrays, incl. NumPy Message-ID: <1381323680.10218.24.camel@newpride> Hi, I've been playing with my Cython-generated extension on PyPy, trying to load it through CPyExt, and, surprisingly, after a few fixes to PyPy, it generally seems to work. However, it looks like the situation with whatever kind of arrays is actually pretty gloomy at the moment. First, PyPy supports Python array.arrays, but doesn't replicate Python internals, so the extension even fails to load, since in PyPy tp_basicsize == 16 and not 56. Not sure what to make of this one... for my needs, that would be enough. I guess PyPy developers have to be approached to see if it's practical to expose CPython structure, or agree on a different API? Second, they say that the new-style buffer access is broken and it's not clear if they will support it at all & when this is going to happen. Third, apparently, there is already some limited support for NumPy C-API in NumPyPy, and it seems that there is interest in improving this support to make it usable: http://docs.scipy.org/doc/numpy/reference/c-api.array.html Finally, GetItem on NumPyPy arrays, which will be very slow, but at least should work, rather than not, is also broken. I think this will be easiest of all to fix. So my question is, has anyone been playing with PyPy recently? Is there anyone interested in making Cython-generated code to work nicely with CPyExt? What are the plans w.r.t. the points I mentioned? I've noticed that PyPy jobs in Jenkins are disabled... not sure whether this really means anything or not. Thanks! -- Sincerely yours, Yury V. Zaytsev From stefan_ml at behnel.de Thu Oct 10 08:49:42 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 10 Oct 2013 08:49:42 +0200 Subject: [Cython] State of PyPy compatibility wrt. arrays, incl. NumPy In-Reply-To: <1381323680.10218.24.camel@newpride> References: <1381323680.10218.24.camel@newpride> Message-ID: <52564E06.2070203@behnel.de> Hi, Yury V. Zaytsev, 09.10.2013 15:01: > I've been playing with my Cython-generated extension on PyPy, trying to > load it through CPyExt, and, surprisingly, after a few fixes to PyPy, it > generally seems to work. Yes, we invested quite a bit of work into that already. Generally speaking, my drive for supporting PyPy has substantially slowed down during the last year or so. The interest on PyPy side in doing even the most obvious optimisations was close to non-existant, which made it essentially uninteresting to keep working on it on our side. > However, it looks like the situation with whatever kind of arrays is > actually pretty gloomy at the moment. > > First, PyPy supports Python array.arrays, but doesn't replicate Python > internals, so the extension even fails to load, since in PyPy > tp_basicsize == 16 and not 56. > > Not sure what to make of this one... for my needs, that would be enough. > I guess PyPy developers have to be approached to see if it's practical > to expose CPython structure, or agree on a different API? The C-internals of the array.array module are private even in CPython, so I'm not surprised that PyPy doesn't expose them. The normal buffer interface would generally be enough, though. > Second, they say that the new-style buffer access is broken and it's not > clear if they will support it at all & when this is going to happen. What do you mean by "broken"? Currently broken in PyPy? Would you have a more complete quote? I couldn't find a discussion on the pypy mailing list about this on a quick look. > Third, apparently, there is already some limited support for NumPy C-API > in NumPyPy, and it seems that there is interest in improving this > support to make it usable: > > http://docs.scipy.org/doc/numpy/reference/c-api.array.html Interesting. Cython has legacy support for older NumPy versions through their C-API instead of the buffer interface. We could enable that for PyPy. Might need some minor adaptations, but might still be the easiest way to get it working. > Finally, GetItem on NumPyPy arrays, which will be very slow, but at > least should work, rather than not, is also broken. I think this will be > easiest of all to fix. How is it broken? Where should it be fixed? > So my question is, has anyone been playing with PyPy recently? Is there > anyone interested in making Cython-generated code to work nicely with > CPyExt? What are the plans w.r.t. the points I mentioned? No plans, but if it's easy to fix, or if someone wants to invest the time to work on it, it should be done, obviously. > I've noticed that PyPy jobs in Jenkins are disabled... not sure whether > this really means anything or not. It means that we don't currently have a working installation of a recent PyPy on our build server. The Linux builds they provide depend on fairly recent system libraries and I consider building PyPy ourselves way too much overhead. Also, running the tests took ages because CPyExt is so slow, so the whole setup became useless after a while. If you have a server lying around somewhere that would allow you to run nightly tests of Cython against nightly builds of PyPy, I certainly wouldn't mind if you set it up for it. Stefan From cournape at gmail.com Fri Oct 11 11:15:58 2013 From: cournape at gmail.com (David Cournapeau) Date: Fri, 11 Oct 2013 10:15:58 +0100 Subject: [Cython] Memory views: dereferencing pointer does break strict-aliasing rules In-Reply-To: References: <1372766081.2659.14.camel@newpride> Message-ID: On Tue, Jul 2, 2013 at 5:07 PM, Robert Bradshaw wrote: > On Tue, Jul 2, 2013 at 4:54 AM, Yury V. Zaytsev wrote: > > Hi, > > > > The simplest possible program using memory views compiles with a large > > number of warnings for me, even for a rather outdated version of gcc: > > > > def hello(int [:] a): > > print(a, "world") > > > > If I translate it with the latest released version of Cython like this: > > > > cython cpp.pyx > > cython --cplus cpp.pyx > > > > and compile like this: > > > > gcc -O3 -march=native -Wall -fPIC > -I/opt/ActivePython-2.7/include/python2.7 -c ./cpp.c -o cpp.o > > g++ -O3 -march=native -Wall -fPIC > -I/opt/ActivePython-2.7/include/python2.7 -c ./cpp.cpp -o cpp.o > > > > I get lots of warnings (see attached). > > > > It doesn't seem to be related to C++ as such, but rather it seems that > > the memory views code indeed somehow violates strict-aliasing rules. > > > > I'm not sure of how severe it is, but the documentation seems to suggest > > that this might even lead to incorrect results. > > > > Can this possibly be fixed in Cython and how important is that? Shall I > > create a bug report on the Trac? Is my only resort to test whether the > > compiler supports -fno-strict-aliasing and use that? > > You should compile with -fno-strict-aliasing--if you were using > distutils rather than gcc directly it should add the all necessary > flags for you. > > Aliasing different pointer types is necessary for Cython--it's how it > implements inheritance (in plain C, a PyObject* could be a pointer to > a list or dict or your own cdef class--pointer aliasing right there. > Also with memory views (and numpy arrays), the underlying data is > allocated as a char* and interpreted as a float* or int* or according > to the metadata in the array. > I think this is a relatively serious issue. Using pointer aliasing is not valid C, and we can usually get the same feature with union (which is usually supported). See http://stackoverflow.com/questions/11639947/is-type-punning-through-a-union-unspecified-in-c99-and-has-it-become-specified I have been bitten by this recently: how difficult of a change would it be ? David > - Robert > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertwb at gmail.com Fri Oct 11 19:10:41 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 11 Oct 2013 10:10:41 -0700 Subject: [Cython] Memory views: dereferencing pointer does break strict-aliasing rules In-Reply-To: References: <1372766081.2659.14.camel@newpride> Message-ID: On Fri, Oct 11, 2013 at 2:15 AM, David Cournapeau wrote: > > On Tue, Jul 2, 2013 at 5:07 PM, Robert Bradshaw wrote: >> >> On Tue, Jul 2, 2013 at 4:54 AM, Yury V. Zaytsev wrote: >> > Hi, >> > >> > The simplest possible program using memory views compiles with a large >> > number of warnings for me, even for a rather outdated version of gcc: >> > >> > def hello(int [:] a): >> > print(a, "world") >> > >> > If I translate it with the latest released version of Cython like this: >> > >> > cython cpp.pyx >> > cython --cplus cpp.pyx >> > >> > and compile like this: >> > >> > gcc -O3 -march=native -Wall -fPIC >> > -I/opt/ActivePython-2.7/include/python2.7 -c ./cpp.c -o cpp.o >> > g++ -O3 -march=native -Wall -fPIC >> > -I/opt/ActivePython-2.7/include/python2.7 -c ./cpp.cpp -o cpp.o >> > >> > I get lots of warnings (see attached). >> > >> > It doesn't seem to be related to C++ as such, but rather it seems that >> > the memory views code indeed somehow violates strict-aliasing rules. >> > >> > I'm not sure of how severe it is, but the documentation seems to suggest >> > that this might even lead to incorrect results. >> > >> > Can this possibly be fixed in Cython and how important is that? Shall I >> > create a bug report on the Trac? Is my only resort to test whether the >> > compiler supports -fno-strict-aliasing and use that? >> >> You should compile with -fno-strict-aliasing--if you were using >> distutils rather than gcc directly it should add the all necessary >> flags for you. >> >> Aliasing different pointer types is necessary for Cython--it's how it >> implements inheritance (in plain C, a PyObject* could be a pointer to >> a list or dict or your own cdef class--pointer aliasing right there. >> Also with memory views (and numpy arrays), the underlying data is >> allocated as a char* and interpreted as a float* or int* or according >> to the metadata in the array. > > > I think this is a relatively serious issue. Using pointer aliasing is not > valid C, and we can usually get the same feature with union (which is > usually supported). See > http://stackoverflow.com/questions/11639947/is-type-punning-through-a-union-unspecified-in-c99-and-has-it-become-specified > > I have been bitten by this recently: how difficult of a change would it be ? I don't think this change is even possible. NumPy has exactly the same issue, as will any library trying to store vectors of arbitrary data types (whose layout may even be defined at runtime) in C. The Python buffer API doesn't provide a way around it. Also, Python and Cython in general breaks pointer aliasing as objects are simultaneously generic PyObject* and PyListObject*, PyDictObject*, etc. As the set of possible types is large and open, unions won't work. This is how object oriented programming (with subclassing) is done in C. - Robert From stefan_ml at behnel.de Fri Oct 11 19:46:47 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 11 Oct 2013 19:46:47 +0200 Subject: [Cython] Memory views: dereferencing pointer does break strict-aliasing rules In-Reply-To: References: <1372766081.2659.14.camel@newpride> Message-ID: <52583987.2070606@behnel.de> Robert Bradshaw, 11.10.2013 19:10: > Python and Cython in > general breaks pointer aliasing as objects are simultaneously generic > PyObject* and PyListObject*, PyDictObject*, etc. As the set of > possible types is large and open, unions won't work. This is how > object oriented programming (with subclassing) is done in C. This has been fixed in CPython 3.x. Stefan From robertwb at gmail.com Fri Oct 11 19:59:23 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 11 Oct 2013 10:59:23 -0700 Subject: [Cython] Memory views: dereferencing pointer does break strict-aliasing rules In-Reply-To: <52583987.2070606@behnel.de> References: <1372766081.2659.14.camel@newpride> <52583987.2070606@behnel.de> Message-ID: On Fri, Oct 11, 2013 at 10:46 AM, Stefan Behnel wrote: > Robert Bradshaw, 11.10.2013 19:10: >> Python and Cython in >> general breaks pointer aliasing as objects are simultaneously generic >> PyObject* and PyListObject*, PyDictObject*, etc. As the set of >> possible types is large and open, unions won't work. This is how >> object oriented programming (with subclassing) is done in C. > > This has been fixed in CPython 3.x. Ah, I wasn't even aware of that. For the curious: http://www.python.org/dev/peps/pep-3123/ Basically, there is an exception for casting a struct pointer to a pointer of its first member. Looking into this more, there's also an exception for char* (aliasing is explicitly allowed), so I take back what I said about this not being possible (due to the looseness of "strict" aliasing in some cases) but I don't know how easy it'd be. - Robert From stefan_ml at behnel.de Fri Oct 11 20:42:48 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 11 Oct 2013 20:42:48 +0200 Subject: [Cython] preparing 0.19.2 for this weekend Message-ID: <525846A8.7050606@behnel.de> Hi, I've started preparing a bug-fix release on the 0.19.x branch. If you have any further changes that should go in, please speak up now. I'd like to get it out by this Sunday at the latest (before PyCon-DE starts on Monday). Stefan From yury at shurup.com Fri Oct 11 20:59:57 2013 From: yury at shurup.com (Yury V. Zaytsev) Date: Fri, 11 Oct 2013 20:59:57 +0200 Subject: [Cython] preparing 0.19.2 for this weekend In-Reply-To: <525846A8.7050606@behnel.de> References: <525846A8.7050606@behnel.de> Message-ID: <1381517997.2794.71.camel@newpride> On Fri, 2013-10-11 at 20:42 +0200, Stefan Behnel wrote: > > I've started preparing a bug-fix release on the 0.19.x branch. If you > have any further changes that should go in, please speak up now. I'd > like to get it out by this Sunday at the latest (before PyCon-DE > starts on Monday). Multiple fixes to array.extend() https://github.com/cython/cython/pull/258 Not sure it qualifies, but to me it looks rather safe to be included and Travis didn't complain about anything. -- Sincerely yours, Yury V. Zaytsev From stefan_ml at behnel.de Sat Oct 12 06:51:24 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 12 Oct 2013 06:51:24 +0200 Subject: [Cython] preparing 0.19.2 for this weekend In-Reply-To: <1381517997.2794.71.camel@newpride> References: <525846A8.7050606@behnel.de> <1381517997.2794.71.camel@newpride> Message-ID: <5258D54C.40609@behnel.de> Yury V. Zaytsev, 11.10.2013 20:59: > Multiple fixes to array.extend() > > https://github.com/cython/cython/pull/258 Hmm, yes, that code looks broken. Especially for the exception handling, there's even more to fix there. > Not sure it qualifies, but to me it looks rather safe to be included and > Travis didn't complain about anything. Well, it's clearly not working, so better fix it. Thanks! Stefan From stefan_ml at behnel.de Sun Oct 13 20:00:21 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 13 Oct 2013 20:00:21 +0200 Subject: [Cython] Cython 0.19.2 released Message-ID: <525ADFB5.6080500@behnel.de> Hi all, I released Cython 0.19.2 today as a bug-fix release for the 0.19 series. You can get it from here: http://cython.org/release/Cython-0.19.2.tar.gz http://cython.org/release/Cython-0.19.2.zip Or from PyPI: https://pypi.python.org/pypi/Cython/0.19.2 Release notes: https://github.com/cython/cython/blob/c2ddf294e025646132f7c29fae609c2f3c010e78/CHANGES.rst Documentation: http://docs.cython.org/ Have fun, Stefan From yury at shurup.com Mon Oct 14 21:52:16 2013 From: yury at shurup.com (Yury V. Zaytsev) Date: Mon, 14 Oct 2013 21:52:16 +0200 Subject: [Cython] State of PyPy compatibility wrt. arrays, incl. NumPy In-Reply-To: <52564E06.2070203@behnel.de> References: <1381323680.10218.24.camel@newpride> <52564E06.2070203@behnel.de> Message-ID: <1381780336.2655.46.camel@newpride> Hi Stefan, On Thu, 2013-10-10 at 08:49 +0200, Stefan Behnel wrote: > > Yury V. Zaytsev, 09.10.2013 15:01: > > I've been playing with my Cython-generated extension on PyPy, trying to > > load it through CPyExt, and, surprisingly, after a few fixes to PyPy, it > > generally seems to work. > > Yes, we invested quite a bit of work into that already. Generally > speaking, my drive for supporting PyPy has substantially slowed down > during the last year or so. The interest on PyPy side in doing even > the most obvious optimisations was close to non-existant, which made > it essentially uninteresting to keep working on it on our side. I obviously don't have the full picture, but when I was hanging around #pypy while trying to get my Cython-generated extension to work, everybody seemed to be extremely helpful, so I've got the opposite impression. I can only speculate that maybe these fixes from their side required them to expose implementation details that are bound to change, or maybe they were afraid that performance fixes will encourage people to blindly offload stuff to Cython, instead fixing their Python code or PyPy JIT, or using cffi for extremely high performance... Nevertheless, I think that there is a very valid use for Cython in conjunction with PyPy, and good compatibility and acceptable performance will certainly benefit both projects. For instance, the project that I'm working on right now involves automatic Python bindings generation with Cython, and 99% of the work actually happens inside the C++ code that I'm wrapping, so I don't really care about the performance of the bindings all that much. What I do care about however is that I don't have to write crazy cffi code (wrapping templated C++!) by hand, and with Cython my extension is now only a hundred lines of code and works perfectly across the whole zoo of CPythons... Having it to work with PyPy would be a serious benefit for us, because then we could write most of the complex processing code in pure Python and still use the same bindings as if we were using CPython. On top of that, it would be truly awesome if we could write callbacks in Python and have them run at speed with PyPy, but I guess that's going to be difficult to make it fast, if these callbacks will be invoked a lot. > The C-internals of the array.array module are private even in CPython, so > I'm not surprised that PyPy doesn't expose them. The normal buffer > interface would generally be enough, though. Hmmm... then, maybe I don't understand how array.array module is handled in Cython at the moment. From what I could figure, Python 3 supports new-style buffer interface to array.array (which is unavailable in PyPy, see below), but Cython also has some code to handle older versions of Python. That's what I was referring to with "internals". It could be that I'm completely confused. Never mind. What matters is that array.array Cython-generated code just doesn't work with PyPy CPyExt at the moment... > > Second, they say that the new-style buffer access is broken and it's not > > clear if they will support it at all & when this is going to happen. > > What do you mean by "broken"? Currently broken in PyPy? I guess it primarily means that it's broken in PyPy right now, but they also claim that this interface is problematic for some reasons and therefore also 'broken' in terms of design. > Would you have a more complete quote? I couldn't find a discussion on > the pypy mailing list about this on a quick look. Sorry, I don't have a complete quote. This information was conveyed to me by fijal on the IRC. When I asked for clarification, he said that it's not that they are going to prevent anyone from implementing it, but right now it doesn't work, and they don't have motivation to fix it because of its design-wise 'brokenness'. I guess he can easily explain you what's so 'broken' about it, but it doesn't make sense for me to serve as an intermediary between you and them, because I poorly understand the subject and will only confuse everything even further... > > Third, apparently, there is already some limited support for NumPy C-API > > in NumPyPy, and it seems that there is interest in improving this > > support to make it usable: > > > > http://docs.scipy.org/doc/numpy/reference/c-api.array.html > > Interesting. Cython has legacy support for older NumPy versions through > their C-API instead of the buffer interface. We could enable that for PyPy. > Might need some minor adaptations, but might still be the easiest way to > get it working. I think so too! > > Finally, GetItem on NumPyPy arrays, which will be very slow, but at > > least should work, rather than not, is also broken. I think this will be > > easiest of all to fix. > > How is it broken? Where should it be fixed? It's broken in a way that it segfaults PyPy. I've created a reproducer, it turned out to be a PyPy problem, and they are working on a fix: https://bugs.pypy.org/issue1621 I guess it will hit PyPy main branch sometime soon. There is nothing to be done about it in Cython. I just didn't get that far at the time of writing... > It means that we don't currently have a working installation of a recent > PyPy on our build server. The Linux builds they provide depend on fairly > recent system libraries and I consider building PyPy ourselves way too much > overhead. Also, running the tests took ages because CPyExt is so slow, so > the whole setup became useless after a while. I see the problem. Yes, I also can't use their nightlies for the same reason. I figured out how to translate it myself though, so that's what I've been using so far. > If you have a server lying around somewhere that would allow you to run > nightly tests of Cython against nightly builds of PyPy, I certainly > wouldn't mind if you set it up for it. Yes, I have a build server which has hopefully recent enough libraries so that I could use stock PyPy nightlies on it without retranslating them myself. I can set up a build that tests Cython against latest PyPy nightly, but this will take some effort. Of course, it would be very nice to know, that this effort will lead to the improvement of Cython, and not just go to the trash bin :-) In any case, would it be possible for me to at least get your build scripts? I don't seem to be able to access them on Jenkins without an account. To start off with something, I did a small test run on my laptop of Cython 0.19-438-ge36fc99 on manually translated PyPy with GetItem fix: http://vps.zaytsev.net/~zaytsev/pypy/cython-pypy-2013-10-10.log I hope that this can give you an overview of what's going on with CPyExt at the moment... Is that useful? If not, how I can make it more useful? -- Sincerely yours, Yury V. Zaytsev From arfrever.fta at gmail.com Tue Oct 15 22:21:01 2013 From: arfrever.fta at gmail.com (Arfrever Frehtes Taifersar Arahesis) Date: Tue, 15 Oct 2013 22:21:01 +0200 Subject: [Cython] Cython 0.19.2 released In-Reply-To: <525ADFB5.6080500@behnel.de> References: <525ADFB5.6080500@behnel.de> Message-ID: <201310152221.02150.Arfrever.FTA@gmail.com> Cython 0.19.2 fails 2 tests with all versions of Python (2.6, 2.7, 3.1, 3.2, 3.3), probably due to new version of NumPy. I use NumPy 1.8.0 rc2. Results with Python 2.7: ====================================================================== FAIL: test_one_sized (line 29) (relaxed_strides.__test__) Doctest: relaxed_strides.__test__.test_one_sized (line 29) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python2.7/doctest.py", line 2201, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for relaxed_strides.__test__.test_one_sized (line 29) File "/tmp/Cython-0.19.2/tests-2.7/memoryview/c/relaxed_strides/relaxed_strides.so", line unknown line number, in test_one_sized (line 29) ---------------------------------------------------------------------- File "/tmp/Cython-0.19.2/tests-2.7/memoryview/c/relaxed_strides/relaxed_strides.so", line ?, in relaxed_strides.__test__.test_one_sized (line 29) Failed example: test_one_sized(a)[0] Exception raised: Traceback (most recent call last): File "/usr/lib64/python2.7/doctest.py", line 1289, in __run compileflags, 1) in test.globs File "", line 1, in test_one_sized(a)[0] File "relaxed_strides.pyx", line 38, in relaxed_strides.test_one_sized (relaxed_strides.c:1387) File "stringsource", line 619, in View.MemoryView.memoryview_cwrapper (relaxed_strides.c:7145) File "stringsource", line 327, in View.MemoryView.memoryview.__cinit__ (relaxed_strides.c:3560) ValueError: ndarray is not C-contiguous ====================================================================== FAIL: test_one_sized (line 29) (relaxed_strides.__test__) Doctest: relaxed_strides.__test__.test_one_sized (line 29) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python2.7/doctest.py", line 2201, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for relaxed_strides.__test__.test_one_sized (line 29) File "/tmp/Cython-0.19.2/tests-2.7/memoryview/cpp/relaxed_strides/relaxed_strides.so", line unknown line number, in test_one_sized (line 29) ---------------------------------------------------------------------- File "/tmp/Cython-0.19.2/tests-2.7/memoryview/cpp/relaxed_strides/relaxed_strides.so", line ?, in relaxed_strides.__test__.test_one_sized (line 29) Failed example: test_one_sized(a)[0] Exception raised: Traceback (most recent call last): File "/usr/lib64/python2.7/doctest.py", line 1289, in __run compileflags, 1) in test.globs File "", line 1, in test_one_sized(a)[0] File "relaxed_strides.pyx", line 38, in relaxed_strides.test_one_sized (relaxed_strides.cpp:1387) File "stringsource", line 619, in View.MemoryView.memoryview_cwrapper (relaxed_strides.cpp:7145) File "stringsource", line 327, in View.MemoryView.memoryview.__cinit__ (relaxed_strides.cpp:3560) ValueError: ndarray is not C-contiguous ---------------------------------------------------------------------- Ran 7903 tests in 3249.768s FAILED (failures=2) ALL DONE -- Arfrever Frehtes Taifersar Arahesis -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: This is a digitally signed message part. URL: From cgohlke at uci.edu Tue Oct 15 22:28:21 2013 From: cgohlke at uci.edu (Christoph Gohlke) Date: Tue, 15 Oct 2013 13:28:21 -0700 Subject: [Cython] Cython 0.19.2 released In-Reply-To: <201310152221.02150.Arfrever.FTA@gmail.com> References: <525ADFB5.6080500@behnel.de> <201310152221.02150.Arfrever.FTA@gmail.com> Message-ID: <525DA565.4070501@uci.edu> On 10/15/2013 1:21 PM, Arfrever Frehtes Taifersar Arahesis wrote: > Cython 0.19.2 fails 2 tests with all versions of Python (2.6, 2.7, 3.1, 3.2, 3.3), > probably due to new version of NumPy. > I use NumPy 1.8.0 rc2. > > Results with Python 2.7: > ====================================================================== > FAIL: test_one_sized (line 29) (relaxed_strides.__test__) > Doctest: relaxed_strides.__test__.test_one_sized (line 29) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib64/python2.7/doctest.py", line 2201, in runTest > raise self.failureException(self.format_failure(new.getvalue())) > AssertionError: Failed doctest test for relaxed_strides.__test__.test_one_sized (line 29) > File "/tmp/Cython-0.19.2/tests-2.7/memoryview/c/relaxed_strides/relaxed_strides.so", line unknown line number, in test_one_sized (line 29) > > ---------------------------------------------------------------------- > File "/tmp/Cython-0.19.2/tests-2.7/memoryview/c/relaxed_strides/relaxed_strides.so", line ?, in relaxed_strides.__test__.test_one_sized (line 29) > Failed example: > test_one_sized(a)[0] > Exception raised: > Traceback (most recent call last): > File "/usr/lib64/python2.7/doctest.py", line 1289, in __run > compileflags, 1) in test.globs > File "", line 1, in > test_one_sized(a)[0] > File "relaxed_strides.pyx", line 38, in relaxed_strides.test_one_sized (relaxed_strides.c:1387) > File "stringsource", line 619, in View.MemoryView.memoryview_cwrapper (relaxed_strides.c:7145) > File "stringsource", line 327, in View.MemoryView.memoryview.__cinit__ (relaxed_strides.c:3560) > ValueError: ndarray is not C-contiguous > > > ====================================================================== > FAIL: test_one_sized (line 29) (relaxed_strides.__test__) > Doctest: relaxed_strides.__test__.test_one_sized (line 29) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib64/python2.7/doctest.py", line 2201, in runTest > raise self.failureException(self.format_failure(new.getvalue())) > AssertionError: Failed doctest test for relaxed_strides.__test__.test_one_sized (line 29) > File "/tmp/Cython-0.19.2/tests-2.7/memoryview/cpp/relaxed_strides/relaxed_strides.so", line unknown line number, in test_one_sized (line 29) > > ---------------------------------------------------------------------- > File "/tmp/Cython-0.19.2/tests-2.7/memoryview/cpp/relaxed_strides/relaxed_strides.so", line ?, in relaxed_strides.__test__.test_one_sized (line 29) > Failed example: > test_one_sized(a)[0] > Exception raised: > Traceback (most recent call last): > File "/usr/lib64/python2.7/doctest.py", line 1289, in __run > compileflags, 1) in test.globs > File "", line 1, in > test_one_sized(a)[0] > File "relaxed_strides.pyx", line 38, in relaxed_strides.test_one_sized (relaxed_strides.cpp:1387) > File "stringsource", line 619, in View.MemoryView.memoryview_cwrapper (relaxed_strides.cpp:7145) > File "stringsource", line 327, in View.MemoryView.memoryview.__cinit__ (relaxed_strides.cpp:3560) > ValueError: ndarray is not C-contiguous > > > ---------------------------------------------------------------------- > Ran 7903 tests in 3249.768s > > FAILED (failures=2) > ALL DONE > > -- > Arfrever Frehtes Taifersar Arahesis > > These failures were previously reported and discussed on the numpy list Christoph From robertwb at gmail.com Thu Oct 17 08:29:28 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Wed, 16 Oct 2013 23:29:28 -0700 Subject: [Cython] Cython wiki Message-ID: I've cleaned out the spam and moved the wiki to https://github.com/cython/cython/wiki Don't edit it just yet (as I have at lest one more round of force-pushing conversion from moinmoin files), but please let me know if something doesn't look right or it's missing pages you expect to see. (Yes, I know some pages are failing to render; still looking into that, but if you can see what the error is please let me know.) - Robert From stefan_ml at behnel.de Thu Oct 31 18:25:46 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 31 Oct 2013 18:25:46 +0100 Subject: [Cython] "Stack" checker for undefined behaviour in C code Message-ID: <5272929A.7000302@behnel.de> Hi, I just came across this paper: http://pdos.csail.mit.edu/~xi/papers/stack-sosp13.pdf They describe an analysis tool that checks C code for bugs that exploit undefined behaviour and that are thus up to the mercy of compiler assumptions and "optimisations" to do the right thing or not. They made it available on github: https://github.com/xiw/stack/ If anyone wants to take the time to set it up for checking some Cython generated code, I'd be interested to see if it finds something. Stefan