[Cython] New function (pointer) syntax.

Robert Bradshaw robertwb at gmail.com
Sun Nov 9 03:08:06 CET 2014


On Sat, Nov 8, 2014 at 3:19 PM, C Blake <cblake at pdos.csail.mit.edu> wrote:
>>But I admit it's hard to come up with an objective measure for how
>>good a syntax is...if it's natural to you than that's great.
>
> I think those queries you mention will mostly be biased by the squeakier
> wheels being more beginning people and that's not a very good argument
> or metric.  I agree an objective measure of "goodness" or "understanding"
> is hard, but I happen to run Gentoo and keep my sources around.  So, I
> did a quick grep over .c and .h files in 600 packages on my system..
> pretty diverse: no one style guide or style or maintainer..Not even any
> very common domains..utilities, libraries, all sorts of stuff.
>
> $ grep '[a-zA-Z0-9_][a-zA-Z0-9_]\*\*[^ ]' `find -type f -name '*.[ch]'` |
>     grep -v '/\*\*' | grep -v '\*\*/' | wc -l
> 3468
>
> $ grep '[a-zA-Z0-9_][a-zA-Z0-9_]  *\*\*[^ ]' `find -type f -name '*.[ch]'` |
>     | grep -v '/\*\*' | grep -v '\*\*/' | wc -l
> 68900
>
> In other words, over 95% of the instances spaced the '**' as if they knew
> it bound to the token on its right.  ('**' is easier than '*' since the
> latter could be multiplies but '**' almost never is).
>
> Yes, greps are way approximate.  Yes, some real parser would be better,
> but that just took me only a few minutes.  I visually inspected what they
> were catching by |less instead of |wc and both cases seemed to mostly be
> catching decl/type uses as intended..less than a few percent error on
> that.  If anything, the most glaring question was that 3468 "type**" cases
> were highly polluted with near 50% questionable "no whitespace at all"
> instances like (char**)a.  Maintainers who know better might accept
> patches and lazily not fix confusing formatting.  So, in a couple ways
> that 5% confused is an upper bound in this corpus (under a spacing = no
> confusion hypothesis).  And, sure, confused people might format
> non-confusingly.  And maybe '**' itself is slightly selecting for
> less confused people.

Yep, most C code, especially code good enough to be shipped in a
standard linux distribution, is written by people who know C well.

> Even so, 95..97.5% to me == "essentially no one" to you by some, let's
> say "not totally bonkers" measure suggests that we are just thinking of
> highly different populations of people.  Even if you think my methods way
> hokey, it's probably at least suggestive "essentially no one" is a far
> bigger set than you thought before.  So, I agree/disagree with some other
> things you said.  Initializers are an (awfully convenient) aberration,
> but your example of teaching is just an example of bad teaching - so what?
> A list of vars of one type is easily achieved even thinking as you want
> with a typedef, hand having both declarators and typedefs gets you
> everything you want.  Still, disagreements aside, I give up trying to
> convince you of anything beyond that you *just might* may have a very
> skewed perception of how confused about declarators are people coming to
> Cython from a C/C++ background or people who write C/C++ in general.
> But it seems in this arc anyway you aren't trying to target them or
> C code integration coherence or such...Period!

To understand where I'm coming from, let's divide the population of
Cython into three groups.

(1) People who know C well. They don't have to be absolute gurus, but
they're used to C declarators and can write complicated function
pointers in their sleep.
(2) People who know a little C. Maybe with the help of google they
could parse a function pointer, but it certainly wouldn't come
naturally--often all they want to do is wrap some code.
(3) People who just want their Python to be fast, and don't care about C at all.

Using C-style declarators primarily helps group (1), and is (IMHO) an
unnecessary hurdle for group (3). Especially as passing functions
around is much more common in Python. Over the years, I've seen a
shift in Cython users away from (1) and towards (3). That's not to say
I don't care about group (1), I do, but I also have much higher
confidence that they'll be able to pick up a new syntax easily.

This is not to dissimilar from supporting memory views rather than
forcing users to deal with pointers directly.

> As per...
>
>>I'm hoping we can avoid it 100% :-) for anyone who doesn't have to
>>actually interact with C.
>
> So, you're leaning hard on the Cython as a Python compiler direction.
> I think Cython in general should probably either be A) as C-like as
> possible or B) as Python-like as possible.  Given your (I still think
> misguided) hatred of C function pointer syntax/scenario A), there's
> your probable answer - be as Py-like as possible.  Given that, for
> just function types, that seems to mean either:
>     A) the "lambda type1, type2: type3" proposal,
>     B) what mypy does which is roughly Function[ [type1, type2], type3 ],
> or possibly C) what Numba does if that really catches on.
> or maybe the (type, type) -> rtype though that seems unpopular here,
> but almost surely not that "char*(..)" thing.

Actually type(type, type) is a type in C, just not one that can be
passed by value, hence the need for function *pointers*. As function
pointers are callable, and functions coerce to pointers, the
distinction is not very important for the user.

> In a few years the mypy approach may well be a PEP approved lint/typing
> approach and people coming from Python will at least already have maybe
> seen it.  In dozens of emails 2..3 months ago Guido was really strongly
> promoting mypy, but I think it is in some kind of a-PEP-needs-to-be-
> written limbo.

On that note, this is turning out to be controversial enough I should
write a CEP, if no one beats me to it...
https://github.com/cython/cython/wiki/enhancements

> Here is a link to the relevant sub-part for those who
> haven't looked at it:
>
>     http://www.mypy-lang.org/tutorial.html#callables
>
> I actually like A) better, but not so much better it should override
> what the parent to one of the two Cython syntax communities goes with.
> A) is really easy to describe - "just take the function value structure
> but use types instead of variables/expression value".
>
> There are some other styles like pytypedecl or obiwan and such that
> might also be worth looking into before you decide.  I haven't looked
> at them, but thought I should mention them.

Good point, I'll look at those too.

- Robert


More information about the cython-devel mailing list