[Cython] Fwd: Question about how best require compiler options for C sources

Erik Bray erik.m.bray at gmail.com
Mon Apr 11 09:44:53 EDT 2016


On Mon, Apr 11, 2016 at 2:51 PM, Nathaniel Smith <njs at vorpus.org> wrote:
> On Apr 11, 2016 04:18, "Erik Bray" <erik.m.bray at gmail.com> wrote:
>>
>> On Fri, Apr 8, 2016 at 5:49 PM, Nathaniel Smith <njs at vorpus.org> wrote:
>> > Can you give a tiny concrete example? My questions are basic enough that
>> > I
>> > feel like I'm missing something fundamental :-)
>>
>> Yes, I think you might be missing something, but I'm not sure exactly
>> where.  In the issue I provided a tiny example which you can see here:
>>
>> https://gist.github.com/embray/12a67edb82b213217e31f408007898e6
>>
>> The C code generated by this example currently does not compile on
>> Windows, because of how Cython uses DL_IMPORT incorrectly.  Regardless
>> of what it *does* do, Cython should not be using the DL_IMPORT macro
>> at all actually since that no longer even exists in Python.
>
> Sure.
>
> For that example, the correct thing to do is to *not* export the function.

So as I wrote in my previous message, my example was incomplete.  But
indeed if the intent was *only* to share the declaration between TUs
of the same library then I agree the *most* correct thing would be to
not use dllexport. Unfortunately there isn't currently a way in Cython
to make this distinction.

> Backing up, to make sure we're on the same page:

Yep, I agree with your analysis that follows, and it's helpful to have
all the cases laid out in one place, so thanks for that!

There are three levels of
> symbol visibility in C: file-internal, shared library internal (different .c
> files that make up the library can see it, but users of the shared library
> can't see it), and shared library exported (everyone can see it; can also
> carry other consequences, e.g. on Linux then internal calls will become
> noticeably slower, and it becomes possible for weird symbol interposition
> issues to occur). So the rule of thumb is to make everything as private as
> you can get away with.
>
> Making this more interesting:
> - vanilla C only includes 2 ways to mark symbol visibility, which is not
> enough to make a 3-way distinction. Hence the need for an extra attribute
> thingummy.
> - everyone agrees that symbols marked 'static' should be file-internal, but
> different platforms disagree about what should happen if the extra attribute
> thingummy is missing.
>
> So on Windows, the convention is:
> 'static' -> file internal
> no marking -> shared library internal
> 'dllexport' -> public
>
> And on Linux it's:
> 'static' -> file internal
> 'visibility (hidden)' -> shared library internal
> no marking -> public
>
> It's generally agreed that Linux got this wrong and that you should always
> use '-fvisibility=hidden' to switch it to the windows style, but cython
> doesn't control compiler options and thus should probably generate code that
> works correctly regardless of such compiler settings. Fortunately, Linux
> does provide some markings to explicitly make things public: you can mark a
> symbol 'visibility (default)' (which means public), or you can use the
> dllexport syntax, just like Windows, because gcc is helpful like that.
>
> OTOH, windows is annoying because of this dllimport thing that started this
> whole thread: on other systems just marking symbols as extern is enough to
> handle both shared-object-internal and shared-library-exported symbols, and
> the linker will sort it out. On Windows, you have to explicitly distinguish
> between these. (And annoyingly, if you accidentally leave out the dllimport
> making on functions then it will use some fallback hack that works but
> silently degrades performance; on other symbols it just doesn't work, and
> ditto for if you use it when it isn't needed.)

Yep. Agreed with all of the above.

> So final conclusion: for non-static symbols, cython should first decide
> whether they are supposed to be shared-library-internal or actually exported
> from the shared library.
>
> For shared-library-internal symbols: their definition should be marked
> 'visibility(hidden)' on Linux, and unmarked on Windows. This is easy using
> some preprocessor gunk. (Or maybe simplest is: marked that everywhere except
> if using msvc, because I think everyone else will understand 'visibility
> (hidden)' even if it's a no op.) Their declaration in the header file should
> just be 'extern' everywhere.
>
> For shared-library-exported symbols: I am dubious about whether cython
> should even support these at all. But if it does, then the definitions
> should be marked 'dllexport' (no macro trickery needed, because everyone
> understands this syntax), and their declaration in the header file needs
> some extra hack that is the subject of this thread.

I understand your doubts here, but for the purpose of discussion let's
just assume that it should be supported? :)

And yes, the issue that the header file needs some hacky
context-dependent declarations.  A possible alternative, which in fact
is exactly the work-around I'm using right now, would be for Cython to
generate two headers.  In the case of my example they would be "foo.h"
and "foo_internal.h".  They would be almost exactly the same except
that the former declares the function dllimport and the latter
declares the function dllexport, and only the _internal.h should be
used to share a declaration between TUs.  I'm currently doing this by
writing the "foo_internal.h" manually.

> Now, back to your example: Here the caller and callee are both compiled into
> the same shared library, so you don't want dllexport/dllimport at all, you
> just want a shared-library-internal symbol, which as we see is much easier.

Well, again, in this case I do want dllexport/dllimport but I should
have been more clear about that.  But supposing I didn't want it, then
this is true.

> NumPy also ran into some problems with this in our experiments with using
> cython internally. Our temporary solution was to use the preprocessor to
> monkeypatch DL_IMPORT into expanding to the appropriate
> shared-library-internal thing :-).

Ah, how did you manage that?  Do you have to make sure to do it before
Python.h is ever included?

>> > My first question is why you even need this, since AFAIK there are no
>> > cases
>> > where it is correct to have a cython module dllexporting symbols that
>> > appear
>> > in header files. This is what cimport is for, right?
>>
>> I don't think this has anything to do with cimport.  Could you explain
>> what you mean?
>
> We only need to solve the dllimport issue if we want to support
> shared-library-exported symbols from cython extensions. This is only useful
> if you have different extensions that are directly linking to each other
> using the platform linker. But this simply doesn't work in almost any cases,
> because platform linkers have all kinds of quirks that don't play well with
> python packages -- to start with, your runtime linker probably does not
> follow Python's rules for which directories to search to find a shared
> library... To solve these problems, the Cython devs invented the cimport
> mechanism, which is basically a portable, python-centric way of doing
> shared-library-exports while avoiding all the problems caused by using the
> platform linker. So my question was, what's your use case that's better
> served by linking directly against an extension module and using the
> platform's shared-library-export functionality, instead of cython's?

Well, yes, but you can't use cimport from C code :)  As for the use
case I'll have to get back to you on that.  I'm trying to help the
Sage project fix some issues related to that and my understanding is
that it is needed for C code to be able to link directly against code
that's compiled into an extension module.  I could be wrong about that
in which case we're free to ignore that case (unless someone does come
up with a use case).  So yes, I had better double-check on that :)

Still, I'm not necessarily talking about linking Cython modules
together, which is why I don't think cimport really comes into it.

>> > My second question is why you would want to do this via the command
>> > line,
>> > when compiling the dll means that you are compiling some
>> > cython-generated
>> > .c, which means that you can put the #define directly in the source
>> > code,
>> > no?
>>
>> Not necessarily. As you can see in my example the file fooutil.c is
>> hand-written C code that was not generated by Cython, but which uses a
>> function in the Cython-generated code.  It includes "foo.h".  In
>> principle you're right--the hand-written C code could set the proper
>> #defines but it would have to do so *before* including "foo.h".  It's
>> not very obvious that this would be needed.  Most other code I've seen
>> that addresses this issue--including cpython itself--do so by passing
>> an appropriate define to the compiler via the command-line, and that
>> seems the clearest to me.
>
> I see, right. I guess my suggestion would be that if a symbol really does
> need to be marked for shared-library-export *and simultaneously* used by
> different files within the same shared library -- which is the only case
> where this arises -- then possibly the simplest and most robust thing is to
> set up the header file so that external users just do #include "foo.h", and
> the internal users do
>
>   #define FOO_INTERNAL
>   #include "foo.h"

Okay, this is similar to my above suggestion of using separate
headers.  I for one prefer the separate headers better, as I think
it's easy to forget to set a #define like that--and to anyone not
working on Windows it's not at all clear why that would be needed.  In
fact on Linux it will "just work" without the "#define FOO_INTERNAL"
so without regular testing on Windows it will be too easy to forget.

I'm not sure if having a separate _internal.h header is any better.
But it *might* be--in particular if it's generated by Cython then it
would force the developer to ask what the difference is between
"foo.h" and "foo_internal.h". And they can both contain comments
explaining when to use them.

> I think the obvious thing is that cython should provide a natural way to
> mark a function as being shared-library-internal; that covers 99% of real
> use cases *and* just works without any further macros needed. Probably the
> existing "public" annotation should be changed to mean this.

Probably, yeah.  Ignoring the DL_IMPORT stuff the other effect of
marking a symbol "public" in Cython is to add the appropriate
__PYX_EXTERN_C.  And if that were *all* it did then its behavior would
be consistent between platforms :)

> (Obviously it
> wasn't quite fully thought through in the first place and has few if any
> users, since it got this very wrong without anyone noticing. So fixing it
> seems viable to me.)

+1

> And then *maybe* there should also be a way to make a symbol
> shared-library-exported, if that really is useful, but as a non-default
> experts-only kind of thing, and as such it would be OK to require these rare
> expert users to be a bit more careful about how they #include the resulting
> header within their own project.

Okay. My belief is that there is a case for this, but I should
substantiate it better.  Would you be amenable to the generation of a
"<modulename>_internal.h"?  The more I think about it the more I'm
convinced this would be the simplest way to handle this, and would
simplify matters by not requiring Cython to impose any compiler flags
(making my original quesition moot).

Agreed that it could be non-default.

Thanks again,

Erik


More information about the cython-devel mailing list