[Cython] array expressions

Mon Aug 27 13:06:55 CEST 2012

Dag Sverre Seljebotn, 27.08.2012 11:55:
> On 08/27/2012 11:53 AM, Dag Sverre Seljebotn wrote:
>> On 08/24/2012 08:40 PM, mark florisson wrote:
>>> Here a pull request for element-wise array expressions for Cython:
>>> https://github.com/cython/cython/pull/144
>>> It includes the IndexNode refactoring branch as well.
>>>
>>> This has been the work this last summer for the gsoc, with great
>>> supervision from Dag, who helped steer the project in a great
>>> direction to make it reusable (it's partially included in Numba and
>>> will likely be in Theano in the future, hopefully others as well). I
>>> also wrote a thesis for my master's, which can be found here
>>> https://github.com/markflorisson88/minivect/tree/master/thesis, which
>>> can shed
>>> some light on some parts of the design and performance aspects.
>>> Performance graphs can also be found here:
>>> https://github.com/markflorisson88/minivect/tree/master/bench/graphs
>>>
>>> So anyway, how would you prefer dealing with the minivect submodule?
>>> We could include it verbatim, with any modifications made to minivect
>>> directly, since we'd have separate git histories. We could
>>> alternatively make it an optional submodule which is only required
>>> when actually using array expressions. I like the latter, but anything
>>> is fine with me really.
>>
>> I think I support using a git submodule for now. Not sure about making
>> it optional (which I assume would make array expression testcases not
>> run if it minivect is not present so that there's no test failures); we
>> want to make sure we are forced to do releases and testing right and
>> include it, and if it is only required to compile code that uses
>> memoryview expressions users could get confused about there being two
>> Cython "editions" around.
>>
>> Since "git submodule" does link to a specific revision, so there's no
>> stability concerns over verbatim inclusion.
>>
>> How hg-git deals with submodules is worth consideration too though.
> 
> Another option you didn't mention is to push the responsibility of getting
> minivect over to end-users; Cython simply tries to do "import minivect".
> This does have versioning issues though since git will likely not be the
> tool used to fetch the revision.
> 
> A lot more pain for those who uses array expressions, but a little less
> pain for the rest. So it depends on how you weigh the user groups.
> 
> Realistically, we'd want to depend on LLVM as well down the road for
> minivect stuff (at least if you want optimal performance), so perhaps
> opening the can-of-external-dependency-worms should just be done sooner
> rather than later.

My experience with lxml tells me that it's often better to keep things
separate but integrated, instead of shipping them in a big box. As long as
it doesn't hurt too much to have separate tools, we should keep it that
way. Those who prefer everything in a big box can use distributions like
Sage, or use apt.

As for versioning, you can set dependency version ranges in distutils (and
friends) which are honoured by install tools like pip. That keeps the
installation fully automatic (and definitely not "a lot more pain"). That
being said, the best way to handle this is to build a well defined
interface between the two components and to keep that alive for a while.

For Jenkins, we'd set up separate jobs that build the dependencies and then
install them from there before running the integration tests. We could even
have dedicated integration test jobs that only run the tests that involve
the dependency (and potentially more than one version of the dependency).

Stefan