[Neuroimaging] [DIPY] Setting up a platform for offline end-to-end quality assurance for DIPY

Thu Mar 3 13:35:16 EST 2016

Also if you want you can use Docker on CircleCI - I use it in NeuroVault:
https://circleci.com/gh/NeuroVault/NeuroVault

Best,
Chris

On Thu, Mar 3, 2016 at 10:33 AM, Chris Filo Gorgolewski <
krzysztof.gorgolewski at gmail.com> wrote:

> Have a look at waht we are doing for Nipype on CircleCI (on the free open
> source tier):
>
> https://github.com/nipy/nipype/blob/master/circle.yml
> https://circleci.com/gh/nipy/nipype
>
> All of the workflows we run for tests take over 3h to finish. Similar set
> up is implemented in nilearn project.
>
> Best,
> Chris
>
> On Thu, Mar 3, 2016 at 10:29 AM, Ariel Rokem <arokem at gmail.com> wrote:
>
>> Hi Eleftherios,
>>
>> I have resources to run this kind of thing on AWS, or some other cloud
>> provider. I see many advantages to doing this on the cloud and using
>> something like docker for deployment (e.g., portability and reproducibility
>> in other people's hands, as well as relatively easy scaling in ours). Data
>> can then also consistently be pulled from the HCP S3 buckets (see for
>> example the beginning of the notebook here:
>> https://github.com/arokem/end-to-end/blob/master/end-to-end.ipynb). Once
>> we have automated all that, it will also be relatively easy to transfer
>> these ideas to the other use-cases you mentioned.
>>
>> But we'd need to do some math to see how much this would actually cost.
>> Do you have a sense of the requirements? For example, how often would you
>> want to run the pipeline? Every time a PR happens? That's happening quite
>> often these days ;-) I don't believe we need a really large machine to run
>> persistently. We might want a small machine running persistently,
>> monitoring github for us, and then waking up the big beast when there's a
>> lot of work to do. That might reduce costs.
>>
>> Cheers,
>>
>> Ariel
>>
>> On Thu, Mar 3, 2016 at 8:24 AM, Eleftherios Garyfallidis <
>> garyfallidis at gmail.com> wrote:
>>
>>> Dear Matthew, Maxime, Ariel and all,
>>>
>>> Mr. Dumont and I have started creating some workflows which can be run
>>> by the command line. These are made to work with large real datasets.
>>>
>>> I think it would be great if we could use a different type of testing
>>> from what we were using right now. Most of the testing we use is actually
>>> fast testing of functions and we should definitely continue having that.
>>>
>>> But I think we need also an end-to-end offline testing where we actually
>>> test with big whole brain datasets and then we can collect some automatic
>>> quality assurance reports. In that way we cover most of unexpected issues.
>>>
>>> Now, the problem with having such a platform is that it needs computing
>>> power and some disk space. It may need a descent computer to run for 24
>>> hours for example and let's say around 100 GBytes of free disk space. Then
>>> it will also need to send some automated reports to say that is all good or
>>> not.
>>>
>>> Ariel has suggested  to use the cloud and docker but I am afraid that it
>>> will be too expensive for our pockets right now except if someone can
>>> donate to the project.
>>>
>>> An alternative idea would be to go gradually and setup one of the
>>> computers in Sherbrooke or in Berkeley or in Seattle to do such a job. I
>>> think this QA should run once/twice a week rather than every day.
>>>
>>> Now there are other platforms that need to run relatively frequently.
>>> One is the examples for the documentation and then there is Omar's
>>> validation framework which actually needs a large cluster. We can deal with
>>> those at a later stage.
>>>
>>> The easiest way forward with the workflows that I see right now is that
>>> Mr. Dumont adds a script in dipy/tools that will run all the workflows as
>>> we do with make_examples.py that run all the examples. We first try this
>>> platform in Sherbrooke and then we need to figure out a way to send
>>> automated reports to the core developers or to berkeley builders and so on.
>>> Maybe sending a PDF or HTML of the output screenshots would be also a
>>> good idea.
>>>
>>> Let me know what you think.
>>>
>>> Cheers,
>>> Eleftherios
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Neuroimaging mailing list
>>> Neuroimaging at python.org
>>> https://mail.python.org/mailman/listinfo/neuroimaging
>>>
>>>
>>
>> _______________________________________________
>> Neuroimaging mailing list
>> Neuroimaging at python.org
>> https://mail.python.org/mailman/listinfo/neuroimaging
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/neuroimaging/attachments/20160303/817e5955/attachment.html>