[Neuroimaging] parallel computation of bundle_distances_mam/mdf ?

Ariel Rokem arokem at gmail.com
Wed Dec 14 13:44:47 EST 2016


Hi Paolo,

I can partially reproduce your findings (also on a MAC, OS 10.12.1 in my
case):

On Wed, Dec 14, 2016 at 9:21 AM, Paolo Avesani <avesani at fbk.eu> wrote:

> Just for reference I tried two ways on Mac OS 10.11.5:
> 1. compile with clang
> 2.  compile with gcc provided by anaconda
> In both cases compilation failed.
>
> 1. compile with clang
> ----------------------------
> $ python setup.py build_ext --inplace
> Compiling test.pyx because it changed.
> Cythonizing test.pyx
> running build_ext
> building 'test' extension
> creating build
> creating build/temp.macosx-10.6-x86_64-2.7
> gcc -fno-strict-aliasing -I/Users/paolo/Software/miniconda/include -arch
> x86_64 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
> -I/Users/paolo/Software/miniconda/include/python2.7 -c test.c -o
> build/temp.macosx-10.6-x86_64-2.7/test.o -fopenmp
> clang: error: unsupported option '-fopenmp'
> error: command 'gcc' failed with exit status 1
>
>
I get the "-fopenmp" error, but compilation then proceeds without a
problem. I don't think that openmp is used (does anyone know how I would
know for sure?), but that's fine for my needs. As Stephan has reported, you
can get an omp-enabled gcc from homebrew, but I don't use that, because
when I do any real data analysis, I do it on AWS ubuntu machines anyway.

I am not sure why your clan exits with status 1, but I don't get that on my
machine. What do you see when you run `gcc --version`?


> 2.  compile with gcc provided by anaconda
> ---------------------------------------------------------
> $ python setup.py build_ext --inplace
> Compiling test.pyx because it changed.
> Cythonizing test.pyx
> running build_ext
> building 'test' extension
> gcc -fno-strict-aliasing -I/Users/paolo/Software/miniconda/include -arch
> x86_64 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
> -I/Users/paolo/Software/miniconda/include/python2.7 -c test.c -o
> build/temp.macosx-10.6-x86_64-2.7/test.o -fopenmp
> In file included from /Users/paolo/Software/minicond
> a/lib/gcc/x86_64-apple-darwin11.4.2/4.8.5/include-fixed/syslimits.h:7:0,
>                  from /Users/paolo/Software/minicond
> a/lib/gcc/x86_64-apple-darwin11.4.2/4.8.5/include-fixed/limits.h:34,
>                  from /Users/paolo/Software/minicond
> a/include/python2.7/Python.h:19,
>                  from test.c:16:
> /Users/paolo/Software/miniconda/lib/gcc/x86_64-apple-
> darwin11.4.2/4.8.5/include-fixed/limits.h:168:61: fatal error: limits.h:
> No such file or directory
>  #include_next <limits.h>  /* recurse down to the real one */
> compilation terminated.
> error: command 'gcc' failed with exit status 1
>
>
Yep. I see a similar error with Anaconda gcc as well.

Cheers,

Ariel


>
>
> On Wed, Dec 14, 2016 at 4:51 PM, Eleftherios Garyfallidis <
> elef at indiana.edu> wrote:
>
>>
>> Hi Emanuele,
>>
>> My understanding is that openmp was only temporarily not available when
>> clang replaced gcc in osx.
>>
>> So, I would suggest to go ahead with openmp. Any current installation
>> issues are only temporarily for osx.
>> Openmp gives us a lot of capability to play with shared memory and it is
>> a standard that will be around
>> for very long time. Also, the great integration in cython makes the
>> algorithms really easy to read.
>> So, especially for this project my recommendation is to use openmp rather
>> than multiprocessing. All the way! :)
>>
>> I am CC'ing Stephan who wrote the instructions for osx. I am sure he can
>> help you with this. I would also suggest
>> to check if xcode provides any new guis for enabling openmp. I remember
>> there was something for that.
>>
>> Laterz!
>> Eleftherios
>>
>>
>>
>>
>> On Wed, Dec 14, 2016 at 6:29 AM Emanuele Olivetti <olivetti at fbk.eu>
>> wrote:
>>
>>> Hi Eleftherios,
>>>
>>> Thank you for pointing me to the MDF example. From what I see the Cython
>>> syntax is not complex, which is good.
>>>
>>> My only concern is the availability of OpenMP in the systems where DiPy
>>> is used. On a reasonably recent GNU/Linux machine it seems straightforward
>>> to have libgomp and the proper version of gcc. On other systems - say OSX -
>>> the situation is less clear to me. According to what I read here
>>>   http://nipy.org/dipy/installation.html#openmp-with-osx
>>> the OSX installation steps are not meant for standard end users. Are
>>> those instructions updated?
>>> As a test of that, we've just tried to skip the steps described above
>>> and instead to install gcc with conda on OSX ("conda install gcc"). In the
>>> process, conda installed the recent gcc-4.8 with libgomp, which seems good
>>> news. Unfortunately, when we tried to compile a simple example of Cython
>>> code using parallelization (see below), the process failed (fatal error:
>>> limits.h : no such file or directory)...
>>>
>>> For the reasons above, I am wondering whether the very simple solution
>>> of using the "multiprocessing" module, available from the standard Python
>>> library, may be an acceptable first step towards the more efficient
>>> multithreading of Cython/libgomp. With "multiprocessing", there is no extra
>>> dependency on libgomp, or recent gcc or else. Moreover, multiprocessing
>>> does not require to have Cython code, because it works on plain Python too.
>>>
>>> Best,
>>>
>>> Emanuele
>>>
>>> ---- test.pyx ----
>>> from cython import parallel
>>> from libc.stdio cimport printf
>>>
>>> def test_func():
>>>     cdef int thread_id = -1
>>>     with nogil, parallel.parallel(num_threads=10):
>>>         thread_id = parallel.threadid()
>>>         printf("Thread ID: %d\n", thread_id)
>>> -----
>>>
>>> ----- setup.py -----
>>> from distutils.core import setup, Extension
>>> from Cython.Build import cythonize
>>>
>>> extensions = [Extension(
>>>                 "test",
>>>                 sources=["test.pyx"],
>>>                 extra_compile_args=["-fopenmp"],
>>>                 extra_link_args=["-fopenmp"]
>>>             )]
>>>
>>> setup(
>>>     ext_modules = cythonize(extensions)
>>> )
>>> ----
>>> python setup.py build_ext --inplace
>>>
>>> On Tue, Dec 13, 2016 at 11:17 PM, Eleftherios Garyfallidis <
>>> elef at indiana.edu> wrote:
>>>
>>> Hi Emanuele,
>>>
>>> Here is an example of how we calculated the distance matrix in parallel
>>> (for the MDF) using OpenMP
>>> https://github.com/nipy/dipy/blob/master/dipy/align/bundlemin.pyx
>>>
>>> You can just add another function that does the same using mam. It
>>> should be really easy to implement as we have
>>> already done it for the MDF for speeding up SLR.
>>>
>>> Then we need to update the bundle_distances* functions to use the
>>> parallel versions.
>>>
>>> I'll be happy to help you with this. Let's try to schedule some time to
>>> look at this together.
>>>
>>> Best regards,
>>> Eleftherios
>>>
>>>
>>> On Mon, Dec 12, 2016 at 11:16 AM Emanuele Olivetti <olivetti at fbk.eu>
>>> wrote:
>>>
>>> Hi,
>>>
>>> I usually compute the distance matrix between two lists of streamlines
>>> using bundle_distances_mam() or bundle_distances_mdf(). When the lists are
>>> large, it is convenient and easy to exploit the multiple cores of the CPU
>>> because such computation is intrinsically (embarassingly) parallel. At the
>>> moment I'm doing it through the multiprocessing or the joblib modules,
>>> because I cannot find a way directly from DiPy, at least according to what
>>> I see in dipy/tracking/distances.pyx . But consider that I am not
>>> proficient in cython.parallel.
>>>
>>> Is there a preferable way to perform such parallel computation? I plan
>>> to prepare a pull request in future and I'd like to be on the right track.
>>>
>>> Best,
>>>
>>> Emanuele
>>>
>>> _______________________________________________
>>> Neuroimaging mailing list
>>> Neuroimaging at python.org
>>> https://mail.python.org/mailman/listinfo/neuroimaging
>>>
>>>
>>> _______________________________________________
>>> Neuroimaging mailing list
>>> Neuroimaging at python.org
>>> https://mail.python.org/mailman/listinfo/neuroimaging
>>>
>>>
>>> _______________________________________________
>>> Neuroimaging mailing list
>>> Neuroimaging at python.org
>>> https://mail.python.org/mailman/listinfo/neuroimaging
>>>
>>
>> _______________________________________________
>> Neuroimaging mailing list
>> Neuroimaging at python.org
>> https://mail.python.org/mailman/listinfo/neuroimaging
>>
>>
>
>
> --
> -------------------------------------------------------
> Paolo Avesani
> Fondazione Bruno Kessler
> via Sommarive 18,
> 38050 Povo (TN) - I
> phone:   +39 0461 314336 <+39%200461%20314336>
> fax:        +39 0461 302040 <+39%200461%20302040>
> email:     avesani at fbk.eu
> web:       avesani.fbk.eu
>
>
> _______________________________________________
> Neuroimaging mailing list
> Neuroimaging at python.org
> https://mail.python.org/mailman/listinfo/neuroimaging
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/neuroimaging/attachments/20161214/44692965/attachment.html>


More information about the Neuroimaging mailing list