[Neuroimaging] [EXTERNAL] Re: Re: CZI grant - what would you like to see in Nibabel?

Thu Jul 30 17:32:44 EDT 2020

Thanks!

From: Neuroimaging [mailto:neuroimaging-bounces+reid.robert=mayo.edu at python.org] On Behalf Of Emanuele Olivetti
Sent: Thursday, July 30, 2020 3:56 PM
To: Neuroimaging analysis in Python
Subject: [EXTERNAL] Re: [Neuroimaging] Re: CZI grant - what would you like to see in Nibabel?

On Thu, Jul 30, 2020 at 6:55 PM Reid, Robert I. (Rob) via Neuroimaging <neuroimaging at python.org<mailto:neuroimaging at python.org>> wrote:
[...]

I got a 404 error for Emanuele’s load_trk.py URL. Maybe things were rearranged?

My mistake, that was a private repo with some code in progress. I've made available the relevant (and self-contained) file from here:
  https://github.com/emanuele/load_trk.git

Could near numpy performance without resampling be achieved by storing the tractogram as a dict of numpy arrays keyed by the number of points in their streamlines?
e.g. {20: <numpy array of all the streamlines with 20 points>, 21: <numpy array of all the streamlines with 21 points>, …}

Good question and interesting suggestion.

Nevertheless, in most of our work, we need an easy way to access the coordinates of streamlines and to have a unique ID for each streamline. In years of experiments and coding, we almost always ended up with one of these two simple data structures:
1) In the case of a tractogram (T) where the streamlines have mixed number of points: a numpy.array of M elements (like M=10 millions) and dtype=np.object, where each element/object is a streamline, i.e. a matrix n x 3, where n is its number of points, that may change from streamline to streamline.
2) In the case of tractogram (T) where the streamlines all have the same number of points, e.g. 16: a numpy array M x 16 x 3, typically dtype=numpy.float32 to save some space (remember the 10M streamlines? It's 2Gb).

Why numpy.array? Mainly for the very convenient indexing property. First, each streamline has a unique ID (in the example above an int between 0 and 9999999), which is the position of the streamline in the array. So if, for example, we write an algorithm for nearest neighbour and compute the 100 nearest neighbours to the streamline having ID=123456, we just need a list or numpy.array (neighbours) of 100 integers to store the result. If we need to retrieve those neighbouring streamlines, it's just "T[neighbours]", which is also very fast.
Moreover, many algorithms in data analysis / machine learning operate directly on numpy.arrays.

A side note: At the repo above, I've just added a small test. Loading 4 million [*] streamlines with mixed number of points takes 1 minute with our load_streamlines() and 6 seconds with numpy.load() - not fat from what I reported in the previous message, where instead all streamlines had the same number of points.

Best,

Emanuele

[*]:In this case, the problem is the excessive use of RAM when doing numpy.save() - that's why I could test just 4 millions on my laptop.

FBK vi invita a leggere il suo Piano di rientro<https://trasparenza.fbk.eu/COVID-19-comunicazioni-del-Datore-di-Lavoro-raccomandazioni-e-altro/Piano-di-rientro-FBK/Piano-di-rientro-FBK> | FBK invites you to read its Premises Reopening Plan<https://trasparenza.fbk.eu/COVID-19-comunicazioni-del-Datore-di-Lavoro-raccomandazioni-e-altro/Piano-di-rientro-FBK/English-version_FBK-Reopening-Plan>.

--
Le informazioni contenute nella presente comunicazione sono di natura privata e come tali sono da considerarsi riservate ed indirizzate esclusivamente ai destinatari indicati e per le finalità strettamente legate al relativo contenuto. Se avete ricevuto questo messaggio per errore, vi preghiamo di eliminarlo e di inviare una comunicazione all’indirizzo e-mail del mittente.
--
The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you received this in error, please contact the sender and delete the material.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/neuroimaging/attachments/20200730/b864dc1a/attachment.html>