[Neuroimaging] Planning for data formats - upcoming journal club

Tue Nov 9 13:08:31 EST 2021

I think it might be useful to separate two aspects

- data and metadata schema:
  - what data types to be stored (volume, surface, connectomes, ...)

  - what metadata should accompany data

  - data scope for an individual "file": there is a spectrum from
    an individual volume (e.g. single echo) to data acquisition session
    (that is what NWB aims for)

- data container
  - that is where HDF5/zarr etc would come in

Indeed current (and perspective future) landscape of neuroimaging,
past and currently present formats and use cases (as for which data to
store, and data and metadata access and modifications) should guide the
design.

Another point to keep in mind (speaking with my datalad and dandi
archives hats on) while deciding on a "container":  with a "rudimentary"
nii + .json in BIDS we kinda reached some nice trade-off for being able
to adjust/fix/expand metadata without causing changes to large files.
monolythic hdf5 and alike are creating a notable difficulty for such
modifications.  on the other end of spectrum, zarr et al - explode in
number of files to store, and thus likely to hit inode limits on
many systems quite quickly.

On Tue, 09 Nov 2021, Satrajit Ghosh wrote:

>    thank you matthew. happy to use any doc.
>    in the meantime a quick few pointers here: zarr, xarray, fsspec, intake
>    (all of these on the python side). let's also keep in mind compression and
>    chunking, two things the asdf model explicitly decided not to address. and
>    hdf5 now has serializable remote access on s3 stores both natively in the
>    library and through fsspec. 
>    cheers,

>    satra
>    On Tue, Nov 9, 2021 at 11:21 AM Matthew Brett <matthew.brett at gmail.com>
>    wrote:

>      Hi,

>      On Tue, Nov 9, 2021 at 4:12 PM Satrajit Ghosh <satra at mit.edu> wrote:

>      > hi matthew,

>      > that paper is a great place to start. is there a document that you
>      have where we can add thoughts/pointers to things that have been
>      developed or reevaluated between 2015 and now ?

>      I haven't got that far yet.  Anderson - I wonder whether your doc is
>      good place for that?  Or do we need another one, less focused on HDF5?

>      > quick clarification: is the scope of the discussion limited to certain
>      types of data (e.g, MRI, transforms - i think this was in the czi
>      proposal) or broadly speaking all things neuroimaging (e.g.
>      MEG/EEG/Microscopy/Genetics/Surfaces) or even more general (e.g.
>      nd-arrays, trees, graphs )?

>      Certainly surfaces - these are on the CZI proposal - but I was - at
>      the moment - thinking of MRI / CT / PET / transforms.  I haven't
>      checked out the BIDS spec in detail - but I had naively imagined
>      something that would be compatible with that, and cover the same sort
>      of ground - if that proves interesting and necessary.

>      For example - I could imagine a version of the nice YaML / binary ASDF
>      format for neuroimaging data, that would have the advantage of being
>      easily human-readable, being a single file to make copying and sharing
>      easier, and allowing formal validation against a JSON schema.  But
>      that's really way ahead of where I am now, personally - I've got lots
>      of reading and listening to do.

>      Cheers,

>      Matthew
>      _______________________________________________
>      Neuroimaging mailing list
>      Neuroimaging at python.org
>      https://mail.python.org/mailman/listinfo/neuroimaging

> _______________________________________________
> Neuroimaging mailing list
> Neuroimaging at python.org
> https://mail.python.org/mailman/listinfo/neuroimaging

-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
WWW:   http://www.linkedin.com/in/yarik