[Numpy-discussion] proposal: smaller representation of string arrays

Robert Kern robert.kern at gmail.com
Mon Apr 24 22:23:55 EDT 2017


On Mon, Apr 24, 2017 at 7:07 PM, Nathaniel Smith <njs at pobox.com> wrote:

> That said, AFAICT what people actually want in most use cases is support
for arrays that can hold variable-length strings, and the only place where
the current approach is *optimal* is when we need mmap compatibility with
legacy formats that use fixed-width-nul-padded fields (at which point it's
super convenient). It's not even possible to *represent* all Python strings
or bytestrings in current numpy unicode or string arrays (Python
strings/bytestrings can have trailing nuls). So if we're talking about
tweaks to the current system it probably makes sense to focus on this use
case specifically.
>
> From context I'm assuming FITS files use fixed-width-nul-padding for
strings? Is that right? I know HDF5 doesn't.

Yes, HDF5 does. Or at least, it is supported in addition to the
variable-length ones.

https://support.hdfgroup.org/HDF5/doc/Advanced/UsingUnicode/index.html

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170424/a1666162/attachment.html>


More information about the NumPy-Discussion mailing list