[Numpy-discussion] using loadtxt to load a text file in to a numpy array

Chris Barker chris.barker at noaa.gov
Fri Jan 17 15:30:06 EST 2014


On Fri, Jan 17, 2014 at 5:18 AM, Freddie Witherden <freddie at witherden.org>wrote:

> In terms of HDF5 it is interesting to look at how h5py -- which has to
>  go between NumPy types and HDF5 conventions -- handles the problem as
> described here:
>
>   http://www.h5py.org/docs/topics/strings.html


from that:
"""All strings in HDF5 hold encoded text.

You can’t store arbitrary binary data in HDF5 strings.
"""

This is actually the same as a py3 string (though the mechanism may be
completely different), and the problem with numpy's 'S' - is it text or
bytes? Given the name and history, it should be text, but apparently people
have been using t for bytes, so we have to keep that meaning/use case. But
I suggest, that like Python3 -- we official declare that you should not
consider it text, and not do any implicite conversions.

Which means we could use a one-byte-per-character text dtype.

"""At the high-level interface, h5py exposes three kinds of strings. Each
maps to a specific type within Python (but see str_py3 below):

Fixed-length ASCII (NumPy S type)
....
"""
This is wrong, or mis-guided, or maybe only a little confusing -- 'S' is
not an ASCII string (even though I wish it were...). But clearly the HDF
folsk think we need one!

"""
Fixed-length ASCII

These are created when you use numpy.string_:

>>> dset.attrs["name"] = numpy.string_("Hello")

or the S dtype:

>>> dset = f.create_dataset("string_ds", (100,), dtype="S10")
"""
Pardon my py3 ignorance -- is numpy.string_ the same as 'S' in py3?
Form another post, I thought you'd need to use numpy.bytes_ (which is the
same on py2)

"""Variable-length ASCII

These are created when you assign a byte string to an attribute:

>>> dset.attrs["attr"] = b"Hello"
or when you create a dataset with an explicit “bytes” vlen type:

>>> dt = h5py.special_dtype(vlen=bytes)
>>> dset = f.create_dataset("name", (100,), dtype=dt)

Note that they’re not fully identical to Python byte strings.
"""
This implies that HDF would be well served by an ascii text type.

"""
What about NumPy’s U type?

NumPy also has a Unicode type, a UTF-32 fixed-width format (4-byte
characters). HDF5 has no support for wide characters. Rather than trying to
hack around this and “pretend” to support it, h5py will raise an error when
attempting to create datasets or attributes of this type.
"""

Interesting, though  I think irrelevant to this conversation  but it would
be nice if HDFpy would encode/decode to/from utf-8 for these.

-Chris
















> which IMHO got it about right.
>
> Regards, Freddie.
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140117/a230d45e/attachment.html>


More information about the NumPy-Discussion mailing list