[Numpy-discussion] using loadtxt to load a text file in to a numpy array
Chris Barker
chris.barker at noaa.gov
Thu Jan 23 16:51:14 EST 2014
On Thu, Jan 23, 2014 at 12:10 PM, <josef.pktd at gmail.com> wrote:
> > Exactly -- but what should those conversion/casting rules be? We can't
> > decide that unless we decide if 'S' is for text or for arbitrary bytes
> -- it
> > can't be both. I say text, that's what it's mostly trying to do already.
> But
> > if it's bytes, fine, then some things still need cleaning up, and we
> could
> > really use a one-byte-text type. and if it's text, then we may need a
> bytes
> > dtype.
>
> (remember I'm just a balcony muppet)
>
me too ;-)
> As far as I understand all codecs have the same ascii part.
nope -- certainly not multi-byte codecs. And one of the key points of utf-8
is that the ascii part is compatible -- none of teh other full-unicode
encoding are.
many of the one-byte-per-char ones do share the ascii part, but not all, or
not completely.
So I would
> cast on ascii and raise on anything else.
>
still a fine option -- clearly defined and quite useful for scientific
text. However, I would prefer latin-1 -- that way you might get garbage
for the non-ascii parts, but it wouldn't raise an exception and it
round-trips through encoding/decoding. And you would have a somewhat more
useful subset -- including the latin-language character and symbols like
the degree symbol, etc.
> or follow whatever the convention of numpy is:
>
> >>> s = -256
> >>> np.array((s,), dtype=np.uint8)[0] == s
> False
> >>> s = -1
> >>> np.array((s,), dtype=np.uint8)[0] == s
> False
>
I think text is distinct enough from numbers that we don't need to do
that same thing -- and this is result of well-defined casting rules built
into the compiler (and hardware?) for the numeric types. I dont hink we
have either the standard or compiler support for text conversions like that.
-CHB
PS: this is interesting, on py2:
In [176]: a = np.array((2222,), dtype='S')
In [177]: a
Out[177]:
array(['2'],
dtype='|S1')
It converts it to a string, but only grabs the first character? (is
it determining the size before converting to a string?
and this:
In [182]: a = np.array(2222, dtype='S')
In [183]: a
Out[183]:
array('2222',
dtype='|S24')
24 ? where did that come from?
>
> Josef
>
> >
> > Key here is that we don't have the option of not breaking anything,
> because
> > there is a lot already broken.
> >
> > -Chris
> >
> >
> > --
> >
> > Christopher Barker, Ph.D.
> > Oceanographer
> >
> > Emergency Response Division
> > NOAA/NOS/OR&R (206) 526-6959 voice
> > 7600 Sand Point Way NE (206) 526-6329 fax
> > Seattle, WA 98115 (206) 526-6317 main reception
> >
> > Chris.Barker at noaa.gov
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140123/223e3b67/attachment.html>
More information about the NumPy-Discussion
mailing list