[Numpy-discussion] using loadtxt to load a text file in to a numpy array

Chris Barker chris.barker at noaa.gov
Thu Jan 23 16:51:14 EST 2014


On Thu, Jan 23, 2014 at 12:10 PM, <josef.pktd at gmail.com> wrote:

> > Exactly -- but what should those conversion/casting rules be? We can't
> > decide that unless we decide if 'S' is for text or for arbitrary bytes
> -- it
> > can't be both. I say text, that's what it's mostly trying to do already.
> But
> > if it's bytes, fine, then some things still need cleaning up, and we
> could
> > really use a one-byte-text type.  and if it's text, then we may need a
> bytes
> > dtype.
>
> (remember I'm just a balcony muppet)
>

me too ;-)



> As far as I understand all codecs have the same ascii part.


nope -- certainly not multi-byte codecs. And one of the key points of utf-8
is that the ascii part is compatible -- none of teh other full-unicode
encoding are.

many of the one-byte-per-char ones do share the ascii part, but not all, or
not completely.

So I would
> cast on ascii and raise on anything else.
>

still a fine option -- clearly defined and quite useful for scientific
text. However, I would prefer latin-1 -- that way  you  might get garbage
for the non-ascii parts, but it wouldn't raise an exception and it
round-trips through encoding/decoding. And you would have a somewhat more
useful subset -- including the latin-language character and symbols like
the degree symbol, etc.


> or follow whatever the convention of numpy is:
>
> >>> s = -256
> >>> np.array((s,), dtype=np.uint8)[0] == s
> False
> >>> s = -1
> >>> np.array((s,), dtype=np.uint8)[0] == s
> False
>

I  think text is distinct enough from  numbers that we don't need to do
that same thing -- and this is result of well-defined casting rules built
into the compiler (and hardware?) for the numeric types. I dont hink we
have either the standard or compiler support for text conversions like that.

-CHB

PS: this is interesting, on py2:


In [176]: a = np.array((2222,), dtype='S')

In [177]: a
Out[177]:
array(['2'],
      dtype='|S1')

It converts it to a string, but only grabs the first character? (is
it determining the size before converting to a string?

and this:

In [182]: a = np.array(2222, dtype='S')

In [183]: a
Out[183]:
array('2222',
      dtype='|S24')

24 ? where did that come from?












>
> Josef
>
> >
> > Key here is that we don't  have the option of not breaking anything,
> because
> > there is a lot already broken.
> >
> > -Chris
> >
> >
> > --
> >
> > Christopher Barker, Ph.D.
> > Oceanographer
> >
> > Emergency Response Division
> > NOAA/NOS/OR&R            (206) 526-6959   voice
> > 7600 Sand Point Way NE   (206) 526-6329   fax
> > Seattle, WA  98115       (206) 526-6317   main reception
> >
> > Chris.Barker at noaa.gov
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140123/223e3b67/attachment.html>


More information about the NumPy-Discussion mailing list