[Numpy-discussion] String type again.

Aldcroft, Thomas aldcroft at head.cfa.harvard.edu
Fri Jul 18 12:06:57 EDT 2014


On Fri, Jul 18, 2014 at 11:10 AM, Julian Taylor <
jtaylor.debian at googlemail.com> wrote:

> On Thu, Jul 17, 2014 at 5:48 PM, Nathaniel Smith <njs at pobox.com> wrote:
> > On Tue, Jul 15, 2014 at 4:29 PM, Charles R Harris
> > <charlesr.harris at gmail.com> wrote:
> >> Thinking more about it, the easiest thing to do might be to make the S
> dtype
> >> a UTF-8 encoding. Most of the machinery to deal with that is already in
> >> place. That change might affect some users though, and we might need to
> do
> >> some work to make it backwards compatible with python 2.
> >
> > I'd be very concerned about backcompat for existing code that uses
> > e.g. "S128" as a dtype to mean "128 arbitrary bytes". An example is
> > this file format reading code:
> >    https://github.com/rerpy/rerpy/blob/master/rerpy/io/erpss.py#L123
> > The file format says there are 128 bytes there, and their
> > interpretation depends on other fields in the header -- but in one
> > case, for "large montages", there's an encoding where every 3 bytes
> > represents 4 characters using an ad hoc 6-bit character set:
> >    https://github.com/rerpy/rerpy/blob/master/rerpy/io/erpss.py#L133
> >
> > Perhaps this case could be handled better by using a u8 subarray or
> > something (that code also goes to some efforts to work around nul
> > padding), and that particular project hasn't been ported to py3 yet so
> > technically wouldn't be affected if we changed the meaning of "S" on
> > py3. But it does seem useful to have a "fixed length bytes" dtype even
> > in py3, and if we declare that be "S" then it avoids breaking any
> > existing code depending on it...
> >
>
> We break code either way.
> Either we break applications using S as string type, but now it
> becomes bytes in python3.
> Or we break applications treating S as byte type and we change it to
> string in python3.
>
> Unfortunately we missed the opportunity when adding python3 support to
> fix the same exact same bytes/text boundary issue which is the main
> reason why pythons3 exists in the first place.
> We should have made porting to numpy3 a intentionally(!) backward
> incompatible change just like python itself did.
>
> Now we are stuck with deciding, which option breaks less.
> On the one hand, that S is bytes in python3 is somewhat established by
> now and lots of workarounds are already place.
>

Removing workarounds is generally a good thing (!), and often not that hard
to do by numpy version number for libraries that need to support multiple
numpy versions.  It's never ideal to break compatibility, but in this case
it would be fixing something that is currently not working in a useful way.

- Tom


> On the other hand, I think code that relies on S being bytes is in the
> minority and python3 usage is probably still  insignificant in this
> area. Unfortunately getting actual numbers and not wild guesses on
> this is probably not easy.

_______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140718/38e1a661/attachment.html>


More information about the NumPy-Discussion mailing list