[Numpy-discussion] proposal: smaller representation of string arrays

Thu Apr 20 14:59:44 EDT 2017

On Thu, Apr 20, 2017 at 8:17 PM Julian Taylor <jtaylor.debian at googlemail.com>
wrote:

> I probably have formulated my goal with the proposal a bit better, I am
> not very interested in a repetition of which encoding to use debate.
> In the end what will be done allows any encoding via a dtype with
> metadata like datetime.
> This allows any codec (including truncated utf8) to be added easily (if
> python supports it) and allows sidestepping the debate.
>
> My main concern is whether it should be a new dtype or modifying the
> unicode dtype. Though the backward compatibility argument is strongly in
> favour of adding a new dtype that makes the np.unicode type redundant.
>

Creating a new dtype to handle encoded unicode, with the encoding specified
in the dtype, sounds perfectly reasonable to me. Changing the behaviour of
the existing unicode dtype seems like it's going to lead to massive
headaches unless exactly nobody uses it. The only downside to a new type is
having to find an obvious name that isn't already in use. (And having to
actively  maintain/deprecate the old one.)

Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170420/95086bde/attachment.html>