[Numpy-discussion] Bytes vs. Unicode in Python3

Fri Nov 27 11:04:57 EST 2009

A Friday 27 November 2009 16:41:04 Pauli Virtanen escrigué:
> > > I think so.  However, I think S is probably closest to bytes... and
> > > maybe S can be reused for bytes... I'm not sure though.
> >
> > That could be a good idea because that would ensure compatibility with
> > existing NumPy scripts (i.e. old 'string' dtypes are mapped to 'bytes',
> > as it should).  The only thing that I don't like is that that 'S' seems
> > to be the initial letter for 'string', which is actually 'unicode' in
> > Python 3 :-/ But, for the sake of compatibility, we can probably live
> > with that.
> 
> Well, we can "deprecate" 'S' (ie. never show it in repr, always only 'B'
> or 'U').

Well, deprecating 'S' seems a sensible option too.  But why only avoiding 
showing it in repr?  Why not issue a DeprecationWarning too?

> > > Also, what will a bytes dtype mean within a py2 program context?  Does
> > > it matter if the bytes dtype just fails somehow if used in a py2
> > > program?
> >
> > Mmh, I'm of the opinion that the new 'bytes' type should be available
> > only with NumPy for Python 3.  Would that be possible?
> 
> I don't see a problem in making a bytes_ scalar type available for
> Python2. In fact, it would be useful for making upgrading to Py3 easier.

I think introducing a bytes_ scalar dtype can be somewhat confusing for Python 
2 users.  But if the 'S' typecode is to be deprecated also for NumPy for 
Python 2, then it makes perfect sense to introduce bytes_ there too.

-- 
Francesc Alted