[Numpy-discussion] Py3 merge
Michael Droettboom
mdroe at stsci.edu
Mon Dec 7 09:50:20 EST 2009
Pauli Virtanen wrote:
> ma, 2009-12-07 kello 09:12 -0500, Michael Droettboom kirjoitti:
>
>>> We need character arrays for the astro people. I assume these will be
>>> byte arrays. Maybe Michael will weigh in here.
>>>
>> I can't find in the thread where removing byte arrays (meaning arrays of
>> fixed-length non-unicode strings) was suggested -- though changing the
>> dtype specifier for them was. That is 'S' would change to 'B' in
>> python3 (with some deprecation period for 'S'), and 'U' would remain
>> 'U'. That seems acceptable to me, as long as we have some way to have
>> fixed-length 8-bit strings. Hopefully all the new chararray unit tests
>> will help with this transition.
>>
>
> Removal was suggested, with the motivation that people should just use
> byte arrays instead. I think we're not going to remove it at the moment,
> though.
>
Maybe I'm missing something, but those don't seem the same thing. The
byte type is fundamentally numeric, whereas byte strings are
lexicographic. They construct, repr and sort differently, and many
numerical operations don't make sense on strings. It doesn't seem like
(at present) byte arrays are a reasonable substitute for string arrays.
> The character 'B' is already by unsigned bytes -- I wonder if it's easy
> to support 'B123' and plain 'B' at the same time, or whether we have to
> pick a different letter for "byte strings". 'y' would be free...
>
It seems to me the motivation to change the 'S' dtype to something else
is to make things clearer with respect to the new conventions of Python
3. (Where str -> bytes, and unicode -> str). In that sense, I'm not
sure there's any advantage going from "S" to "y" (particularly without
doing "U" to "S"), whereas there's a strong backward-compatibility
advantage to keep it as "S", though admittedly it's confusing to someone
who doesn't know the pre Python 3 history.
I'm not sure your suggestion of making 'B' and 'B123' both work seems
like a good one because of the semantic differences between numbers and
strings. Would np.array(['a', 'b']) have a repr of [97, 98] or ['a',
'b']? Sorting them would also not necessarily do the right thing.
> The chararray unit tests are all presently failing, so they are
> definitely useful :)
>
Glad to help :)
Mike
--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA
More information about the NumPy-Discussion
mailing list