[Python-Dev] Replacement for array.array('u')?

Steven D'Aprano steve at pearwood.info
Fri Mar 22 05:24:23 EDT 2019


On Fri, Mar 22, 2019 at 08:31:33PM +1300, Greg Ewing wrote:
> A poster on comp.lang.python is asking about array.array('u').
> He wants an efficient mutable collection of unicode characters
> that can be initialised from a string.
> 
> According to the docs, the 'u' code is deprecated and will be
> removed in 4.0, but no alternative is suggested.
> 
> Why is this being deprecated, instead of keeping it and making
> it always 32 bits? It seems like useful functionality that can't
> be easily obtained another way.

I can't answer any of those questions, but perhaps the poster can do 
this instead:

py> a = array('L', 'ℍℰâѵÿ Ϻεταł'.encode('utf-32be'))
py> a
array('L', [220266496, 807469056, 3791650816, 1963196416, 4278190080, 
536870912, 4194500608, 3036872704, 3288530944, 2969763840, 1107361792])

Getting the string out again is no harder:

py> bytes(a).decode('utf-32be')
'ℍℰâѵÿ Ϻεταł'

But having said that, it would be nice to have an array code which 
treated the values as single UTF-32 characters:

array('?', ['ℍ', 'ℰ', 'â', 'ѵ', 'ÿ', ' ', 'Ϻ', 'ε', 'τ', 'α', 'ł'])

if for no other reason than it looks nicer than a bunch of 32 bit ints.


-- 
Steven


More information about the Python-Dev mailing list