[Python-Dev] Replacement for array.array('u')?

Victor Stinner vstinner at redhat.com
Fri Mar 22 03:45:03 EDT 2019


Hi,

Internally, CPython has a _PyUnicodeWriter which is an efficient way
to create a string but appending substrings or characters.
_PyUnicodeWriter changes the internal storage format depending on
characters code points (ascii or latin1: 1 byte/character, BMP: 2 b/c,
full UCS: 4 b/c). I tried once to expose it in Python, but I wasn't
convinced by performances. The overhead of method calls was quite
significant, and I wasn't convinced by "writer += str" performance
neither. Maybe I should try again. PyPy also has such object. It
avoids the "str += str" hack in ceval.c to avoid very poor performance
(_PyUnicodeWriter also uses overallocation which can be controlled
with multiple parameters to reduce the number of realloc).

Another alternative would be have to add a "strarray" type similar to
bytes/bytearray couple.

Is is what you are looking for? Or do you really need array.array API?

Victor

Le ven. 22 mars 2019 à 08:38, Greg Ewing <greg.ewing at canterbury.ac.nz> a écrit :
>
> A poster on comp.lang.python is asking about array.array('u').
> He wants an efficient mutable collection of unicode characters
> that can be initialised from a string.
>
> According to the docs, the 'u' code is deprecated and will be
> removed in 4.0, but no alternative is suggested.
>
> Why is this being deprecated, instead of keeping it and making
> it always 32 bits? It seems like useful functionality that can't
> be easily obtained another way.
>
> --
> Greg
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/vstinner%40redhat.com



-- 
Night gathers, and now my watch begins. It shall not end until my death.


More information about the Python-Dev mailing list