Why is array.array('u') deprecated?

Peter Otten __peter__ at web.de
Fri May 8 07:35:30 EDT 2015


jonathan.slenders at gmail.com wrote:

> Le vendredi 8 mai 2015 12:29:15 UTC+2, Steven D'Aprano a écrit :
>> On Fri, 8 May 2015 07:14 pm, jonathan.slenders wrote:
>> 
>> > Why is array.array('u') deprecated?
>> > 
>> > Will we get an alternative for a character array or mutable unicode
>> > string?
>> 
>> 
>> Good question.
>> 
>> Of the three main encodings for Unicode, two are variable-width:
>> 
>> * UTF-8 uses 1-4 bytes per character
>> * UTF-16 uses 2 or 4 bytes per character
>> 
>> while UTF-32 is fixed-width (4 bytes per character). So you could try
>> faking it with a 32-bit array and filling it with
>> string.encode('utf-32').
> 
> 
> I guess that doesn't work. I need to have something that I can pass to the
> re module for searching through it. Creating new strings all the time is
> no option. (Think about gigabyte strings.)

Can you expand a bit on how array("u") helps here? Are the matches in the 
gigabyte range?




More information about the Python-list mailing list