byte count unicode string

Diez B. Roggisch deets at nospam.web.de
Wed Sep 20 04:26:20 EDT 2006


MonkeeSage schrieb:
> John Machin wrote:
>> The answer is, "You can't", and the rationale would have to be that
>> nobody thought of a use case for counting the length of the UTF-8  form
>> but not creating the UTF-8 form. What is your use case?
> 
> Playing DA here, what if you need to send the byte-count on a server
> via a header, but need the utf8 representation for the actual data?

So what - you need it in the end, don't you?

The runtime complexity of the calculation will be the same - you have to 
consider each character, so its O(n).

Of course you will roughly double the memory consumption - the original 
unicode being represented as UCS2 or UCS4.

But then - if that really is a problem, how would you work with that 
string anyway?

So you have to resort to slicing and computing the size of the parts, 
which will remedy that easily.

Diez



More information about the Python-list mailing list