byte count unicode string
Diez B. Roggisch
deets at nospam.web.de
Wed Sep 20 04:26:20 EDT 2006
MonkeeSage schrieb:
> John Machin wrote:
>> The answer is, "You can't", and the rationale would have to be that
>> nobody thought of a use case for counting the length of the UTF-8 form
>> but not creating the UTF-8 form. What is your use case?
>
> Playing DA here, what if you need to send the byte-count on a server
> via a header, but need the utf8 representation for the actual data?
So what - you need it in the end, don't you?
The runtime complexity of the calculation will be the same - you have to
consider each character, so its O(n).
Of course you will roughly double the memory consumption - the original
unicode being represented as UCS2 or UCS4.
But then - if that really is a problem, how would you work with that
string anyway?
So you have to resort to slicing and computing the size of the parts,
which will remedy that easily.
Diez
More information about the Python-list
mailing list