byte count unicode string

Duncan Booth duncan.booth at invalid.invalid
Wed Sep 20 04:33:34 EDT 2006


"MonkeeSage" <MonkeeSage at gmail.com> wrote:

> John Machin wrote:
>> The answer is, "You can't", and the rationale would have to be that
>> nobody thought of a use case for counting the length of the UTF-8  form
>> but not creating the UTF-8 form. What is your use case?
> 
> Playing DA here, what if you need to send the byte-count on a server
> via a header, but need the utf8 representation for the actual data?

Then you still need both the data and its length. John asked for an example 
where you need only the length and not the data itself.

I guess you could invent something like inserting a string into a database 
which has fixed size fields, silently truncates fields which are too long 
and stores the strings internally in utf-8 but only accepts ucs-2 in its 
interface. Pretty far fetched, but if it exists I suspect that an extra 
utf-8 encoding here or there is the least of your problems.



More information about the Python-list mailing list