About size of Unicode string

Laszlo Zsolt Nagy gandalf at geochemsource.com
Mon Jun 6 13:42:49 EDT 2005


Frank Abel Cancio Bello wrote:

>Hi all!
>
>I need know the size of string object independently of its encoding. For
>example:
>
>	len('123') == len('123'.encode('utf_8'))
>
>while the size of '123' object is different of the size of
>'123'.encode('utf_8')
>
>More:
>I need send in HTTP request a string. Then I need know the length of the
>string to set the header "content-length" independently of its encoding.
>
>Any idea?
>  
>
This is from the RFC:

>
> The Content-Length entity-header field indicates the size of the 
> entity-body, in decimal number of OCTETs, sent to the recipient or, in 
> the case of the HEAD method, the size of the entity-body that would 
> have been sent had the request been a GET.
>
>       Content-Length    = "Content-Length" ":" 1*DIGIT
>  
>
> An example is
>
>       Content-Length: 3495
>  
>
> Applications SHOULD use this field to indicate the transfer-length of 
> the message-body, unless this is prohibited by the rules in section 
> 4.4 <http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4>.
>
> Any Content-Length greater than or equal to zero is a valid value. 
> Section 4.4 describes how to determine the length of a message-body if 
> a Content-Length is not given.
>
Looks to me that the Content-Length header has nothing to do with the 
encoding. It is a very low levet stuff. The content length is given in 
OCTETs and it represents the size of the body. Clearly, it has nothing 
to do with MIME/encoding etc. It is about the number of bits transferred 
in the body. Try to write your unicode strings into a StringIO and take 
its length....

   Laci




More information about the Python-list mailing list