About size of Unicode string

Mon Jun 13 06:48:07 EDT 2005

Frank Abel Cancio Bello wrote:

> Can I get how many bytes have a string object independently of its encoding?

strings hold characters, not bytes.  an encoding is used to convert a
stream of characters to a stream of bytes.   if you need to know the
number of bytes needed to hold an encoded string, you need to know
the encoding.

(and in some cases, including UTF-8, you need to *do* the encoding
before you can tell how many bytes you get)

> Is the "len" function the right way of get it?

len() on the encoded string, yes.

> Laci look the following code:
>
> import urllib2
> request = urllib2.Request(url= 'http://localhost:6000')
> data = 'data to send\n'.encode('utf_8')
> request.add_data(data)
> request.add_header('content-length', str(len(data)))
> request.add_header('content-encoding', 'UTF-8')
> file = urllib2.urlopen(request)
>
> Is always true that "the size of the entity-body" is "len(data)"
> independently of the encoding of "data"?

your data variable contains bytes, not characters, so the answer is "yes".

on the other hand, that add_header line isn't really needed -- if you leave
it out, urllib2 will add the content-length header all by itself.

</F>