About size of Unicode string

Frank Abel Cancio Bello frankabel at tesla.cujae.edu.cu
Mon Jun 6 14:48:53 EDT 2005


Well I will repeat the question:

Can I get how many bytes have a string object independently of its encoding?
Is the "len" function the right way of get it?

Laci look the following code:

	import urllib2
	request = urllib2.Request(url= 'http://localhost:6000')
	data = 'data to send\n'.encode('utf_8')
	request.add_data(data)
	request.add_header('content-length', str(len(data)))
	request.add_header('content-encoding', 'UTF-8')
	file = urllib2.urlopen(request)

Is always true that "the size of the entity-body" is "len(data)"
independently of the encoding of "data"?


> -----Original Message-----
> From: Laszlo Zsolt Nagy [mailto:gandalf at geochemsource.com]
> Sent: Monday, June 06, 2005 1:43 PM
> To: Frank Abel Cancio Bello; python-list at python.org
> Subject: Re: About size of Unicode string
> 
> Frank Abel Cancio Bello wrote:
> 
> >Hi all!
> >
> >I need know the size of string object independently of its encoding. For
> >example:
> >
> >	len('123') == len('123'.encode('utf_8'))
> >
> >while the size of '123' object is different of the size of
> >'123'.encode('utf_8')
> >
> >More:
> >I need send in HTTP request a string. Then I need know the length of the
> >string to set the header "content-length" independently of its encoding.
> >
> >Any idea?
> >
> >
> This is from the RFC:
> 
> >
> > The Content-Length entity-header field indicates the size of the
> > entity-body, in decimal number of OCTETs, sent to the recipient or, in
> > the case of the HEAD method, the size of the entity-body that would
> > have been sent had the request been a GET.
> >
> >       Content-Length    = "Content-Length" ":" 1*DIGIT
> >
> >
> > An example is
> >
> >       Content-Length: 3495
> >
> >
> > Applications SHOULD use this field to indicate the transfer-length of
> > the message-body, unless this is prohibited by the rules in section
> > 4.4 <http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4>.
> >
> > Any Content-Length greater than or equal to zero is a valid value.
> > Section 4.4 describes how to determine the length of a message-body if
> > a Content-Length is not given.
> >
> Looks to me that the Content-Length header has nothing to do with the
> encoding. It is a very low levet stuff. The content length is given in
> OCTETs and it represents the size of the body. Clearly, it has nothing
> to do with MIME/encoding etc. It is about the number of bits transferred
> in the body. Try to write your unicode strings into a StringIO and take
> its length....
> 
>    Laci
> 
> 








More information about the Python-list mailing list