byte count unicode string

willie willie at jamots.com
Wed Sep 20 03:22:23 EDT 2006


Marc 'BlackJack' Rintsch:

 >In <mailman.313.1158732191.10491.python-l... at python.org>, willie wrote:
 >> # What's the correct way to get the
 >> # byte count of a unicode (UTF-8) string?
 >> # I couldn't find a builtin method
 >> # and the following is memory inefficient.

 >> ustr = "example\xC2\x9D".decode('UTF-8')

 >> num_chars = len(ustr)    # 8

 >> buf = ustr.encode('UTF-8')

 >> num_bytes = len(buf)     # 9

 >That is the correct way.


# Apologies if I'm being dense, but it seems
# unusual that I'd have to make a copy of a
# unicode string, converting it into a byte
# string, before I can determine the size (in bytes)
# of the unicode string. Can someone provide the rational
# for that or correct my misunderstanding?

# Thanks.



More information about the Python-list mailing list