byte count unicode string
willie
willie at jamots.com
Wed Sep 20 03:22:23 EDT 2006
Marc 'BlackJack' Rintsch:
>In <mailman.313.1158732191.10491.python-l... at python.org>, willie wrote:
>> # What's the correct way to get the
>> # byte count of a unicode (UTF-8) string?
>> # I couldn't find a builtin method
>> # and the following is memory inefficient.
>> ustr = "example\xC2\x9D".decode('UTF-8')
>> num_chars = len(ustr) # 8
>> buf = ustr.encode('UTF-8')
>> num_bytes = len(buf) # 9
>That is the correct way.
# Apologies if I'm being dense, but it seems
# unusual that I'd have to make a copy of a
# unicode string, converting it into a byte
# string, before I can determine the size (in bytes)
# of the unicode string. Can someone provide the rational
# for that or correct my misunderstanding?
# Thanks.
More information about the Python-list
mailing list