[Tutor] How does len() compute length of a string in UTF-8, 16, and 32?
Ben Finney
ben+python at benfinney.id.au
Mon Aug 7 23:01:22 EDT 2017
boB Stepp <robertvstepp at gmail.com> writes:
> How is len() getting these values?
By asking the objects themselves to report their length. You are
creating different objects with different content::
>>> s = 'Hello!'
>>> s_utf8 = s.encode("UTF-8")
>>> s == s_utf8
False
>>> s_utf16 = s.encode("UTF-16")
>>> s == s_utf16
False
>>> s_utf32 = s.encode("UTF-32")
>>> s == s_utf32
False
So it shouldn't be surprising that, with different content, they will
have different length::
>>> type(s), len(s)
(<class 'str'>, 6)
>>> type(s_utf8), len(s_utf8)
(<class 'bytes'>, 6)
>>> type(s_utf16), len(s_utf16)
(<class 'bytes'>, 14)
>>> type(s_utf32), len(s_utf32)
(<class 'bytes'>, 28)
What is it you think ‘str.encode’ does?
--
\ “In the long run, the utility of all non-Free software |
`\ approaches zero. All non-Free software is a dead end.” —Mark |
_o__) Pilgrim, 2006 |
Ben Finney
More information about the Tutor
mailing list