[Tutor] How does len() compute length of a string in UTF-8, 16, and 32?

Mon Aug 7 23:01:22 EDT 2017

boB Stepp <robertvstepp at gmail.com> writes:

> How is len() getting these values?

By asking the objects themselves to report their length. You are
creating different objects with different content::

    >>> s = 'Hello!'
    >>> s_utf8 = s.encode("UTF-8")
    >>> s == s_utf8
    False
    >>> s_utf16 = s.encode("UTF-16")
    >>> s == s_utf16
    False
    >>> s_utf32 = s.encode("UTF-32")
    >>> s == s_utf32
    False

So it shouldn't be surprising that, with different content, they will
have different length::

    >>> type(s), len(s)
    (<class 'str'>, 6)
    >>> type(s_utf8), len(s_utf8)
    (<class 'bytes'>, 6)
    >>> type(s_utf16), len(s_utf16)
    (<class 'bytes'>, 14)
    >>> type(s_utf32), len(s_utf32)
    (<class 'bytes'>, 28)

What is it you think ‘str.encode’ does?

-- 
 \              “In the long run, the utility of all non-Free software |
  `\      approaches zero. All non-Free software is a dead end.” —Mark |
_o__)                                                    Pilgrim, 2006 |
Ben Finney