UTF-8 question from Dive into Python 3

Tim Roberts timr at probo.com
Wed Jan 19 02:21:14 EST 2011


Tim Harig <usernet at ilthio.net> wrote:
>On 2011-01-17, carlo <sysengp2p at gmail.com> wrote:
>
>> 2- If that were true, can you point me to some documentation about the
>> math that, as Mark says, demonstrates this?
>
>It is true because UTF-8 is essentially an 8 bit encoding that resorts
>to the next bit once it exhausts the addressible space of the current
>byte it moves to the next one.  Since the bytes are accessed and assessed
>sequentially, they must be in big-endian order.

You were doing excellently up to that last phrase.  Endianness only applies
when you treat a series of bytes as a larger entity.  That doesn't apply to
UTF-8.  None of the bytes is more "significant" than any other, so by
definition it is neither big-endian or little-endian.
-- 
Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.



More information about the Python-list mailing list