[Tutor] Tutor Digest, Vol 162, Issue 25

Jaroslaw Michalski jm77tmp at gmail.com
Wed Aug 9 11:37:31 EDT 2017


>
> > The machine I'm on here is writing big endian UTF-16 and UTF-32.
> >
> > As you note, the 16 and 32 forms are (6 + 1) times 2 or 4 respectively. This
> > is because each encoding has a leading byte order marker to indicate the big
> > endianness or little endianness. For big endian data that is \xff\xfe; for
> > little endian data it would be \xfe\xff.
>
> The arithmetic as I mentioned in my original post is what I am
> expecting in "bytes", but my current thinking is that if I have for
> the BOM you point out "\xff\xfe", I translate that as 4 hex digits,
> each having 16 bits, for a total of 64 bits or 8 bytes.  What am I
> misunderstanding here?  Is a definition of "byte" meaning something
> other than 8 bits here?  I vaguely recall reading somewhere that
> "byte" can mean different numbers of bits in different contexts.
>
> And is len() actually counting "bytes" or something else for these encodings
>

Hi there.
"\xff\xfe" it looks like 4 hex digits, but 1 hex digit can be
represented by only 4 binary digits.
1 byte is 8 bits, so 2 hex digits per 1 byte
xff is one byte and you've got 2 hex digits.
xf (as a hex digit) can be represented as b1111 (binary format)
xff\xfe in a binary format is b1111 1111 \ b1111 1110
4 hex digits = 16 bits = 2 bytes
for example xff is b11111111 = 255 and it's the biggest number when
you use only 8 bits (1byte)
The problem was that 16 bits per a hex digit is not true. One hex
digit is only 4 bits, from b0000 or x0 to b1111 or xf (dec 15)
I hope it's clear now.


More information about the Tutor mailing list