How to waste computer memory?

Sun Mar 20 08:27:58 EDT 2016

On Sun, Mar 20, 2016 at 11:14 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>>> On the other hand, I believe that the output of the UTF transformations
>>> is explicitly described in terms of 8-bit bytes and 16- or 32-bit words.
>>> For instance, the UTF-8 encoding of "A" has to be a single byte with
>>> value 0x41 (decimal 65). It isn't that this is the most obvious
>>> implementation, its that it can't be anything else and still be UTF-8.
>>
>> Exactly. Aside from the way UTF-16 and UTF-32 have LE and BE variants,
>
> Blame the chip manufacturers for that. Actually, I think we can blame Intel
> specifically for that, for reversing the normal layout of words in memory.

No, I disagree; it's inherent in the notion of representing a 16-bit
or 32-bit value across bytes. Maybe there could have been one
most-common standard, but there'd still have been another way of doing
it. Little-endianness and big-endianness are important enough to have
to deal with.

ChrisA