Lies in education [was Re: The "loop and a half"]

Peter J. Holzer hjp-usenet3 at hjp.at
Fri Oct 13 06:06:50 EDT 2017


On 2017-10-13 05:28, Gregory Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Grant Edwards wrote:
>> On 2017-10-13, Stefan Ram <ram at zedat.fu-berlin.de> wrote:
>>>      1 byte
>>>
>>>      addressable unit of data storage large enough to hold
>>>      any member of the basic character set of the execution
>>>      environment«
>>>
>>>    ISO C standard
>
> Hmmm. So an architecture with memory addressed in octets
> and Unicode as the basic character set would have a
> char of 8 bits and a byte of 32 bits?

No, because a char is also "large enough to store any member of the
basic execution character set. (§6.2.5). A "byte" is just the amount of
storage a "char" occupies:

| The sizeof operator yields the size (in bytes) of its operand
[...]
| When applied to an operand that has type char, unsigned char, or signed
| char, (or a qualified version thereof) the result is 1. 
    (§6.5.3.4)

So if a C implementation used Unicode as the base character set, a byte
would have to be at least 21 bits, a char the same, and all other types
would have to be multiples of that. For any modern architecture that
would be rounded up to 32 bits. (I am quite certain that there was at
least one computer with a 21 bit word size, but I can't find it: Lots of
18 bit and 24 bit machines, but nothing in between.)

An implementation could also choose the BMP as the base character set
and the rest of Unicode as the extended character set. That would result
in a 16 bit byte and char (and most likely UTF-16 as the multibyte
character representation).


> Not only does "byte" not always mean "8 bits", but
> "char" isn't always short for "character"...

True. A character often occupies more space than a char, and you can
store non-character data in a char.

        hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp at hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel



More information about the Python-list mailing list