How to waste computer memory?

Steven D'Aprano steve at pearwood.info
Sat Mar 19 10:18:00 EDT 2016


On Sat, 19 Mar 2016 11:24 pm, BartC wrote about combining characters:

> So a string that looks like:
> 
> "ññññññññññññññññññññññññññññññññññññññññññññññññññ"
> 
> can have 2**50 different representations? 

Yes.

> And occupy somewhere between 50 and 200 bytes? Or is that 400?

The minimum storage would use a legacy encoding (like MacRoman, or Latin-1)
with the composed ñ character. That gives 50 x 1-byte characters, or 50
bytes.

The maximum storage would be if all 50 characters were decomposed into two
code points (giving 100 code points), and then stored as UTF-32, giving 400
bytes all up.


> OK...

You say that as if 400 bytes was a lot.

Besides, this is hardly any different from (say) a pure ASCIII version of
the "permille" (per thousand) symbol. In Unicode I can write ‰ (two bytes
in UTF-16) but in ASCII I am forced to write O/oo (four bytes), or
worse, "per thousand" (12 bytes). Imagine a string of "‰"*50, written in
ASCII, for a total of 600 bytes...

Yes, this is silly. Really, if you've got 50 ñ in a string, they take up the
space they take up, and memory is cheap. The days of thinking that 127
characters is all you need (7 bit ASCII) are long, long gone, just like the
days when it was appropriate for ints to be 16 bits.

When I first started programming, the default "integer" type in Pascal,
Forth and other languages was 16 bits, which meant that the largest number
you can represent in a calculation was 32767. My four-function calculator
had an 8 digit display and could calculate up to 99999999, while Pascal
choked on 32767. (Or 65536 if you used unsigned numbers.) Now, I routinely
and without hesitation generate thousand-plus bit numbers like 2**10000,
and my computer calculates and prints the result faster than I can enter
the calculation in the first place. Worrying about the fact that characters
use more than 8 bits is oh-so-1990s.


-- 
Steven




More information about the Python-list mailing list