How to waste computer memory?

Rustom Mody rustompmody at gmail.com
Sun Mar 20 02:20:47 EDT 2016


On Sunday, March 20, 2016 at 10:32:07 AM UTC+5:30, Steven D'Aprano wrote:
> On Sun, 20 Mar 2016 03:12 am, Marko Rauhamaa wrote:
> 
> > Steven D'Aprano :
> > 
> >> On Sun, 20 Mar 2016 02:02 am, Marko Rauhamaa wrote:
> >>> Yes, but UTF-16 produces 16-bit values that are outside Unicode.
> >>
> >> Show me.
> >>
> >> Before you answer, if your answer is "surrogate pairs", that is
> >> incorrect. Surrogate pairs is how UTF-16 encodes astral characters.
> > 
> > UTF-16 inputs a Unicode stream and produces a stream of 16-bit numbers.
> > Thus, the output of UTF-16 is not Unicode.
> 
> I'm not sure what point you think you are making.
> 
> Unicode (the character set part of it) is a set of abstract 23-bit numbers,

23? Or 21?
AIUI if the 'least-count' is 1 its 21
If its 8 its 24
If its 16 its 32

More pertinently if the number of bits signifies, whatever is the sense of
the word 'abstract'?

> or code points, representing (among other things) characters, and numbered
> from U+0000 to U+10FFFF. Any UTF is, by definition, a transformation from
> such abstract code points to sequences of machine words or bytes (and vice
> versa). What's your point?

I think its more useful to think of data transformations between formats
Rather than calling one format more abstract than another



More information about the Python-list mailing list