How to waste computer memory?

Chris Angelico rosuav at gmail.com
Sat Mar 19 12:23:32 EDT 2016


On Sun, Mar 20, 2016 at 3:12 AM, Marko Rauhamaa <marko at pacujo.net> wrote:
> Steven D'Aprano <steve at pearwood.info>:
>
>> On Sun, 20 Mar 2016 02:02 am, Marko Rauhamaa wrote:
>>> Yes, but UTF-16 produces 16-bit values that are outside Unicode.
>>
>> Show me.
>>
>> Before you answer, if your answer is "surrogate pairs", that is
>> incorrect. Surrogate pairs is how UTF-16 encodes astral characters.
>
> UTF-16 inputs a Unicode stream and produces a stream of 16-bit numbers.
> Thus, the output of UTF-16 is not Unicode.

Then UTF-16 produces 16-bit values that have nothing whatsoever to do
with Unicode. Is that what you're saying? If so, you're correct;
UTF-16LE produces two bytes to represent every BMP character, and four
bytes to represent every non-BMP character, and those are not
themselves Unicode.

ChrisA



More information about the Python-list mailing list