how to handle surrogate encoding: read from fs write to database

Marko Rauhamaa marko at pacujo.net
Sun Jun 12 15:08:54 EDT 2016


Random832 <random832 at fastmail.com>:

> On Sun, Jun 12, 2016, at 12:50, Steven D'Aprano wrote:
>> I think Windows also gets it almost write: NTFS uses UTF-16, and (I
>> think) only allow valid Unicode file names.
>
> Nope. Windows allows any sequence of 16-bit units (except for a dozen or
> so ASCII characters) in filenames.

Also, somewhat related, Python allows strings to contain non-Unicode
code points, namely code points in the surrogate hole. Thus, Python's
native character set is a superset of Unicode.


Marko



More information about the Python-list mailing list