how to handle surrogate encoding: read from fs write to database

Random832 random832 at fastmail.com
Sun Jun 12 13:59:44 EDT 2016


On Sun, Jun 12, 2016, at 12:50, Steven D'Aprano wrote:
> I think Windows also gets it almost write: NTFS uses UTF-16, and (I
> think) only allow valid Unicode file names.

Nope. Windows allows any sequence of 16-bit units (except for a dozen or
so ASCII characters) in filenames.

Of course, you're not particularly _likely_ to encounter invalid
surrogates, since nothing is going to create them without deliberately
setting out to (unlike Linux where 'invalid' filenames will be created
by any program using the 'wrong' locale).



More information about the Python-list mailing list