[Python-Dev] Python-3.0, unicode, and os.environ

Hagen Fürstenau hfuerstenau at gmx.net
Sun Dec 7 10:35:15 CET 2008


>> As far as I can see all Python Unicode strings can be encoded to UTF-8,
>> even things like lone surrogates because Python doesn't care about them.
>> So both the Unicode API and the binary API would be fail-safe on Windows.
> 
> Python is broken and needs to be fixed.
> 
> http://bugs.python.org/issue3672
> http://bugs.python.org/issue3297

But the question of whether Python should care about lone surrogates or
not is at best tangential to the issue at hand.  If you have lone
surrogates in the Unicode API (and didn't raise an exception on the way
getting there), then the sensible thing is to encode them into lone
UTF-8 surrogates.  Even if you wanted to prevent lone surrogates,
encoding to UTF-8 for the binary API would not be the place to enforce it.

- Hagen


More information about the Python-Dev mailing list