[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Barry Scott barry at barrys-emacs.org
Wed Apr 29 18:41:16 EDT 2009


On 22 Apr 2009, at 07:50, Martin v. Löwis wrote:

>
> If the locale's encoding is UTF-8, the file system encoding is set to
> a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes
> (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF.
>

Forgive me if this has been covered. I've been reading this thread for  
a long time
and still have a 100 odd replies to go...

How do get a printable unicode version of these path strings if they  
contain
none unicode data?

I'm guessing that an app has to understand that filenames come in two  
forms
unicode and bytes if its not utf-8 data. Why not simply return string  
if its valid
utf-8 otherwise return bytes? Then in the app you check for the type  
for the object,
string or byte and deal with reporting errors appropriately.

Barry




More information about the Python-list mailing list