[Python-Dev] Bytes path support

Cameron Simpson cs at zip.com.au
Fri Aug 22 00:27:21 CEST 2014


On 21Aug2014 09:20, Antoine Pitrou <antoine at python.org> wrote:
>Le 21/08/2014 00:52, Cameron Simpson a écrit :
>>The "bytes in some arbitrary encoding where at least the slash character
>>(and
>>maybe a couple others) is ascii compatible" notion is completely bogus.
>>There's only one special byte, the slash (code 47). There's no OS-level
>>need that it or anything else be ASCII compatible.
>
>Of course there is. Try to split an UTF-16-encoded file path on the 
>byte 47 and you'll get a lot of garbage. So, yes, POSIX implicitly 
>mandates an ASCII-compatible encoding for file paths.

[Rolls eyes.] Looking at the UTF-16 encoding, it looks like it also embeds NUL 
bytes for various codes below 32768. How are they handled? As remarked, codes 0 
(NUL) and 47 (ASCII slash code) _are_ special to UNIX filename bytes strings.

If you imagine you can embed bare UTF-16 freely even excluding code 47, I think 
one of us is missing something.

That's not "ASCII compatible". That's "not all byte codes can be freely used 
without thought", and any multibyte coding will have to consider such things 
when embedding itself in another coding scheme.

Cheers,
Cameron Simpson <cs at zip.com.au>

Microsoft:  Committed to putting the "backward" into "backward compatibility."


More information about the Python-Dev mailing list