[Python-Dev] Bytes path support

"Martin v. Löwis" martin at v.loewis.de
Fri Aug 22 17:25:16 CEST 2014


Am 22.08.14 01:56, schrieb Glenn Linderman:
> 0 and 47 are certainly originally derived from ASCII.  However, there
> could be lots of encodings that are not ASCII compatible (but in
> practice, probably very few, since most encodings _are_ ASCII
> compatible) that could be fit those constraints.
> 
> So while as a technical matter, Cameron is correct that Unix only treats
> 0 & 47 as special, and that is insufficient to declare that encodings
> must be ASCII compatible, as a practical matter, since most encodings
> are ASCII compatible anyway, it would be hard to find very many that
> could be used successfully with Unix file names that are not ASCII
> compatible, that could comply with the 0 & 47 requirements.

More importantly, existing encodings that are distinctively *not*
ASCII compatible (e.g. the EBCDIC ones) do not put the slash into 47
(instead, it is at 91 at EBCDIC, 47 is the BEL control character).

There are boundary cases, of course. VISCII is "mostly ASCII
compatible", putting graphic characters into some of the control
characters, but using those that aren't used in ASCII, anyway.

And then there is the YUSCII family of encodings, which definitely
is not ASCII compatible, as it does not contain Latin characters,
but still puts the / into 47 (and also keeps the ASCII digits and
special characters in their positions). There is also SI 960, which
has the slash, the ASCII uppercase letters, digits and special
characters, but replaces the lower-case characters with Hebrew.

So yes, Unix doesn't mandate ASCII-compatible encodings; but it
still mandates ASCII-inspired encodings. I wonder how you would
run "gcc", though, on an SI 960 system; you'ld have to type
חדד.

Regards,
Martin



More information about the Python-Dev mailing list