os.lisdir, gets unicode, returns unicode... USUALLY?!?!?

Leo Kislov Leo.Kislov at gmail.com
Tue Nov 21 04:41:45 EST 2006


Martin v. Löwis wrote:
> Ross Ridge schrieb:
> > Ross Ridge schrieb:
> >> That would conflict with private use characters appearing in file
> >> names.
> >
> > Martin v. Löwis wrote:
> >> Not necessarily: they could get escaped.
> >
> > How?
>
> Suppose I use U+E001..U+E0FF as the PUA characters for unencodable
> bytes; U+E000 wouldn't be needed since it \0 cannot be part of
> a file name in POSIX.
>
> Then I would use U+E000 for escaping. Each PUA character in the
> listed file name would get escaped with U+E000 in the Python
> string; when the file name is converted back to the system, it
> gets unescaped.
>
> Notice that I think this is a really unrealistic case - I expect
> that all file names containing PUA characters were deliberately
> crafted to investigate using PUA characters in file names.

How will it interoperate with non-python world? Will these file names
ever escape python process?

Unicode consortium thinks "safe" utf-8 is a bad idea:

http://www.mail-archive.com/unicode@unicode.org/msg27241.html

[Lars Kristan]
> Which could be understood as "a proposal to amend UTF-8 to allow invalid
> sequences".

[Kenneth Whistler, Technical Director, The Unicode Consortium]
O.k., and as pointed out already, that simply won't fly. *Nobody*
in the UTC or WG2 is going to go for that. It would destroy
UTF-8, not fix it.
---------------------------------

Kenneth Whistler on invalid file names:
http://www.mail-archive.com/unicode@unicode.org/msg27225.html


And also: http://www.mail-archive.com/unicode@unicode.org/msg27167.html

[Lars Kristan]
> Should all
> filenames that do not conform to UTF-8 be declared invalid?

[Doug Ewell, the guy behind Unicode Technical Note #14]
If you have a UTF-8 file system, yes.
--------------------------------------------------------------

  -- Leo




More information about the Python-list mailing list