os.lisdir, gets unicode, returns unicode... USUALLY?!?!?
Leo Kislov
Leo.Kislov at gmail.com
Tue Nov 21 04:41:45 EST 2006
Martin v. Löwis wrote:
> Ross Ridge schrieb:
> > Ross Ridge schrieb:
> >> That would conflict with private use characters appearing in file
> >> names.
> >
> > Martin v. Löwis wrote:
> >> Not necessarily: they could get escaped.
> >
> > How?
>
> Suppose I use U+E001..U+E0FF as the PUA characters for unencodable
> bytes; U+E000 wouldn't be needed since it \0 cannot be part of
> a file name in POSIX.
>
> Then I would use U+E000 for escaping. Each PUA character in the
> listed file name would get escaped with U+E000 in the Python
> string; when the file name is converted back to the system, it
> gets unescaped.
>
> Notice that I think this is a really unrealistic case - I expect
> that all file names containing PUA characters were deliberately
> crafted to investigate using PUA characters in file names.
How will it interoperate with non-python world? Will these file names
ever escape python process?
Unicode consortium thinks "safe" utf-8 is a bad idea:
http://www.mail-archive.com/unicode@unicode.org/msg27241.html
[Lars Kristan]
> Which could be understood as "a proposal to amend UTF-8 to allow invalid
> sequences".
[Kenneth Whistler, Technical Director, The Unicode Consortium]
O.k., and as pointed out already, that simply won't fly. *Nobody*
in the UTC or WG2 is going to go for that. It would destroy
UTF-8, not fix it.
---------------------------------
Kenneth Whistler on invalid file names:
http://www.mail-archive.com/unicode@unicode.org/msg27225.html
And also: http://www.mail-archive.com/unicode@unicode.org/msg27167.html
[Lars Kristan]
> Should all
> filenames that do not conform to UTF-8 be declared invalid?
[Doug Ewell, the guy behind Unicode Technical Note #14]
If you have a UTF-8 file system, yes.
--------------------------------------------------------------
-- Leo
More information about the Python-list
mailing list