os.lisdir, gets unicode, returns unicode... USUALLY?!?!?

Leo Kislov Leo.Kislov at gmail.com
Sat Nov 18 04:23:22 EST 2006


Martin v. Löwis wrote:
> Leo Kislov schrieb:
> > How about returning two lists, first list contains unicode names, the
> > second list contains undecodable names:
> >
> > files, troublesome = os.listdir(separate_errors=True)
> >
> > and make separate_errors=True by default in python 3.0 ?
>
> That would be quite an incompatible change, no?

Yeah, that was idea-dump. Actually it is possible to make this idea
mostly backward compatible by making os.listdir() return only unicode
names and os.binlistdir() return only binary directory entries.
Unfortunately the same trick will not work for getcwd.

Another idea is to map all 256 bytes to unicode private code points.
When a file name cannot be fully decoded the undecoded bytes will be
mapped to specially allocated code points. Unfortunately this idea
seems to leak if the program later wants to write such unicode string
to a file. Python will have to throw an exception since we don't know
if it is ok to write broken string to a file. So we are back to square
one, programs need to deal with filesystem garbage :(

  -- Leo




More information about the Python-list mailing list