[issue3187] os.listdir can return byte strings

STINNER Victor report at bugs.python.org
Thu Aug 21 10:20:42 CEST 2008


STINNER Victor <haypo at users.sourceforge.net> added the comment:

If the filename can not be encoded correctly in the system charset, 
it's not really a problem. The goal is to be able to use open(), 
shutil.copyfile(), os.unlink(), etc. with the given filename.

orig = filename from the kernel (bytes)
filename = filename from listdir() (str)
dest = filename to the kernel (bytes)

The goal is to get orig == dest. In my program Hachoir, to workaround 
this problem I store the original filename (bytes) and convert it to 
unicode with characters replacements (eg. replace invalid byte 
sequence by "?"). So the bytes string is used for open(), 
unlink(), ... and the unicode string is displayed to stdout for the 
user.

IMHO, the best solution is to create such class:

class Filename:
    def __init__(self, orig):
        self.as_bytes = orig
        self.as_str = myformat(orig)
    def __str__(self):
        return self.as_str
    def __bytes__(self):
        return self.as_bytes

New problems: I guess that functions operating on filenames 
(os.path.*) will have to support this new type (Filename class).

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue3187>
_______________________________________


More information about the Python-bugs-list mailing list