os.walk and os.listdir problems python 3.0+

Amos Anderson amosanderson at gmail.com
Wed Jun 24 23:07:14 EDT 2009


I've run into a bit of an issue iterating through files in python 3.0 and
3.1rc2. When it comes to a files with '\u200b' in the file name it gives the
error...

Traceback (most recent call last):

  File "ListFiles.py", line 19, in <module>

    f.write("file:{0}\n".format(i))

  File "c:\Python31\lib\encodings\cp1252.py", line 19, in encode

    return codecs.charmap_encode(input,self.errors,encoding_table)[0]

UnicodeEncodeError: 'charmap' codec can't encode character '\u200b' in
position

30: character maps to <undefined>


Code is as follows...

import os

f = open("dirlist.txt", 'w')


for root, dirs, files in os.walk("C:\\Users\\Filter\\"):
    f.write("root:{0}\n".format(root))

    f.write("dirs:\n")

    for i in dirs:
        f.write("dir:{0}\n".format(i))

    f.write("files:\n")

    for i in files:
        f.write("file:{0}\n".format(i))

f.close()

input("done")


The file it's choking on happens to be a link that internet explorer
created. There are two files that appear in explorer to have the same name
but one actually has a zero width space ('\u200b') just before the .url
extension. In playing around with this I've found several files with the
same character throughout my file system. OS: Vista SP2, Language: US
English.


Am I doing something wrong or did I find a bug? It's worth noting that
Python 2.6 just displays this character as a ? just as it appears if you
type dir at the windows command prompt.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090624/a8ca799a/attachment.html>


More information about the Python-list mailing list