WinXP, Python3.1.2, dir-listing to XML - problem with unicode file names
Mark Tolonen
metolone+gmane at gmail.com
Sat Apr 3 17:24:19 EDT 2010
"kai_nerda" <ewomy at yahoo.com> wrote in message
news:hp69ri+ao5e at eGroups.com...
> Hi,
>
> OS = Windows XP (German language)
> Python = 3.1.2
>
> I need to write a directory listing into a XML file.
> And after hours of trying and searching i have no clue.
>
> My main problem is that the file and folder names can
> have characters of different languages like
> German, Turkish, Russian, maybe else.
>
> Because Python 3.1 is better with unicode, I
> decided to use that instead of 2.6
>
> For testing I have created the following files:
> http://img340.imageshack.us/img340/3461/files.png
> (google for the words
> russia, turkish, deutsch, france
> to find websites with special characters and copy & paste)
>
> And this is the code I have now:
> ############################################
> # -*- coding: iso-8859-1 -*-
> # inspired by:
> # http://www.dpawson.co.uk/java/dirlist.py
> # (for Python ~2.4)
>
> import sys
> print ('filesystemencoding: ' + sys.getfilesystemencoding())
> print ('defaultencoding: ' + sys.getdefaultencoding())
>
>
> from pprint import pprint
> import os.path
> from stat import *
> from xml.sax.saxutils import XMLGenerator
>
> def recurse_dir(path, writer):
> for cdir, subdirs, files in os.walk(path):
> pprint (cdir)
> writer.startElement('dir', { 'name': cdir })
> for f in files:
> uf = f.encode('utf-8')
> pprint (uf)
> attribs = {'name': f}
> attribs['size'] = str(os.stat(os.path.join(cdir,f))[ST_SIZE])
> pprint (attribs)
> writer.startElement('file', attribs)
> writer.endElement('file')
> for subdir in subdirs:
> recurse_dir(os.path.join(cdir, subdir), writer)
> writer.endElement('directory')
This should be:
writer.endElement('dir')
> break
>
> if __name__ == '__main__':
> directory = 'c:\\_TEST\\'
> out = open('C:\\_TEST.xml','w')
The above line opens the file in the default file system encoding 'mbcs'
(cp850 on your system). Try:
out = open('C:\\_TEST.xml','w',encoding='utf8')
Regards,
-Mark
> writer = XMLGenerator(out, 'utf-8')
> writer.startDocument()
> recurse_dir(directory, writer)
>
> out.close()
> ############################################
More information about the Python-list
mailing list