WinXP, Python3.1.2, dir-listing to XML - problem with unicode file names

Mark Tolonen metolone+gmane at gmail.com
Sat Apr 3 17:24:19 EDT 2010


"kai_nerda" <ewomy at yahoo.com> wrote in message 
news:hp69ri+ao5e at eGroups.com...
> Hi,
>
> OS = Windows XP (German language)
> Python = 3.1.2
>
> I need to write a directory listing into a XML file.
> And after hours of trying and searching i have no clue.
>
> My main problem is that the file and folder names can
> have characters of different languages like
> German, Turkish, Russian, maybe else.
>
> Because Python 3.1 is better with unicode, I
> decided to use that instead of 2.6
>
> For testing I have created the following files:
> http://img340.imageshack.us/img340/3461/files.png
> (google for the words
> russia, turkish, deutsch, france
> to find websites with special characters and copy & paste)
>
> And this is the code I have now:
> ############################################
> # -*- coding: iso-8859-1 -*-
> # inspired by:
> # http://www.dpawson.co.uk/java/dirlist.py
> # (for Python ~2.4)
>
> import sys
> print ('filesystemencoding: ' + sys.getfilesystemencoding())
> print ('defaultencoding:    ' + sys.getdefaultencoding())
>
>
> from pprint import pprint
> import os.path
> from stat import *
> from xml.sax.saxutils import XMLGenerator
>
> def recurse_dir(path, writer):
>     for cdir, subdirs, files in os.walk(path):
>         pprint (cdir)
>         writer.startElement('dir', { 'name': cdir })
>         for f in files:
>             uf = f.encode('utf-8')
>             pprint (uf)
>             attribs = {'name': f}
>             attribs['size'] = str(os.stat(os.path.join(cdir,f))[ST_SIZE])
>             pprint (attribs)
>             writer.startElement('file', attribs)
>             writer.endElement('file')
>         for subdir in subdirs:
>             recurse_dir(os.path.join(cdir, subdir), writer)
>         writer.endElement('directory')

This should be:

           writer.endElement('dir')

>         break
>
> if __name__ == '__main__':
>     directory = 'c:\\_TEST\\'
>     out = open('C:\\_TEST.xml','w')

The above line opens the file in the default file system encoding 'mbcs' 
(cp850 on your system).  Try:

    out = open('C:\\_TEST.xml','w',encoding='utf8')

Regards,
-Mark

>     writer = XMLGenerator(out, 'utf-8')
>     writer.startDocument()
>     recurse_dir(directory, writer)
>
>     out.close()
> ############################################





More information about the Python-list mailing list