unicode + xml

Laurent Luce laurentluce49 at yahoo.com
Mon Sep 7 20:55:01 EDT 2009


Hello,

I am trying to do the following:

- read list of folders in a specific directory: os.listdir() - some folders have Japanese characters
- post list of folders as xml to a web server: I used content-type 'text/xml' and I use '<?xml version="1.0" encoding="utf-8"?>' to start the xml data.
- on the server side (Django), I get the data using post_data and I use minidom.parseString() to parse it. I get an exception because of the following in the xml for one of the folder name:
'/ufffdX/ufffd^/ufffd[/ufffdg /ufffd/ufffd/ufffdj/ufffd/ufffd/ufffd['

The weird thing is that I see 5 bytes for each unicode character: ie: /ufffdX

Should I format the data differently inside the xml so minidom is happy ?

Laurent




More information about the Python-list mailing list