minidom utf-8 encoding

fscked fsckedagain at gmail.com
Thu Jan 4 12:14:29 EST 2007


Martin v. Löwis wrote:
<...snip...>

> I find that hard to believe. There is no code in Python that does
> removal of characters, and I can't see any other reason why it gets
> removed.
>
> OTOH, what I do get when writing to a file is a UnicodeError, when
> it tries to convert the Unicode string that toxml gives to a byte
> string.
>
> So I recommend you pass encoding="utf-8" to the toprettyxml invocation
> also.
>
> Regards,
> Martin

OK, now I am really confused. After trying all variations of opening
and writing and encoding and all the other voodoo I can find on the web
for hours, I decide to put the script back to how it was when it did
everything but remove the unicode characters.

And now it just works...

I hate it when that happens. In case you are wondering here is the code
that caused me all this (seemingly odd) pain:

import csv
import codecs
from xml.dom.minidom import Document

out = open("test.xml", "w")

# Create the minidom document
doc = Document()

# Create the <boxes> base element
boxes = doc.createElement("boxes")
myfile = open('ClientsXMLUpdate.txt')
csvreader = csv.reader(myfile)
for row in csvreader:
    mainbox = doc.createElement("box")
    doc.appendChild(boxes)
    r2 = csv.reader(myfile)
    b = r2.next()
    mainbox.setAttribute("city", b[10])
    mainbox.setAttribute("country", b[9])
    mainbox.setAttribute("phone", b[8])
    mainbox.setAttribute("address", b[7])
    mainbox.setAttribute("name", b[6])
    mainbox.setAttribute("pl_heartbeat", b[5])
    mainbox.setAttribute("sw_ver", b[4])
    mainbox.setAttribute("hw_ver", b[3])
    mainbox.setAttribute("date_activated", b[2])
    mainbox.setAttribute("mac_address", b[1])
    mainbox.setAttribute("boxid", b[0])
    boxes.appendChild(mainbox)


# Print our newly created XML
out.write( doc.toprettyxml ())


And it just works...




More information about the Python-list mailing list