Problem with minidom and special chars in HTML

Horst Gutmann zerok at zerokspot.com
Tue Feb 22 11:20:42 EST 2005


Hi :-)
I currently have quite a big problem with minidom and special chars (for 
example ü)  in HTML.

Let's say I have following input file:
--------------------------------------------------
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
             "http://www.w3.org/TR/html4/strict.dtd">
<html>
<body>
ü
</body>
</html>
--------------------------------------------------

And following python script:
--------------------------------------------------
from xml.dom import minidom
if __name__ == '__main__':
	doc = minidom.parse('test2.html')
	f = open('test3.html','w+')
	f.write(doc.toxml())
	f.close()
--------------------------------------------------

test3.html only has a blank line where should be the ü It is simply 
removed.

Any idea how I could solve this problem?

MfG, Horst



More information about the Python-list mailing list