Problem with minidom and special chars in HTML

Horst Gutmann zerok at zerokspot.com
Tue Feb 22 13:36:38 EST 2005


Fredrik Lundh wrote:
> umm.  doesn't that doctype point to an SGML DTD?  even if minidom did fetch
> external DTD's (I don't think it does), it would probably choke on that DTD.
> 
> running your documents through "tidy -asxml -numeric" before parsing them as
> XML might be a good idea...
> 
>     http://tidy.sourceforge.net/ (command-line binaries, library)
>     http://utidylib.berlios.de/ (python bindings)
> 
> </F> 
> 
> 
> 

Thanks, but the problem is, that I can't use the numeric representations 
of these special chars. I will probably simply play find&replace before 
feeding the document into minidom and change the output back afterwards :-)

MfG, Horst



More information about the Python-list mailing list