Problem with minidom and special chars in HTML

Horst Gutmann zerok at zerokspot.com
Wed Feb 23 05:01:15 EST 2005


Jarek Zgoda wrote:
> Horst Gutmann napisał(a):
> 
>> I currently have quite a big problem with minidom and special chars 
>> (for example ü)  in HTML.
>>
>> Let's say I have following input file:
>> --------------------------------------------------
>> <?xml version="1.0"?>
>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
>>             "http://www.w3.org/TR/html4/strict.dtd">
> 
> 
> HTML4 is not an XML application. Even if minidom will fetch this DTD and 
> be able to parse character entities, it may not be able to parse the 
> document.
> 
>> Any idea how I could solve this problem?
> 
> 
> Don't use minidom or convert HTML4 to XHTML and change declaration of 
> doctype.
> 
This was just a bad example :-) I get the same problem with XHTML in the 
doctype. The funny thing here IMO is, that the special chars are simply 
removed. No warning, no nothing :-(

MfG, Horst



More information about the Python-list mailing list