Problem with minidom and special chars in HTML
Horst Gutmann
zerok at zerokspot.com
Wed Feb 23 05:01:15 EST 2005
Jarek Zgoda wrote:
> Horst Gutmann napisał(a):
>
>> I currently have quite a big problem with minidom and special chars
>> (for example ü) in HTML.
>>
>> Let's say I have following input file:
>> --------------------------------------------------
>> <?xml version="1.0"?>
>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
>> "http://www.w3.org/TR/html4/strict.dtd">
>
>
> HTML4 is not an XML application. Even if minidom will fetch this DTD and
> be able to parse character entities, it may not be able to parse the
> document.
>
>> Any idea how I could solve this problem?
>
>
> Don't use minidom or convert HTML4 to XHTML and change declaration of
> doctype.
>
This was just a bad example :-) I get the same problem with XHTML in the
doctype. The funny thing here IMO is, that the special chars are simply
removed. No warning, no nothing :-(
MfG, Horst
More information about the Python-list
mailing list