[XML-SIG] sgmlop and html parsing

Walter Dörwald walter at livinglogic.de
Wed Jan 14 14:32:56 EST 2004


Martin v. Löwis wrote:

> Walter Dörwald wrote:
> 
>> Wouldn't it make sense to implement an SGMLParser that supports
>> unicode?
> 
> No. In SGML, the SGML declaration defines the document encoding, e.g.
> [...]
> So to understand a character reference, you have to know the SGML
> declaration. It is Unicode only if the declaration says
 > [...]

At least it would help for parsing HTML. Setting the encoding
attribute to None would return 8bit strings from the parser,
so it's the job of the application to decode them.

Bye,
    Walter Dörwald




More information about the XML-SIG mailing list