[XML-SIG] sgmlop and html parsing
Alexandre Fayolle
Alexandre.Fayolle at logilab.fr
Wed Jan 14 09:36:52 EST 2004
On Wed, Jan 14, 2004 at 09:26:17AM -0500, Thomas B. Passin wrote:
> Alexandre Fayolle wrote:
> >>
> >>This should happen only if self->unicode is false. This is XML parsing,
> >>right? If so, you should enable self->unicode, and it will give you
> >>a unicode character (in handle_data).
> >
> >
> >This is netscape bookmark parsing, so this is not well formed XML (lots
> >of tags are not closed).
> >
> >demo/xbel/ns_parse.py calls sax2exts.SGMLParserFactory.make_parser(), so
> >I expect it to return an SGML parser, and not an XML reader.
>
> I took a different approach. To parse Netscape bookmark files, I just
> take the default parser, and handle the encoding downstream using a few
> patches in the downstream code to handle encoding. (I have found that
> setting the encoding to utf-8 works reliably in Mozilla-derived browsers
> on Windows 2000.
Would you mind committing your changes to the CVS so that they can ship
in pyxml 0.8.4 ? Your patch are likely to be better than mine since you
seem to be using the tools on a daily basis.
--
Alexandre Fayolle
LOGILAB, Paris (France).
http://www.logilab.com http://www.logilab.fr http://www.logilab.org
Développement logiciel avancé - Intelligence Artificielle - Formations
More information about the XML-SIG
mailing list