[Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike

Steve Holden sholden at holdenweb.com
Tue Dec 2 07:39:08 EST 2003


> -----Original Message-----
> From: web-sig-bounces+sholden=holdenweb.com at python.org
> [mailto:web-sig-bounces+sholden=holdenweb.com at python.org]On Behalf Of
> Casey Duncan
> Sent: Tuesday, December 02, 2003 12:11 AM
> To: Casey Duncan; John J Lee
> Cc: web-sig at python.org
> Subject: Re: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike
>
>
> > On Mon, 1 Dec 2003 20:55:47 +0000 (GMT)
> > John J Lee <jjl at pobox.com> wrote:
> [snip]
> > > Problems:
> > >
> > > 1. no volunteer to write a plain-old-C-API wrapper of tidylib
> >
> > I'll look into this, but I'll hold off volunteering until I
> see how big
> the API is. I suspect not very.
>
> After looking at it I'd say it's certainly a non-trivial task
> to wrap (by
> hand), depending on what the real needs are. Do we simply
> want a 1-to-1
> (perhaps swigged) wrapper, do we want something pythonic, or what? The
> latter is obviously more involved and would need much more
> discussion and
> vetting, especially given its DOM-ish aspirations.
>
> Perhaps the most reasonable approach would be to generate a
> simple low-level
> wrapper first and then gradually develop a high-level interface to it,
> mostly written in Python. That might also insulate us from future API
> changes to tidy better.
>
I think we also want to consider seriously whether tidy is what we need.
Does it really provide a necessary function? And, even if it does, how
valuable would that function be? I wasn't impressed with tidy in either
of the two attempts I made to use it.

Then, of course, there's the question of prior art:

	http://www.lemburg.com/files/python/mxTidy.html

might be worth looking at before you go too much further ...

regards
--
Steve Holden          +1 703 278 8281        http://www.holdenweb.com/
Improve the Internet           http://vancouver-webpages.com/CacheNow/
Python Web Programming                http://pydish.holdenweb.com/pwp/
Interview with GvR August 14, 2003       http://www.onlamp.com/python/






More information about the Web-SIG mailing list