[Chicago] web page content scraper

Adrian Holovaty web at holovaty.com
Wed Apr 9 18:27:45 CEST 2008


On Tue, Apr 8, 2008 at 9:25 AM, Tom Printy <tprinty at mail.edisonave.net> wrote:
> Wow this library is super cool. Anyone got slides or notes from the
>  talk?

Hey, that's my library and was my talk. Note that the current version
of templatemaker (on Google Code) is pretty "dumb" when dealing with
HTML.

Since that talk, I've developed a new one, based on lxml, that
analyzes differences in the HTML trees. It's a *lot* better (I'd even
call it *awesome*), but I haven't released it open-source yet. Stay
tuned.

Adrian


More information about the Chicago mailing list