Instrumented web proxy

Paul Rubin http
Thu Mar 27 17:24:42 EDT 2008


Andrew McLean <andrew-news at andros.org.uk> writes:
> I would like to write a web (http) proxy which I can instrument to
> automatically extract information from certain web sites as I browse
> them. Specifically, I would want to process URLs that match a
> particular regexp. For those URLs I would have code that parsed the
> content and logged some of it.
> 
> Think of it as web scraping under manual control.

I've used Proxy 3 for this, a very cool program with powerful
capabilities for on the fly html rewriting.

http://theory.stanford.edu/~amitp/proxy.html



More information about the Python-list mailing list