python html

Mike Meyer mwm at mired.org
Thu Aug 18 23:01:29 EDT 2005


Steve Young <drevil_53711 at yahoo.com> writes:

> Hi, I am looking for something where I can go through
> a html page and make change the url's for all the
> links, images, href's, etc... easily. If anyone knows
> of something, please let me know. Thanks.

I've been doing a lot of that today. But the tool I'm using is sh and
sed, because what I'm doing is captured nicely by regular expressions
on the URLs. You might consider that option.

If you have well-formed HTML, you can use the HTMLParser module, and
write out the mangled data as it passed through your sublcass of the
HTMLParser class.

If the HTML isn't well-formed (which is probably true for most of the
stuff on the web), you need a more understanding parser. I'd look into
using BeatifulSoup for this, though Iv'e only used it to extract
information from web pages, not to modify them.

       <mike
-- 
Mike Meyer <mwm at mired.org>			http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.



More information about the Python-list mailing list