Absolute TO Relative URLs

Jonathan Feinberg jdf at pobox.com
Tue May 8 23:51:41 EDT 2001


Just van Rossum <just at letterror.com> writes:

> Satheesh Babu wrote:
> 
> > Let us say I've an HTML document. I would like to write a small
> > Python script that reads this document, goes through the absolute
> > URLS (A HREF, IMG SRC etc) and replaces them with relative URLs. I
> > can pass a parameter which specifies the BASE HREF of the
> > document.
> 
> It's indeed pretty easy by deriving from SGMLParser.

Subclassing SGMLParser (or HTMLParser) I understand; using such a
technique to modify the input document on the fly, I do not.  I find
the documentation for the formatter classes to be rather obscure,
though I feel like the proper application of a formatter is probably
just the thing.  Can you shed a little light?  As a trivial example,
let's say I want to capitalize all of the attributes of my anchor
tags.

class Capper(htmllib.HTMLParser):
	def start_a(self, attrs):
        newattrs = [ (x[0].upper(), x[1]) for x in attrs ]
        # what do I do now?

-- 
Jonathan Feinberg   jdf at pobox.com   Sunny Brooklyn, NY
http://pobox.com/~jdf



More information about the Python-list mailing list