Absolute TO Relative URLs
Jonathan Feinberg
jdf at pobox.com
Tue May 8 23:51:41 EDT 2001
Just van Rossum <just at letterror.com> writes:
> Satheesh Babu wrote:
>
> > Let us say I've an HTML document. I would like to write a small
> > Python script that reads this document, goes through the absolute
> > URLS (A HREF, IMG SRC etc) and replaces them with relative URLs. I
> > can pass a parameter which specifies the BASE HREF of the
> > document.
>
> It's indeed pretty easy by deriving from SGMLParser.
Subclassing SGMLParser (or HTMLParser) I understand; using such a
technique to modify the input document on the fly, I do not. I find
the documentation for the formatter classes to be rather obscure,
though I feel like the proper application of a formatter is probably
just the thing. Can you shed a little light? As a trivial example,
let's say I want to capitalize all of the attributes of my anchor
tags.
class Capper(htmllib.HTMLParser):
def start_a(self, attrs):
newattrs = [ (x[0].upper(), x[1]) for x in attrs ]
# what do I do now?
--
Jonathan Feinberg jdf at pobox.com Sunny Brooklyn, NY
http://pobox.com/~jdf
More information about the Python-list
mailing list