Absolute TO Relative URLs

Sean 'Shaleh' Perry shaleh at valinux.com
Tue May 8 14:10:06 EDT 2001


On 08-May-2001 Satheesh Babu wrote:
> Hi,
> 
> Let us say I've an HTML document. I would like to write a small Python
> script that reads this document, goes through the absolute URLS (A HREF, IMG
> SRC etc) and replaces them with relative URLs. I can pass a parameter which
> specifies the BASE HREF of the document.
> 
> I'm not sure whether I should proceed with regex nightmare or are there any
> easy solutions?
> 
> Any help/pointers will be greatly appreciated.
> 

More likely a combination of the approaches.  There is a wonderful *ML parser
framework derived from SGMLParser.  You would just implement hooks for a, img,
src, etc.  Then once you find the item use a regex or string compare.




More information about the Python-list mailing list