cut strings and parse for images

Paul McGuire ptmcg at austin.rr._bogus_.com
Mon Dec 6 15:36:36 EST 2004


"Andreas Volz" <usenet-spam-trap at brachttal.net> wrote in message
news:20041206203456.251c6d85 at frodo.mittelerde...
> Hi,
>
> I used SGMLParser to parse all href's in a html file. Now I need to cut
> some strings. For example:
>
> http://www.example.com/dir/example.html
>
> Now I like to cut the string, so that only domain and directory is
> left over. Expected result:
>
> http://www.example.com/dir/
>
> I know how to do this in bash programming, but not in python. How could
> this be done?
>
> The next problem is not only to extract href's, but also images. A href
> is easy:
>
> <a href="install.php">Install</a>
>
> But a image is a little harder:
>
> <img class="bild" src="images/marine.jpg">
>

Check out the urlparse module (in std distribution).  For images, you can
provide a default addressing scheme, so you can expand "images/marine.jpg"
relative to the current location.

-- Paul





More information about the Python-list mailing list