[Tutor] list all links with certain extension in an html file python

Santosh Kumar sntshkmr60 at gmail.com
Sun Sep 16 09:20:09 CEST 2012


I want to extract (no I don't want to download) all links that end in
a certain extension.

Suppose there is a webpage, and in the head of that webpage there are
4 different CSS files linked to external server. Let the head look
like this:

    <link rel="stylesheet" type="text/css" href="http://foo.bar/part1.css">
    <link rel="stylesheet" type="text/css" href="http://foo.bar/part2.css">
    <link rel="stylesheet" type="text/css" href="http://foo.bar/part3.css">
    <link rel="stylesheet" type="text/css" href="http://foo.bar/part4.css">

Please note that I don't want to download those CSS, instead I want
something like this (to stdout):

    http://foo.bar/part1.css
    http://foo.bar/part1.css
    http://foo.bar/part1.css
    http://foo.bar/part1.css

Also I don't want to use external libraries. I am asking for: which
libraries and functions should I use?


More information about the Tutor mailing list