Is it possible to download only the <head> of a web page?

Fredrik Lundh fredrik at pythonware.com
Thu Sep 4 17:53:33 EDT 2008


Rex wrote:

> I am writing a script that executes a bunch of queries through a form
> on a website and reads the results. I am only interested in the
> <title> section in the <head> of each web page. Currently, each page
> the server returns is about 100kb and contains a bunch of HTML and
> Javascript, all of which I don't need; I don't want to waste bandwidth
> or consume too much of the server's resources. I just need the <title>
> string.

you need to issue a GET request to get the HTML head section, which 
almost always means that the server will build the entire page before 
sending it to you (so it can set content-length etc).

you can save on network traffic by parsing the data as it arrives, and 
stopping when you've gotten the TITLE element:

     http://effbot.org/librarybook/sgmllib.htm

</F>




More information about the Python-list mailing list