What is the best way to "get" a web page?

George Sakkis george.sakkis at gmail.com
Sun Sep 24 00:40:58 EDT 2006


Pete wrote:

> The file "temp.html" is definitely different than the first run, but
> still not anything close to www.python.org . Any other suggestions?

If you mean that the page looks different in a browser, for one thing
you have to download the css files too. Here's the relevant extract
from the main page:

<link media="screen" href="styles/screen-switcher-default.css"
type="text/css" id="screen-switcher-stylesheet" rel="stylesheet" />
<link media="scReen" href="styles/netscape4.css" type="text/css"
rel="stylesheet" />
<link media="print" href="styles/print.css" type="text/css"
rel="stylesheet" />
<link media="screen" href="styles/largestyles.css" type="text/css"
rel="alternate stylesheet" title="large text" />
<link media="screen" href="styles/defaultfonts.css" type="text/css"
rel="alternate stylesheet" title="default fonts" />

You may either hardcode the urls of the css files, or parse the page,
extract the css links and normalize them to absolute urls. The first is
simpler but the second is more robust, in case a new css is added or an
existing one is renamed or removed.

George




More information about the Python-list mailing list