What is the best way to "get" a web page?
George Sakkis
george.sakkis at gmail.com
Sun Sep 24 00:40:58 EDT 2006
Pete wrote:
> The file "temp.html" is definitely different than the first run, but
> still not anything close to www.python.org . Any other suggestions?
If you mean that the page looks different in a browser, for one thing
you have to download the css files too. Here's the relevant extract
from the main page:
<link media="screen" href="styles/screen-switcher-default.css"
type="text/css" id="screen-switcher-stylesheet" rel="stylesheet" />
<link media="scReen" href="styles/netscape4.css" type="text/css"
rel="stylesheet" />
<link media="print" href="styles/print.css" type="text/css"
rel="stylesheet" />
<link media="screen" href="styles/largestyles.css" type="text/css"
rel="alternate stylesheet" title="large text" />
<link media="screen" href="styles/defaultfonts.css" type="text/css"
rel="alternate stylesheet" title="default fonts" />
You may either hardcode the urls of the css files, or parse the page,
extract the css links and normalize them to absolute urls. The first is
simpler but the second is more robust, in case a new css is added or an
existing one is renamed or removed.
George
More information about the Python-list
mailing list