What is the best way to "get" a web page?

Pete harbingerofpeace at post.com
Sat Sep 23 23:49:43 EDT 2006


> >I have the following code:
> >
> >>>> web_page = urllib.urlopen("http://www.python.org")
> >>>> file = open("temp.html", "w")
> >>>> web_page_contents = web_page.read()
> >>>> file.write(web_page_contents)
> >>>> file.close
> > <built-in method close of file object at 0xb7cc76e0>
> >>>>
> >
> > The file "temp.html" is created, but it doesn't look like the page at
> > www.python.org. I'm guessing there are multiple frames and my code did
> > not get everything. Can anyone point me to a tutorial or other
> > reference on how to "get" all of the html contents at a particular
> > page?
> >
> > Why did Python print the line after "file.close"?
> >
> > Thanks,
> > Pete
> >
>
> A. You didn't actually invoke the close method, you simply referenced it,
> which is why you got the output line after file.close.  Python is not VB.
> To call close, you have to follow it with ()'s, as in:
>
> file.close()

Ahhhh. Thank you very much!

> This will have the added benefit of flushing the output to temp.html,
> probably containing the missing content you were looking for.
>
> B. Don't name variables "file", or "list", "str", "dict", "int", etc.  Doing
> so masks global names of builtin data types.  Try "tempFile" instead.

Oh. Thanks again!
The file "temp.html" is definitely different than the first run, but
still not anything close to www.python.org . Any other suggestions?

Thanks,
Pete

> -- Paul




More information about the Python-list mailing list