You gotta love a 2-line python solution

DFS nospam at dfs.com
Mon May 2 23:56:55 EDT 2016


On 5/2/2016 11:27 PM, jfong at ms4.hinet.net wrote:
> DFS at 2016/5/3 9:12:24AM wrote:
>> try
>>
>> from urllib.request import urlretrieve
>>
>> http://stackoverflow.com/questions/21171718/urllib-urlretrieve-file-python-3-3
>>
>>
>> I'm running python 2.7.11 (32-bit)
>
> Alright, it works...someway.
>
> I try to get a zip file. It works, the file can be unzipped correctly.
>
>>>> from urllib.request import urlretrieve
>>>> urlretrieve("http://www.caprilion.com.tw/fed.zip", "d:\\temp\\temp.zip")
> ('d:\\temp\\temp.zip', <http.client.HTTPMessage object at 0x03102C50>)
>>>>
>
> But when I try to get this forum page, it does get a html file but can't be viewed normally.
>
>>>> urlretrieve("https://groups.google.com/forum/#!topic/comp.lang.python/jFl3GJ
> bmR7A", "d:\\temp\\temp.html")
> ('d:\\temp\\temp.html', <http.client.HTTPMessage object at 0x03102A90>)
>>>>
>
> I suppose the html is a much complex situation where more processes need to be done before it can be opened by a web browser:-)


Who knows what Google has done... it won't open in Opera.  The tab title 
shows up, but after 20-30 seconds the screen just stays blank and the 
cursor quits loading.

It's a mess - try running it thru BeautifulSoup.prettify() and it looks 
better.

------------------------------------------------------------
import BeautifulSoup
from urllib.request import urlretrieve
webfile = "D:\\afile.html"
urllib.urlretrieve("https://groups.google.com/forum/#!topic/comp.lang.python/jFl3GJbmR7A",webfile)
f = open(webfile)
soup = BeautifulSoup.BeautifulSoup(f)
f.close()
print soup.prettify()
------------------------------------------------------------






More information about the Python-list mailing list