You gotta love a 2-line python solution

Steven D'Aprano steve at pearwood.info
Tue May 3 21:20:28 EDT 2016


On Tue, 3 May 2016 01:56 pm, DFS wrote:

> On 5/2/2016 11:27 PM, jfong at ms4.hinet.net wrote:
>> DFS at 2016/5/3 9:12:24AM wrote:
>>> try
>>>
>>> from urllib.request import urlretrieve
>>>
>>>
http://stackoverflow.com/questions/21171718/urllib-urlretrieve-file-python-3-3
>>>
>>>
>>> I'm running python 2.7.11 (32-bit)
>>
>> Alright, it works...someway.
>>
>> I try to get a zip file. It works, the file can be unzipped correctly.
>>
>>>>> from urllib.request import urlretrieve
>>>>> urlretrieve("http://www.caprilion.com.tw/fed.zip",
>>>>> "d:\\temp\\temp.zip")
>> ('d:\\temp\\temp.zip', <http.client.HTTPMessage object at 0x03102C50>)
>>>>>
>>
>> But when I try to get this forum page, it does get a html file but can't
>> be viewed normally.
>>
>>>>>
urlretrieve("https://groups.google.com/forum/#!topic/comp.lang.python/jFl3GJ
>> bmR7A", "d:\\temp\\temp.html")
>> ('d:\\temp\\temp.html', <http.client.HTTPMessage object at 0x03102A90>)
>>>>>
>>
>> I suppose the html is a much complex situation where more processes need
>> to be done before it can be opened by a web browser:-)
> 
> 
> Who knows what Google has done... it won't open in Opera.  The tab title
> shows up, but after 20-30 seconds the screen just stays blank and the
> cursor quits loading.


Dennis has given the answer to this, but since he has X-No-Archive=Yes, his
useful and well-written answer will be lost forever.

So I've taken the liberty of copying his answer here:

Dennis Lee Bieber says:

        There's practically no HTML in that page -- just miles of
Javascript.
The one obvious item is:

-=-=-=-=-=-
<script type="text/javascript" language="javascript"
src="/forum/C53652DA8B67255A46256B72F0D65A40.cache.js">
        
      </script>
-=-=-=-=-=-

which is a RELATIVE path. If you copied the file to your machine and then
load it in a browser, it will be looking for

/forum/C53652DA8B67255A46256B72F0D65A40.cache.js 

to be on your machine in a subdirectory of where you saved the main file.

        You'd have to recreate most of the Google environment and fetch
anything that was referenced through a relative path first, to get the
content to display. Of course, you may find, for example, that the
Javascript at some point is doing a database lookup -- and you'd maybe have
to now duplicate the database...



-- 
Steven




More information about the Python-list mailing list