Web page data and urllib2.urlopen

Kushal Kumaran kushal.kumaran+python at gmail.com
Fri Aug 7 02:52:31 EDT 2009


On Fri, Aug 7, 2009 at 3:47 AM, Dave Angel<davea at ieee.org> wrote:
>
>
> Piet van Oostrum wrote:
>>
>> <snip>
>>>
>>> DA> All I can guess is that it has something to do with "browser type" or
>>> DA> cookies.  And that would make lots of sense if this was a cgi page.
>>>  But
>>> DA> the URL doesn't look like that, as it doesn't end in pl, py, asp, or
>>> any of
>>> DA> another dozen special suffixes.
>>>

Note that the URL does not have to have any special suffix for it to
be dynamically generated.  See any page at wikipedia, for example.
Mediawiki, the software running the site, is a php application.

>>
>>
>>>
>>> DA> Any hints, anybody???
>>>
>>
>> If you look into the HTML that Firefox gets, there is a lot of
>> javascript in it.
>>
>
> But the raw page didn't have any javascript.  So what about that original
> raw page triggered additional stuff to be loaded?

FWIW, I'm getting a ton of javascript in the page downloaded using
your code fragment.

> Is it "user agent", as someone else brought out?  And is there somewhere I
> can read more about that aspect of things?  I've mostly built very static
> html pages, where the server yields the same page to everybody.  And some
> form stuff, where the  user clicks on a 'submit" button to trigger a script
> that's not shown on the URL line.
>

-- 
kushal



More information about the Python-list mailing list