some site login problem help plz..

lkcl luke.leighton at googlemail.com
Mon Oct 12 13:35:57 EDT 2009


On Oct 5, 8:26 am, "Diez B. Roggisch" <de... at nospam.web.de> wrote:
> james27 wrote:
>
> > hello..
> > im new to python.
> > i have some problem with mechanize.
> > before i was used mechanize with no problem.
> > but i couldn't success login with some site.
> > for several days i was looked for solution but failed.
> > my problem is , login is no probelm but can't retrieve html source code
> > from opened site.
> > actually only can read some small html code, such like below.
>
> > <html>
> > <script language=javascript>
> > location.replace("http://www.naver.com");
> > </script>
> > </html>
>
> > i want to retrive full html source code..but i can't . i was try with
> > twill and mechanize and urllib and so on.
> > i have no idea.. anyone can help me?
>
> Your problem is that the site usesJavaScriptto replace itself. Mechanize
> can't do anything about that. You might have more luck with scripting a
> browser. No idea if there are any special packages available for that
> though.

 yes, there are.  i've mentioned this a few times, on
comp.lang.python,
 (so you can search for them) and have the instances documented here:

 http://wiki.python.org/moin/WebBrowserProgramming

 basically, you're not going to like this, but you actually need
 a _full_ web browser engine, and to _execute_ the javascript.
 then, after a suitable period of time (or after the engine's
 "stopped executing" callback has been called, if it has one)
 you can then node-walk the DOM of the engine, grab the engine's
 document.body.innerHTML property, or use the engine's built-in
 XPath support (if it has it) to find specific parts of the DOM
 faster than if you extracted the text (into lxml etc).

 you should not be shocked by this - by the fact that it takes
 a whopping 10 or 20mb library, including a graphical display
 mechanism, to execute a few bits of javascript.

 also, if you ask him nicely, flier liu is currently working on
 http://code.google.com/p/pyv8 and on implementing the W3C DOM
 standard as a "daemon" service (i.e. with no GUI component) and
 he might be able to help you out.  the pyv8 project comes with
 an example w3c.py file which implements DOM partially, but i
 know he's done a lot more.

 so - it's all doable, but for a given value of "do" :)

 l.



More information about the Python-list mailing list