some site login problem help plz..
lkcl
luke.leighton at googlemail.com
Mon Oct 12 13:35:57 EDT 2009
On Oct 5, 8:26 am, "Diez B. Roggisch" <de... at nospam.web.de> wrote:
> james27 wrote:
>
> > hello..
> > im new to python.
> > i have some problem with mechanize.
> > before i was used mechanize with no problem.
> > but i couldn't success login with some site.
> > for several days i was looked for solution but failed.
> > my problem is , login is no probelm but can't retrieve html source code
> > from opened site.
> > actually only can read some small html code, such like below.
>
> > <html>
> > <script language=javascript>
> > location.replace("http://www.naver.com");
> > </script>
> > </html>
>
> > i want to retrive full html source code..but i can't . i was try with
> > twill and mechanize and urllib and so on.
> > i have no idea.. anyone can help me?
>
> Your problem is that the site usesJavaScriptto replace itself. Mechanize
> can't do anything about that. You might have more luck with scripting a
> browser. No idea if there are any special packages available for that
> though.
yes, there are. i've mentioned this a few times, on
comp.lang.python,
(so you can search for them) and have the instances documented here:
http://wiki.python.org/moin/WebBrowserProgramming
basically, you're not going to like this, but you actually need
a _full_ web browser engine, and to _execute_ the javascript.
then, after a suitable period of time (or after the engine's
"stopped executing" callback has been called, if it has one)
you can then node-walk the DOM of the engine, grab the engine's
document.body.innerHTML property, or use the engine's built-in
XPath support (if it has it) to find specific parts of the DOM
faster than if you extracted the text (into lxml etc).
you should not be shocked by this - by the fact that it takes
a whopping 10 or 20mb library, including a graphical display
mechanism, to execute a few bits of javascript.
also, if you ask him nicely, flier liu is currently working on
http://code.google.com/p/pyv8 and on implementing the W3C DOM
standard as a "daemon" service (i.e. with no GUI component) and
he might be able to help you out. the pyv8 project comes with
an example w3c.py file which implements DOM partially, but i
know he's done a lot more.
so - it's all doable, but for a given value of "do" :)
l.
More information about the Python-list
mailing list