build_opener

Mon Jan 9 14:46:50 EST 2006

Hello, I had a question about urllib2's  build_opener() statement. I am trying to just get the html from any  webpage as a string but I need everything on the page to be the same as  what it'd be if I would browse to that page (and at the very least, all  the href's). This is my code:

  url = 'http://news.yahoo.com/fc/world/iraq'
  req = Request(url)
  f = build_opener().open(req)
  page = f.read()
  f.close()

  return page
      so looking at the source of the page browsing to the  page, one of the links has an href that looks like this:

  href = 

http://news.yahoo.com/s/ap/20051118/ap_on_re_mi_ea/iraq_051118153857;_ylt=AiPsFWWIyLLbGdlCQFLMn8NX6GMA;_ylu=X3oDMTBiMW04NW9mBHNlYwMlJVRPUCUl

  after running the code and looking at the returned page's same link, it looks like this:

  href = 

http://192.168.23.106/s/ap/20051118/ap_on_re_mi_ea/iraq_051118153857

it seems that everything after the semi-colon is missing after running
the build_opener(). Is there a way that I can get the page as a string
with all the links (href's) to not be missing anything? Thanks.

-Steve

---------------------------------
 Yahoo! DSL Something to write home about. Just $16.99/mo. or less
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20060109/65a8d4a5/attachment.html>