[Tutor] getting results from encoded data i sent to website

Prasad, Ramit ramit.prasad at jpmorgan.com
Mon Jul 30 22:37:58 CEST 2012


Please always respond to the list (or at least CC it) and 
not the individual person.

> By "manually" I mean when I type the isbn into the text box and hit enter. The
> page that the browser shows has this html code (I just pasted the start of
> it):
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"><html>
> <head>
> <!--eBay V3- msxml 4.0 XXXXXXXXXXXXXXXXXXXXXXXXXX-->
> <!--srcId: HalfCommonPage-->
> <title>Half.com: The Cat in the Hat by Dr. Seuss (1957,
> Hardcover)(9780394800011): Dr. Seuss: Books </title>
> <meta name="google-site-verification"
> content="8kHr3jd3Z43q1ovwo0KVgo_NZKIEMjthBxti8m8fYTg">
> <meta name="keywords" content="the cat in the hat by dr. seuss 1957,
> hardcover, dr. seuss, 039480001x 9780394800011, random house children's books,
> hardcover">
> <meta name="description" content="When Mom's away, the mischievous Cat in the
> Hat comes to play--and turns the house into a total ...,($0.75),Dr.
> Seuss,1957,Random House Children's Books,9780394800011">
> <meta name="copyright" content="Copyright 1995-2008 eBay.com">
> <meta name="robots" content="follow,index">
> <meta name="revisit-after" content="7 days">
> <meta http-equiv="expires" content="0">
> <meta http-equiv="pragma" content="no-cache">
> <meta http-equiv="cache-control" content="no-cache,no-store,must-revalidate">
> <meta http-equiv="content-language" content="en">
> <meta http-equiv="content-type" content="text/html; charset=UTF-8">
> <meta property="og:title" content="The Cat in the Hat by Dr. Seuss (1957,
> Hardcover)">
> <meta property="og:description" content="Half.com (Best Price $0.75):When
> Mom's away, the mischievous Cat in the Hat comes to play--and turns the house
> into a total uproar. But the Cat manages to make things right in the end, not
> a split second before Mom returns. Dr. Seuss's">
> <meta property="og:image"
> content="http://i.ebayimg.com/03/!!eCcIY!!2M~$(KGrHqV,!iUE0GshyQ4nBNRUo50i8g~~
> _7.JPG?set_id=89040003C1">
> <meta property="og:site_name" content="Half.com">
> <meta property="og:type" content="product">
> <meta property="fb:app_id" content="102628213125203"><script
> type="text/javascript" language="JavaScript1.1">includeHost =
> 'http://include.ebaystatic.com/';</script><script
> src="http://include.ebaystatic.com/js/e783/us/rover_e7836us.js">
> </script><script language="JavaScript" type="text/javascript"><!--
> 				var s_pageName = "info-refresh/Books";
> 
> 				  var rover = new Rover();
> 				  var ns = rover.createNSTracker();
> 
> 					ns.setSvrGMT(1343591318016);
> 
> 					var RoverDomainBaseUrl =
> "http://rover.ebay.com";
> 
> 
> 				  ns.hasTpimCookielet();
> 
> 

> But when I run the run the program, it returns this:


> ['<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
> "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-
> Type" content="text/html; charset=UTF-8"><script>var pageHasRtmPlacements =
> true;</script><style type="text/css">body,.g-std{font-
> family:Arial,Helvetica,sans-serif;font-
> size:small}form{margin:0;padding:0}a:active,a:link,.g-novisited
> a:visited{color:#00c;text-decoration:none}a:visited{color:#909;text-
> decoration:none}.g-b{font-weight:bold}.g-em{color:#090;font-weight:bold}.g-
> err{color:#f00}.g-hlp{color:#666}.g-pipe{color:#99f}.g-txtBx,.g-btn,.g-
> nav{font-family:Verdana;font-size:x-small}.g-txtBxHlp{color:#666;font-
> family:Verdana;font-size:x-small}.g-m0{margin:0}a:hover{text-
> decoration:underline}.g-i{font-style:italic}.g-bi{font-weight:bold;font-
> style:italic}.g-dft{font-weight:normal}.g-dfti{font-weight:normal;font-
> style:italic}.g-s{font-size:small}.g-xs{font-size:x-small}.g-m{font-
> size:medium}.g-l{font-size:large}.g-xl{font-size:x-large}.g-hdn{height:0;line-
> height:0;overflow:hidden;width:0;position:absolute;font-size:0;z-index:-
> 1;outline:none}#CentralArea{clear:both!important;width:980px!important;margin:
> 0 auto!important}#footer_wrapper{clear:both!important;margin:0
> auto!important;width:100%!important}.iss-
> rtmw{width:980px!important;height:320px;margin:0 auto}.iss-wrp{margin:0
> auto;position:relative;width:980px!important;height:388px;overflow:hidden;bord
> er:1px solid #ccc}.iss-srcd1{width:820px}.iss-srcd2{width:850px;margin-
> left:70px}.iss-txtd{font-size:24px;font-
> family:tahoma;color:#333;position:absolute;left:240px;top:180px}.iss-
> txts1{font-size:24px;font-family:tahoma;color:#333}.iss-txts2{font-
> size:24px;font-family:tahoma;color:#f18113}.iss-txtbtdiv1{padding-
> left:30px;width:750px}.iss-textInput1{width:475px;border-width:1px;border-
> style:solid;border-color:#f00;font-size:13px;font-
> family:Arial;color:#333}.iss-textInput2{width:475px;font-
> size:13px;color:#333;font-family:Arial}.iss-inlineErrDiv1{font-
> size:11px;color:#f00;alignment-adjust:left;margin-bottom:3px}.iss-
> innerHelpSpan1{font-size:11px;color:#333;width:260px;margin-top:5px}.iss-
> title{width:100%!important}.iss-titleDiv{font-family:tahoma;font-
> size:32px;color:#333;width:970px!important;margin:0 auto}.iss-linkTd1{padding-
> top:30px;font-size:13px}.iss-linkTd2{width:810px}.iss-innerHelpSpan2{font-
> size:11px;color:#333;width:260px}.iss-searchDiv{width:820px;margin-
> bottom:80px}.iss-div1{border:1px solid
> #dedede;width:315px;height:320px;margin:7px 0 0 0!important}.iss-
> div2{border:1px solid #dedede;width:315px;height:320px;margin-
> left:10px!important;margin-top:7px!important}.iss-div3{border:1p

> That's the "mumbo-jumbo" I was talking about.
> I'm not sure why the program isn't returning what I expect.

Well, when comparing ebay.com between urllib and the browser 
the code was the same from the first few lines. So I would
assume it has to do with the way you are "filling in" the
text box data. Most of that "mumbo-jumbo" looks like a CSS 
definition. I only have a passing knowledge of urllib(2)
so maybe someone else on the list can help out.

From what I understand doing this all manually can be done
via the Python standard library, but you may want to look
at a couple packages that are written to help. I have not
used either before, but they have been recommended on
the list.

BeautifulSoup is for parsing HTML and a much better option
than the instinct to use regular expressions (which I see
regularly). Mechanize will help deal with browsing and
I think with textbox entry. 

http://www.crummy.com/software/BeautifulSoup/ 
http://wwwsearch.sourceforge.net/mechanize/

<p><span STYLE="color :#000000; font-size: 8pt; background-color :#FFFFFF">
This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email.  
</span></p>


More information about the Tutor mailing list