[Tutor] weather scraping with Beautiful Soup

Che M pine508 at hotmail.com
Fri Jul 17 10:01:29 CEST 2009



> The posts basocally say go and look at the HTML and find the
> right tags for the data you need. This is fubndamental to any kind of web
> scraping, you need to understand the HTML tree well enough to identify
> where yourt data exists.
> 
> How familiar are you with HTML and its structures?

Reasonably familiar.

> Can you view the source in your browser and identify the heirarchy
> of tags to the place where your data lives?

I can view the source, and have made my own web pages (HTML, CSS).
I am less sure about the hierarchy of tags.  For example, here is the
section around the current temperature:
<div class="blueBox">
	<div id="curcondbox">
		<div class="subG b">West of Town, Jamestown, Pennsylvania (PWS)</div>
		<div class="bm10">Updated: <span class="pwsrt" pwsid="KPAJAMES1" pwsunit="english" pwsvariable="lu" value="1247814018">3:00 AM EDT on July 17, 2009</span></div>
		<table cellspacing="0" cellpadding="0" class="full">
		<tr>
		<td class="vaT full">
		<table cellspacing="0" cellpadding="5" class="full">
		<tr>
		<td class="vaM taC"><img src="http://icons-pe.wxug.com/i/c/a/nt_clear.gif" width="42" height="42" alt="Clear" class="condIcon" /></td>
		<td class="vaM taC full">
		<div style="font-size: 17px;"><span class="pwsrt" pwsid="KPAJAMES1" pwsunit="english" pwsvariable="tempf" english="&deg;F" metric="&deg;C" value="60.3">
  <span class="nobr"><span class="b">60.3</span>&nbsp;&#176;F</span>
</span></div>

The 60.3 is the value I want to extract.  It appears to be down within a hierarchy
something like:

<body
<div class="blueBox">
    <div id="curcondbox">
         <table 
            <table 
               <div>
                   <span class="nobr">
                         <span class="b">


But I am far from sure I got all that right; it is not easy to 
look at HTML and match <div> with </div>.  Unless I am missing
something?  Do I have to use all of the above in my Beautiful
Soup?
CM

_________________________________________________________________
Windows Live™ SkyDrive™: Get 25 GB of free online storage.
http://windowslive.com/online/skydrive?ocid=TXT_TAGLM_WL_SD_25GB_062009
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090717/78a9caac/attachment.htm>


More information about the Tutor mailing list