Parsing html :: output to comma delimited

samuels ssweber at gmail.com
Sat Jul 16 15:02:53 EDT 2005


Hello All,

I am a total python newbie, and I need help writing a script.

This is what I want to do:

There is a list of links at http://www.rentalhq.com/fulllist.asp.  Each
link goes to a page like,
http://www.rentalhq.com/store.asp?id=907%2F272%2D4425, that contains a
company name, address, phone, and fax.  I want extract each page, parse
this information, and export it to a comma delimited text file, or tab
delimited.  The important information in each page is:

<table border="0" cellpadding="0" cellspacing="0"
style="border-collapse: collapse" bordercolor="#111111" width="100%"
id="AutoNumber1">
  <tr>
    <td width="100%" colspan="2">
    <h2 style="text-align: center; margin-top:2; margin-bottom:2;
line-height:14px" class="title">
    <font size="4">United Rentals Inc.</font>
    </h2>

    <h3 style="text-align: center; margin-top:4;
margin-bottom:4">3401 Commercial Dr. 
    Anchorage AK, 99501-3024
    </h3>
    <p style="text-align: center; margin-top:4; margin-bottom:4">
    <a target="_blank"
href="http://maps.google.com/maps?q=3401+Commercial+Dr%2E Anchorage AK
99501-3024 ">
<!--    <a target="_blank"
href="http://www.mapquest.com/maps/map.adp?city=Anchorage&state=AK&address=3401+Commercial+Dr.&zip=99501-3024&country=&zoom=8">-->
    <img height="15" src="Scraps/Rental_Images/map.gif" width="33"
border="0"></a>
    </p>
    </td>
  </tr>
  <tr>
    <td width="50%" valign="top">
    <p style="text-align: center; line-height:100%; margin-top:0;
margin-bottom:0"> 
    </p>
    <p style="text-align: center; line-height: 100%; margin-top:0;
margin-bottom:0">
    <b>Phone</b> - 907/272-4425<br>
     <b>Fax</b> - 907/272-9683 </p>

So from that I want output like :

United Rentals Inc.,3401 Commercial
Dr.,Anchorage,AK,"995013024","9072724425","9072729683"

or

United Rentals Inc.	3401 Commercial
Dr.	Anchorage	AK	995013024	9072724425	9072729683


I have been messing around with beautiful soup
(http://www.crummy.com/software/BeautifulSoup/index.html) but haven't
gotten very far. (specially because the html is so sloppy)

Any help would be really appreciated!  Just point me in the right
direction, what to use, examples...  Thanks!

-Sam




More information about the Python-list mailing list