newb: BeautifulSoup
7stud
bbxx789_05ss at yahoo.com
Fri Sep 21 03:38:24 EDT 2007
On Sep 20, 9:04 pm, crybaby <joemystery... at gmail.com> wrote:
> I need to traverse a html page with big table that has many row and
> columns. For example, how to go 35th td tag and do regex to retireve
> the content. After that is done, you move down to 15th td tag from
> 35th tag (35+15) and do regex to retrieve the content?
1) You can find your table using one of these methods:
a)
target_table = soup.find('table', id='car_parts')
b)
tables = soup.findall('table')
target_table = tables[2]
The tables are put in a list in the order that they appear on the
page.
2) You can get all the td's in the table using this statement:
all_tds = target_table.findall('td')
3) You can get the contents of the tags using these statements:
print all_tds[34].string
print all_tds[49].string
Here is an example:
from BeautifulSoup import BeautifulSoup
doc = """
<html>
<head>
<title></title>
</head>
<body>
<table>
</table>
<table>
<tr><td>hello</td></tr>
<tr><td>world</td><td>goodbye</td></tr>
</table>
</body>
</html>
"""
soup = BeautifulSoup(doc)
tables = soup.findAll('table')
target_table = tables[1]
all_tds = target_table.findAll('td')
print all_tds[0].string
print all_tds[2].string
--output:--
hello
goddbye
More information about the Python-list
mailing list