Beautiful Soup Table Parsing

Tom Russell tsrdatatech at gmail.com
Wed Aug 8 19:58:56 EDT 2012


I am parsing out a web page at
http://online.wsj.com/mdc/public/page/2_3021-tradingdiary2.html?mod=mdc_pastcalendar
using BeautifulSoup.

My problem is that I can parse into the table where the data I want
resides but I cannot seem to figure out how to go about grabbing the
contents of the cell next to my row header I want.

For instance this code below:

soup = BeautifulSoup(urlopen('http://online.wsj.com/mdc/public/page/2_3021-tradingdiary2.html?mod=mdc_pastcalendar'))

table = soup.find("table",{"class": "mdcTable"})
for row in table.findAll("tr"):
    for cell in row.findAll("td"):
        print cell.findAll(text=True)

brings in a list that looks like this:

[u'NYSE']
[u'Latest close']
[u'Previous close']
[u'Week ago']
[u'Issues traded']
[u'3,114']
[u'3,136']
[u'3,134']
[u'Advances']
[u'1,529']
[u'1,959']
[u'1,142']
[u'Declines']
[u'1,473']
[u'1,070']
[u'1,881']
[u'Unchanged']
[u'112']
[u'107']
[u'111']
[u'New highs']
[u'141']
[u'202']
[u'222']
[u'New lows']
[u'15']
[u'11']
[u'42']
[u'Adv. volume*']
[u'375,422,072']
[u'502,402,887']
[u'345,372,893']
[u'Decl. volume*']
[u'245,106,870']
[u'216,507,612']
[u'661,578,907']
[u'Total volume*']
[u'637,047,653']
[u'728,170,765']
[u'1,027,754,710']
[u'Closing tick']
[u'+131']
[u'+102']
[u'-505']
[u'Closing Arms (TRIN)\x86']
[u'0.62']
[u'0.77']
[u'1.20']
[u'Block trades*']
[u'3,874']
[u'4,106']
[u'4,463']
[u'Adv. volume']
[u'1,920,440,454']
[u'2,541,919,125']
[u'1,425,279,645']
[u'Decl. volume']
[u'1,149,672,387']
[u'1,063,007,504']
[u'2,812,073,564']
[u'Total volume']
[u'3,186,154,537']
[u'3,643,871,536']
[u'4,322,541,539']
[u'Nasdaq']
[u'Latest close']
[u'Previous close']
[u'Week ago']
[u'Issues traded']
[u'2,607']
[u'2,604']
[u'2,554']
[u'Advances']
[u'1,085']
[u'1,596']
[u'633']
[u'Declines']
[u'1,390']
[u'880']
[u'1,814']
[u'Unchanged']
[u'132']
[u'128']
[u'107']
[u'New highs']
[u'67']
[u'87']
[u'41']
[u'New lows']
[u'36']
[u'36']
[u'83']
[u'Closing tick']
[u'+225']
[u'+252']
[u'+588']
[u'Closing Arms (TRIN)\x86']
[u'0.48']
[u'0.46']
[u'0.69']
[u'Block trades']
[u'10,790']
[u'8,961']
[u'5,890']
[u'Adv. volume']
[u'1,114,620,628']
[u'1,486,955,619']
[u'566,904,549']
[u'Decl. volume']
[u'692,473,754']
[u'377,852,362']
[u'1,122,931,683']
[u'Total volume']
[u'1,856,979,279']
[u'1,883,468,274']
[u'1,714,837,606']
[u'NYSE Amex']
[u'Latest close']
[u'Previous close']
[u'Week ago']
[u'Issues traded']
[u'434']
[u'432']
[u'439']
[u'Advances']
[u'185']
[u'204']
[u'202']
[u'Declines']
[u'228']
[u'202']
[u'210']
[u'Unchanged']
[u'21']
[u'26']
[u'27']
[u'New highs']
[u'10']
[u'12']
[u'29']
[u'New lows']
[u'4']
[u'7']
[u'13']
[u'Adv. volume*']
[u'2,365,755']
[u'5,581,737']
[u'11,992,771']
[u'Decl. volume*']
[u'4,935,335']
[u'4,619,515']
[u'15,944,286']
[u'Total volume*']
[u'7,430,052']
[u'10,835,106']
[u'28,152,571']
[u'Closing tick']
[u'+32']
[u'+24']
[u'+24']
[u'Closing Arms (TRIN)\x86']
[u'1.63']
[u'0.64']
[u'1.12']
[u'Block trades*']
[u'75']
[u'113']
[u'171']
[u'NYSE Arca']
[u'Latest close']
[u'Previous close']
[u'Week ago']
[u'Issues traded']
[u'1,188']
[u'1,205']
[u'1,176']
[u'Advances']
[u'580']
[u'825']
[u'423']
[u'Declines']
[u'562']
[u'361']
[u'730']
[u'Unchanged']
[u'46']
[u'19']
[u'23']
[u'New highs']
[u'17']
[u'45']
[u'42']
[u'New lows']
[u'5']
[u'25']
[u'12']
[u'Adv. volume*']
[u'72,982,336']
[u'140,815,734']
[u'73,868,550']
[u'Decl. volume*']
[u'58,099,822']
[u'31,998,976']
[u'185,213,281']
[u'Total volume*']
[u'146,162,965']
[u'175,440,329']
[u'260,075,071']
[u'Closing tick']
[u'+213']
[u'+165']
[u'+83']
[u'Closing Arms (TRIN)\x86']
[u'0.86']
[u'0.73']
[u'1.37']
[u'Block trades*']
[u'834']
[u'1,043']
[u'1,593']

What I want to do is only be getting the data for NYSE and nothing
else so I do not know if that's possible or not. Also I want to do
something like:

If cell.contents[0] == "Advances":
    Advances = next cell or whatever??---> this part I am not sure how to do.

Can someone help point me in the right direction to get the first data
point for the Advances row? I have others I will get as well but
figure once I understand how to do this I can do the rest.

Thanks,

Tom



More information about the Python-list mailing list