[Numpy-discussion] Importing data from html tables
Robert Kern
robert.kern at gmail.com
Fri Sep 7 14:17:44 EDT 2007
ale wrote:
> Hi,
> I'm trying to import into array the data contained in a html table.
> I use BeautifulSoup as html parser
>
> html = open('T0015.html','r')
> bs = BeautifulSoup(html)
> for tr in bs.findAll('tr')[1:]:
> table.append([td.p.string for td in tr.findAll('td')])
>
> and I get this:
>
> print table
>
> [[u'1925', u'--', u'--', u'--', u'--', u'--', u'--', u'--', u'--', u'105.0']
> [u'1926', u'--', u'--', u'--', u'--', u'--', u'--', u'--', u'--', u'136.0']
> [u'1927', u'--', u'--', u'--', u'--', u'--', u'--', u'--', u'--', u'51.0']
> [u'1928', u'--', u'--', u'--', u'nn', u'--', u'--', u'--', u'--', u'104.0']
> ,.......and so on]
>
> How to put this list of list of strings in a numpy array, and set '--'
> and 'nn' as NaN?
from numpy import array, nan
def myfloat(x):
if x == '--':
return nan
else:
return float(x)
arr = array([map(myfloat, row) for row in table])
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
More information about the NumPy-Discussion
mailing list