[Numpy-discussion] Importing data from html tables

Robert Kern robert.kern at gmail.com
Fri Sep 7 14:17:44 EDT 2007


ale wrote:
> Hi,
> I'm trying to import into array the data contained in a html table.
> I use BeautifulSoup as html parser
> 
> html = open('T0015.html','r')
> bs = BeautifulSoup(html)
> for tr in bs.findAll('tr')[1:]:
>         table.append([td.p.string for td in tr.findAll('td')])
> 
> and I get this:
> 
> print table
> 
> [[u'1925', u'--', u'--', u'--', u'--', u'--', u'--', u'--', u'--', u'105.0']
> [u'1926', u'--', u'--', u'--', u'--', u'--', u'--', u'--', u'--', u'136.0']
> [u'1927', u'--', u'--', u'--', u'--', u'--', u'--', u'--', u'--', u'51.0']
> [u'1928', u'--', u'--', u'--', u'nn', u'--', u'--', u'--', u'--', u'104.0']
> ,.......and so on]
> 
> How to put this list of list of strings in a numpy array, and set '--'
> and 'nn' as NaN?

from numpy import array, nan

def myfloat(x):
  if x == '--':
    return nan
  else:
    return float(x)

arr = array([map(myfloat, row) for row in table])

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco



More information about the NumPy-Discussion mailing list