HTML to dictionary

WEINHANDL Herbert weinhand at unileoben.ac.at
Tue Feb 27 06:14:55 EST 2007


Tina I schrieb:
> Hi everyone,
> 
> I have a small, probably trivial even, problem. I have the following HTML:
>> <b>
>>  METAR:
>> </b>
>> ENBR 270920Z 00000KT 9999 FEW018 02/M01 Q1004 NOSIG
>> <br />
...

BeautifulSoup is really fun to work with ;-)

> I have played around with BeautifulSoup but I'm stuck at stripping off 
> the tags and chop it up to what I need to put in the dict. If someone 
> can offer some hints or example to get me going I would greatly 
> appreciate it.
> 
> Thanks!
> Tina

#!/usr/bin/python
# -*- coding: utf-8 -*-

from   BeautifulSoup import BeautifulSoup, Tag, NavigableString

html = """<html> <head><title>Title</title> </head>
<body>
<b> METAR:    </b> ENBR 270920Z 00000KT 9999 ... <br />
<b> short-TAF:</b> ENBR 270800Z 270918 VRB05KT ... <br />
<b> long-TAF: </b> ENBR 271212 VRB05KT 9999  ... <br />
</body>
</html>
"""

soup  = BeautifulSoup( html, convertEntities='html' )
bolds = soup.findAll( 'b' )

dict = {}

for b in bolds :
     key = b.next.strip()
     val = b.next.next.strip()
     print 'key=', key
     print 'val=', val, '\n'
     dict[key] = val

print dict

#---- end ----


happy pythoning

Herbert



More information about the Python-list mailing list