[python-win32] Parse HTML String only not file

Randy Syring rsyring at inteli-com.com
Thu Jun 17 22:42:12 CEST 2010


We have had great success with PyQuery for getting API access to XML data:

http://pypi.python.org/pypi/pyquery

--------------------------------------
Randy Syring
Intelicom
502-644-4776

"Whether, then, you eat or drink or 
whatever you do, do all to the glory
of God." 1 Cor 10:31



Tim Roberts wrote:
> On 6/17/2010 11:09 AM, Mauricio Martinez Garcia wrote:
>   
>> Hi, how can parse an HTML String.
>> I need parse next Line :
>>
>> '<FIELD><NAME>BSCS
>> status</NAME><TYPE>string</TYPE><VALUE>none</VALUE></FIELD><FIELD><NAME>TopCre_life</NAME><TYPE>integer</TYPE><VALUE>0</VALUE></FIELD>'
>>     
>
> That's not HTML.  It's XML.  You CAN parse this with the SGMLParser
> (since XML is a variant of SGML), but you might consider whether you
> would be better served using xmllib, or even xml.sax.
>
>
>   
>> Result of program its:
>>
>> bash-3.1$ ./pruebasDOM.py
>> ['BSCS status']
>> ['string']
>> ['none']
>> ['TopCre_life']
>> ['integer']
>> ['0']
>>
>>
>> I can't pass the data to one dict() or [].  I need all values, ['BSCS
>> Status', 'string', 'none', 'TopCre_life', 'integer', '0']
>>
>> That i can do?
>>     
>
> Of course.  Just change your ParserHTML class to create a list in "def
> __init__", then append the values that you get to the list instead of
> printing them.  So, for example:
>
> class ParserHTML(SGMLParser):
>     def __init__(self):
>         SGMLParser.__init__(self)
>         self.results = []
>     ...
>     def handle_data(self, data):
>         ...
>         self.results.append(data)
>     ...
> if __name__ == '__main__':
>     ...
>     p = ParserHTML()
>     p.feed(node)
>     print p.results
>
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-win32/attachments/20100617/1d0e8621/attachment.html>


More information about the python-win32 mailing list