SimplePrograms challenge

Wed Jun 13 11:36:31 EDT 2007

Rob Wolfe wrote:
> Steve Howell wrote:
> 
>> I suggested earlier that maybe we post multiple
>> solutions.  That makes me a little nervous, to the
>> extent that it shows that the Python community has a
>> hard time coming to consensus on tools sometimes.
> 
> We agree that BeautifulSoup is the best for parsing HTML. :)
> 
>> This is not a completely unfair knock on Python,
>> although I think the reason multiple solutions tend to
>> emerge for this type of thing is precisely due to the
>> simplicity and power of the language itself.
>>
>> So I don't know.  What about trying to agree on an XML
>> parsing example instead?
>>
>> Thoughts?
> 
> I vote for example with ElementTree (without xpath)
> with a mention of using ElementSoup for invalid HTML.

Sounds good to me.  Maybe something like::

import xml.etree.ElementTree as etree
dinner_recipe = '''
<ingredients>
<ing><amt><qty>24</qty><unit>slices</unit></amt><item>baguette</item></ing>
<ing><amt><qty>2+</qty><unit>tbsp</unit></amt><item>olive_oil</item></ing>
<ing><amt><qty>1</qty><unit>cup</unit></amt><item>tomatoes</item></ing>
<ing><amt><qty>1-2</qty><unit>tbsp</unit></amt><item>garlic</item></ing>
<ing><amt><qty>1/2</qty><unit>cup</unit></amt><item>Parmesan</item></ing>
<ing><amt><qty>1</qty><unit>jar</unit></amt><item>pesto</item></ing>
</ingredients>'''
pantry = set(['olive oil', 'pesto'])
tree = etree.fromstring(dinner_recipe)
for item_elem in tree.getiterator('item'):
     if item_elem.text not in pantry:
         print item_elem.text

Though I wouldn't know where to put the ElementSoup link in this one...

STeVe