SimplePrograms challenge

Rob Wolfe rw at smsnet.pl
Wed Jun 13 15:31:42 EDT 2007


Steven Bethard <steven.bethard at gmail.com> writes:

>> I vote for example with ElementTree (without xpath)
>> with a mention of using ElementSoup for invalid HTML.
>
> Sounds good to me.  Maybe something like::
>
> import xml.etree.ElementTree as etree
> dinner_recipe = '''
> <ingredients>
> <ing><amt><qty>24</qty><unit>slices</unit></amt><item>baguette</item></ing>
> <ing><amt><qty>2+</qty><unit>tbsp</unit></amt><item>olive_oil</item></ing>
                                                      ^^^^^^^^^

Is that a typo here?

> <ing><amt><qty>1</qty><unit>cup</unit></amt><item>tomatoes</item></ing>
> <ing><amt><qty>1-2</qty><unit>tbsp</unit></amt><item>garlic</item></ing>
> <ing><amt><qty>1/2</qty><unit>cup</unit></amt><item>Parmesan</item></ing>
> <ing><amt><qty>1</qty><unit>jar</unit></amt><item>pesto</item></ing>
> </ingredients>'''
> pantry = set(['olive oil', 'pesto'])
> tree = etree.fromstring(dinner_recipe)
> for item_elem in tree.getiterator('item'):
>     if item_elem.text not in pantry:
>         print item_elem.text

That's nice example. :)

> Though I wouldn't know where to put the ElementSoup link in this one...

I had a regular HTML in mind, something like:

<code>
# HTML page
dinner_recipe = '''
<html><head><title>Recipe</title></head><body>
<table>
<tr><th>amt</th><th>unit</th><th>item</th></tr>
<tr><td>24</td><td>slices</td><td>baguette</td></tr>
<tr><td>2+</td><td>tbsp</td><td>olive_oil</td></tr>
<tr><td>1</td><td>cup</td><td>tomatoes</td></tr>
<tr><td>1-2</td><td>tbsp</td><td>garlic</td></tr>
<tr><td>1/2</td><td>cup</td><td>Parmesan</td></tr>
<tr><td>1</td><td>jar</td><td>pesto</td></tr>
</table>
</body></html>'''

# program
import xml.etree.ElementTree as etree
tree = etree.fromstring(dinner_recipe)

#import ElementSoup as etree                 # for invalid HTML
#from cStringIO import StringIO              # use this
#tree = etree.parse(StringIO(dinner_recipe)) # wrapper for BeautifulSoup

pantry = set(['olive oil', 'pesto'])

for ingredient in tree.getiterator('tr'):
    amt, unit, item = ingredient.getchildren()
    if item.tag == "td" and item.text not in pantry:
        print "%s: %s %s" % (item.text, amt.text, unit.text)
</code>

But if that's too complicated I will not insist on this. :)
Your example is good enough.

-- 
Regards,
Rob



More information about the Python-list mailing list