Extracting xml from html

kyosohma at gmail.com kyosohma at gmail.com
Wed Sep 19 10:08:12 EDT 2007


On Sep 19, 3:13 am, Stefan Behnel <stefan.behnel-n05... at web.de> wrote:
> kyoso... at gmail.com wrote:
> > Does this make sense? It works pretty well, but I don't really
> > understand everything that I'm doing.
>
> > def Parser(filename):
>
> It's uncommon to give a function a capitalised name, unless it's a factory
> function (which this isn't).
>

Yeah. I was going to use a class (and I still might), so that's how it
got capitalized.


> >     events = ("recordnum", "primaryowner", "customeraddress")
>
> You're not using this anywhere below, so I assume this is left-over code.
>

I realized I didn't need that line soon after I posted. Sorry about
that!


> You could do this more easily in a couple of ways. One is to use XPath:
>
>    print [el.text for el in tree.xpath("//primaryowner|//customeraddress")]
>

This works quite well. Wish I'd thought of it.

> Note that this works directly on the tree that you retrieved right in the
> third line of your code.
>
> Another (and likely simpler) solution is to first find the "Row" element and
> then start from that:
>
>    row = tree.find("//Row")
>    print row.findtext("primaryowner")
>    print row.findtext("customeraddress")
>

I tried this your way and Laurent's way and both give me this error:

AttributeError: 'NoneType' object has no attribute 'findtext'


> See the lxml tutorial on this, as well as the documentation on XPath support
> and tree iteration:
>
> http://codespeak.net/lxml/xpathxslt.html#xpathhttp://codespeak.net/lxml/api.html#iteration
>
> Hope this helps,
> Stefan

I'm not sure what George's deal is. I'm not a beginner with Python,
just with lxml. I don't have all the hundreds of modules of Python
memorized and I have yet to meet any one who does. Even if I had used
Beautiful Soup, my code would probably still suck and I was told
explicitly by my boss to avoid adding new dependencies to my programs
whenever possible.

Thanks for the help. I'll add the list comprehension to my code.

Mike




More information about the Python-list mailing list