"Soup Strainer" for ElementSoup?

erikcw erikwickstrom at gmail.com
Tue Mar 25 16:13:13 EDT 2008


On Mar 25, 12:17 am, John Nagle <na... at animats.com> wrote:
> erikcwwrote:
> > Hi all,
>
> > I was reading in the Beautiful Soup documentation that you should use
> > a "Soup Strainer" object to keep memory usage down.
>
> > Since I'm already using Element Tree elsewhere in the project, I
> > figured it would make sense to use ElementSoup to keep the api
> > consistent. (and cElementTree should be faster right??).
>
> > I can't seem to figure out how to pass ElementSoup a "soup strainer"
> > though.
>
> > Any ideas?
>
> > Also - do I need to use the extract() method with ElementSoup like I
> > do with Beautiful Soup to keep garbage collection working?
>
> > Thanks!
> > Erik
>
>     I really should get my version of BeautifulSoup merged back into
> the mainstream.  I have one that's been modified to use weak pointers
> for all "up" and "left" links, which makes the graph cycle free. So
> the memory is recovered by reference count update as soon as you
> let go of the head of the tree.  That helps with the garbage problem.
>
>     What are you parsing?  If you're parsing well-formed XML,
> BeautifulSoup is overkill.  If you're parsing real-world HTML,
> ElementTree is too brittle.
>
>                                         John Nagle

I'm parsing real-world HTML with BeautifulSoup and XML with
cElementTree.

I'm guessing that the only benefit to using ElementSoup is that I'll
have one less API to keep track of, right?  Or are there memory
benefits in converting the Soup object to an ElementTree?

Any idea about using a Soup Strainer with ElementSoup?

Thanks!



More information about the Python-list mailing list