"Soup Strainer" for ElementSoup?

Fredrik Lundh fredrik at pythonware.com
Sun Mar 30 12:45:51 EDT 2008


erikcw wrote:

> I'm parsing real-world HTML with BeautifulSoup and XML with
> cElementTree.
> 
> I'm guessing that the only benefit to using ElementSoup is that I'll
> have one less API to keep track of, right?  Or are there memory
> benefits in converting the Soup object to an ElementTree?

It's purely an API thing: ElementSoup loads the entire HTML file with 
BeautifulSoup, and then uses the resulting BS data structure to build an 
ET tree.

The ET tree doesn't contain cycles, though, so you can safely pull out 
the strings you need from ET and throw away the rest of the tree.

> Any idea about using a Soup Strainer with ElementSoup?

The strainer is used when parsing the file, to control what goes into 
the BS tree; to add straining support to ES, you could e.g. add a 
parseOnlyThese option that's passed through to BS.

</F>




More information about the Python-list mailing list