Parsing html with Beautifulsoup

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Mon Dec 14 17:39:35 EST 2009


En Mon, 14 Dec 2009 03:58:34 -0300, Johann Spies <jspies at sun.ac.za>
escribió:
> On Sun, Dec 13, 2009 at 07:58:55AM -0300, Gabriel Genellina wrote:

>> cell.findAll(text=True) returns a list of all text nodes inside a
>> <td> cell; I preprocess all \n and   in each text node, and
>> join them all. lines is a list of lists (each entry one cell), as
>> expected by the csv module used to write the output file.
>
> I have struggled a bit to find the documentation for (text=True).
> Most of documentation for Beautifulsoup I saw mostly contained some
> examples without explaining what the options do.  Thanks for your
> explanation.

See  
http://www.crummy.com/software/BeautifulSoup/documentation.html#arg-text

> As far as I can see there was no documentation installed with the
> debian package.

BeautifulSoup is very small - a single .py file, no dependencies. The  
whole documentation is contained in the above linked page.

-- 
Gabriel Genellina




More information about the Python-list mailing list