How to find <tag> to </tag> HTML strings and 'save' them?

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Sun Mar 25 19:05:15 EDT 2007


En Sun, 25 Mar 2007 19:44:17 -0300, <mark at agtechnical.co.uk> escribió:

> from BeautifulSoup import BeautifulSoup
> import re
>
> page = open("soup_test/tomatoandcream.html", 'r')
> soup = BeautifulSoup(page)
>
> myTagSearch = str(soup.findAll('h2'))
>
> myFile = open('Soup_Results.html', 'w')
> myFile.write(myTagSearch)
> myFile.close()
>
> del myTagSearch
> ...............................
>
> Firstly, I'm getting the following character: "[" at the start, "]" at
> the end of the code. Along with "," in between each tag line listing.
> This seems like normal behaviour but I can't find the way to strip
> them out.

findAll() returns a list. You convert the list to its string  
representation, using str(...), and that's the way lists look like: with  
[] around, and commas separating elements. If you don't like that, don't  
use str(some_list).
Do you like an item by line? Use "\n".join(myTagSearch) (remember to strip  
the str() around findAll)
Do you like comma separated items? Use ",".join(myTagSearch)
Read about lists here http://docs.python.org/lib/typesseq.html and strings  
here http://docs.python.org/lib/string-methods.html

For the remaining questions, I strongly suggest reading the Python  
Tutorial (or any other book like Dive into Python). You should grasp some  
basic knowledge of the language at least, before trying to use other tools  
like BeautifulSoup; it's too much for a single step.

-- 
Gabriel Genellina




More information about the Python-list mailing list