How to find <tag> to </tag> HTML strings and 'save' them?
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Sun Mar 25 19:05:15 EDT 2007
En Sun, 25 Mar 2007 19:44:17 -0300, <mark at agtechnical.co.uk> escribió:
> from BeautifulSoup import BeautifulSoup
> import re
>
> page = open("soup_test/tomatoandcream.html", 'r')
> soup = BeautifulSoup(page)
>
> myTagSearch = str(soup.findAll('h2'))
>
> myFile = open('Soup_Results.html', 'w')
> myFile.write(myTagSearch)
> myFile.close()
>
> del myTagSearch
> ...............................
>
> Firstly, I'm getting the following character: "[" at the start, "]" at
> the end of the code. Along with "," in between each tag line listing.
> This seems like normal behaviour but I can't find the way to strip
> them out.
findAll() returns a list. You convert the list to its string
representation, using str(...), and that's the way lists look like: with
[] around, and commas separating elements. If you don't like that, don't
use str(some_list).
Do you like an item by line? Use "\n".join(myTagSearch) (remember to strip
the str() around findAll)
Do you like comma separated items? Use ",".join(myTagSearch)
Read about lists here http://docs.python.org/lib/typesseq.html and strings
here http://docs.python.org/lib/string-methods.html
For the remaining questions, I strongly suggest reading the Python
Tutorial (or any other book like Dive into Python). You should grasp some
basic knowledge of the language at least, before trying to use other tools
like BeautifulSoup; it's too much for a single step.
--
Gabriel Genellina
More information about the Python-list
mailing list