Mining strings from a HTML document.

Wed Jan 25 04:23:44 EST 2006

Hi,

I am new to Python and have been doing most of my work with PHP until
now. I find Python to be *much* nicer for the development of local apps
(running on my machine) but I am very new to the Python way of thinking
and I don't realy know where to start other than just by doing it...so
far I'm just through the tut :)

My problem is as follows:
I have an html file with a list of records from a database. The list of
records is delimited with a comment and the format is as follows:

<!-- comment first-->
<a href="slfdhksah kkshdfksahdf">Record 1</a>
<b>Field1</b>Data data data<br><b>Field2</b>Data data
data<br><b>Field3</b>Data data data<br><b>Field4</b>Data data data<br>

<a href="slfdhksah kkshdfksahdf">Record 2</a>
<b>Field1</b>Data data data<br><b>Field2</b>Data data
data<br><b>Field3</b>Data data data<br><b>Field4</b>Data data data<br>

<a href="slfdhksah kkshdfksahdf">Record 3</a>
<b>Field1</b>Data data data<br><b>Field2</b>Data data
data<br><b>Field3</b>Data data data<br><b>Field4</b>Data data data<br>
<!-- comment last-->

The data fields could be up to 2 or 3 paragraphs each. The number and
names of fields may differ between records (some info in one, but not
the other - ie null values do not show up in the html)

What are the string functions I would use and how would I use them? I
saw something about html parsing in python, but that might be overkill.
Babysteps.

Thanks