Parsing an HTML file

Paul McGuire ptmcg at users.sourceforge.net
Wed Dec 17 15:53:56 EST 2003


"CodeGuru73" <eddiembabaali at yahoo.com> wrote in message
news:5e290f27.0312170808.4590723e at posting.google.com...
> I am trying to find the best way to parse a bunch of html files. They
> are all simillar in structure and I need to get them into a database.
> Their relevant structure is:
> <html><head></head>
> <body>
> <h1>title</h1>
> <address> authors </address>
> <div> Main html content</div>
>
> I basically need to get the values between <h1></h1>,
> <address></address> and <div></div>
>
> I am able to read the the files into an array.

Check out this simple XML parsing code:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/157358

-- Paul






More information about the Python-list mailing list