getfirst and re

Tim Chase python.list at tim.thechases.com
Wed Jan 6 12:59:52 EST 2010


Victor Subervi wrote:
> On Wed, Jan 6, 2010 at 1:27 PM, Tim Chase <python.list at tim.thechases.com>wrote:
> 
>> But if you're using it on HTML form text, regexps are usually the wrong
>> tool, and you should be using an HTML parser (such as BeautifulSoup) that
>> knows how to handle odd text and escapings better and more robustly than
>> regexps will
> 
> I have an automatically generated HTML form from which I need to extract
> data to the script which this form calls (to which the information is sent).
> I believe BeautifulSoup is geared to scraping pages that exist permanently
> on the web. By the time BeautifulSoup was called, this page would be gone.

BeautifulSoup takes string data fed to it, and builds a structure 
that can be neatly navigated.  That string data can come from a 
web page, from a disk, or even a serial port, a 
random-character-generator, or just from HTML that's built up in 
memory and never sees a network or a disk.  It's worth reading 
its documentation[1] and trying its examples to get familiar with it.

-tkc


[1]
http://www.crummy.com/software/BeautifulSoup/documentation.html






More information about the Python-list mailing list