Startying with Python, need some pointers with manipulating strings

Kent Johnson kent3737 at yahoo.com
Thu Jan 27 17:21:07 EST 2005


Benji99 wrote:
> I've managed to load the html source I want into an object 
> called htmlsource using:
> 
> 
>>>>import urllib
>>>>sock = urllib.urlopen("URL Link")
>>>>htmlSource = sock.read()
>>>>sock.close()
> 
> 
> I'm assuming that htmlSource is a string with \n at the end of 
> each line.
> NOTE: I've become very accustomed with the TStringList class in 
> Delphi so forgive me if I'm trying to work in that way with 
> Python...
> 
> Basically, I want to search through the whole string( 
> htmlSource), for a specific keyword, when it's found, I want to 
> know which line it's on so that I can retrieve that line and 
> then I should be able to parse/extract what I need using Regular 
> Expressions (which I'm getting quite confortable with). So how 
> can this be accomplished?

The Pythonic way to do this is to iterate through the lines of htmlSource and process them one at a 
time.
htmlSource = htmlSource.split('\n')  # Split on newline, making a list of lines
for line in htmlSource:
   # Do something with line - check to see if it has the text of interest

You might want to look at Beautiful Soup. If you can find the links of interest by the tags around 
them it might do what you want:
http://www.crummy.com/software/BeautifulSoup/

Kent



More information about the Python-list mailing list