regular expression, help

MRAB google at mrabarnett.plus.com
Tue Jan 27 12:39:30 EST 2009


Vincent Davis wrote:
> I think there are two parts to this question and I am sure lots I am 
> missing. I am hoping an example will help me
> I have a html doc that I am trying to use regular expressions to get a 
> value out of.
> here is an example or the line
> <td colspan='2'>Parcel ID: 39-034-15-009 </td>
> I want to get the number "39-034-15-009" after "Parcel ID:" The number 
> will be different each time but always the same format.
> I think I can match "Parcel ID:" but not sure how to get the number 
> after. "Parcel ID:" only occurs once in the document.
> 
> is this how i need to start?
> pid = re.compile('Parcel ID: ')
> 
> Basically I am completely lost and am not finding examples I find helpful.
> 
> I am getting the html using myurl=urllib.urlopen(). 
> Can I use RE like this
> thenum=pid.match(myurl) 
> 
> 
> I think the two key things I need to know are
> 1, how do I get the text after a match?
> 2, when I use myurl=urllib.urlopen(http://.......). can I use the myurl 
> as the string in a RE, thenum=pid.match(myurl)
> 
Something like:

pid = re.compile(r'Parcel ID: (\d+(?:-\d+)*)')
myurl = urllib.urlopen(url)
text = myurl.read()
myurl.close()
thenum = pid.search(text).group(1)

Although BeautifulSoup is the preferred solution.



More information about the Python-list mailing list