Regular Expression problem

Justin Azoff justin.azoff at gmail.com
Thu Jul 13 20:43:06 EDT 2006


John Blogger wrote:
> That I want a particular tag value of one of my HTML files.
>
> ie: I want only the value after 'href=' in the tag >>
>
> '<link href="mystylesheet.css" rel="stylesheet" type="text/css">'
>
> here it would be 'mystylesheet.css'. I used the following regex to get
> this value(I dont know if it is good).

No matter how good it is you should still use something that
understands html:

>>> from BeautifulSoup import BeautifulSoup
>>> html='<link href="mystylesheet.css" rel="stylesheet" type="text/css">'
>>> page=BeautifulSoup(html)
>>> page.link.get('href')
'mystylesheet.css'

-- 
- Justin




More information about the Python-list mailing list