Regular Expression problem

Ant antroy at gmail.com
Fri Jul 14 02:04:56 EDT 2006


> So What should I do to get the exact value(here the value after
> 'href=') in any case even if the
>
> tags are like these? >>
>
> <link rel="stylesheet" href="mystylesheet.css" type="text/css">
> -OR-
> <link href="mystylesheet.css" rel="stylesheet" type="text/css">
> -OR-
> <link type="text/css" href="mystylesheet.css" rel="stylesheet">

The following should do it:

expr = r'<link .*?href="(.*?)"'

or if single quotes might have been used:

expr = r'''<link .*?href=["'](.*?)['"]'''

But like the others have said, beautiful soup is very good for things
like this.




More information about the Python-list mailing list