RE multiline

Diez B. Roggisch deets at nospam.web.de
Sun Nov 30 15:33:41 EST 2008


Guy Doune schrieb:
> Hi,
> 
> I try to figure out what gonna be the equivalent of :
> 
> (.*?)
> 
> For the same purpose on multiline basis.
> 
> I would like completed the variable part of elements that I searching for.
> 
> Example :
> 
> <table width="95%" cellpadding="0" cellspacing="0" border="0" 
> align="center">
> 
> Is the begining of the variable element that I wanna completed...
> 
> </table>
> 
> Is the end of the element, so, I would like to completed what between 
> those two patterns.
> 
> 
> pattern1+r"(.*?)"+pattern2
> 
> Was working ok for a single line selection like :
> 
> <table width="95%" cellpadding="0" cellspacing="0" border="0" 
> align="center">"variable element of the search"</table>
> 
> I hoped that I have been clear.

See the flags of module re - especially re.DOTALL.

However, you just experience that regular expresions aren't the proper 
tool for the job of dealing with HTML/XML.

What would you do if for example a table was nested inside another?

Instead, use tools like BeautifulSoup or lxml which provide 
error-tolerant HTML-parsers and expression/filter-based element 
extraction. That's much better suited for your task.

Diez



More information about the Python-list mailing list