ask for a RE pattern to match TABLE in html

David C. Ullrich dullrich at sprynet.com
Thu Jun 26 14:26:55 EDT 2008


In article <mailman.877.1214489508.1044.python-list at python.org>,
 Cédric Lucantis <omer at no-log.org> wrote:

> Le Thursday 26 June 2008 15:53:06 oyster, vous avez écrit :
> > that is, there is no TABLE tag between a TABLE, for example
> > <table >something with out table tag</table>
> > what is the RE pattern? thanks
> >
> > the following is not right
> > <table.*?>[^table]*?</table>
> 
> The construct [abc] does not match a whole word but only one char, so  
> [^table] means "any char which is not t, a, b, l or e".
> 
> Anyway the inside table word won't match your pattern, as there are '<' 
> and '>' in it, and these chars have to be escaped when used as simple text.
> So this should work:
> 
> re.compile(r'<table(|[ ].*)>.*</table>')
>                     ^ this is to avoid matching a tag name starting with 
>                     table 
> (like <table_ext>)

Doesn't work - for example it matches '<table></table><table></table>'
(and in fact if the html contains any number of tables it's going
to match the string starting at the start of the first table and
ending at the end of the last one.)

-- 
David C. Ullrich



More information about the Python-list mailing list