ask for a RE pattern to match TABLE in html

MRAB google at mrabarnett.plus.com
Thu Jun 26 18:22:54 EDT 2008


On Jun 26, 7:26 pm, "David C. Ullrich" <dullr... at sprynet.com> wrote:
> In article <mailman.877.1214489508.1044.python-l... at python.org>,
>  Cédric Lucantis <o... at no-log.org> wrote:
>
>
>
> > Le Thursday 26 June 2008 15:53:06 oyster, vous avez écrit :
> > > that is, there is no TABLE tag between a TABLE, for example
> > > <table >something with out table tag</table>
> > > what is the RE pattern? thanks
>
> > > the following is not right
> > > <table.*?>[^table]*?</table>
>
> > The construct [abc] does not match a whole word but only one char, so  
> > [^table] means "any char which is not t, a, b, l or e".
>
> > Anyway the inside table word won't match your pattern, as there are '<'
> > and '>' in it, and these chars have to be escaped when used as simple text.
> > So this should work:
>
> > re.compile(r'<table(|[ ].*)>.*</table>')
> >                     ^ this is to avoid matching a tag name starting with
> >                     table
> > (like <table_ext>)
>
> Doesn't work - for example it matches '<table></table><table></table>'
> (and in fact if the html contains any number of tables it's going
> to match the string starting at the start of the first table and
> ending at the end of the last one.)
>
Try something like:

re.compile(r'<table\b.*?>.*?</table>', re.DOTALL)



More information about the Python-list mailing list