Regular Expressions

Benjamin Arai benjamin at araisoft.com
Mon Apr 26 11:57:51 EDT 2004


I would just use the re library because regular expressions will allow
you to get right down to the data on the first try anyways without
further parsing.  If you use the htmlparser library first it may cause
some unneeded processing time.

On Mon, 2004-04-26 at 06:38, Diez B. Roggisch wrote:

> > <FONT COLOR="#FF0000">A - TYPE1: any_text<BR>
> > B - TYPE2: any_text_2<BR>
> > C - TYPE2: any_text_3<BR>
> > w - any_text_15<BR>
> > </FONT>
> > html code
> > </BODY></HTML>
> > 
> > I need to have only following data:
> > (B, any_text_2)
> > (C, any_text_3)
> > that is, these data TYPE2 in which.
> 
> you should utilize the htmlparser class to extract the text first. Then this
> regular expression might help:
> 
> r"(.) TYPE. : (.*)"
> 
>  
> -- 
> Regards,
> 
> Diez B. Roggisch

Benjamin Arai
Araisoft

Email: benjamin at araisoft.com
Website: http://www.araisoft.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20040426/05471cff/attachment.html>


More information about the Python-list mailing list