Regular Expressions

Robert Brewer fumanchu at amor.org
Mon Apr 26 11:35:10 EDT 2004


sjf wrote:
> I would like to please to help me with build a regular expression.
> There are following piece of html code in my files:
> 
> <FONT COLOR="#FF0000">A - TYPE1: any_text<BR>
> B - TYPE2: any_text_2<BR>
> C - TYPE2: any_text_3<BR>
> w - any_text_15<BR>
> </FONT>
> html code
> </BODY></HTML>
> 
> I need to have only following data:
> (B, any_text_2)
> (C, any_text_3)
> that is, these data TYPE2 in which.

If you can guarantee that every TYPE2 is on its own line with the same
formatting:

>>> s = '<FONT COLOR="#FF0000">A - TYPE1: any_text<BR>\nB - TYPE2:
any_text_2<BR>\nC - TYPE2: any_text_3<BR>\nw -
any_text_15<BR>\n</FONT>\nhtml code'
>>> import re
>>> re.findall(r'(?m)^(.) - TYPE2: (.*)<BR>$', s)
[('B', 'any_text_2'), ('C', 'any_text_3')]


Robert Brewer
MIS
Amor Ministries
fumanchu at amor.org




More information about the Python-list mailing list