Regular expression problem

Tim Legant tim-dated-1015476021.3d1cbb at catseye.net
Wed Feb 27 23:40:21 EST 2002


Asheesh Laroia <pan-news at asheeshenterprises.com> writes:

> This is great, thanks!
> 
> Only one problem.  I'm having trouble (I did give it a try) making the
> following work:
> 
> 	<@Trap Body text:>Useful Text
> 
> I need to still be able to extract "Useful Text", not delete it.
> 
> Thanks again!

Try this:

>>> rc = re.compile(r'<@Trap\s+\w+\s+\w+=?(?:<.+?>)*>', re.MULTILINE|re.DOTALL)

>>> text = """<@Trap Body text=<FONT "Times"><CCOLOR\n   "Black"><
11><HORIZONTAL 100><LETTERSPACE 0><CTRACK 127><CSSIZE 70><C+SIZE\n
58.3><C-POSITION 33.3><C+POSITION 33.3><P><CBASELINE 0><CNOBREAK
0><CLEADING -0.05\n  ><GGRID 0><GLEFT 0><GRIGHT 0><GFIRST 19.2><G+BEFORE
0><G+AFTER 0><GALIGNMENT \n  "justify\n  "><GMETHOD "proportional"><G&
"ENGLISH"><GPAIRS 4><G% 120><GKNEXT 0><GKWIDOW \n  1><GKORPHAN\n
1><GTABS $><GHYPHENATION 2 36 0><GWORDSPACE 75 100 150><GSPACE -5 0
25>>Useful Text"""

>>> rc.sub('', text)
'Useful Text'


Tim




More information about the Python-list mailing list