Regular expression problem
Wolfgang Grafen
wolfgang.grafen at gmx.de
Wed Feb 27 20:19:56 EST 2002
import re
rc=re.compile("<@Trap Body text\s*"
"(?:(?P<assigned>=)|(?P<unassigned>>))\s*"
"(?P<rest>.*?)\s*\Z",
re.MULTILINE|re.DOTALL).match
t1='<@Trap Body text>'
t2='<@Trap Body text=<FONT "Times">'
t3="""<@Trap Body text=<FONT "Times"><CCOLOR\n "Black"><SIZE 11><HORIZONTAL
100><LETTERSPACE 0><CTRACK 127><CSSIZE 70><C+SIZE\n 58.3><C-POSITION 33.3><C+POSITION
33.3><P><CBASELINE 0><CNOBREAK 0><CLEADING -0.05\n ><GGRID 0><GLEFT 0><GRIGHT 0><GFIRST
19.2><G+BEFORE 0><G+AFTER 0><GALIGNMENT \n "justify\n "><GMETHOD "proportional"><G&
"ENGLISH"><GPAIRS 4><G% 120><GKNEXT 0><GKWIDOW \n 1><GKORPHAN\n 1><GTABS $><GHYPHENATION
2 36 0><GWORDSPACE 75 100 150><GSPACE -5 0 25>>"""
rc(t1).groups()
(None, '>', '')
rc(t2).groups()
('=', None, '<FONT "Times">')
rc(t3).groups()
('=', None, '<FONT "Times"><CCOLOR\n "Black"><SIZE 11><HORIZONTAL 100><LETTERSPACE
0><CTRACK 127><CSSIZE 70><C+SIZE\n 58.3><C-POSITION 33.3><C+POSITION 33.3><P><CBASELINE
0><CNOBREAK 0><CLEADING -0.05\n ><GGRID 0><GLEFT 0><GRIGHT 0><GFIRST 19.2><G+BEFORE
0><G+AFTER 0><GALIGNMENT \n "justify\n "><GMETHOD "proportional"><G& "ENGLISH"><GPAIRS
4><G% 120><GKNEXT 0><GKWIDOW \n 1><GKORPHAN\n 1><GTABS $><GHYPHENATION 2 36 0><GWORDSPACE
75 100 150><GSPACE -5 0 25>>')
cheers
wolfgang
Asheesh Laroia schrieb:
> I have some SGML input (PageMaker 6.5 tagged text), and I want to be able
> to recognize (and delete) a tag. That tag looks like:
>
> <@Trap Body text:>
>
> It may also look like <@Trap Body text: useless-data>.
>
> So, I tried the regular expression r"<@.?>". That doesn't match the
> above string. Nor does r"<@.?Trap Body text.?>". What RE should I be
> using, and why doesn't this work?
>
> Thanks in advance!
>
> -- Asheesh Laroia.
>
> PS: An example of the tag "in the wild" is the following string:
>
> <@Trap Body text=<FONT "Times"><CCOLOR
> "Black"><SIZE 11><HORIZONTAL 100><LETTERSPACE 0><CTRACK 127><CSSIZE 70><C+SIZE
> 58.3><C-POSITION 33.3><C+POSITION 33.3><P><CBASELINE 0><CNOBREAK 0><CLEADING -0.05
> ><GGRID 0><GLEFT 0><GRIGHT 0><GFIRST 19.2><G+BEFORE 0><G+AFTER 0><GALIGNMENT "justify
> "><GMETHOD "proportional"><G& "ENGLISH"><GPAIRS 4><G% 120><GKNEXT 0><GKWIDOW 1><GKORPHAN
> 1><GTABS $><GHYPHENATION 2 36 0><GWORDSPACE 75 100 150><GSPACE -5 0 25>>
More information about the Python-list
mailing list