Regular expression problem

Wolfgang Grafen wolfgang.grafen at gmx.de
Wed Feb 27 20:19:56 EST 2002


import re

rc=re.compile("<@Trap Body text\s*"
                          "(?:(?P<assigned>=)|(?P<unassigned>>))\s*"
                          "(?P<rest>.*?)\s*\Z",
                         re.MULTILINE|re.DOTALL).match

t1='<@Trap Body text>'
t2='<@Trap Body text=<FONT "Times">'
t3="""<@Trap Body text=<FONT "Times"><CCOLOR\n   "Black"><SIZE 11><HORIZONTAL
100><LETTERSPACE 0><CTRACK 127><CSSIZE 70><C+SIZE\n  58.3><C-POSITION 33.3><C+POSITION
33.3><P><CBASELINE 0><CNOBREAK 0><CLEADING -0.05\n  ><GGRID 0><GLEFT 0><GRIGHT 0><GFIRST
19.2><G+BEFORE 0><G+AFTER 0><GALIGNMENT \n  "justify\n  "><GMETHOD "proportional"><G&
"ENGLISH"><GPAIRS 4><G% 120><GKNEXT 0><GKWIDOW \n  1><GKORPHAN\n  1><GTABS $><GHYPHENATION
2 36 0><GWORDSPACE 75 100 150><GSPACE -5 0 25>>"""

rc(t1).groups()
(None, '>', '')

rc(t2).groups()
('=', None, '<FONT "Times">')

rc(t3).groups()
('=', None, '<FONT "Times"><CCOLOR\n   "Black"><SIZE 11><HORIZONTAL 100><LETTERSPACE
0><CTRACK 127><CSSIZE 70><C+SIZE\n  58.3><C-POSITION 33.3><C+POSITION 33.3><P><CBASELINE
0><CNOBREAK 0><CLEADING -0.05\n  ><GGRID 0><GLEFT 0><GRIGHT 0><GFIRST 19.2><G+BEFORE
0><G+AFTER 0><GALIGNMENT \n  "justify\n  "><GMETHOD "proportional"><G& "ENGLISH"><GPAIRS
4><G% 120><GKNEXT 0><GKWIDOW \n  1><GKORPHAN\n  1><GTABS $><GHYPHENATION 2 36 0><GWORDSPACE
75 100 150><GSPACE -5 0 25>>')

cheers

wolfgang


Asheesh Laroia schrieb:

> I have some SGML input (PageMaker 6.5 tagged text), and I want to be able
> to recognize (and delete) a tag.  That tag looks like:
>
>         <@Trap Body text:>
>
> It may also look like <@Trap Body text: useless-data>.
>
> So, I tried the regular expression r"<@.?>".  That doesn't match the
> above string.  Nor does r"<@.?Trap Body text.?>".  What RE should I be
> using, and why doesn't this work?
>
> Thanks in advance!
>
> -- Asheesh Laroia.
>
> PS: An example of the tag "in the wild" is the following string:
>
> <@Trap Body text=<FONT "Times"><CCOLOR
>  "Black"><SIZE 11><HORIZONTAL 100><LETTERSPACE 0><CTRACK 127><CSSIZE 70><C+SIZE
> 58.3><C-POSITION 33.3><C+POSITION 33.3><P><CBASELINE 0><CNOBREAK 0><CLEADING -0.05
> ><GGRID 0><GLEFT 0><GRIGHT 0><GFIRST 19.2><G+BEFORE 0><G+AFTER 0><GALIGNMENT "justify
> "><GMETHOD "proportional"><G& "ENGLISH"><GPAIRS 4><G% 120><GKNEXT 0><GKWIDOW 1><GKORPHAN
> 1><GTABS $><GHYPHENATION 2 36 0><GWORDSPACE 75 100 150><GSPACE -5 0 25>>




More information about the Python-list mailing list