Reg Exp: Need advice concerning "greediness"

Franz GEIGER fgeiger at datec.at
Sat Sep 30 09:08:02 EDT 2000


Hello all,

I want to exchange font colors of headings of a certain level in HTML files.

I have a line containing a heading level 1, e.g.: <h1><font
COLOR="#FF0000">Heading Level 1</font></h1>.

Now I want to split this into 3 groups: Everything before "COLOR=xyz",
"COLOR=xyz" itself, and everything after "COLOR=xyz".

I tried:
sRslt = "<h1><font COLOR="#FF0000">Heading Level 1</font></h1>";
print re.findall(re.compile(r'(.*?FONT.*?)(COLOR=.*?)*([ |>].*)', re.I |
re.S), sRslt);

This returns [("<h1><font, , COLOR="#FF0000">Heading Level 1</font></h1>)].
I'd expected to receive [("<h1><font , COLOR="#FF0000", >Heading Level
1</font></h1>)].

It works if I replace (COLOR=.*?)* by (COLOR=.*?). But I need having the '*'
because there may be headings w/o the color attribute but with a face
attribute.

As I understood until now, '*' means 'zero or more of preceeding, but as
many as possible'. If a color attribute is present, 'as many as possible'
means 'the one that is there', doesn't it? If there is no such attribute,
well - then it's 'zero'.

What did I miss?

Best regards
Franz






More information about the Python-list mailing list