[regexp] Where's the error in this ini-file reading regexp?

F. GEIGER fgeiger at datec.at
Wed Nov 13 14:05:15 EST 2002


Dear all,

I have to parse a string which contains data in ini-file-format, i.e.:

   s = \
'''
[Section 1]
Key11=Value11
Key12=Value12
Key13=Value13
[Section 2]
Key21=Value21
Key22=Value22
Key23=Value23
'''

I decided to try a solution using re (yes, ConfigParser or simple string
splitting would be an other way to do it), because the structure really is
regular: Sections, which contain key/value pairs.

I'm sure there's only a tiny step left to succeed.

I tried:

rex = re.compile(r"(\s*(\[.+\])(\s*((.+)=(.+)))+?)",
re.MULTILINE|re.IGNORECASE)
L = rex.findall(s)

where s is the string already shown above.

What I get is:

[('\n[Section 1]\nKey11=Value11', '[Section 1]', '\nKey11=Value11', \
'Key11=Value11', 'Key11', 'Value11'), ('\n[Section 2]\nKey21=Value21', \
'[Section 2]', '\nKey21=Value21', 'Key21=Value21', 'Key21', 'Value21')]

So L[0][1] contains the string 'Section1', L[0][4] the string 'Key11',
L[0][5] the string 'Value11'.
Section2 is also there and is contained by L[1].

So what I have is both sections, but only one (i.e. the first one) key/value
pair for each of those two sections.

When I remove the last '?' in ...(.+=.+))+?)" then the last key/value pair
instead of the first one is the result. So this must have to do with
greediness/non-greediness. What I wonder is, why do I not get all three
key/value pairs? How can I tell the re engine "gimme *all* groups having
key=value form", when a '+' delivers only the last, and '+?' only the first
of those pairs?


Many thanks in advance
Franz GEIGER


P.S.: You guess it, I use regular expressions only from time to time.
Therefore I almost always run into troubles like this. Would Friedl's
Mastering Regular Expression be an option for people like me? Anyone's
owning it? Does it have a cookbook section too? Any other pointers except to
those "if you want to parse a telephone number..."-examples?







More information about the Python-list mailing list