Regexp: unexspected splitting of string in several groups

Piet pit.grinja at gmx.de
Tue Jun 1 16:14:40 EDT 2004


"Robert Brewer" <fumanchu at amor.org> wrote in message news:<mailman.450.1086010426.6949.python-list at python.org>...
> Piet wrote:
> > vartype(width[,decimals]|list) further variable attributes.
> > Typical examples are:
> > char(30) binary
> > int(10) zerofill
> > float(3,2)...
> > I would like to extract the vartype, the bracketed string and the
> > further properties separately and thus defined the following regular
> > expression:
> > #snip
> > vartypePattern = re.compile("([a-zA-Z]+)(\(.*\))*([^(].*[^)])")
> > vartypeSplit = vartypePattern.match("float(3,2) not null")
> 
> You might try collecting the parentheses and "further attributes" into
> their own group:
> 
> >>> a = "char(30) binary"
> >>> b = "float"
> >>> pat = r"([a-zA-Z]+)((\(.*\))(.*))*"
> >>> re.match(pat, a).groups()
>  ('char', '(30) binary', '(30)', ' binary')
> >>> re.match(pat, b).groups()
> ('float', None, None, None)
Thanks for the tip. The concept of "nested groups" in regexes seems
interesting. I tried that approach in a much simpler version in the
beginning and was heavily irritated by the fact that some parts of the
string are returned twice. Sure the result can be unambiguously
evaluated, but I preferred a "one group for one match". What finally
worked for me was

vartypePattern = re.compile("([a-zA-Z]*)(?:\((.*)\))*(.*)")

Seems to be similar to the ideas proposed above. I will definitely
keep those in mind just in case I stumble over a special case that is
not correctly handled by the line above. Anyway, my admiration to
people who could qiuckly offer a solution for a problem that cost me
almost half a day to finally NOT work.
Thanks a lot
Peter



More information about the Python-list mailing list