Regexp: unexspected splitting of string in several groups

Christos TZOTZIOY Georgiou tzot at sil-tec.gr
Mon May 31 18:28:53 EDT 2004


On 31 May 2004 04:41:11 -0700, rumours say that pit.grinja at gmx.de (Piet)
might have written:

>vartype is a simple string(varchar, tinyint ...) which might be
>followed by a string in curved brackets. This bracketed string is
>either composed of a single number, two numbers separated by a comma,
>or a list of strings separated by a comma. After the bracketed string,
>there might be a list of further strings (separated by blanks)
>describing some more properties of the column.
>Typical examples are:
>char(30) binary
>int(10) zerofill
>float(3,2)...
>I would like to extract the vartype, the bracketed string and the
>further properties separately and thus defined the following regular
>expression:

Does this RE work for you?

tre= re.compile(r"(\w+)"
r"(?:\(([\d\w]+(?:,[\d\w]+)*)\))?"
r"(\s+\w+)*")

For your examples:

>>> tre.match("char(30) binary").groups()
('char', '30', ' binary')
>>> tre.match("int(10) zerofill").groups()
('int', '10', ' zerofill')
>>> tre.match("float(3,2)").groups()
('float', '3,2', None)

PS1 if you make the re slightly more complex, you can avoid the initial
space in the third "properties" group.  I also assumed no space between
the "vartype" and the left parenthesis (if it is there).

PS2 redemo.py somewhere in your python's installation is a good friend
of yours.

PS3 I am a fan of regular expressions for years, and I often overuse
them.  Perhaps somebody else might give you a better advice than me.
-- 
TZOTZIOY, I speak England very best,
"I have a cunning plan, m'lord" --Sean Bean as Odysseus/Ulysses



More information about the Python-list mailing list