splitting words with brackets

Tim Chase python.list at tim.thechases.com
Wed Jul 26 16:37:25 EDT 2006


 > "a (b c) d [e f g] h i"
 > should be splitted to
 > ["a", "(b c)", "d", "[e f g]", "h", "i"]
 >
 > As speed is a factor to consider, it's best if there is a
 > single line regular expression can handle this.  I tried
 > this but failed:
 > re.split(r"(?![\(\[].*?)\s+(?!.*?[\)\]])", s).  It work
 > for "(a b) c" but not work "a (b c)" :(
 >
 > Any hint?

[and later added]
 > sorry i forgot to give a limitation: if a letter is next
 > to a bracket, they should be considered as one word. i.e.:
 > "a(b c) d" becomes ["a(b c)", "d"] because there is no
 > blank between "a" and "(".


 >>> import re
 >>> s ='a (b c) d [e f g] h ia abcd(b c)xyz d [e f g] h i'
 >>> r = re.compile(r'(?:\S*(?:\([^\)]*\)|\[[^\]]*\])\S*)|\S+')
 >>> r.findall(s)
['a', '(b c)', 'd', '[e f g]', 'h', 'ia', 'abcd(b c)xyz', 'd', 
'[e f g]', 'h', 'i']

I'm sure there's a *much* more elegant pyparsing solution to
this, but I don't have the pyparsing module on this machine.
It's much better/clearer and will be far more readable when
you come back to it later.

However, the above monstrosity passes the tests I threw at
it.

-tkc









More information about the Python-list mailing list