splitting words with brackets

Simon Forman rogue_pedro at yahoo.com
Wed Jul 26 16:39:06 EDT 2006


Qiangning Hong wrote:
> faulkner wrote:
> > re.findall('\([^\)]*\)|\[[^\]]*|\S+', s)
>
> sorry i forgot to give a limitation: if a letter is next to a bracket,
> they should be considered as one word. i.e.:
> "a(b c) d" becomes ["a(b c)", "d"]
> because there is no blank between "a" and "(".

This variation seems to do it:

import re

s = "a (b c) d [e f g] h i(j k) l [m n o]p q"

def splitup(s):
    return re.findall('''
        \S*\( [^\)]* \)\S*  |
        \S*\[ [^\]]* \]\S*  |
        \S+
        ''', s, re.VERBOSE)

print splitup(s)

# Prints

['a', '(b c)', 'd', '[e f g]', 'h', 'i(j k)', 'l', '[m n o]p', 'q']


Peace,
~Simon




More information about the Python-list mailing list