Splitting a string

Fredrik Lundh fredrik at pythonware.com
Tue Feb 14 03:00:15 EST 2006


Nico Grubert wrote:

> I'd like to split a string where 'and', 'or', 'and not' occurs.
>
> Example string:
> s = 'Smith, R. OR White OR Blue, T. AND Black AND Red AND NOT Green'
>
> I need to split s in order to get this list:
> ['Smith, R.', 'White', 'Blue, T.', 'Back', 'Red', 'Green']
>
> Any idea, how I can split a string where 'and', 'or', 'and not' occurs?

try re.split:

>>> s = 'Smith, R. OR White OR Blue, T. AND Black AND Red AND NOT Green'

>>> import re
>>> re.split("AND NOT|AND|OR", s) # look for longest first!
['Smith, R. ', ' White ', ' Blue, T. ', ' Black ', ' Red ', ' Green']

to get rid of the whitespace, you can either use strip

>>> [w.strip() for w in re.split("AND NOT|AND|OR", s)]
['Smith, R.', 'White', 'Blue, T.', 'Black', 'Red', 'Green']

or tweak the split pattern somewhat:

>>> re.split("\s*(?:AND NOT|AND|OR)\s*", s)
['Smith, R.', 'White', 'Blue, T.', 'Black', 'Red', 'Green']

to make the split case insensitive (so it matches "AND" as well as "and"
and "AnD" and any other combination), prepend (?i) to the pattern:

>>> re.split("(?i)\s*(?:and not|and|or)\s*", s)
['Smith, R.', 'White', 'Blue, T.', 'Black', 'Red', 'Green']

to keep the separators, change (?:...) to (...):

>>> re.split("(?i)\s*(and not|and|or)\s*", s)
['Smith, R.', 'OR', 'White', 'OR', 'Blue, T.', 'AND', 'Black', 'AND', 'Red',
'AND NOT', 'Green']

hope this helps!

</F>






More information about the Python-list mailing list