Splitting a string
Fredrik Lundh
fredrik at pythonware.com
Tue Feb 14 03:00:15 EST 2006
Nico Grubert wrote:
> I'd like to split a string where 'and', 'or', 'and not' occurs.
>
> Example string:
> s = 'Smith, R. OR White OR Blue, T. AND Black AND Red AND NOT Green'
>
> I need to split s in order to get this list:
> ['Smith, R.', 'White', 'Blue, T.', 'Back', 'Red', 'Green']
>
> Any idea, how I can split a string where 'and', 'or', 'and not' occurs?
try re.split:
>>> s = 'Smith, R. OR White OR Blue, T. AND Black AND Red AND NOT Green'
>>> import re
>>> re.split("AND NOT|AND|OR", s) # look for longest first!
['Smith, R. ', ' White ', ' Blue, T. ', ' Black ', ' Red ', ' Green']
to get rid of the whitespace, you can either use strip
>>> [w.strip() for w in re.split("AND NOT|AND|OR", s)]
['Smith, R.', 'White', 'Blue, T.', 'Black', 'Red', 'Green']
or tweak the split pattern somewhat:
>>> re.split("\s*(?:AND NOT|AND|OR)\s*", s)
['Smith, R.', 'White', 'Blue, T.', 'Black', 'Red', 'Green']
to make the split case insensitive (so it matches "AND" as well as "and"
and "AnD" and any other combination), prepend (?i) to the pattern:
>>> re.split("(?i)\s*(?:and not|and|or)\s*", s)
['Smith, R.', 'White', 'Blue, T.', 'Black', 'Red', 'Green']
to keep the separators, change (?:...) to (...):
>>> re.split("(?i)\s*(and not|and|or)\s*", s)
['Smith, R.', 'OR', 'White', 'OR', 'Blue, T.', 'AND', 'Black', 'AND', 'Red',
'AND NOT', 'Green']
hope this helps!
</F>
More information about the Python-list
mailing list