Splitting a quoted string.
Paul Melis
paul at science.uva.nl
Wed May 16 07:17:23 EDT 2007
Hi,
mosscliffe wrote:
> I am looking for a simple split function to create a list of entries
> from a string which contains quoted elements. Like in 'google'
> search.
>
> eg string = 'bob john "johnny cash" 234 june'
>
> and I want to have a list of ['bob', 'john, 'johnny cash', '234',
> 'june']
>
> I wondered about using the csv routines, but I thought I would ask the
> experts first.
>
> There maybe a simple function, but as yet I have not found it.
Here a not-so-simple-function using regular expressions. It repeatedly
matched two regexps, one that matches any sequence of characters except
a space and one that matches a double-quoted string. If there are two
matches the one occurring first in the string is taken and the matching
part of the string cut off. This is repeated until the whole string is
matched. If there are two matches at the same point in the string the
longer of the two matches is taken. (This can't be done with a single
regexp using the A|B operator, as it uses lazy evaluation. If A matches
then it is returned even if B would match a longer string).
import re
def split_string(s):
pat1 = re.compile('[^ ]+')
pat2 = re.compile('"[^"]*"')
parts = []
m1 = pat1.search(s)
m2 = pat2.search(s)
while m1 or m2:
if m1 and m2:
# Both match, take match occurring earliest in the string
p1 = m1.group(0)
p2 = m2.group(0)
if m1.start(0) < m2.start(0):
part = p1
s = s[m1.end(0):]
elif m2.start(0) < m1.start(0):
part = p2
s = s[m2.end(0):]
else:
# Both match at the same string position, take longest match
if len(p1) > len(p2):
part = p1
s = s[m1.end(0):]
else:
part = p2
s = s[m2.end(0):]
elif m1:
part = m1.group(0)
s = s[m1.end(0):]
else:
part = m2.group(0)
s = s[m2.end(0):]
parts.append(part)
m1 = pat1.search(s)
m2 = pat2.search(s)
return parts
>>> s = 'bob john "johnny cash" 234 june'
>>> split_string(s)
['bob', 'john', '"johnny cash"', '234', 'june']
>>>
Paul
More information about the Python-list
mailing list